The content of the invention
In order to solve the problems, such as that accuracy existing for delay time estimation method that above-mentioned technology is related to is low, the embodiment of the present invention carries
The delay time estimation method and device of a kind of voice signal are supplied.The technical solution is as follows:
First aspect, there is provided a kind of delay time estimation method of voice signal, the described method includes:
Obtain two-way voice signal;
Coherence's matching is carried out to the two-way voice signal according to the Short Time Fourier Transform of the two-way voice signal
The first matching result is obtained, first matching result includes the first matched position and the first matching of the two-way voice signal
Degree;
Coherence is carried out to the two-way voice signal according to the Spectral correlation of the power spectrum of the two-way voice signal
Matching obtains the second matching result, and second matching result includes the second matched position and second of the two-way voice signal
Matching degree;
The time delay between the two-way voice signal is calculated according to first matching result and second matching result.
Optionally, it is described that the two-way voice signal is calculated according to first matching result and second matching result
Between time delay, including:
For per voice signal all the way, being calculated using weighted average first matched position and second matched position
Method calculates final matched position, and the weight of the Weighted Average Algorithm is according to first matching degree and second matching degree
Definite;
The time delay between the two-way voice signal is calculated according to the final matched position of the two-way voice signal.
Optionally, the Short Time Fourier Transform according to the two-way voice signal carries out the two-way voice signal
Coherence matches to obtain the first matching result, including:
For per voice signal all the way, making an uproar according to the following equation to each frame voice signal in the voice signal
Acoustic tracking, obtains the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu
< 1;W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain;
The Short Time Fourier Transform progress binary conversion treatment to each frame voice signal obtains two-value spectrum according to the following equation
Xb(w,n):
TbFor preset first threshold value;
By the wherein all the way corresponding K of voice signalaA two-value composes K corresponding with another way voice signalbA two-value spectrum carries out
Coherence between two-by-two matches to obtain first matching result, and first matching result includes highest one group two of matching degree
Value composes corresponding matched position and matching degree, Ka、KbIt is positive integer.
Optionally, the Spectral correlation of the power spectrum according to the two-way voice signal is to the two-way voice signal
Coherence is carried out to match to obtain the second matching result, including:
For per voice signal all the way, calculating the work(of each frame voice signal in the voice signal according to the following equation
Rate spectrum P (w, n):
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2;
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W tables
Show the frequency point sequence number on frequency domain;N represents the frame number in time domain;
The Spectral correlation DP (w, n) of the power spectrum of each frame voice signal is calculated according to the following equation:
DP (w, n)=| P (w+1, n)-P (w, n) |;
Noise tracking is carried out to the Spectral correlation DP (w, n) according to the following equation, obtains each frame voice signal
The Spectral correlation NDP (w, n) of noise power spectrum:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1;
The Spectral correlation DP (w, n) the progress binary conversion treatment of each frame voice signal is obtained according to the following equation
Correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold;
By the wherein all the way corresponding KD of voice signalaA correlation two-value composes KD corresponding with another way voice signalbA phase
Coherence between closing property two-value spectrum carries out two-by-two matches to obtain second matching result, and second matching result includes matching
Spend highest one group of correlation two-value and compose corresponding matched position and matching degree, KDa、KDbIt is positive integer.
Optionally, the Short Time Fourier Transform according to the two-way voice signal carries out the two-way voice signal
Before coherence matches to obtain the first matching result, further include:
For per voice signal all the way, being pre-processed to obtain pretreated voice signal, institute to the voice signal
Stating pretreatment includes at least one of noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing;
Short Time Fourier Transform is carried out to the pretreated voice signal.
Second aspect, there is provided a kind of time delay estimation device of voice signal, described device include:
Signal acquisition module, for obtaining two-way voice signal;
First matching module, believes the two-way sound for the Short Time Fourier Transform according to the two-way voice signal
Number carry out coherence match to obtain the first matching result, first matching result includes first of the two-way voice signal
With position and the first matching degree;
Second matching module, the Spectral correlation for the power spectrum according to the two-way voice signal is to the two-way sound
Sound signal carries out coherence and matches to obtain the second matching result, and second matching result includes the of the two-way voice signal
Two matched positions and the second matching degree;
Time-delay calculation module, for calculating the two-way sound according to first matching result and second matching result
Time delay between sound signal.
Optionally, the time-delay calculation module, including:Position calculation unit and time-delay calculation unit;
The position calculation unit, for for per voice signal all the way, to first matched position and described second
Matched position calculates final matched position using Weighted Average Algorithm, and the weight of the Weighted Average Algorithm is according to described first
What matching degree and second matching degree determined;
The time-delay calculation unit, described two are calculated for the final matched position according to the two-way voice signal
Time delay between the voice signal of road.
Optionally, first matching module, including:First tracking cell, the first binarization unit and the first matching are single
Member;
First tracking cell, for for per voice signal all the way, according to the following equation in the voice signal
Each frame voice signal carry out noise tracking, obtain the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu
< 1;W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain;
First binarization unit, for according to the following equation to the Short Time Fourier Transform of each frame voice signal into
Row binary conversion treatment obtains two-value spectrum Xb (w, n):
TbFor preset first threshold value;
First matching unit, for will the wherein all the way corresponding K of voice signalaA two-value spectrum is believed with another way sound
Number corresponding KbCoherence between a two-value spectrum carries out two-by-two matches to obtain first matching result, first matching result
Corresponding matched position and matching degree, K are composed including the highest one group of two-value of matching degreea、KbIt is positive integer.
Optionally, second matching module, including:Spectra calculation unit, correlation calculations unit, the second tracking are single
Member, the second binarization unit and the second matching unit;
The spectra calculation unit, for for per voice signal all the way, calculating the sound letter according to the following equation
The power spectrum P (w, n) of each frame voice signal in number:
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2;
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W tables
Show the frequency point sequence number on frequency domain;N represents the frame number in time domain;
The correlation calculations unit, for phase between the spectrum for the power spectrum for calculating each frame voice signal according to the following equation
Closing property DP (w, n):
DP (w, n)=| P (w+1, n)-P (w, n) |;
Second tracking cell, for carrying out noise tracking to the Spectral correlation DP (w, n) according to the following equation,
Obtain the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1;
Second binarization unit, for according to the following equation to the Spectral correlation DP of each frame voice signal
(w, n) carries out binary conversion treatment and obtains correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold;
Second matching unit, for will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum with it is another
The corresponding KD of road voice signalbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain second matching result,
Second matching result includes the highest one group of correlation two-value of matching degree and composes corresponding matched position and matching degree, KDa、KDb
It is positive integer.
Optionally, described device further includes:
Signal pre-processing module, for for per voice signal all the way, being pre-processed to obtain to the voice signal pre-
Voice signal after processing, the pretreatment are included in noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing
At least one;
Fourier transformation module, for carrying out Short Time Fourier Transform to the pretreated voice signal.
The beneficial effect that technical solution provided in an embodiment of the present invention is brought is:
The first matching result is obtained by carrying out analysis matching to the Short Time Fourier Transform of two-way voice signal, and is passed through
Analysis matching is carried out to the Spectral correlation of the power spectrum of two-way voice signal and obtains the second matching result, then in conjunction with first
The time delay between two-way voice signal is calculated with result and the second matching result;Solves the delay time estimation method that correlation technique is related to
The problem of existing accuracy is low;From frequency domain distribution and two angles of Spectral correlation of power spectrum respectively to two-way voice signal
The matching analysis is carried out, and comprehensive two matching results determine final matching results, have reached raising matching precision, improve time delay
The effect of accuracy of estimation.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Please refer to Fig.1, it illustrates the method for the delay time estimation method of voice signal provided by one embodiment of the present invention
Flow chart, the present embodiment by the delay time estimation method be applied to mobile phone, tablet computer, laptops or it is desk-top in terms of
It is illustrated in the electronic equipment of calculation machine etc.The delay time estimation method can include the following steps:
Step 102, two-way voice signal is obtained.
Step 104, coherence's matching is carried out to two-way voice signal according to the Short Time Fourier Transform of two-way voice signal
The first matching result is obtained, which includes the first matched position and the first matching degree of two-way voice signal.
Step 106, coherence is carried out to two-way voice signal according to the Spectral correlation of the power spectrum of two-way voice signal
Matching obtains the second matching result, which includes the second matched position and the second matching of two-way voice signal
Degree.
Step 108, the time delay between two-way voice signal is calculated according to the first matching result and the second matching result.
It should be noted is that:Above-mentioned steps 106 can carry out after step 104, can also be before step 104
Carry out, or be carried out at the same time with step 104, in the present embodiment, is only illustrated after step 104 with step 106,
This is not especially limited.
In conclusion the delay time estimation method of voice signal provided in this embodiment, by the short of two-way voice signal
When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal
Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result
Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point
Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot
Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
Please refer to Fig.2, the method for the delay time estimation method of the voice signal provided it illustrates another embodiment of the present invention
Flow chart, the present embodiment by the delay time estimation method be applied to mobile phone, tablet computer, laptops or it is desk-top in terms of
It is illustrated in the electronic equipment of calculation machine etc.The delay time estimation method can include the following steps:
Step 201, two-way voice signal is obtained.
Two-way voice signal is the discrete signal in time domain.In this example, it is assumed that wherein all the way voice signal A
For xra(n), another way voice signal B is xrb(n), n represents the frame number in time domain, n ∈ [0, M-1], M >=2 and n, M are
Integer.
Step 202, for per voice signal all the way, being pre-processed to obtain pretreated sound letter to voice signal
Number.
Wherein, pretreatment includes but not limited in noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing
At least one.The purpose pre-processed to voice signal is to extract more accurate, reliable sound in the follow-up process
Sound feature, to improve matching precision.
In this example, it is assumed that to being pretreatment F (*) per voice signal all the way, then wherein voice signal A's all the way is pre-
Handling result is xa(n)=F (xra(n)), the pre-processed results of another way voice signal B are xb(n)=F (xrb(n))。
It should be noted is that:The present embodiment is only illustrated with the above-mentioned several pretreatment modes enumerated,
In practical application, other pretreatment modes can be used according to the actual requirements, this present embodiment is not especially limited.
Step 203, Short Time Fourier Transform is carried out to pretreated voice signal.
The process of STFT (Short Time Fourier Transform, Short Time Fourier Transform) is to use predetermined window letter
It is several that and FFT (Fast Fourier Transform, Fast Fourier Transform (FFT)) is done to pretreated voice signal adding window, from
And voice signal is transformed into frequency domain from time domain.Wherein, the selection of window function includes but not limited to hamming window and Caesar's window.Window
The length of function is 2 power, such as 128,256 etc..
In the present embodiment, remember that STFT functions are STFT (*), then wherein the STFT of voice signal A can be denoted as X all the waya(w,
N)=STFT (xa(n)), the STFT of another way voice signal B can be denoted as Xb(w, n)=STFT (xb(n)), w is represented on frequency domain
Frequency point sequence number, w ∈ [0, N-1], N >=2 and w, N are integer.
Step 204, coherence's matching is carried out to two-way voice signal according to the Short Time Fourier Transform of two-way voice signal
The first matching result is obtained, which includes the first matched position and the first matching degree of two-way voice signal.
In the first matching way, two-way voice signal is matched from the frequency domain distribution angle of voice signal.Tool
For body, this step can include following several sub-steps:
First, for per voice signal all the way, being carried out according to the following equation to each frame voice signal in voice signal
Noise tracks, and obtains the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu< 1;
W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain.
The purpose that noise tracking is carried out to voice signal is to reduce the noise in voice signal as much as possible to matching
As a result influence, to improve matching precision.In addition, pass through reasonable set predetermined coefficient αu、αdSize, in signal uphill process
It is middle to use smaller factor alphau, and use larger factor alpha during signal declined, noise tracking effect can be improved.
In this example, it is assumed that the noise spectrum of the i-th frame voice signal in wherein all the way voice signal A is Na(w, i),
Then:
Wherein, Xa(w, i) represents the Short Time Fourier Transform of the i-th frame voice signal in voice signal A, i >=0 and i is
Integer.
Similar, it is assumed that the noise spectrum of the jth frame voice signal in another way voice signal B is Nb(w, j), then:
Wherein, Xb(w, j) represents the Short Time Fourier Transform of the jth frame voice signal in voice signal B, j >=0 and j is
Integer.
Second, binary conversion treatment is carried out to the Short Time Fourier Transform of each frame voice signal according to the following equation and obtains two
Value spectrum Xb (w, n):
TbFor preset first threshold value.
After binary conversion treatment is carried out to the frequency spectrum of voice signal, the Short Time Fourier Transform of each frame voice signal
Binary sequence being converted into equal length, being made of 0 and 1.Data processing can fully be improved by binary conversion treatment
Speed, subsequently to carry out the matching primitives of highly efficient (robust).
In this example, it is assumed that the corresponding two-value spectrum of the i-th frame voice signal in wherein all the way voice signal A is Xba
(w, i), then:
Similar, it is assumed that the corresponding two-value spectrum of jth frame voice signal in another way voice signal B is Xbb(w, j), then:
3rd, will the wherein all the way corresponding K of voice signalaA two-value composes K corresponding with another way voice signalbA two-value
Coherence between spectrum carries out two-by-two matches to obtain the first matching result, Ka、KbIt is positive integer.
Wherein, the first matching result includes the corresponding matched position of the highest one group of two-value spectrum of matching degree and matching degree.When
The corresponding two-value spectrum quantity of wherein all the way voice signal A is KaAnd the corresponding two-value spectrum quantity of another way voice signal B is KbWhen,
Need to carry out Ka×KbSecondary two-value spectrum matching.For matching each time, the matching degree between two two-value spectrums is recorded, and carry out
The sequence number n of matched two two-values spectruma(i) and nb(j).Wherein, na(i) the i-th frame sound in wherein all the way voice signal A is represented
The sequence number of the corresponding two-value spectrum of sound signal, na(i)∈[0,Ka- 1], Ka>=2 and na(i)、KaIt is integer;nb(j) represent another
The sequence number of the corresponding two-value spectrum of jth frame voice signal in the voice signal B of road, nb(j)∈[0,Kb- 1], Kb>=2 and nb(j)、Kb
It is integer.
For example if the corresponding two-value of the i-th frame voice signal in wherein all the way voice signal A is composed into XbaIt is (w, i) and another
The corresponding two-value spectrum Xb of jth frame voice signal all the way in voice signal Bb(w, j) is matched, then both matching degree Pbij
For:
Wherein, ⊙ represents same or oeprator.Matching degree PbijCorresponding position binaryzation result in equal to two two-value spectrums
Equal frequency point is to quantity and frequency point to total ratio.
In a specific example, it is assumed that Xba(w, i)={ 1,1,0,0,0,1,1,1 }, Xbb(w, j)=0,1,1,1,
0,1,1,1 }, then as w=0, due to Xba(0, i)=1, Xbb(0, j)=0, so Xba(0,i)⊙Xbb(0, j)=0;Work as w
When=1, due to Xba(1, i)=1, Xbb(1, j)=1, so Xba(1,i)⊙Xbb(1, j)=1;And so on, it can calculate
The matching degree Pb of above-mentioned two binary sequenceij=5/8=0.625.
After the matching degree being calculated between each two two-value spectrum, the highest one group of two-value spectrum of matching degree is chosen, and remember
Record this group of two-value and compose corresponding sequence number and matching degree.
In this example, it is assumed that the i-th frame voice signal corresponding n-th in wherein all the way voice signal Aa(i) a two
Value spectrum and the jth frame voice signal corresponding n-th in another way voice signal Bb(j) the matching degree highest of a two-value spectrum, is denoted as
P1.Such as i=1 and j=2,
Step 205, coherence is carried out to two-way voice signal according to the Spectral correlation of the power spectrum of two-way voice signal
Matching obtains the second matching result, which includes the second matched position and the second matching of two-way voice signal
Degree.
In second of matching way, two-way voice signal is matched from the Spectral correlation angle of power spectrum.Tool
For body, this step can include following several sub-steps:
First, for per voice signal all the way, calculating each frame voice signal in voice signal according to the following equation
Power spectrum P (w, n):
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2;
Wherein, X (w, n) represents the Short Time Fourier Transform of voice signal;αpFor predetermined coefficient and 0 < αp< 1;W represents frequency
Frequency point sequence number on domain;N represents the frame number in time domain.
In this example, it is assumed that the power spectrum of the i-th frame voice signal in wherein all the way voice signal A is Pa(w, i),
Then:
Pa(w, i)=αpPa(w,i-1)+(1-αp)|Xa(w,i)|2;
Wherein, Xa(w, i) represents the Short Time Fourier Transform of the i-th frame voice signal in voice signal A, i >=0 and i is
Integer.
Similar, it is assumed that the power spectrum of the jth frame voice signal in another way voice signal B is Pb(w, j), then:
Pb(w, j)=αpPb(w,j-1)+(1-αp)|Xb(w,j)|2;
Wherein, Xb(w, j) represents the Short Time Fourier Transform of the jth frame voice signal in voice signal B, j >=0 and j is
Integer.
Second, the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal is calculated according to the following equation:
DP (w, n)=| P (w+1, n)-P (w, n) |.
After the power spectrum of voice signal is calculated, the energy of high frequency points is subtracted to the energy of low frequency point, calculates power
The Spectral correlation of spectrum.
It is in this example, it is assumed that related between the spectrum of the power spectrum of the i-th frame voice signal in wherein all the way voice signal A
Property is DPa(w, i), then:
DPa(w, i)=| Pa(w+1,i)-Pa(w,i)|;
Similar, it is assumed that the Spectral correlation of the power spectrum of the jth frame voice signal in another way voice signal B is DPb
(w, j), then:
DPb(w, j)=| Pb(w+1,j)-Pb(w,j)|。
3rd, noise tracking is carried out to Spectral correlation DP (w, n) according to the following equation, obtains each frame voice signal
The Spectral correlation NDP (w, n) of noise power spectrum:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1.
The purpose for carrying out noise tracking to Spectral correlation is that the noise fluctuated in order to prevent brings subsequent match erroneous judgement,
Influence of the noise to matching result is reduced as much as possible, to improve matching precision.In addition, pass through reasonable set predetermined coefficient βu、βd
Size, noise tracking effect can be improved.
In this example, it is assumed that between the spectrum of the noise power spectrum of the i-th frame voice signal in wherein all the way voice signal A
Correlation is NDPa(w, i), then:
It is similar, it is assumed that the Spectral correlation of the noise power spectrum of the jth frame voice signal in another way voice signal B is
NDPb(w, j), then:
4th, binary conversion treatment is carried out to the Spectral correlation DP (w, n) of each frame voice signal according to the following equation and is obtained
To correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold.
After carrying out binary conversion treatment to Spectral correlation, the Spectral correlation of each frame voice signal can be converted into one
Equal length, the binary sequence being made of 0 and 1.Data processing speed can fully be improved by binary conversion treatment, so as to follow-up
Carry out the matching primitives of highly efficient (robust).
In this example, it is assumed that the corresponding correlation two-value spectrum of the i-th frame voice signal in wherein all the way voice signal A
For XDba(w, i), then:
Similar, it is assumed that the corresponding correlation two-value spectrum of jth frame voice signal in another way voice signal B is XDbb
(w, j), then:
5th, will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum is corresponding with another way voice signal
KDbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain the second matching result, KDa、KDbIt is positive integer.
Wherein, the second matching result includes the corresponding matched position of the highest one group of correlation two-value spectrum of matching degree and matching
Degree.When the corresponding correlation two-value spectrum quantity of wherein all the way voice signal A is KDaAnd the corresponding correlations of another way voice signal B
Property two-value spectrum quantity be KDbWhen, it is necessary to carry out KDa×KDbSecondary correlation two-value spectrum matching.For matching each time, record two
Matching degree between a correlation two-value spectrum, and carry out the sequence number nd of matched two correlation two-values spectruma(i) and ndb(j).Its
In, nda(i) sequence number of the corresponding correlation two-value spectrum of the i-th frame voice signal in wherein all the way voice signal A, nd are representeda(i)
∈[0,KDa- 1], KDa>=2 and nda(i)、KDaIt is integer;ndb(j) the jth frame sound letter in another way voice signal B is represented
The sequence number of number corresponding correlation two-value spectrum, ndb(j)∈[0,KDb- 1], KDb>=2 and ndb(j)、KDbIt is integer.In addition,
Under normal conditions, KDaWith the K in above-mentioned steps 204aIt is equal, KDbWith the K in above-mentioned steps 204bIt is equal.
For example if the corresponding correlation two-value of the i-th frame voice signal in wherein all the way voice signal A is composed into XDba
(w, i) correlation two-value spectrum XDb corresponding with the jth frame voice signal in another way voice signal Bb(w, j) is matched, then
Both matching degree PDbijFor:
Wherein, ⊙ represents same or oeprator.Matching degree PDbijCorresponding position two in equal to two correlation two-value spectrums
The equal data of value result are to quantity and data to total ratio.
After the matching degree being calculated between each two correlation two-value spectrum, the highest one group of correlation of matching degree is chosen
Two-value is composed, and is recorded this group of correlation two-value and composed corresponding sequence number and matching degree.
In this example, it is assumed that corresponding n-th d of the i-th frame voice signal in wherein all the way voice signal Aa(i) a phase
Closing property two-value composes n-th d corresponding with the jth frame voice signal in another way voice signal Bb(j) of a correlation two-value spectrum
With degree highest, P is denoted as2.Such as i=1 and j=3,
Step 206, the time delay between two-way voice signal is calculated according to the first matching result and the second matching result.
After the first matching result and the second matching result is calculated, above-mentioned two matching result is integrated to obtain
Final matching results, and then calculate the time delay between two-way voice signal according to final matching results.Specifically, this step can be with
Including following two sub-steps:
First, for per voice signal all the way, Weighted Average Algorithm is used to the first matched position and the second matched position
Final matched position is calculated, the weight of the Weighted Average Algorithm is determined according to the first matching degree and the second matching degree.
In a kind of possible embodiment, it is assumed that the first matching obtained from the frequency domain distribution angle calculation of voice signal
Spend for P1, the second matching degree obtained from the Spectral correlation angle calculation of power spectrum is P2, then the corresponding power of the first matched position
Weight isThe corresponding weight of second matched position is
The wherein all the way final matched position nl of voice signal AaFor:
Wherein, naRepresent the first matched position of voice signal A, ndaRepresent the second matched position of voice signal A.
Similar, the final matched position nl of another way voice signal BbFor:
Wherein, nbRepresent the first matched position of voice signal B, ndbRepresent the second matched position of voice signal B.
Second, the time delay between two-way voice signal is calculated according to the final matched position of two-way voice signal.
The time delay t between two-way voice signal is calculated according to the following equation:
T=k (nla-nlb);
Wherein, k is time coefficient.
Time coefficient k can be calculated according to STFT sample frequency f, sampled point quantity Num and the overlap coefficient η converted,
Time coefficient
In a specific example, if the FFT overlapping 50% of 256 points of the signal sampling using 16KHz, then the time
Coefficient
It should be noted is that:Above-mentioned steps 205 can carry out after step 204, can also be before the step 204
Carry out, or be carried out at the same time with step 204, in the present embodiment, is only illustrated after step 204 with step 205,
This is not especially limited.
In conclusion the delay time estimation method of voice signal provided in this embodiment, by the short of two-way voice signal
When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal
Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result
Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point
Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot
Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
In addition, delay time estimation method provided in this embodiment, before matching primitives are carried out, by carrying out two-value to data
Change is handled, and substantially increases matching efficiency, there is provided a kind of delay time estimation method of robust.
Following is apparatus of the present invention embodiment, can be used for performing the method for the present invention embodiment.It is real for apparatus of the present invention
The details not disclosed in example is applied, refer to the method for the present invention embodiment.
Please refer to Fig.3, it illustrates the structure of the time delay estimation device of voice signal provided by one embodiment of the present invention
Block diagram, the time delay estimation device can by software, hardware or both be implemented in combination with as electronic equipment part or
Person is whole.The time delay estimation device can include:Signal acquisition module 310, the first matching module 320, the second matching module 330
With time-delay calculation module 340.
Signal acquisition module 310, for obtaining two-way voice signal.
First matching module 320, for the Short Time Fourier Transform according to the two-way voice signal to the two-way sound
Sound signal carries out coherence and matches to obtain the first matching result, and first matching result includes the of the two-way voice signal
One matched position and the first matching degree.
Second matching module 330, the Spectral correlation for the power spectrum according to the two-way voice signal is to described two
Road voice signal carries out coherence and matches to obtain the second matching result, and second matching result includes the two-way voice signal
The second matched position and the second matching degree.
Time-delay calculation module 340, for calculating described two according to first matching result and second matching result
Time delay between the voice signal of road.
In conclusion the time delay estimation device of voice signal provided in this embodiment, by the short of two-way voice signal
When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal
Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result
Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point
Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot
Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
Please refer to Fig.4, the structure of the time delay estimation device of the voice signal provided it illustrates another embodiment of the present invention
Block diagram, the time delay estimation device can by software, hardware or both be implemented in combination with as electronic equipment part or
Person is whole.The time delay estimation device can include:Signal acquisition module 310, the first matching module 320, the second matching module 330
With time-delay calculation module 340.
Signal acquisition module 310, for obtaining two-way voice signal.
First matching module 320, for the Short Time Fourier Transform according to the two-way voice signal to the two-way sound
Sound signal carries out coherence and matches to obtain the first matching result, and first matching result includes the of the two-way voice signal
One matched position and the first matching degree.
Wherein, first matching module 320, can include:First tracking cell 320a, the first binarization unit 320b
With the first matching unit 320c.
The first tracking cell 320a, for for per voice signal all the way, believing according to the following equation the sound
Each frame voice signal in number carries out noise tracking, obtains the noise spectrum N (w, n) of each frame voice signal:
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αu、αdFor predetermined coefficient and 0 < αd< αu
< 1;W represents the frequency point sequence number on frequency domain;N represents the frame number in time domain.
The first binarization unit 320b, for carrying out two-value to the frequency spectrum of each frame voice signal according to the following equation
Change handles to obtain two-value spectrum Xb (w, n):
TbFor preset first threshold value.
The first matching unit 320c, for will the wherein all the way corresponding K of voice signalaA two-value spectrum and another way sound
The corresponding K of sound signalbCoherence between a two-value spectrum carries out two-by-two matches to obtain first matching result, first matching
As a result include the highest one group of two-value of matching degree and compose corresponding matched position and matching degree, Ka、KbIt is positive integer.
Second matching module 330, the Spectral correlation for the power spectrum according to the two-way voice signal is to described two
Road voice signal carries out coherence and matches to obtain the second matching result, and second matching result includes the two-way voice signal
The second matched position and the second matching degree.
Wherein, second matching module 330, can include:Spectra calculation unit 330a, correlation calculations unit
330b, the second tracking cell 330c, the second binarization unit 330d and the second matching unit 330e.
The spectra calculation unit 330a, for for per voice signal all the way, calculating the sound according to the following equation
The power spectrum P (w, n) of each frame voice signal in sound signal:
P (w, n)=αpP(w,n-1)+(1-αp)|X(w,n)|2;
Wherein, X (w, n) represents the Short Time Fourier Transform of the voice signal;αpFor predetermined coefficient and 0 < αp< 1;W tables
Show the frequency point sequence number on frequency domain;N represents the frame number in time domain.
The correlation calculations unit 330b, the spectrum of the power spectrum for calculating each frame voice signal according to the following equation
Between correlation DP (w, n):
DP (w, n)=| P (w+1, n)-P (w, n) |.
The second tracking cell 330c, for carrying out noise to the Spectral correlation DP (w, n) according to the following equation
Tracking, obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
Wherein, βu、βdFor predetermined coefficient and 0 < βd< βu< 1.
The second binarization unit 330d, for according to the following equation related the spectrum of each frame voice signal
Property DP (w, n) carry out binary conversion treatment obtain correlation two-value spectrum XDb (w, n):
TDbTo preset second threshold.
The second matching unit 330e, for will the wherein all the way corresponding KD of voice signalaA correlation two-value spectrum with
The corresponding KD of another way voice signalbCoherence between a correlation two-value spectrum carries out two-by-two matches to obtain the second matching knot
Fruit, second matching result include the highest one group of correlation two-value of matching degree and compose corresponding matched position and matching degree,
KDa、KDbIt is positive integer.
Time-delay calculation module 340, for calculating described two according to first matching result and second matching result
Time delay between the voice signal of road.
Wherein, the time-delay calculation module 340, can include:Position calculation unit 340a and time-delay calculation unit 340b.
The position calculation unit 340a, for for per voice signal all the way, to first matched position and described
Second matched position calculates final matched position using Weighted Average Algorithm, and the weight of the Weighted Average Algorithm is according to
What the first matching degree and second matching degree determined.
The time-delay calculation unit 340b, institute is calculated for the final matched position according to the two-way voice signal
State the time delay between two-way voice signal.
Optionally, described device can also include:Signal pre-processing module 312 and fourier transformation module 314.
Signal pre-processing module 312, for for per voice signal all the way, being pre-processed to obtain to the voice signal
Pretreated voice signal, the pretreatment include noise reduction process, enhanced processing, high-pass filtering processing, lifting sampling processing
At least one of.
Fourier transformation module 314, for carrying out Short Time Fourier Transform to the pretreated voice signal.
In conclusion the time delay estimation device of voice signal provided in this embodiment, by the short of two-way voice signal
When Fourier transformation carry out analysis matching obtain the first matching result, and phase the spectrum for passing through the power spectrum two-way voice signal
Closing property carries out analysis matching and obtains the second matching result, and two-way sound is calculated then in conjunction with the first matching result and the second matching result
Time delay between sound signal;Solve the problems, such as that accuracy existing for the delay time estimation method that correlation technique is related to is low;From frequency domain point
Two angles of Spectral correlation of cloth and power spectrum respectively carry out two-way voice signal the matching analysis, and comprehensive two matchings knot
Fruit determines final matching results, has reached raising matching precision, improves the effect of time delay accuracy of estimation.
In addition, time delay estimation device provided in this embodiment, before matching primitives are carried out, by carrying out two-value to data
Change is handled, and substantially increases matching efficiency.
It should be noted that:The time delay estimation device for the voice signal that above-described embodiment provides is calculating two-way voice signal
When be delayed, can as needed will be above-mentioned only with the division progress of above-mentioned each function module for example, in practical application
Function distribution is completed by different function module, i.e., the internal structure of equipment is divided into different function modules, with complete with
The all or part of function of upper description.In addition, the time delay estimation device of above-described embodiment offer and the side of delay time estimation method
Method embodiment belongs to same design, its specific implementation process refers to embodiment of the method, and which is not described herein again.
Fig. 5 is refer to, it illustrates the structure diagram of electronic equipment provided by one embodiment of the present invention.The electronics is set
It is ready for use on the delay time estimation method for the voice signal for implementing to be provided in above-described embodiment.Specifically:
Electronic equipment 500 can include RF (Radio Frequency, radio frequency) circuit 510, include one or one with
Memory 520, input unit 530, display unit 540, sensor 550, the voicefrequency circuit of upper computer-readable recording medium
560th, WiFi (wireless fidelity, Wireless Fidelity) module 570, include one or more than one processing core
The component such as processor 580 and power supply 590.It will be understood by those skilled in the art that the electronic devices structure shown in Fig. 5 is simultaneously
The restriction to electronic equipment is not formed, can be included than illustrating more or fewer components, either combines some components or not
Same component arrangement.Wherein:
RF circuits 510 can be used for receive and send messages or communication process in, the reception and transmission of signal, especially, by base station
After downlink information receives, transfer to one or more than one processor 580 is handled;In addition, will be related to the data sending of uplink to
Base station.In general, RF circuits 510 include but not limited to antenna, at least one amplifier, tuner, one or more oscillators, use
Family identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), duplex
Device etc..In addition, RF circuits 510 can also be communicated by wireless communication with network and other equipment.The wireless communication can make
With any communication standard or agreement, include but not limited to GSM (Global System of Mobile communication, entirely
Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code
Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple
Access, wideband code division multiple access), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short
Messaging Service, Short Message Service) etc..
Memory 520 can be used for storage software program and module, and processor 580 is stored in memory 520 by operation
Software program and module, so as to perform various functions application and data processing.Memory 520 can mainly include storage journey
Sequence area and storage data field, wherein, storing program area can storage program area, the application program (ratio needed at least one function
Such as sound-playing function, image player function) etc.;Storage data field can be stored to be created according to using for electronic equipment 500
Data (such as voice data, phone directory etc.) etc..In addition, memory 520 can include high-speed random access memory, may be used also
With including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states
Part.Correspondingly, memory 520 can also include Memory Controller, to provide processor 580 and input unit 530 to storage
The access of device 520.
Input unit 530 can be used for the numeral or character information for receiving input, and produce and user setting and function
Control related keyboard, mouse, operation lever, optics or the input of trace ball signal.Specifically, input unit 530 may include figure
As input equipment 531 and other input equipments 532.Image input device 531 can be camera or optoelectronic scanning
Equipment.Except image input device 531, input unit 530 can also include other input equipments 532.Specifically, other are inputted
Equipment 532 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse
One or more in mark, operation lever etc..
Display unit 540 can be used for display by information input by user or be supplied to the information and electronic equipment of user
500 various graphical user interface, these graphical user interface can by figure, text, icon, video and its any combination Lai
Form.Display unit 540 may include display panel 541, optionally, can use LCD (Liquid Crystal Display,
Liquid crystal display), the form such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display
Panel 541.
Electronic equipment 500 may also include at least one sensor 550, for example, optical sensor, motion sensor and other
Sensor.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein, ambient light sensor can basis
The light and shade of ambient light adjusts the brightness of display panel 541, proximity sensor can when electronic equipment 500 is moved in one's ear,
Close display panel 541 and/or backlight.As one kind of motion sensor, gravity accelerometer can detect all directions
The size of upper (generally three axis) acceleration, can detect that size and the direction of gravity, available for identification mobile phone posture when static
Application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (for example pedometer, strikes
Hit) etc.;The gyroscope that can also configure as electronic equipment 500, barometer, hygrometer, thermometer, infrared ray sensor etc. other
Sensor, details are not described herein.
Voicefrequency circuit 560, loudspeaker 561, microphone 562 can provide the audio interface between user and electronic equipment 500.
The transformed electric signal of the voice data received can be transferred to loudspeaker 561, is changed by loudspeaker 561 by voicefrequency circuit 560
Exported for voice signal;On the other hand, the voice signal of collection is converted to electric signal by microphone 562, is connect by voicefrequency circuit 560
Voice data is converted to after receipts, then after voice data output processor 580 is handled, it is such as another to be sent to through RF circuits 510
One electronic equipment, or voice data is exported to memory 520 further to handle.Voicefrequency circuit 560 is also possible that
Earphone jack, to provide the communication of peripheral hardware earphone and electronic equipment 500.
WiFi belongs to short range wireless transmission technology, and electronic equipment 500 can help user to receive and dispatch by WiFi module 570
Email, browse webpage and access streaming video etc., it has provided wireless broadband internet to the user and has accessed.Although Fig. 5
Show WiFi module 570, but it is understood that, it is simultaneously not belonging to must be configured into for electronic equipment 500, completely can root
Omitted according to needs in the essential scope for do not change invention.
Processor 580 is the control centre of electronic equipment 500, utilizes each of various interfaces and connection whole mobile phone
Part, by running or performing the software program and/or module that are stored in memory 520, and calls and is stored in memory
Data in 520, perform the various functions and processing data of electronic equipment 500, so as to carry out integral monitoring to mobile phone.It is optional
, processor 580 may include one or more processing cores;Preferably, processor 580 can integrate application processor and modulatedemodulate
Processor is adjusted, wherein, application processor mainly handles operating system, user interface and application program etc., modem processor
Main processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 580.
Electronic equipment 500 further includes the power supply 590 (such as battery) to all parts power supply, it is preferred that power supply can lead to
Cross power-supply management system and processor 580 be logically contiguous, thus by power-supply management system realize management charging, electric discharge and
The functions such as power managed.Power supply 590 can also include one or more direct current or AC power, recharging system, electricity
The random component such as source fault detection circuit, power supply changeover device or inverter, power supply status indicator.
Although being not shown, electronic equipment 500 can also be including bluetooth module etc., and details are not described herein.
Specifically in the present embodiment, electronic equipment 500 has further included memory, and one or more than one journey
Sequence, either more than one program storage in memory and is configured to by one or more than one processor for one of them
Perform the delay time estimation method such as above-mentioned Fig. 1 or embodiment illustrated in fig. 2.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment
To complete, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.