CN104700842A - Sound signal time delay estimation method and device - Google Patents

Sound signal time delay estimation method and device Download PDF

Info

Publication number
CN104700842A
CN104700842A CN201510083890.1A CN201510083890A CN104700842A CN 104700842 A CN104700842 A CN 104700842A CN 201510083890 A CN201510083890 A CN 201510083890A CN 104700842 A CN104700842 A CN 104700842A
Authority
CN
China
Prior art keywords
voice signal
matching
matching result
ndp
spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510083890.1A
Other languages
Chinese (zh)
Other versions
CN104700842B (en
Inventor
陈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Baiguoyuan Information Technology Co Ltd
Original Assignee
All Kinds Of Fruits Garden Guangzhou Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by All Kinds Of Fruits Garden Guangzhou Network Technology Co Ltd filed Critical All Kinds Of Fruits Garden Guangzhou Network Technology Co Ltd
Priority to CN201510083890.1A priority Critical patent/CN104700842B/en
Publication of CN104700842A publication Critical patent/CN104700842A/en
Application granted granted Critical
Publication of CN104700842B publication Critical patent/CN104700842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Mobile Radio Communication Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a sound signal time delay estimation method and device and belongs to the technical field of audio processing. The sound signal time delay estimation method comprises the steps of obtaining two sound signals, conducting coherence matching on the two sound signals through short-time Fourier transformation of the two sound signals to obtain a first matching result, wherein the first matching result includes first matching positions and first matching degrees of the two sound signals; conducting coherence matching on the two sound signals according to spectral correlation of power spectrums of the two sound signals to obtain a second matching result, wherein the second matching result includes second matching positions and second matching degrees of the two sound signals; calculating the time delay of the two sound signals according to the first matching result and the second matching result. The problem that a relevant time delay estimation method is low in accuracy is solved. The sound signals are matched from two aspects of frequency domain distribution and power spectrums, the two matching results are integrated to determine a final matching result, and matching accuracy and time delay estimation accuracy are improved.

Description

The delay time estimation method of voice signal and device
Technical field
The present invention relates to audio signal processing technique field, particularly a kind of delay time estimation method of voice signal and device.
Background technology
The Time Delay Estimation Algorithms of voice signal is obtained for widespread use at numerous areas such as Sound Match, encoding and decoding alignment, sound rangings.
Prior art also provides multiple different delay time estimation method, and wherein a kind of widely used Time Delay Estimation Algorithms is the delay time estimation method based on correlation analysis.The basic thought of the method is: utilize the similarity degree of two voice signals on frequency domain to estimate time delay between these two voice signals.
Realizing in process of the present invention, inventor finds that above-mentioned technology at least exists following problem: the above-mentioned delay time estimation method based on correlation analysis only considers the similarity degree of two voice signals on frequency domain, make the matching precision of two voice signals lower, cause the time delay accuracy that finally calculates lower.
Summary of the invention
The problem that the accuracy that the delay time estimation method related to solve above-mentioned technology exists is low, embodiments provides a kind of delay time estimation method and device of voice signal.Described technical scheme is as follows:
First aspect, provides a kind of delay time estimation method of voice signal, and described method comprises:
Obtain two-way voice signal;
Carry out coherence's coupling according to the Short Time Fourier Transform of described two-way voice signal to described two-way voice signal and obtain the first matching result, described first matching result comprises the first matched position and first matching degree of described two-way voice signal;
Carry out coherence's coupling according to the Spectral correlation of the power spectrum of described two-way voice signal to described two-way voice signal and obtain the second matching result, described second matching result comprises the second matched position and second matching degree of described two-way voice signal;
The time delay between described two-way voice signal is calculated according to described first matching result and described second matching result.
Optionally, describedly calculate time delay between described two-way voice signal according to described first matching result and described second matching result, comprising:
For each road voice signal, adopt Weighted Average Algorithm to calculate final matched position to described first matched position and described second matched position, the weight of described Weighted Average Algorithm is determined according to described first matching degree and described second matching degree;
The time delay between described two-way voice signal is calculated according to the described final matched position of described two-way voice signal.
Optionally, the described Short Time Fourier Transform according to described two-way voice signal is carried out coherence's coupling to described two-way voice signal and is obtained the first matching result, comprising:
For each road voice signal, according to the following equation noise tracking is carried out to each the frame voice signal in described voice signal, obtains the noise spectrum N (w, n) of each frame voice signal:
N ( w , n ) = ( 1 - &alpha; u ) | X ( w , n ) | + &alpha; u N ( w , n - 1 ) , | X ( w , n ) | &GreaterEqual; N ( w , n - 1 ) ( 1 - &alpha; d ) | X ( w , n ) | + &alpha; d N ( w , n - 1 ) , | X ( w , n ) | < N ( w , n - 1 ) ;
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α u, α dfor predetermined coefficient and 0 < α d< α u< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
According to the following equation binary conversion treatment is carried out to the Short Time Fourier Transform of each frame voice signal and obtains two-value spectrum Xb (w, n):
Xb ( w , n ) = 1 , | X ( w , n ) | - N ( w , n ) > T b 0 , | X ( w , n ) | - N ( w , n ) &le; T b , T bfor preset first threshold value;
By K corresponding for a wherein road voice signal athe K that individual two-value spectrum is corresponding with another road voice signal bindividual two-value is composed the coupling of the coherence between carrying out between two and is obtained described first matching result, and described first matching result comprises matched position corresponding to one group of the highest two-value spectrum of matching degree and matching degree, K a, K bbe positive integer.
Optionally, the Spectral correlation of the described power spectrum according to described two-way voice signal carries out coherence's coupling to described two-way voice signal and obtains the second matching result, comprising:
For each road voice signal, calculate the power spectrum P (w, n) of each the frame voice signal in described voice signal according to the following equation:
P(w,n)=α pP(w,n-1)+(1-α p)|X(w,n)| 2
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α pfor predetermined coefficient and 0 < α p< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
Calculate the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal according to the following equation:
DP(w,n)=|P(w+1,n)-P(w,n)|;
According to the following equation noise tracking is carried out to described Spectral correlation DP (w, n), obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
NDP ( w , n ) = ( 1 - &beta; u ) DP ( w , n ) + &beta; u NDP ( w , n - 1 ) , DP ( w , n ) &GreaterEqual; NDP ( w , n - 1 ) ( 1 - &beta; d ) DP ( w , n ) + &beta; d NDP ( w , n - 1 ) , DP ( w , n ) < NDP ( w , n - 1 ) ;
Wherein, β u, β dfor predetermined coefficient and 0 < β d< β u< 1;
According to the following equation binary conversion treatment is carried out to the described Spectral correlation DP (w, n) of each frame voice signal and obtains correlativity two-value spectrum XDb (w, n):
XDb ( w , n ) = 1 , DP ( w , n ) - NDP ( w , n ) > T Db 0 , DP ( w , n ) - NDP ( w , n ) &le; T Db , T dbfor default Second Threshold;
By KD corresponding for a wherein road voice signal athe KD that individual correlativity two-value spectrum is corresponding with another road voice signal bindividual correlativity two-value is composed the coupling of the coherence between carrying out between two and is obtained described second matching result, and described second matching result comprises matched position corresponding to one group of the highest coherence's two-value spectrum of matching degree and matching degree, KD a, KD bbe positive integer.
Optionally, the described Short Time Fourier Transform according to described two-way voice signal carries out before coherence's coupling obtains the first matching result, also comprising to described two-way voice signal:
For each road voice signal, carry out pre-service obtain pretreated voice signal to described voice signal, described pre-service comprises at least one in noise reduction process, amplification process, high-pass filtering process, lifting sampling process;
Short Time Fourier Transform is carried out to described pretreated voice signal.
Second aspect, provide a kind of time delay estimation unit of voice signal, described device comprises:
Signal acquisition module, for obtaining two-way voice signal;
First matching module, obtain the first matching result for carrying out coherence's coupling according to the Short Time Fourier Transform of described two-way voice signal to described two-way voice signal, described first matching result comprises the first matched position and first matching degree of described two-way voice signal;
Second matching module, Spectral correlation for the power spectrum according to described two-way voice signal carries out coherence's coupling to described two-way voice signal and obtains the second matching result, and described second matching result comprises the second matched position and second matching degree of described two-way voice signal;
Time-delay calculation module, for calculating the time delay between described two-way voice signal according to described first matching result and described second matching result.
Optionally, described time-delay calculation module, comprising: position calculation unit and time-delay calculation unit;
Described position calculation unit, for for each road voice signal, adopt Weighted Average Algorithm to calculate final matched position to described first matched position and described second matched position, the weight of described Weighted Average Algorithm is determined according to described first matching degree and described second matching degree;
Described time-delay calculation unit, for calculating the time delay between described two-way voice signal according to the described final matched position of described two-way voice signal.
Optionally, described first matching module, comprising: the first tracking cell, the first binarization unit and the first matching unit;
Described first tracking cell, for for each road voice signal, carries out noise tracking to each the frame voice signal in described voice signal according to the following equation, obtains the noise spectrum N (w, n) of each frame voice signal:
N ( w , n ) = ( 1 - &alpha; u ) | X ( w , n ) | + &alpha; u N ( w , n - 1 ) , | X ( w , n ) | &GreaterEqual; N ( w , n - 1 ) ( 1 - &alpha; d ) | X ( w , n ) | + &alpha; d N ( w , n - 1 ) , | X ( w , n ) | < N ( w , n - 1 ) ;
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α u, α dfor predetermined coefficient and 0 < α d< α u< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
Described first binarization unit, obtains two-value spectrum Xb (w, n) for carrying out binary conversion treatment to the Short Time Fourier Transform of each frame voice signal according to the following equation:
Xb ( w , n ) = 1 , | X ( w , n ) | - N ( w , n ) > T b 0 , | X ( w , n ) | - N ( w , n ) &le; T b , T bfor preset first threshold value;
Described first matching unit, for by K corresponding for a wherein road voice signal athe K that individual two-value spectrum is corresponding with another road voice signal bindividual two-value is composed the coupling of the coherence between carrying out between two and is obtained described first matching result, and described first matching result comprises matched position corresponding to one group of the highest two-value spectrum of matching degree and matching degree, K a, K bbe positive integer.
Optionally, described second matching module, comprising: spectra calculation unit, correlation calculations unit, the second tracking cell, the second binarization unit and the second matching unit;
Described spectra calculation unit, for for each road voice signal, calculates the power spectrum P (w, n) of each the frame voice signal in described voice signal according to the following equation:
P(w,n)=α pP(w,n-1)+(1-α p)|X(w,n)| 2
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α pfor predetermined coefficient and 0 < α p< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
Described correlation calculations unit, for calculating the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal according to the following equation:
DP(w,n)=|P(w+1,n)-P(w,n)|;
Described second tracking cell, for carrying out noise tracking to described Spectral correlation DP (w, n) according to the following equation, obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
NDP ( w , n ) = ( 1 - &beta; u ) DP ( w , n ) + &beta; u NDP ( w , n - 1 ) , DP ( w , n ) &GreaterEqual; NDP ( w , n - 1 ) ( 1 - &beta; d ) DP ( w , n ) + &beta; d NDP ( w , n - 1 ) , DP ( w , n ) < NDP ( w , n - 1 ) ;
Wherein, β u, β dfor predetermined coefficient and 0 < β d< β u< 1;
Described second binarization unit, obtains correlativity two-value spectrum XDb (w, n) for carrying out binary conversion treatment to the described Spectral correlation DP (w, n) of each frame voice signal according to the following equation:
XDb ( w , n ) = 1 , DP ( w , n ) - NDP ( w , n ) > T Db 0 , DP ( w , n ) - NDP ( w , n ) &le; T Db , T dbfor default Second Threshold;
Described second matching unit, for by KD corresponding for a wherein road voice signal athe KD that individual correlativity two-value spectrum is corresponding with another road voice signal bindividual correlativity two-value is composed the coupling of the coherence between carrying out between two and is obtained described second matching result, and described second matching result comprises matched position corresponding to one group of the highest coherence's two-value spectrum of matching degree and matching degree, KD a, KD bbe positive integer.
Optionally, described device also comprises:
Signal pre-processing module, for for each road voice signal, carries out pre-service to described voice signal and obtains pretreated voice signal, and described pre-service comprises noise reduction process, at least one of amplifying in process, high-pass filtering process, lifting sampling process;
Fourier transform module, for carrying out Short Time Fourier Transform to described pretreated voice signal.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:
The first matching result is obtained by carrying out analysis coupling to the Short Time Fourier Transform of two-way voice signal, and carry out analysis coupling by the Spectral correlation of the power spectrum to two-way voice signal and obtain the second matching result, then calculate the time delay between two-way voice signal in conjunction with the first matching result and the second matching result; Solve the problem that the accuracy of the delay time estimation method existence that correlation technique relates to is low; Respectively the matching analysis is carried out to two-way voice signal from Spectral correlation two angles of frequency domain distribution and power spectrum, and comprehensive two matching results determine final matching results, reach raising matching precision, improve the effect of time delay accuracy of estimation.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the method flow diagram of the delay time estimation method of the voice signal that one embodiment of the invention provides;
Fig. 2 is the method flow diagram of the delay time estimation method of the voice signal that another embodiment of the present invention provides;
Fig. 3 is the block diagram of the time delay estimation unit of the voice signal that one embodiment of the invention provides;
Fig. 4 is the block diagram of the time delay estimation unit of the voice signal that another embodiment of the present invention provides;
Fig. 5 is the structural representation of the electronic equipment that one embodiment of the invention provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Please refer to Fig. 1, it illustrates the method flow diagram of the delay time estimation method of the voice signal that one embodiment of the invention provides, the present embodiment is applied in the electronic equipment of mobile phone, panel computer, laptops or desk-top computer and so on this delay time estimation method and is illustrated.This delay time estimation method can comprise following several step:
Step 102, obtains two-way voice signal.
Step 104, carry out coherence's coupling according to the Short Time Fourier Transform of two-way voice signal to two-way voice signal and obtain the first matching result, this first matching result comprises the first matched position and first matching degree of two-way voice signal.
Step 106, carry out coherence's coupling according to the Spectral correlation of the power spectrum of two-way voice signal to two-way voice signal and obtain the second matching result, this second matching result comprises the second matched position and second matching degree of two-way voice signal.
Step 108, calculates the time delay between two-way voice signal according to the first matching result and the second matching result.
It should be noted is that: above-mentioned steps 106 can be carried out after step 104, also can carry out before step 104, or carry out with step 104 simultaneously, in the present embodiment, only be illustrated after step 104 with step 106, concrete restriction is not done to this.
In sum, the delay time estimation method of the voice signal that the present embodiment provides, the first matching result is obtained by carrying out analysis coupling to the Short Time Fourier Transform of two-way voice signal, and carry out analysis coupling by the Spectral correlation of the power spectrum to two-way voice signal and obtain the second matching result, then calculate the time delay between two-way voice signal in conjunction with the first matching result and the second matching result; Solve the problem that the accuracy of the delay time estimation method existence that correlation technique relates to is low; Respectively the matching analysis is carried out to two-way voice signal from Spectral correlation two angles of frequency domain distribution and power spectrum, and comprehensive two matching results determine final matching results, reach raising matching precision, improve the effect of time delay accuracy of estimation.
Please refer to Fig. 2, it illustrates the method flow diagram of the delay time estimation method of the voice signal that another embodiment of the present invention provides, the present embodiment is applied in the electronic equipment of mobile phone, panel computer, laptops or desk-top computer and so on this delay time estimation method and is illustrated.This delay time estimation method can comprise following several step:
Step 201, obtains two-way voice signal.
Two-way voice signal is the discrete signal in time domain.In the present embodiment, suppose that wherein a road voice signal A is x ran (), another road voice signal B is x rbn (), n represents the frame number in time domain, n ∈ [0, M-1], M>=2 and n, M are integer.
Step 202, for each road voice signal, carries out pre-service to voice signal and obtains pretreated voice signal.
Wherein, pre-service includes but not limited at least one in noise reduction process, amplification process, high-pass filtering process, lifting sampling process.Carrying out pretreated object to voice signal is to extract more accurate, reliable sound characteristic in subsequent process, to improve matching precision.
In the present embodiment, suppose to be pre-service F (*) to each road voice signal, then wherein the pre-processed results of a road voice signal A is x a(n)=F (x ra(n)), the pre-processed results of another road voice signal B is x b(n)=F (x rb(n)).
It should be noted is that: the present embodiment is only illustrated with the above-mentioned several pretreatment modes exemplified, in actual applications, other pretreatment mode can be adopted according to the actual requirements, concrete restriction is not done to this present embodiment.
Step 203, carries out Short Time Fourier Transform to pretreated voice signal.
STFT (Short Time Fourier Transform, Short Time Fourier Transform) process for adopt predetermined window function to pretreated voice signal windowing, and be FFT (Fast Fourier Transform, Fast Fourier Transform (FFT)), thus voice signal is transformed into frequency domain from time domain.Wherein, the selection of window function includes but not limited to hamming window and Caesar's window.The length of window function is the power of 2, such as 128,256 etc.
In the present embodiment, note STFT function is STFT (*), then wherein the STFT of a road voice signal A can be designated as X a(w, n)=STFT (x a(n)), the STFT of another road voice signal B can be designated as X b(w, n)=STFT (x b(n)), w represents the frequency sequence number on frequency domain, w ∈ [0, N-1], N>=2 and w, N are integer.
Step 204, carry out coherence's coupling according to the Short Time Fourier Transform of two-way voice signal to two-way voice signal and obtain the first matching result, this first matching result comprises the first matched position and first matching degree of two-way voice signal.
In the first matching way, from the frequency domain distribution angle of voice signal, two-way voice signal is mated.Specifically, this step can comprise following a few sub-steps:
The first, for each road voice signal, according to the following equation noise tracking is carried out to each the frame voice signal in voice signal, obtain the noise spectrum N (w, n) of each frame voice signal:
N ( w , n ) = ( 1 - &alpha; u ) | X ( w , n ) | + &alpha; u N ( w , n - 1 ) , | X ( w , n ) | &GreaterEqual; N ( w , n - 1 ) ( 1 - &alpha; d ) | X ( w , n ) | + &alpha; d N ( w , n - 1 ) , | X ( w , n ) | < N ( w , n - 1 ) ;
Wherein, X (w, n) represents the Short Time Fourier Transform of voice signal; α u, α dfor predetermined coefficient and 0 < α d< α u< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain.
The object of voice signal being carried out to noise tracking is to reduce the impact of the noise in voice signal on matching result as much as possible, to improve matching precision.In addition, by reasonable set predetermined coefficient α u, α dsize, in signal uphill process, use less factor alpha u, and larger factor alpha is used in signal decline process d, noise tracking effect can be improved.
In the present embodiment, the noise spectrum supposing the i-th frame voice signal in a wherein road voice signal A is N a(w, i), then:
N a ( w , i ) = ( 1 - &alpha; u ) | X a ( w , i ) | + &alpha; u N a ( w , i - 1 ) , | X a ( w , i ) | &GreaterEqual; N a ( w , i - 1 ) ( 1 - &alpha; d ) | X a ( w , i ) | + &alpha; d N a ( w , i - 1 ) , | X a ( w , i ) | < N a ( w , i - 1 ) ;
Wherein, X a(w, i) represents the Short Time Fourier Transform of the i-th frame voice signal in voice signal A, i>=0 and i is integer.
Similar, suppose that the noise spectrum of the jth frame voice signal in the voice signal B of another road is N b(w, j), then:
N b ( w , i ) = ( 1 - &alpha; u ) | X b ( w , j ) | + &alpha; u N b ( w , j - 1 ) , | X b ( w , j ) | &GreaterEqual; N b ( w , j - 1 ) ( 1 - &alpha; d ) | X b ( w , j ) | + &alpha; d N b ( w , j - 1 ) , | X b ( w , j ) | < N b ( w , j - 1 ) ;
Wherein, X b(w, j) represents the Short Time Fourier Transform of the jth frame voice signal in voice signal B, j>=0 and j is integer.
The second, according to the following equation binary conversion treatment is carried out to the Short Time Fourier Transform of each frame voice signal and obtain two-value spectrum Xb (w, n):
Xb ( w , n ) = 1 , | X ( w , n ) | - N ( w , n ) > T b 0 , | X ( w , n ) | - N ( w , n ) &le; T b , T bfor preset first threshold value.
After binary conversion treatment is carried out to the frequency spectrum of voice signal, the Short Time Fourier Transform of each frame voice signal all can be converted into length equal, the binary sequence that is made up of 0 and 1.Fully can improve data processing speed by binary conversion treatment, carry out the matching primitives of more efficient (robust) so that follow-up.
In the present embodiment, suppose that the two-value spectrum that the i-th frame voice signal in a wherein road voice signal A is corresponding is Xb a(w, i), then:
X b a ( w , i ) = 1 , | X a ( w , i ) | - N a ( w , i ) > T b 0 , | X a ( w , i ) | - N a ( w , i ) &le; T b ;
Similar, suppose that the two-value spectrum that jth frame voice signal in the voice signal B of another road is corresponding is Xb b(w, j), then:
X b b ( w , j ) = 1 , | X b ( w , j ) | - N b ( w , j ) > T b 0 , | X b ( w , j ) | - N b ( w , j ) &le; T b .
3rd, by K corresponding for a wherein road voice signal athe K that individual two-value spectrum is corresponding with another road voice signal bindividual two-value is composed the coupling of the coherence between carrying out between two and is obtained the first matching result, K a, K bbe positive integer.
Wherein, the first matching result comprises matched position corresponding to one group of the highest two-value spectrum of matching degree and matching degree.The two-value spectrum quantity corresponding as a wherein road voice signal A is K aand another road voice signal B corresponding two-value spectrum quantity be K btime, need to carry out K a× K bsecondary two-value spectrum coupling.For mating each time, record the matching degree between two two-value spectrums, and carry out the sequence number n of two two-value spectrums of mating a(i) and n b(j).Wherein, n ai () represents the sequence number of the two-value spectrum that the i-th frame voice signal wherein in a road voice signal A is corresponding, n a(i) ∈ [0, K a-1], K a>=2 and n a(i), K abe integer; n bj () represents the sequence number of the two-value spectrum that the jth frame voice signal in the voice signal B of another road is corresponding, n b(j) ∈ [0, K b-1], K b>=2 and n b(j), K bbe integer.
Such as, if by two-value corresponding for the i-th frame voice signal in wherein road voice signal A spectrum Xb athe two-value that (w, i) is corresponding with the jth frame voice signal in the voice signal B of another road composes Xb b(w, j) mates, then both matching degree Pb ijfor:
Wherein, ⊙ represents same or sign of operation.Matching degree Pb ijequal frequency that in two two-values spectrum, corresponding position binaryzation result is equal to quantity and frequency to the ratio of sum.
In a concrete example, suppose Xb a(w, i)={ 1,1,0,0,0,1,1,1}, Xb b(w, j)={ 0,1,1,1,0,1,1,1}, then as w=0, due to Xb a(0, i)=1, Xb b(0, j)=0, so Xb a(0, i) ⊙ Xb b(0, j)=0; As w=1, due to Xb a(1, i)=1, Xb b(1, j)=1, so Xb a(1, i) ⊙ Xb b(1, j)=1; The like, the matching degree Pb of above-mentioned two binary sequences can be calculated ij=5/8=0.625.
After calculating the matching degree between every two two-values spectrum, choose one group of two-value spectrum that matching degree is the highest, and record sequence number corresponding to this group two-value spectrum and matching degree.
In the present embodiment, suppose the i-th frame voice signal in a wherein road voice signal A corresponding n-th ai () individual two-value composes corresponding with the jth frame voice signal in the voice signal B of another road n-th bj the matching degree of () individual two-value spectrum is the highest, be designated as P 1.Such as, as i=1 and j=2 time,
Step 205, carry out coherence's coupling according to the Spectral correlation of the power spectrum of two-way voice signal to two-way voice signal and obtain the second matching result, this second matching result comprises the second matched position and second matching degree of two-way voice signal.
In the second matching way, from the Spectral correlation angle of power spectrum, two-way voice signal is mated.Specifically, this step can comprise following a few sub-steps:
The first, for each road voice signal, calculate the power spectrum P (w, n) of each the frame voice signal in voice signal according to the following equation:
P(w,n)=α pP(w,n-1)+(1-α p)|X(w,n)| 2
Wherein, X (w, n) represents the Short Time Fourier Transform of voice signal; α pfor predetermined coefficient and 0 < α p< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain.
In the present embodiment, the power spectrum supposing the i-th frame voice signal in a wherein road voice signal A is P a(w, i), then:
P a(w,i)=α pP a(w,i-1)+(1-α p)|X a(w,i)| 2
Wherein, X a(w, i) represents the Short Time Fourier Transform of the i-th frame voice signal in voice signal A, i>=0 and i is integer.
Similar, suppose that the power spectrum of the jth frame voice signal in the voice signal B of another road is P b(w, j), then:
P b(w,j)=α pP b(w,j-1)+(1-α p)|X b(w,j)| 2
Wherein, X b(w, j) represents the Short Time Fourier Transform of the jth frame voice signal in voice signal B, j>=0 and j is integer.
The second, calculate the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal according to the following equation:
DP(w,n)=|P(w+1,n)-P(w,n)|。
After calculating the power spectrum of voice signal, the energy of high frequency points is deducted the energy of low frequency, the Spectral correlation of rated output spectrum.
In the present embodiment, the Spectral correlation supposing the power spectrum of the i-th frame voice signal in a wherein road voice signal A is DP a(w, i), then:
DP a(w,i)=|P a(w+1,i)-P a(w,i)|;
Similar, suppose that the Spectral correlation of the power spectrum of the jth frame voice signal in the voice signal B of another road is DP b(w, j), then:
DP b(w,j)=|P b(w+1,j)-P b(w,j)|。
3rd, according to the following equation noise tracking is carried out to Spectral correlation DP (w, n), obtain the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
NDP ( w , n ) = ( 1 - &beta; u ) DP ( w , n ) + &beta; u NDP ( w , n - 1 ) , DP ( w , n ) &GreaterEqual; NDP ( w , n - 1 ) ( 1 - &beta; d ) DP ( w , n ) + &beta; d NDP ( w , n - 1 ) , DP ( w , n ) < NDP ( w , n - 1 ) ;
Wherein, β u, β dfor predetermined coefficient and 0 < β d< β u< 1.
The object of Spectral correlation being carried out to noise tracking is to prevent the noise fluctuated from bringing erroneous judgement to subsequent match, reduces noise as much as possible to the impact of matching result, to improve matching precision.In addition, by reasonable set predetermined coefficient β u, β dsize, noise tracking effect can be improved.
In the present embodiment, the Spectral correlation supposing the noise power spectrum of the i-th frame voice signal in a wherein road voice signal A is NDP a(w, i), then:
ND P a ( w , i ) = ( 1 - &beta; u ) D P a ( w , i ) + &beta; u ND P a ( w , i - 1 ) , D P a ( w , i ) &GreaterEqual; ND P a ( w , i - 1 ) ( 1 - &beta; d ) D P a ( w , i ) + &beta; d ND P a ( w , i - 1 ) , D P a ( w , i ) < ND P a ( w , i - 1 ) ;
Similar, suppose that the Spectral correlation of the noise power spectrum of the jth frame voice signal in the voice signal B of another road is NDP b(w, j), then:
ND P b ( w , j ) = ( 1 - &beta; u ) D P b ( w , j ) + &beta; u ND P b ( w , j - 1 ) , D P b ( w , j ) &GreaterEqual; ND P b ( w , j - 1 ) ( 1 - &beta; d ) D P b ( w , j ) + &beta; d ND P b ( w , j - 1 ) , D P b ( w , j ) < ND P b ( w , j - 1 ) .
4th, according to the following equation binary conversion treatment is carried out to the Spectral correlation DP (w, n) of each frame voice signal and obtain correlativity two-value spectrum XDb (w, n):
XDb ( w , n ) = 1 , DP ( w , n ) - NDP ( w , n ) > T Db 0 , DP ( w , n ) - NDP ( w , n ) &le; T Db , T dbfor default Second Threshold.
After carrying out binary conversion treatment to Spectral correlation, the Spectral correlation of each frame voice signal all can be converted into a length binary sequence that is equal, that be made up of 0 and 1.Fully can improve data processing speed by binary conversion treatment, carry out the matching primitives of more efficient (robust) so that follow-up.
In the present embodiment, suppose that the correlativity two-value spectrum that the i-th frame voice signal in a wherein road voice signal A is corresponding is XDb a(w, i), then:
XD b a ( w , i ) = 1 , D P a ( w , i ) - ND P a ( w , i ) > T Db 0 , D P a ( w , i ) - ND P a ( w , i ) &le; T Db ;
Similar, suppose that the correlativity two-value spectrum that jth frame voice signal in the voice signal B of another road is corresponding is XDb b(w, j), then:
XD b b ( w , j ) = 1 , D P b ( w , j ) - ND P b ( w , j ) > T Db 0 , D P b ( w , j ) - ND P b ( w , j ) &le; T Db .
5th, by KD corresponding for a wherein road voice signal athe KD that individual correlativity two-value spectrum is corresponding with another road voice signal bindividual correlativity two-value is composed the coupling of the coherence between carrying out between two and is obtained the second matching result, KD a, KD bbe positive integer.
Wherein, the second matching result comprises matched position corresponding to one group of the highest correlativity two-value spectrum of matching degree and matching degree.The correlativity two-value spectrum quantity corresponding as a wherein road voice signal A is KD aand another road voice signal B corresponding correlativity two-value spectrum quantity be KD btime, need to carry out KD a× KD bsecondary correlativity two-value spectrum coupling.For mating each time, record the matching degree between two correlativity two-value spectrums, and carry out the sequence number nd of two correlativity two-value spectrums of mating a(i) and nd b(j).Wherein, nd ai () represents the sequence number of the correlativity two-value spectrum that the i-th frame voice signal wherein in a road voice signal A is corresponding, nd a(i) ∈ [0, KD a-1], KD a>=2 and nd a(i), KD abe integer; Nd bj () represents the sequence number of the correlativity two-value spectrum that the jth frame voice signal in the voice signal B of another road is corresponding, nd b(j) ∈ [0, KD b-1], KD b>=2 and nd b(j), KD bbe integer.In addition, under normal conditions, KD awith the K in above-mentioned steps 204 aequal, KD bwith the K in above-mentioned steps 204 bequal.
Such as, if by correlativity two-value corresponding for the i-th frame voice signal in wherein road voice signal A spectrum XDb athe correlativity two-value that (w, i) is corresponding with the jth frame voice signal in the voice signal B of another road composes XDb b(w, j) mates, then both matching degree PDb ijfor:
Wherein, ⊙ represents same or sign of operation.Matching degree PDb ijequal data that in two correlativity two-values spectrum, corresponding position binaryzation result is equal to quantity and data to the ratio of sum.
After calculating the matching degree between every two correlativity two-values spectrum, choose one group of correlativity two-value spectrum that matching degree is the highest, and record sequence number corresponding to this group correlativity two-value spectrum and matching degree.
In the present embodiment, the n-th d that the i-th frame voice signal in a wherein road voice signal A is corresponding is supposed ai the n-th d that () individual correlativity two-value spectrum is corresponding with the jth frame voice signal in the voice signal B of another road bj the matching degree of () individual correlativity two-value spectrum is the highest, be designated as P 2.Such as, as i=1 and j=3 time,
Step 206, calculates the time delay between two-way voice signal according to the first matching result and the second matching result.
After calculating the first matching result and the second matching result, above-mentioned two matching results are comprehensively obtained final matching results, and then calculate the time delay between two-way voice signal according to final matching results.Specifically, this step can comprise following two sub-steps:
The first, for each road voice signal, adopt Weighted Average Algorithm to calculate final matched position to the first matched position and the second matched position, the weight of this Weighted Average Algorithm is determined according to the first matching degree and the second matching degree.
In a kind of possible embodiment, suppose that the first matching degree obtained from the frequency domain distribution angle calculation of voice signal is P 1, the second matching degree obtained from the Spectral correlation angle calculation of power spectrum is P 2, then the weight that the first matched position is corresponding is the weight that second matched position is corresponding is
The wherein final matched position nl of a road voice signal A afor:
nl a = P 1 P 1 + P 2 &times; n a + P 2 P 1 + P 2 &times; nd a ;
Wherein, n arepresent first matched position of this voice signal A, nd arepresent second matched position of this voice signal A.
Similar, the final matched position nl of another road voice signal B bfor:
nl b = P 1 P 1 + P 2 &times; n b + P 2 P 1 + P 2 &times; nd b ;
Wherein, n brepresent first matched position of this voice signal B, nd brepresent second matched position of this voice signal B.
The second, calculate the time delay between two-way voice signal according to the final matched position of two-way voice signal.
Calculate the time delay t between two-way voice signal according to the following equation:
t=k(nl a-nl b);
Wherein, k is time coefficient.
Time coefficient k can calculate according to sample frequency f, the sampled point quantity Num of STFT conversion and overlap coefficient η, time coefficient
In a concrete example, if use the FFT overlap 50% of the signal sampling 256 of 16KHz, so time coefficient k = 256 &times; 50 % 16000 = 8 ms .
It should be noted is that: above-mentioned steps 205 can be carried out after step 204, also can carry out before the step 204, or carry out with step 204 simultaneously, in the present embodiment, only be illustrated after step 204 with step 205, concrete restriction is not done to this.
In sum, the delay time estimation method of the voice signal that the present embodiment provides, the first matching result is obtained by carrying out analysis coupling to the Short Time Fourier Transform of two-way voice signal, and carry out analysis coupling by the Spectral correlation of the power spectrum to two-way voice signal and obtain the second matching result, then calculate the time delay between two-way voice signal in conjunction with the first matching result and the second matching result; Solve the problem that the accuracy of the delay time estimation method existence that correlation technique relates to is low; Respectively the matching analysis is carried out to two-way voice signal from Spectral correlation two angles of frequency domain distribution and power spectrum, and comprehensive two matching results determine final matching results, reach raising matching precision, improve the effect of time delay accuracy of estimation.
In addition, the delay time estimation method that the present embodiment provides, before carrying out matching primitives, by carrying out binary conversion treatment to data, substantially increasing matching efficiency, providing a kind of delay time estimation method of robust.
Following is apparatus of the present invention embodiment, may be used for performing the inventive method embodiment.For the details do not disclosed in apparatus of the present invention embodiment, please refer to the inventive method embodiment.
Please refer to Fig. 3, it illustrates the block diagram of the time delay estimation unit of the voice signal that one embodiment of the invention provides, this time delay estimation unit can realize becoming the some or all of of electronic equipment by software, hardware or both combinations.This time delay estimation unit can comprise: signal acquisition module 310, first matching module 320, second matching module 330 and time-delay calculation module 340.
Signal acquisition module 310, for obtaining two-way voice signal.
First matching module 320, obtain the first matching result for carrying out coherence's coupling according to the Short Time Fourier Transform of described two-way voice signal to described two-way voice signal, described first matching result comprises the first matched position and first matching degree of described two-way voice signal.
Second matching module 330, Spectral correlation for the power spectrum according to described two-way voice signal carries out coherence's coupling to described two-way voice signal and obtains the second matching result, and described second matching result comprises the second matched position and second matching degree of described two-way voice signal.
Time-delay calculation module 340, for calculating the time delay between described two-way voice signal according to described first matching result and described second matching result.
In sum, the time delay estimation unit of the voice signal that the present embodiment provides, the first matching result is obtained by carrying out analysis coupling to the Short Time Fourier Transform of two-way voice signal, and carry out analysis coupling by the Spectral correlation of the power spectrum to two-way voice signal and obtain the second matching result, then calculate the time delay between two-way voice signal in conjunction with the first matching result and the second matching result; Solve the problem that the accuracy of the delay time estimation method existence that correlation technique relates to is low; Respectively the matching analysis is carried out to two-way voice signal from Spectral correlation two angles of frequency domain distribution and power spectrum, and comprehensive two matching results determine final matching results, reach raising matching precision, improve the effect of time delay accuracy of estimation.
Please refer to Fig. 4, it illustrates the block diagram of the time delay estimation unit of the voice signal that another embodiment of the present invention provides, this time delay estimation unit can realize becoming the some or all of of electronic equipment by software, hardware or both combinations.This time delay estimation unit can comprise: signal acquisition module 310, first matching module 320, second matching module 330 and time-delay calculation module 340.
Signal acquisition module 310, for obtaining two-way voice signal.
First matching module 320, obtain the first matching result for carrying out coherence's coupling according to the Short Time Fourier Transform of described two-way voice signal to described two-way voice signal, described first matching result comprises the first matched position and first matching degree of described two-way voice signal.
Wherein, described first matching module 320, can comprise: the first tracking cell 320a, the first binarization unit 320b and the first matching unit 320c.
Described first tracking cell 320a, for for each road voice signal, carries out noise tracking to each the frame voice signal in described voice signal according to the following equation, obtains the noise spectrum N (w, n) of each frame voice signal:
N ( w , n ) = ( 1 - &alpha; u ) | X ( w , n ) | + &alpha; u N ( w , n - 1 ) , | X ( w , n ) | &GreaterEqual; N ( w , n - 1 ) ( 1 - &alpha; d ) | X ( w , n ) | + &alpha; d N ( w , n - 1 ) , | X ( w , n ) | < N ( w , n - 1 ) ;
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α u, α dfor predetermined coefficient and 0 < α d< α u< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain.
Described first binarization unit 320b, obtains two-value spectrum Xb (w, n) for carrying out binary conversion treatment to the frequency spectrum of each frame voice signal according to the following equation:
Xb ( w , n ) = 1 , | X ( w , n ) | - N ( w , n ) > T b 0 , | X ( w , n ) | - N ( w , n ) &le; T b , T bfor preset first threshold value.
Described first matching unit 320c, for by K corresponding for a wherein road voice signal athe K that individual two-value spectrum is corresponding with another road voice signal bindividual two-value is composed the coupling of the coherence between carrying out between two and is obtained described first matching result, and described first matching result comprises matched position corresponding to one group of the highest two-value spectrum of matching degree and matching degree, K a, K bbe positive integer.
Second matching module 330, Spectral correlation for the power spectrum according to described two-way voice signal carries out coherence's coupling to described two-way voice signal and obtains the second matching result, and described second matching result comprises the second matched position and second matching degree of described two-way voice signal.
Wherein, described second matching module 330, can comprise: spectra calculation unit 330a, correlation calculations unit 330b, the second tracking cell 330c, the second binarization unit 330d and the second matching unit 330e.
Described spectra calculation unit 330a, for for each road voice signal, calculates the power spectrum P (w, n) of each the frame voice signal in described voice signal according to the following equation:
P(w,n)=α pP(w,n-1)+(1-α p)|X(w,n)| 2
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α pfor predetermined coefficient and 0 < α p< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain.
Described correlation calculations unit 330b, for calculating the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal according to the following equation:
DP(w,n)=|P(w+1,n)-P(w,n)|。
Described second tracking cell 330c, for carrying out noise tracking to described Spectral correlation DP (w, n) according to the following equation, obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
NDP ( w , n ) = ( 1 - &beta; u ) DP ( w , n ) + &beta; u NDP ( w , n - 1 ) , DP ( w , n ) &GreaterEqual; NDP ( w , n - 1 ) ( 1 - &beta; d ) DP ( w , n ) + &beta; d NDP ( w , n - 1 ) , DP ( w , n ) < NDP ( w , n - 1 ) ;
Wherein, β u, β dfor predetermined coefficient and 0 < β d< β u< 1.
Described second binarization unit 330d, obtains correlativity two-value spectrum XDb (w, n) for carrying out binary conversion treatment to the described Spectral correlation DP (w, n) of each frame voice signal according to the following equation:
XDb ( w , n ) = 1 , DP ( w , n ) - NDP ( w , n ) > T Db 0 , DP ( w , n ) - NDP ( w , n ) &le; T Db , T dbfor default Second Threshold.
Described second matching unit 330e, for by KD corresponding for a wherein road voice signal athe KD that individual correlativity two-value spectrum is corresponding with another road voice signal bindividual correlativity two-value is composed the coupling of the coherence between carrying out between two and is obtained described second matching result, and described second matching result comprises matched position corresponding to one group of the highest coherence's two-value spectrum of matching degree and matching degree, KD a, KD bbe positive integer.
Time-delay calculation module 340, for calculating the time delay between described two-way voice signal according to described first matching result and described second matching result.
Wherein, described time-delay calculation module 340, can comprise: position calculation unit 340a and time-delay calculation unit 340b.
Described position calculation unit 340a, for for each road voice signal, adopt Weighted Average Algorithm to calculate final matched position to described first matched position and described second matched position, the weight of described Weighted Average Algorithm is determined according to described first matching degree and described second matching degree.
Described time-delay calculation unit 340b, for calculating the time delay between described two-way voice signal according to the described final matched position of described two-way voice signal.
Optionally, described device can also comprise: signal pre-processing module 312 and Fourier transform module 314.
Signal pre-processing module 312, for for each road voice signal, carry out pre-service to described voice signal and obtain pretreated voice signal, described pre-service comprises at least one in noise reduction process, amplification process, high-pass filtering process, lifting sampling process.
Fourier transform module 314, for carrying out Short Time Fourier Transform to described pretreated voice signal.
In sum, the time delay estimation unit of the voice signal that the present embodiment provides, the first matching result is obtained by carrying out analysis coupling to the Short Time Fourier Transform of two-way voice signal, and carry out analysis coupling by the Spectral correlation of the power spectrum to two-way voice signal and obtain the second matching result, then calculate the time delay between two-way voice signal in conjunction with the first matching result and the second matching result; Solve the problem that the accuracy of the delay time estimation method existence that correlation technique relates to is low; Respectively the matching analysis is carried out to two-way voice signal from Spectral correlation two angles of frequency domain distribution and power spectrum, and comprehensive two matching results determine final matching results, reach raising matching precision, improve the effect of time delay accuracy of estimation.
In addition, the time delay estimation unit that the present embodiment provides, before carrying out matching primitives, by carrying out binary conversion treatment to data, substantially increases matching efficiency.
It should be noted that: the time delay estimation unit of the voice signal that above-described embodiment provides is in the time time delay calculating two-way voice signal, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by equipment is divided into different functional modules, to complete all or part of function described above.In addition, the time delay estimation unit that above-described embodiment provides and the embodiment of the method for delay time estimation method belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
Please refer to Fig. 5, it illustrates the structural representation of the electronic equipment that one embodiment of the invention provides.This electronic equipment is for implementing the delay time estimation method of the voice signal provided in above-described embodiment.Specifically:
Electronic equipment 500 can comprise RF (Radio Frequency, radio frequency) circuit 510, the storer 520 including one or more computer-readable recording mediums, input block 530, display unit 540, sensor 550, voicefrequency circuit 560, WiFi (wireless fidelity, Wireless Fidelity) module 570, include the parts such as processor 580 and power supply 590 that more than or processes core.It will be understood by those skilled in the art that the electronic devices structure shown in Fig. 5 does not form the restriction to electronic equipment, the parts more more or less than diagram can be comprised, or combine some parts, or different parts are arranged.Wherein:
RF circuit 510 can be used for receiving and sending messages or in communication process, the reception of signal and transmission, especially, after being received by the downlink information of base station, transfer to more than one or one processor 580 to process; In addition, base station is sent to by relating to up data.Usually, RF circuit 510 includes but not limited to antenna, at least one amplifier, tuner, one or more oscillator, subscriber identity module (SIM) card, transceiver, coupling mechanism, LNA (Low Noise Amplifier, low noise amplifier), diplexer etc.In addition, RF circuit 510 can also by radio communication and network and other devices communicatings.Described radio communication can use arbitrary communication standard or agreement, include but not limited to GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access (WCDMA)), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc.
Storer 520 can be used for storing software program and module, and processor 580 is stored in software program and the module of storer 520 by running, thus performs the application of various function and data processing.Storer 520 mainly can comprise storage program district and store data field, and wherein, storage program district can store operating system, application program (such as sound-playing function, image player function etc.) etc. needed at least one function; Store data field and can store the data (such as voice data, phone directory etc.) etc. created according to the use of electronic equipment 500.In addition, storer 520 can comprise high-speed random access memory, can also comprise nonvolatile memory, such as at least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, storer 520 can also comprise Memory Controller, to provide the access of processor 580 and input block 530 pairs of storeies 520.
Input block 530 can be used for the numeral or the character information that receive input, and produces and to arrange with user and function controls relevant keyboard, mouse, control lever, optics or trace ball signal and inputs.Particularly, input block 530 can comprise image input device 531 and other input equipments 532.Image input device 531 can be camera, also can be photoelectric scanning device.Except image input device 531, input block 530 can also comprise other input equipments 532.Particularly, other input equipments 532 can include but not limited to one or more in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, control lever etc.
Display unit 540 can be used for the various graphical user interface showing information or the information being supplied to user and the electronic equipment 500 inputted by user, and these graphical user interface can be made up of figure, text, icon, video and its combination in any.Display unit 540 can comprise display panel 541, optionally, the form such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) can be adopted to configure display panel 541.
Electronic equipment 500 also can comprise at least one sensor 550, such as optical sensor, motion sensor and other sensors.Particularly, optical sensor can comprise ambient light sensor and proximity transducer, and wherein, ambient light sensor the light and shade of environmentally light can regulate the brightness of display panel 541, proximity transducer when electronic equipment 500 moves in one's ear, can cut out display panel 541 and/or backlight.As the one of motion sensor, Gravity accelerometer can detect the size of all directions (are generally three axles) acceleration, size and the direction of gravity can be detected time static, can be used for identifying the application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating) of mobile phone attitude, Vibration identification correlation function (such as passometer, knock) etc.; As for electronic equipment 500 also other sensors such as configurable gyroscope, barometer, hygrometer, thermometer, infrared ray sensor, do not repeat them here.
Voicefrequency circuit 560, loudspeaker 561, microphone 562 can provide the audio interface between user and electronic equipment 500.Voicefrequency circuit 560 can by receive voice data conversion after electric signal, be transferred to loudspeaker 561, by loudspeaker 561 be converted to voice signal export; On the other hand, the voice signal of collection is converted to electric signal by microphone 562, voice data is converted to after being received by voicefrequency circuit 560, after again voice data output processor 580 being processed, through RF circuit 510 to send to such as another electronic equipment, or export voice data to storer 520 to process further.Voicefrequency circuit 560 also may comprise earphone jack, to provide the communication of peripheral hardware earphone and electronic equipment 500.
WiFi belongs to short range wireless transmission technology, and electronic equipment 500 can help user to send and receive e-mail by WiFi module 570, browse webpage and access streaming video etc., and its broadband internet wireless for user provides is accessed.Although Fig. 5 shows WiFi module 570, be understandable that, it does not belong to must forming of electronic equipment 500, can omit in the scope of essence not changing invention as required completely.
Processor 580 is control centers of electronic equipment 500, utilize the various piece of various interface and the whole mobile phone of connection, software program in storer 520 and/or module is stored in by running or performing, and call the data be stored in storer 520, perform various function and the process data of electronic equipment 500, thus integral monitoring is carried out to mobile phone.Optionally, processor 580 can comprise one or more process core; Preferably, processor 580 accessible site application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.Be understandable that, above-mentioned modem processor also can not be integrated in processor 580.
Electronic equipment 500 also comprises the power supply 590 (such as battery) of powering to all parts, preferably, power supply can be connected with processor 580 logic by power-supply management system, thus realizes the functions such as management charging, electric discharge and power managed by power-supply management system.Power supply 590 can also comprise one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.
Although not shown, electronic equipment 500 can also comprise bluetooth module etc., does not repeat them here.
Specifically in the present embodiment, electronic equipment 500 also includes storer, and one or more than one program, one of them or more than one program are stored in storer, and are configured to be performed as above-mentioned Fig. 1 or delay time estimation method embodiment illustrated in fig. 2 by more than one or one processor.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a delay time estimation method for voice signal, is characterized in that, described method comprises:
Obtain two-way voice signal;
Carry out coherence's coupling according to the Short Time Fourier Transform of described two-way voice signal to described two-way voice signal and obtain the first matching result, described first matching result comprises the first matched position and first matching degree of described two-way voice signal;
Carry out coherence's coupling according to the Spectral correlation of the power spectrum of described two-way voice signal to described two-way voice signal and obtain the second matching result, described second matching result comprises the second matched position and second matching degree of described two-way voice signal;
The time delay between described two-way voice signal is calculated according to described first matching result and described second matching result.
2. method according to claim 1, is characterized in that, describedly calculates time delay between described two-way voice signal according to described first matching result and described second matching result, comprising:
For each road voice signal, adopt Weighted Average Algorithm to calculate final matched position to described first matched position and described second matched position, the weight of described Weighted Average Algorithm is determined according to described first matching degree and described second matching degree;
The time delay between described two-way voice signal is calculated according to the described final matched position of described two-way voice signal.
3. method according to claim 1, is characterized in that, the described Short Time Fourier Transform according to described two-way voice signal is carried out coherence's coupling to described two-way voice signal and obtained the first matching result, comprising:
For each road voice signal, according to the following equation noise tracking is carried out to each the frame voice signal in described voice signal, obtains the noise spectrum N (w, n) of each frame voice signal:
N ( w , n ) = ( 1 - &alpha; u ) | X ( w , n ) | + &alpha; u N ( w , n - 1 ) , | X ( w , n ) | &GreaterEqual; N ( w , n - 1 ) ( 1 - &alpha; d ) | X ( w , n ) | + &alpha; d N ( w , n - 1 ) , | X ( w , n ) | < N ( w , n - 1 ) ;
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α u, α dfor predetermined coefficient and 0 < α d< α u< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
According to the following equation binary conversion treatment is carried out to the Short Time Fourier Transform of each frame voice signal and obtains two-value spectrum Xb (w, n):
Xb ( w , n ) = 1 , | X ( w , n ) | - N ( w , n ) > T b 0 , | X ( w , n ) | N ( w , n ) &le; T b , T bfor preset first threshold value;
By K corresponding for a wherein road voice signal athe K that individual two-value spectrum is corresponding with another road voice signal bindividual two-value is composed the coupling of the coherence between carrying out between two and is obtained described first matching result, and described first matching result comprises matched position corresponding to one group of the highest two-value spectrum of matching degree and matching degree, K a, K bbe positive integer.
4. method according to claim 1, is characterized in that, the Spectral correlation of the described power spectrum according to described two-way voice signal carries out coherence's coupling to described two-way voice signal and obtains the second matching result, comprising:
For each road voice signal, calculate the power spectrum P (w, n) of each the frame voice signal in described voice signal according to the following equation:
P(w,n)=α pP(w,n-1)+(1-α p)|X(w,n)| 2
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α pfor predetermined coefficient and 0 < α p< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
Calculate the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal according to the following equation:
DP(w,n)=|P(w+1,n)-P(w,n)|;
According to the following equation noise tracking is carried out to described Spectral correlation DP (w, n), obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
NDP ( w , n ) = ( 1 - &beta; u ) DP ( w , n ) + &beta; u NDP ( w , n - 1 ) , DP ( w , n ) &GreaterEqual; NDP ( w , n - 1 ) ( 1 - &beta; d ) Dp ( w , n ) + &beta; d NDP ( w , n - 1 ) , DP ( w , n ) < NDP ( w , n - 1 ) ;
Wherein, β u, β dfor predetermined coefficient and 0 < β d< β u< 1;
According to the following equation binary conversion treatment is carried out to the described Spectral correlation DP (w, n) of each frame voice signal and obtains correlativity two-value spectrum XDb (w, n):
XDb ( w , n ) = 1 , DP ( w , n ) - NDP ( w , n ) > T Db 0 , DP ( w , n ) - NDP ( w , n ) &le; T Db , T dbfor default Second Threshold;
By KD corresponding for a wherein road voice signal athe KD that individual correlativity two-value spectrum is corresponding with another road voice signal bindividual correlativity two-value is composed the coupling of the coherence between carrying out between two and is obtained described second matching result, and described second matching result comprises matched position corresponding to one group of the highest coherence's two-value spectrum of matching degree and matching degree, KD a, KD bbe positive integer.
5. according to the arbitrary described method of Claims 1-4, it is characterized in that, the described Short Time Fourier Transform according to described two-way voice signal carries out before coherence's coupling obtains the first matching result, also comprising to described two-way voice signal:
For each road voice signal, carry out pre-service obtain pretreated voice signal to described voice signal, described pre-service comprises at least one in noise reduction process, amplification process, high-pass filtering process, lifting sampling process;
Short Time Fourier Transform is carried out to described pretreated voice signal.
6. a time delay estimation unit for voice signal, is characterized in that, described device comprises:
Signal acquisition module, for obtaining two-way voice signal;
First matching module, obtain the first matching result for carrying out coherence's coupling according to the Short Time Fourier Transform of described two-way voice signal to described two-way voice signal, described first matching result comprises the first matched position and first matching degree of described two-way voice signal;
Second matching module, Spectral correlation for the power spectrum according to described two-way voice signal carries out coherence's coupling to described two-way voice signal and obtains the second matching result, and described second matching result comprises the second matched position and second matching degree of described two-way voice signal;
Time-delay calculation module, for calculating the time delay between described two-way voice signal according to described first matching result and described second matching result.
7. device according to claim 6, is characterized in that, described time-delay calculation module, comprising: position calculation unit and time-delay calculation unit;
Described position calculation unit, for for each road voice signal, adopt Weighted Average Algorithm to calculate final matched position to described first matched position and described second matched position, the weight of described Weighted Average Algorithm is determined according to described first matching degree and described second matching degree;
Described time-delay calculation unit, for calculating the time delay between described two-way voice signal according to the described final matched position of described two-way voice signal.
8. device according to claim 6, is characterized in that, described first matching module, comprising: the first tracking cell, the first binarization unit and the first matching unit;
Described first tracking cell, for for each road voice signal, carries out noise tracking to each the frame voice signal in described voice signal according to the following equation, obtains the noise spectrum N of each frame voice signal b(w, n):
N b ( w , n ) = ( 1 - &alpha; u ) | X ( w , n ) | + &alpha; u N ( w , n - 1 ) , | X ( w , n ) | &GreaterEqual; N ( w , n - 1 ) ( 1 - &alpha; d ) | X ( w , n ) | + &alpha; d N ( w , n - 1 ) , | X ( w , n ) | < N ( w , n - 1 ) ;
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α u, α dfor predetermined coefficient and 0 < α d< α u< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
Described first binarization unit, obtains two-value spectrum Xb (w, n) for carrying out binary conversion treatment to the Short Time Fourier Transform of each frame voice signal according to the following equation:
Xb ( w , n ) = 1 , | X ( w , n ) | - N b ( w , n ) > T b 0 , | X ( w , n ) | - N ( w , n ) &le; T b , T bfor preset first threshold value;
Described first matching unit, for by K corresponding for a wherein road voice signal athe K that individual two-value spectrum is corresponding with another road voice signal bindividual two-value is composed the coupling of the coherence between carrying out between two and is obtained described first matching result, and described first matching result comprises matched position corresponding to one group of the highest two-value spectrum of matching degree and matching degree, K a, K bbe positive integer.
9. device according to claim 6, is characterized in that, described second matching module, comprising: spectra calculation unit, correlation calculations unit, the second tracking cell, the second binarization unit and the second matching unit;
Described spectra calculation unit, for for each road voice signal, calculates the power spectrum P (w, n) of each the frame voice signal in described voice signal according to the following equation:
P(w,n)=α pP(w,n-1)+(1-α p)|X(w,n)| 2
Wherein, X (w, n) represents the Short Time Fourier Transform of described voice signal; α pfor predetermined coefficient and 0 < α p< 1; W represents the frequency sequence number on frequency domain; N represents the frame number in time domain;
Described correlation calculations unit, for calculating the Spectral correlation DP (w, n) of the power spectrum of each frame voice signal according to the following equation:
DP(w,n)=|P(w+1,n)-P(w,n)|;
Described second tracking cell, for carrying out noise tracking to described Spectral correlation DP (w, n) according to the following equation, obtains the Spectral correlation NDP (w, n) of the noise power spectrum of each frame voice signal:
NDP ( w , n ) = ( 1 - &beta; u ) DP ( w , n ) + &beta; u NDP ( w , n - 1 ) , DP ( w , n ) &GreaterEqual; NDP ( w , n - 1 ) ( 1 - &beta; d ) DP ( w , n ) + &beta; d NDP ( w , n - 1 ) , DP ( w , n ) < NDP ( w , n - 1 ) ;
Wherein, β u, β dfor predetermined coefficient and 0 < β d< β u< 1;
Described second binarization unit, obtains correlativity two-value spectrum XDb (w, n) for carrying out binary conversion treatment to the described Spectral correlation DP (w, n) of each frame voice signal according to the following equation:
XDb ( w , n ) = 1 , DP ( w , n ) - NDP ( w , n ) > T Db 0 , DP ( w , n ) - NDP ( w , n ) &le; T Db , T dbfor default Second Threshold;
Described second matching unit, for by KD corresponding for a wherein road voice signal athe KD that individual correlativity two-value spectrum is corresponding with another road voice signal bindividual correlativity two-value is composed the coupling of the coherence between carrying out between two and is obtained described second matching result, and described second matching result comprises matched position corresponding to one group of the highest coherence's two-value spectrum of matching degree and matching degree, KD a, KD bbe positive integer.
10., according to the arbitrary described device of claim 6 to 9, it is characterized in that, described device also comprises:
Signal pre-processing module, for for each road voice signal, carries out pre-service to described voice signal and obtains pretreated voice signal, and described pre-service comprises noise reduction process, at least one of amplifying in process, high-pass filtering process, lifting sampling process;
Fourier transform module, for carrying out Short Time Fourier Transform to described pretreated voice signal.
CN201510083890.1A 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal Active CN104700842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510083890.1A CN104700842B (en) 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510083890.1A CN104700842B (en) 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal

Publications (2)

Publication Number Publication Date
CN104700842A true CN104700842A (en) 2015-06-10
CN104700842B CN104700842B (en) 2018-05-08

Family

ID=53347896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510083890.1A Active CN104700842B (en) 2015-02-13 2015-02-13 The delay time estimation method and device of voice signal

Country Status (1)

Country Link
CN (1) CN104700842B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469484A (en) * 2015-11-20 2016-04-06 宁波大业产品造型艺术设计有限公司 App intelligent home lock
CN105567434A (en) * 2015-12-31 2016-05-11 山东泰德新能源有限公司 Production apparatus of high cleanness biodiesel, and method thereof
CN105726130A (en) * 2016-01-26 2016-07-06 高玮 Maintenance prompter for medical equipment
CN105872275A (en) * 2016-03-22 2016-08-17 Tcl集团股份有限公司 Speech signal time delay estimation method and system used for echo cancellation
CN106057211A (en) * 2016-05-27 2016-10-26 广州多益网络股份有限公司 Signal matching method and device
CN106209491A (en) * 2016-06-16 2016-12-07 苏州科达科技股份有限公司 A kind of time delay detecting method and device
CN107745207A (en) * 2017-10-17 2018-03-02 桂林电子科技大学 A kind of three-dimensional welding robot mixing control method
CN107862917A (en) * 2017-10-27 2018-03-30 湖南城市学院 Application system and method for the form vocabulary test in children and adults' English teaching
CN107908211A (en) * 2017-11-14 2018-04-13 朱宪民 A kind of solar energy irrigation sprinkler water intaking pressurizing control system
CN107993501A (en) * 2017-12-04 2018-05-04 菏泽学院 A kind of human anatomy teaching system
CN108200526A (en) * 2017-12-29 2018-06-22 广州励丰文化科技股份有限公司 A kind of sound equipment adjustment method and device based on confidence level curve
CN108399946A (en) * 2018-03-05 2018-08-14 湖北省第三人民医院 A kind of nursing work load distribution assistance system
CN109388067A (en) * 2018-11-01 2019-02-26 长沙理工大学 A kind of intelligent home control system towards function
CN109451254A (en) * 2018-12-14 2019-03-08 广州市科虎电子有限公司 A kind of smart television digital receiver
CN110085259A (en) * 2019-05-07 2019-08-02 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110310661A (en) * 2019-07-03 2019-10-08 云南康木信科技有限责任公司 A kind of calculation method of two-way real-time broadcast audio delay and similarity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030093265A1 (en) * 2001-11-12 2003-05-15 Bo Xu Method and system of chinese speech pitch extraction
CN102606891A (en) * 2012-04-11 2012-07-25 广州东芝白云自动化系统有限公司 Water leakage detector, water leakage detecting system and water leakage detecting method
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030093265A1 (en) * 2001-11-12 2003-05-15 Bo Xu Method and system of chinese speech pitch extraction
CN102606891A (en) * 2012-04-11 2012-07-25 广州东芝白云自动化系统有限公司 Water leakage detector, water leakage detecting system and water leakage detecting method
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周氏青香: "听觉特性及噪声估计在语音增强算法中的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
袁文浩: "基于噪声轨迹的语音增强方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469484A (en) * 2015-11-20 2016-04-06 宁波大业产品造型艺术设计有限公司 App intelligent home lock
CN105567434A (en) * 2015-12-31 2016-05-11 山东泰德新能源有限公司 Production apparatus of high cleanness biodiesel, and method thereof
CN105726130A (en) * 2016-01-26 2016-07-06 高玮 Maintenance prompter for medical equipment
CN105872275A (en) * 2016-03-22 2016-08-17 Tcl集团股份有限公司 Speech signal time delay estimation method and system used for echo cancellation
CN105872275B (en) * 2016-03-22 2019-10-11 Tcl集团股份有限公司 A kind of speech signal time delay estimation method and system for echo cancellor
CN106057211A (en) * 2016-05-27 2016-10-26 广州多益网络股份有限公司 Signal matching method and device
CN106209491B (en) * 2016-06-16 2019-07-02 苏州科达科技股份有限公司 A kind of time delay detecting method and device
CN106209491A (en) * 2016-06-16 2016-12-07 苏州科达科技股份有限公司 A kind of time delay detecting method and device
CN107745207A (en) * 2017-10-17 2018-03-02 桂林电子科技大学 A kind of three-dimensional welding robot mixing control method
CN107862917A (en) * 2017-10-27 2018-03-30 湖南城市学院 Application system and method for the form vocabulary test in children and adults' English teaching
CN107908211A (en) * 2017-11-14 2018-04-13 朱宪民 A kind of solar energy irrigation sprinkler water intaking pressurizing control system
CN107993501A (en) * 2017-12-04 2018-05-04 菏泽学院 A kind of human anatomy teaching system
CN108200526A (en) * 2017-12-29 2018-06-22 广州励丰文化科技股份有限公司 A kind of sound equipment adjustment method and device based on confidence level curve
CN108200526B (en) * 2017-12-29 2020-09-22 广州励丰文化科技股份有限公司 Sound debugging method and device based on reliability curve
CN108399946A (en) * 2018-03-05 2018-08-14 湖北省第三人民医院 A kind of nursing work load distribution assistance system
CN109388067A (en) * 2018-11-01 2019-02-26 长沙理工大学 A kind of intelligent home control system towards function
CN109451254A (en) * 2018-12-14 2019-03-08 广州市科虎电子有限公司 A kind of smart television digital receiver
CN110085259A (en) * 2019-05-07 2019-08-02 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110085259B (en) * 2019-05-07 2021-09-17 国家广播电视总局中央广播电视发射二台 Audio comparison method, device and equipment
CN110310661A (en) * 2019-07-03 2019-10-08 云南康木信科技有限责任公司 A kind of calculation method of two-way real-time broadcast audio delay and similarity
CN110310661B (en) * 2019-07-03 2021-06-11 云南康木信科技有限责任公司 Method for calculating two-path real-time broadcast audio time delay and similarity

Also Published As

Publication number Publication date
CN104700842B (en) 2018-05-08

Similar Documents

Publication Publication Date Title
CN104700842A (en) Sound signal time delay estimation method and device
CN103578474B (en) A kind of sound control method, device and equipment
CN104601207B (en) A kind of data transmission method, user terminal and system
CN103605077B (en) Predetermined battery recognition methods, device and electronic equipment
CN104217717A (en) Language model constructing method and device
CN104869468A (en) Method and apparatus for displaying screen information
CN104834529A (en) Method and device for optimizing performance of application
CN104135728B (en) Method for connecting network and device
CN104717125A (en) Graphic code storage method and device
CN104657105A (en) Method and device for starting voice input function of terminal
CN104936128A (en) Off-line data transfer method, device and system
CN105243638A (en) Image uploading method and apparatus
CN104461597A (en) Starting control method and device for application program
CN104239343A (en) User input information processing method and device
CN105335653A (en) Abnormal data detection method and apparatus
CN104966086A (en) Living body identification method and apparatus
CN104134043B (en) Content is hidden, separates method, device and the terminal hidden
CN104216651A (en) Social information displaying method and device
CN103871050B (en) icon dividing method, device and terminal
CN105047185A (en) Method, device and system for obtaining audio frequency of accompaniment
CN104731782A (en) Information handling method and mobile terminal
CN104135396B (en) Show the method for network speed information, device and terminal
CN103336677B (en) A kind of methods, devices and systems to display device output image
CN107179596A (en) Focusing method and related product
CN104244448A (en) Method and system for reducing data transmission delay

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20161207

Address after: 511442 Guangzhou City, Guangdong Province, Panyu District, South Village, Huambo Road, No. 79, Huambo Business District Wanda Commercial Plaza, B-1 building, room, room 2705, room

Applicant after: Guangzhou Baiguoyuan Information Technology Co. Ltd.

Address before: 511442 Guangzhou City, Guangdong Province, Panyu District, South Village, Huambo, No. 79, No. two road, business district, Wanda Plaza, North building, B-1 floor, floor

Applicant before: All kinds of fruits garden, Guangzhou network technology company limited

GR01 Patent grant
GR01 Patent grant