CN108986832A

CN108986832A - Ears speech dereverberation method and device based on voice probability of occurrence and consistency

Info

Publication number: CN108986832A
Application number: CN201810765266.3A
Authority: CN
Inventors: 刘宏; 王秀玲
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2018-12-11
Anticipated expiration: 2038-07-12
Also published as: CN108986832B

Abstract

The present invention discloses a kind of ears speech dereverberation method and device based on voice probability of occurrence and consistency.This method comprises: 1) voice signal received two microphones carries out delay compensation, the voice signal after being aligned on the time is obtained；2) adding window sub-frame processing is carried out, and so that voice signal is transformed from the time domain to frequency domain by Fourier transformation；3) the reverberation power spectrum based on voice probability of occurrence estimation low-frequency range part；4) consistency of the unlike signal component of voice signal is calculated；5) the reverberation power spectrum based on Uniform estimates high band part；6) the reverberation power spectrum for combining low-and high-frequency is estimated according to the division threshold value of height frequency range；7) final reverberation power spectrum is calculated using recurrence smoothing algorithm；8) frequency-region signal after dereverberation is obtained by gain function；9) time-domain signal after dereverberation is obtained using Short-time Fourier inverse transformation.The present invention can effectively remove the reverberation on entire frequency band, improve speech perceptual quality.

Description

Ears speech dereverberation method and device based on voice probability of occurrence and consistency

Technical field

The invention belongs to Audio Signal Processings and computer audio technical field, and in particular to one kind is suitable in the presence of mixed The dual microphone speech dereverberation method and device under environment are rung, by the mould for calculating reverberation power spectrum based on voice probability of occurrence Type carries out the reverberation removal of low frequency part, is removed using voice consistency model to the reverberation of high frequency section, can be effective The reverberation on entire frequency band is removed, speech perceptual quality is improved.

Background technique

Binaural audio naturally has the advantage of many communications and multimedia experiences.In the daily interaction of person to person, the sense of hearing Perception is all one of interpersonal most effective most direct interactive mode.But in the actual environment, voice as person to person, The important information carrier that people is exchanged with machine is inevitably interfered by reverberation, ambient noise etc., so that voice is clear Clear degree, intelligibility and comfort level substantially reduce, and seriously affect the Auditory Perception of human ear and the performance of subsequent voice processing system.It is logical Often, since multipath transmisstion arrives when microphone can also receive sound-source signal by channel other than receiving the through part of sound source The reflection signal (signal that ground, wall, ceiling, household furnishing reflection in such as room generate) reached, acoustically prolongs The back wave that the slow time reaches about 50ms or more is known as echo, and the effect that remaining back wave in addition to direct sound wave generates is known as Reverberation phenomenon will have an impact the reception of desired voice signal.In order to offset the decline of the sound quality as caused by reverberation, scholar Propose dereverberation (or reverberation elimination) technology, it is intended to propose the quality and intelligibility of section voice.

Speech dereverbcration technology has to be widely applied very much.With the development of modern signal processing technology and intelligent subject, The intelligence degree of robot is constantly promoted, and robot in practical applications usually can be various in complicated acoustic enviroment The noise Deng Huishi robot of type is interfered when obtaining voice, under the discrimination meeting rapidly of voice under reverberant ambiance Drop, influences subsequent operation and the realization of function, or even be unable to satisfy practical application.Therefore, ears speech dereverbcration technology is utilized Reverberation is reduced to have great importance to the influence of robot in practical applications.For another example, ears speech dereverbcration technology can be with Pretreatment is provided for many voice process technologies, such as: binaural sound sources positioning, speech recognition.In addition, for another example, for having For the personage of dysaudia, it is often necessary to be exchanged by hearing-aid device or artificial cochlea.But under reverberant ambiance, hearing aid Hearing aid effect will receive very big influence.It is needed at this time using speech dereverbcration algorithm before impure voice signal is amplified It is pre-processed, reverb signal can be removed to a certain extent, to help hearing-impaired people preferably to exchange.

Speech dereverbcration technology can usually be divided according to single channel and multicenter voice enhancing.Single channel dereverberation algorithm Speech enhan-cement is carried out using single microphone, such method has been obtained and has been widely applied with its simple model and cheap cost With mature development.But since single-channel voice dereverberation algorithm can only inhibit mixed using the statistical property of single channel voice signal It rings.Multicenter voice dereverberation system uses multiple microphones, i.e. microphone array collected sound signal, obtains multiple signals. Due to the increase of input channel number, signal processing algorithm can use the correlation between each channel signal to carry out voice increasing By force.Compared to the limitation that single channel can only be enhanced using the difference of voice and reverberation on time-frequency domain, microphone array draws Enter the deficiency that can make up single-channel voice dereverberation algorithm.Usually, the quantity for increasing microphone can be improved voice and go The effect of reverberation.Compared to single microphone, the Time-Frequency Information of signal not only can use based on microphone array, it can be with signal Spatial information, widely paid close attention to.But the disadvantage is that structure size is huge, system-computed complexity and calculation amount are too big etc.. Comprehensively consider the cost of equipment, voice increases the real-time of method algorithm and the effect of algorithm, using double-channel pronunciation dereverberation, also It is that speech dereverbcrations are carried out using two microphones is a kind of relatively good compromise proposal.

The algorithm of dual microphone speech dereverbcration mainly has the side based on consistency model and based on binary channels Wiener filtering Method etc..Wherein, the algorithm based on consistency dereverberation is mainly different according to the consistency between clean speech and reverberation voice To design filter.This method assume clean speech part and reverberant part be it is incoherent, utilize clean speech, reverberation voice Voice consistency is received with microphone to estimate to receive the reverberation power in voice, passes through the reverberation power meter for estimating to obtain The gain for calculating filter, to obtain the voice after dereverberation.Double-channel pronunciation dereverberation method based on consistency is mainly wrapped Containing following steps:

1, voice input, pre-filtering, analog to digital conversion.The analoging sound signal of typing is first carried out pre-filtering, high-pass filtering Inhibit 50Hz power supply noise signal；Low-pass filtering filtering sound signal intermediate frequency rate component is more than the part of sample frequency half, is prevented Only aliasing interferes, and carries out sampling to analoging sound signal and quantization obtains digital signal.

2, preemphasis.Signal is by high emphasis filtering device impulse response, to compensate lip radiation bring high frequency attenuation.

3, framing, adding window.Due to the slow time-varying of voice signal, whole non-stationary, local stationary, it is considered that voice letter It number is that smoothly, voice signal can be carried out framing according to the length of 20ms in 10-30ms.Framing function are as follows:

x_k(n)=w (n) s (Nk+n) n=0,1...N-1；K=0,1...L-1 (1)

Wherein N is frame length, and L is frame number, and s indicates voice signal.W (n) is window function, its selection (shape and length) is right The characteristic influence of short-time analysis parameter is very big, and common window function includes rectangular window, Hanning window and Hamming window etc..It is general to select the Chinese Bright window can react the characteristic variations of voice signal, Hamming window expression formula well are as follows:

4, reverberation power Spectral Estimation.Clean speech and reverberation voice consistency are obtained using the form that forefathers study when estimation It arrives, the defined formula that microphone receives voice consistency consistency calculates.

5, it calculates filter gain and double-channel signal is filtered.

6, filtered voice is transformed into time domain output with inverse Fourier transform.

Summary of the invention

The present invention proposes a kind of new ears speech dereverberation method and device, for improving the diamylose gram based on consistency Dereverberation effect of the wind dereverberation algorithm in low-frequency range part.

Traditional dual microphone dereverberation algorithm based on consistency assumes that reverberation is scattering acoustic field, has lower consistent Property, and consistency with higher between clean speech, therefore can be gone reverberation according to the height of consistency, but in low frequency Section, the consistency of reverberation voice is also higher, therefore in the less of the reverberation of low-frequency range removal.In addition, conventional method is calculating respectively Free field computation is used when the consistency of a voice parts, and for ears microphone situation, due to the presence of " Head shadow effect ", The consistency of each voice parts will receive the influence that head is blocked, and the form of free field is not applicable.For this two problem, originally Invention proposes the ears speech dereverberation method based on voice probability of occurrence and consistency.

The technical solution adopted by the invention is as follows:

A kind of ears speech dereverberation method based on voice probability of occurrence and consistency, mainly comprises the steps that

1) voice signal for receiving two microphones carries out delay compensation, obtains the voice letter after being aligned on the time Number；

2) adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and voice is made by Fourier transformation Signal transforms from the time domain to frequency domain；

3) the reverberation power spectrum of the low-frequency range part based on voice probability of occurrence estimated speech signal；

4) consistency of the unlike signal component of voice signal is calculated；

5) the reverberation power spectrum of the high band part based on the Uniform estimates voice signal；

6) according to the reverberation power spectrum of the reverberation power spectrum of the low-frequency range part and the high band part, according to height The division threshold value of frequency range estimates the reverberation power spectrum for combining low-and high-frequency；

7) according to the reverberation power spectrum of the combination low-and high-frequency, final reverberation function is calculated using recurrence smoothing algorithm Rate spectrum；

8) frequency domain according to the final reverberation spectra calculation gain function, after dereverberation is obtained by gain function Signal；

9) the time domain letter according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation Number.

Above steps is described as follows:

1) voice signal for receiving two microphones carries out delay compensation, obtains the voice after being aligned on the time.By There are the time differences when voice signal reaches two microphones, so needing to handle after signal alignment.Here base is used Time delay estimation, the mainly spectrum peak position by finding cross-correlation function are carried out in the GCC-PHAT- ρ γ method of broad sense cross-correlation And determine the ears time difference.This method can overcome the disturbing factors such as correlated noise, reverberation in environment to compose cross-correlation function The influence of peak position, compared with robust.

In time-domain, double-channel pronunciation model be can be described as:

x_i(n)=s_i(n)+v_i(n), (3)

Wherein, x_i(n) voice signal that microphone receives, s are indicated_i(n) clean speech signal, v are indicated_i(n) it indicates to make an uproar Acoustical signal, wherein subscript i ∈ { l, r } represents first microphone signal and second microphone signal.

Using Short Time Fourier Transform, double-channel pronunciation model can indicate on frequency domain are as follows:

X_i(λ, μ)=S_i(λ,μ)+V_i(λ,μ), (4)

Wherein, λ and μ respectively indicate frame number and frequency.Then, two cross-correlation functions for receiving voice can express Are as follows:

Wherein Δ τ is the time difference, and * expression takes complex conjugate, and ω indicates angular frequency.W (ω) represents frequency domain weighting functions,For sharpening the spectral peak of cross-correlation function, parameter ρ is the reverberation factors determined by signal-to-noise ratio, γ (ω) is that microphone receives the compatibility function (can be discussed in detail in step 4)) of voice, and the two be all can be according to ring Border automatic adjusument, G (ω) represents crosspower spectrum, G (ω)=X_l(ω)X_r ^*(ω).Then, time delay can pass through maximization Broad sense cross-correlation function obtains:

2) adding window framing pretreatment work is carried out to the voice after two alignment, and carries out Fourier transformation, make signal from Time domain transforms to frequency domain.

3) the reverberation power spectrum based on voice probability of occurrence estimation low-frequency range part.The step is by the reverberation power of low-frequency range Spectrum separately estimation, to guarantee also remove the reverberation of low-frequency range.For the voice in each channel, by phonetic speech power and reverberation Power be denoted as φ respectively_ss(λ, μ) and φ_vv(λ, μ), due to voice whether occur be it is uncertain, utilize least mean-square error Method obtain noise power spectrum E (| V |²| X), calculation formula are as follows:

E(|V|²| X)=P (H₀|X)E(|V|²|X,H₀)+P(H₁|X)E(|V|²|X,H₁) (7)

Wherein X and V respectively indicates the Discrete Fourier Transform of signal and reverb signal that microphone receives, H₁Indicate language Sound, H₀Indicate non-voice, P (H₀| X) the expression non-probability of occurrence of voice, E (| V |²|X,H₀) indicate mixed in the case of voice does not occur Ring power spectrum, P (H₁| X) indicate voice occur probability, E (| V |²|X,H₁) indicate voice occur in the case of reverberation power spectrum.

Define the mixed ratio of posteriority letter are as follows:

ξ=φ_ss/φ_vv (8)

Voice probability of occurrence can be calculated with formula (9):

Wherein, ξ_optIndicate the mixed ratio of optimal posteriority letter.Studies have shown that when real posteriori SNR-∞ to 20dB it Between, take 10log₁₀(ξo_ptIt is minimum to calculate error for voice probability of occurrence when)=15dB.Calculate voice probability of occurrence P (H₁| X) it Afterwards, the non-probability of occurrence P (H of voice₀| X) it can be calculated with following formula:

P(H₀| X)=1-P (H₁|X) (10)

When voice does not occur, it is believed that the voice that microphone receives is exactly reverberation noise, therefore reverberation power spectrum can It is obtained with following formula:

E(|V|²|X,H₀)=E (| V |²| V)=| V |²=| X |² (11)

When voice occurs, reverberation power spectrum is calculated by the reverberation estimated result of previous frame:

WhereinFor the auto-power spectrum of estimated reverberation.Therefore, reverberation power spectrum E (| V |²| X) calculating can change It is written as:

It is smooth that interframe is carried out to reverberation power spectrum:

Wherein α is smoothing factor.

When the larger value in the voice probability of occurrence of two channels (i.e. two microphones) is lower than some threshold value, to reverberation Power spectrum is updated, and is not otherwise updated:

If 1) max (P (H₁|X_l),P(H₁|Xr))<p₀, and P (H₁|X_l)<P(H₁|X_r),

Then

If 2) max (P (H₁|X_l),P(H₁|X_r))<p₀, and P (H₁|X_l) > P (H₁|X_r),

Then

3) other,

Wherein, P (H₁|X_l) indicate first microphone signal voice probability of occurrence, P (H₁|X_r) indicate second Mike The voice probability of occurrence of wind number, p₀Indicate threshold value.

Band reverberation voice signal low frequency part is based on this method and carries out reverberation power Spectral Estimation, is as a result denoted as

4) consistency of unlike signal component is calculated.Reverb signal and voice signal have obviously in the consistency of high frequency section Difference, therefore estimated in the reverberation of high frequency section using consistency.It is consistent between voice difference component firstly the need of calculating Property.The consistency for the voice that microphone receives can be calculated directly by the definition of consistency, the consistency between two signals It may be defined as on frequency domain:

WhereinWithIndicate signal x₁And x₂Auto-power spectrum,What is indicated is the crosspower spectrum of signal, using recurrence The method of average calculates:

Wherein, α_PSDIt is smoothing factor, * indicates complex conjugate.

Reverberation voice generally assumes that for scattering acoustic field, scattering sound field is by countless incoherent signals with identical energy Caused by amount is propagated simultaneously in all directions, the consistency calculation method of ideal scattering sound field in conventional method are as follows:

Wherein, f indicates frequency, d_micIndicate that the distance between two microphones, c indicate the velocity of sound.But when two microphone positions At the left and right ear of the number of people, due to blocking for the number of people, scattering sound field consistency is more complicated.Therefore it is mentioned using M.Jeub et al. Out with the method for curve matching come the approximate model:

Wherein, a_p, b_pWith c_pIt is constant, value is 2.3810 respectively^-3, 1371,151.5, P indicate model order, value It is 3.

For clean speech, the consistency of voice is higher, it is assumed that reaches two microphones with angle, θ, then clean speech point Consistency between amount may be expressed as:

Wherein f indicates frequency, and c indicates the aerial spread speed of sound, d_micIndicate the distance of two microphones.

5) the reverberation power spectrum based on signal conformance estimation high band part.As it is assumed that reverberant field is diffusivity sound , so the noise signal power spectrum having the same that each microphone receivesIn view of Head shadow effect, The clean speech signal power spectrum difference that ears microphone receives cannot directly be ignored, and the power spectrum of clean speech signal can be with It indicates are as follows:

Wherein, H_lAnd H_rThe transmission function of left and right ear is respectively indicated, S indicates sound-source signal, S_lIndicate that left side microphone receives The voice signal arrived, S_rIndicate the voice signal that right side microphones receive.It can be obtained in conjunction with binaural signal coherent function γ:

To left and right ear clean speech signal s_lAnd s_r, reverb signal v_lAnd v_rThe voice signal x received with microphone_l And x_rAuto-power spectrum and the relationship of crosspower spectrum may be expressed as:

As it is assumed that reverberation is uncorrelated to voice, convolution (23), (25), (26) can be obtained:

Definition and formula (28) in conjunction with ears coherence, can obtain:

Solution formula (29) can obtain reverberation power Spectral Estimation result φ_vv.It rewrites formula (29) are as follows:

It solves:

For theoretically, since voice signal consistency is stronger, reverb signal consistency is weaker, receives the consistent of signal Property be not more than clean speech signal consistency, thereforeTo it is believed that formula (31) have solution. In order to guarantee reverberation power spectrum φ_vvPositive number is taken, calculates reverberation power spectrum using formula (32):

Wherein auto-power spectrumWith crosspower spectrumEqually calculated using recursive average method.

High frequency section with reverberation voice is based on this method and carries out reverberation power Spectral Estimation, and estimated result is

Since there are a certain distance for theoretic signal conformance and actual signal consistency, and then influence the function of reverberation The result of rate Power estimation.In order to further increase the effect of estimation, the consistency of signal is updated here.

When the larger value in two voice probabilities of occurrence is lower than some threshold value, the voice signal that is received using microphone Consistency consistency obtained to reverb signal be updated:

If max (P (H₁|X_l),P(H₁|X_r))<p₀

Then

When the smaller value in two voice probabilities of occurrence is higher than some threshold value, the voice signal that is received using microphone Consistency the consistency of clean speech signal is updated, can be obtained by formula (29):

If min (P (H₁|X_l),P(H₁|X_r)) > p₁

Then

Wherein, p₀、p₁Indicate threshold value,Indicate the consistency of reverb signal, α_γIndicate smoothing factor, γ_xlxrIndicate wheat The consistency between two voices that gram wind receives,Indicate the consistency of clean speech signal,Indicate two Mikes The crosspower spectrum for the voice that wind receives, φ_xlxlIndicate the auto-power spectrum for the voice that left microphone receives,Indicate right wheat The auto-power spectrum for the voice that gram wind receives, φ_vvIndicate the auto-power spectrum of reverb signal.Due to the reverberation power based on consistency Power estimation has only used square of clean speech signal consistency, so need to be only updated using formula (35).

6) the reverberation power Spectral Estimation for combining low-and high-frequency, as the small Mr. Yu's setting value μ of frequency μ_s(distinguish the frequency of low-and high-frequency Value) when, reverberation power spectrum isWhen frequency is greater than the threshold value μ_sWhen, reverberation power spectrum isThat is:

7) the reverberation power spectrum of the combination low-and high-frequency obtained according to step 6) estimation, utilizes existing recurrence smoothing algorithm meter Calculation obtains final reverberation power spectrum.

8) gain function is calculated.It is calculated after the power spectrum of reverb signal, can use the design of reverberation power spectrum and increase Beneficial function, the signal that microphone receives, which is multiplied by gain function, can be obtained the signal after dereverberation.Based on reverberation power Spectral Estimation Speech dereverbcration be often filtered with spectrum-subtraction.It is based on a simple principle: assuming that reverberation be postback noise, by from Microphone receive with the estimation for subtracting reverberation spectrum in reverberation speech manual, available pure voice signal spectrum.Gain letter Number is as follows:

Wherein,Indicate the reverberation power spectrum of estimation,Indicate that the microphone calculated receives voice letter Power frequency spectrum, ξ²Square of (λ) expression posteriori SNR.Subtract in order to avoid crossing, sets a lower bound G_min.Language after dereverberation Sound signal indicates on frequency domain are as follows:

9) finally, utilizing the time-domain signal after the available dereverberation of Short-time Fourier inverse transformation

Accordingly with above method, the present invention also provides a kind of ears voices based on voice probability of occurrence and consistency to go Reverberation unit comprising:

Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains the time Shang pair Voice signal after neat；Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and passes through Fourier transformation Voice signal is set to transform from the time domain to frequency domain；

Low-frequency range reverberation power Spectral Estimation unit is responsible for the low-frequency range part based on voice probability of occurrence estimated speech signal Reverberation power spectrum；

High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal；It is based on The reverberation power spectrum of the high band part of the Uniform estimates voice signal；

In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for the reverberation power spectrum and institute according to the low-frequency range part The reverberation power spectrum for stating high band part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range Meter；

Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated using recurrence smoothing algorithm To final reverberation power spectrum；According to the final reverberation spectra calculation gain function, obtain mixing by gain function Frequency-region signal after sound；According to the frequency-region signal after the dereverberation, after obtaining dereverberation using Short-time Fourier inverse transformation Time-domain signal.

The beneficial effects of the present invention are:

The difference of consistency, adopts low-and high-frequency between the reverberation and clean speech that the present invention is received using two microphones Different reverberation power Spectral Estimations is taken, the model by calculating reverberation power spectrum based on voice probability of occurrence carries out low frequency part Reverberation removal, is removed the reverberation of high frequency section using voice consistency model, can effectively remove on entire frequency band Reverberation improves speech perceptual quality.

Detailed description of the invention

Fig. 1 is the flow diagram of the ears speech dereverberation method the present invention is based on voice probability of occurrence and consistency.

Fig. 2 is that true reverberation power and the method based on consistency dereverberation improve front and back to reverberation in the embodiment of the present invention The comparison diagram of power Spectral Estimation.

Fig. 3 (a)-Fig. 3 (c) is the voice signal polluted by reverberation respectively, mix based on consistency before improvement The sound spectrograph of voice after sound, and carry out the sound spectrograph of the voice after dereverberation after improving using voice probability of occurrence and consistency.

Specific embodiment

Below with reference to examples and drawings, the present invention is clearly and completely described.

The more authority and be using the widest in field of speech enhancement in the world of database used by the present embodiment One of general database.Clean speech is derived from TSP database, shares 80 voices for testing.The signal that microphone receives It is that clean speech signal obtains in the room impulse response convolution provided by Air (Aachen Impulse Response) database It arrives.Air impulse response data library is to be recorded by communication system research institute of Aachen, Germany polytechnical university using HMS2 human simulation foreman System, including the different types of scene such as office, meeting room, Conference Hall is ground for signal processing algorithm under reverberant ambiance Study carefully.Two microphones are located at the left and right ear of dummy head, about 0.17 meter of distance.

The present embodiment is existed using the ears speech dereverberation method based on voice probability of occurrence and consistency as shown in Figure 1 The evaluation of speech dereverbcration algorithm is carried out under different reverberation scenes.For the parameter in algorithm, specific setting is as shown in table 1.

The setting of 1 algorithm parameter of table

Parameter	Value
		Sample rate f_s	16kHz
Frame length L	320
		Frame moves M	160
Spectrum smoothing parameter alpha	50%
		Subtraction factor β	0.85
Compose lower bound G_min	-10dB

Table 2 gives improve before only use consistency to carry out reverberation estimation and use after removing the method and improvement of reverberation Voice probability of occurrence and consistency come carry out reverberation estimation and remove speech perceptual quality (PESQ) that the method for reverberation obtains and The raising degree (Δ SRMR) of signal reverberation modulation ratio.It can be seen that occur based on voice from the comparison for the Δ SRMR for improving front and back The method of probability and consistency dereverberation can obviously remove more reverberation, thus can also obtain higher PESQ value.

2 noise power spectrum algorithm for estimating of table improves front and back noise power Power estimation log error

Reverberation scene	Office	Lecture Room	Corridor	Auditorium
					Reverberation time	0.45s	0.85s	0.83s	5.16s
Initial p ESQ value	1.89	1.62	1.74	1.44
					Before PESQ- is improved	2.19	1.78	1.92	1.61
After PESQ- is improved	2.42	2.00	2.07	1.78
					Before Δ SRMR- is improved	1.05	1.11	1.19	0.90
After Δ SRMR- is improved	1.32	1.37	1.41	1.18

Fig. 2 is that reverberation scene is the power spectrum of true reverb signal and to make under conditions of office in the embodiment of the present invention With the reverberation power spectrum estimated based on Improvement of Consistency anterior-posterior approach.From Fig. 2 this it appears that with improved method Estimate that obtained power spectrum is closer to true reverberation power spectrum.

The effect of speech dereverbcration can be preferably observed using the sound spectrograph of voice signal after dereverberation.Fig. 3 (a)-Fig. 3 (c) example is given.Fig. 3 (a)-Fig. 3 (c) is the sound spectrograph of the voice signal after being polluted by reverberation respectively, before improvement The sound spectrograph of the voice after dereverberation is carried out based on consistency, and is made a return journey after improving using voice probability of occurrence and consistency mixed The sound spectrograph of voice after sound.From sound spectrograph as can be seen that carrying out the language obtained after speech dereverbcration using method of the invention The sound spectrograph of sound signal can remove more reverberation, especially in low frequency part.

Another embodiment of the present invention provides a kind of ears speech dereverbcration device based on voice probability of occurrence and consistency, Comprising:

Examples detailed above is citing of the invention, although disclosing example of the invention, this field for the purpose of illustration Technical staff be understood that without departing from the spirit and scope of the invention and the appended claims, various replacements, variation It is all possible with modification.Therefore, the present invention should not be limited to the content of the example, and protection scope of the present invention should be wanted with right It asks subject to described in book.

Claims

1. a kind of ears speech dereverberation method based on voice probability of occurrence and consistency, step include:

1) voice signal for receiving two microphones carries out delay compensation, obtains the voice signal after being aligned on the time；

2) adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and voice signal is made by Fourier transformation Transform from the time domain to frequency domain；

4) consistency of the unlike signal component of voice signal is calculated；

6) according to the reverberation power spectrum of the reverberation power spectrum of the low-frequency range part and the high band part, according to height frequency range Division threshold value to combine low-and high-frequency reverberation power spectrum estimate；

7) according to the reverberation power spectrum of the combination low-and high-frequency, final reverberation power is calculated using recurrence smoothing algorithm Spectrum；

8) the frequency domain letter according to the final reverberation spectra calculation gain function, after dereverberation is obtained by gain function Number；

9) time-domain signal according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation.

2. the method as described in claim 1, which is characterized in that two voice signals use the side γ GCC-PHAT- ρ in step 1) Method carries out delay compensation, to overcome influence of the disturbing factor in environment to cross-correlation function spectrum peak position.

3. the method as described in claim 1, which is characterized in that step 3) separately estimates the reverberation power spectrum of low-frequency range, with Guarantee the reverberation of removal low-frequency range.

4. method as claimed in claim 3, which is characterized in that in step 3) when in the voice probability of occurrence in two channels compared with When big value is lower than some threshold value, reverberation power spectrum is updated, is not otherwise updated；It is described that reverberation power spectrum is updated Method are as follows:

If 1) max (P (H₁|X_l),P(H₁|Xr))<p₀, and P (H₁|X_l)<P(H₁|X_r),

Then

If 2) max (P (H₁|X_l),P(H₁|X_r))<p₀, and P (H₁|X_l) > P (H₁|X_r),

Then

3) other,

Wherein, P (H₁|X_l) indicate first microphone signal X_lVoice probability of occurrence, P (H₁|X_r) indicate second microphone Signal X_rVoice probability of occurrence, p₀Indicate threshold value, λ and μ respectively indicate frame number and frequency, H₁Indicate voice, H₀Indicate non-language Sound,For the auto-power spectrum of estimated reverberation.

5. the method as described in claim 1, which is characterized in that reverberation is assumed to be scattering sound field by step 4), and use has The reverberation consistency model that head is blocked calculates consistency.

6. the method as described in claim 1, which is characterized in that step 5) includes following sub-step:

5-1) according to the consistency of the voice probability of occurrence more new signal at all frequencies；

5-2) considering the influence of head occlusion effect, it is assumed that the clean speech signal power spectrum that two microphones receive is different, Reverberation power spectrum is estimated in conjunction with compatibility function.

7. method as claimed in claim 6, which is characterized in that the clean speech that two microphones receive in step 5) from Power spectrum and crosspower spectrum indicate are as follows:

Wherein, H_lAnd H_rThe transmission function of left and right ear is respectively indicated, S indicates sound-source signal,Indicate binaural signal coherent function, S_lIndicate the voice signal that left side microphone receives, S_rIndicate the voice signal that right side microphones receive.

8. the method for claim 7, which is characterized in that step 5-1) include:

A) consistency of reverberation voice is updated, i.e., when the larger value in two voice probabilities of occurrence is lower than some threshold value When, the consistency of the voice signal received using microphone obtains consistent update to reverb signal are as follows:

If max (P (H₁|X_l),P(H₁|X_r))<p₀

Then

Wherein,Indicate the consistency of reverb signal, α_γIndicate smoothing factor,Indicate two voices that microphone receives Between consistency, p₀Indicate threshold value；

B) consistency of clean speech is updated, i.e., when the smaller value in two voice probabilities of occurrence is higher than some threshold value When, the consistent update of the consistency of the voice signal received using microphone to clean speech signal are as follows:

If min (P (H₁|X_l),P(H₁|X_r)) > p₁

Then

Wherein,Indicate the consistency of clean speech signal,Indicate the crosspower spectrum for the voice that two microphones receive,Indicate the auto-power spectrum for the voice that left microphone receives,Indicate the auto-power spectrum for the voice that right microphone receives, φ_vvIndicate the auto-power spectrum of reverb signal, p₁Indicate threshold value；

Step 5-2) estimation to reverberation power spectrum are as follows:

9. method according to claim 8, which is characterized in that the reverberation function for the combination low-and high-frequency that step 6) estimation obtains Rate spectrum are as follows:

Wherein, μ indicates a certain frequency, μ_sIndicate the frequency values of differentiation low-and high-frequency,It indicates to be based on voice probability of occurrence The reverberation power spectrum of the low-frequency range part of estimation；Indicate the reverberation function of the high band part based on Uniform estimates Rate spectrum.

10. a kind of ears speech dereverbcration device based on voice probability of occurrence and consistency characterized by comprising

Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains after being aligned on the time Voice signal；Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and language is made by Fourier transformation Sound signal transforms from the time domain to frequency domain；

Low-frequency range reverberation power Spectral Estimation unit is responsible for the mixed of the low-frequency range part based on voice probability of occurrence estimated speech signal Ring power spectrum；

High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal；Based on described The reverberation power spectrum of the high band part of Uniform estimates voice signal；

In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for reverberation power spectrum and the height according to the low-frequency range part The reverberation power spectrum of frequency range part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range；

Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated most using recurrence smoothing algorithm Whole reverberation power spectrum；According to the final reverberation spectra calculation gain function, after obtaining dereverberation by gain function Frequency-region signal；Time domain according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation Signal.