CN108986832A - Ears speech dereverberation method and device based on voice probability of occurrence and consistency - Google Patents

Ears speech dereverberation method and device based on voice probability of occurrence and consistency Download PDF

Info

Publication number
CN108986832A
CN108986832A CN201810765266.3A CN201810765266A CN108986832A CN 108986832 A CN108986832 A CN 108986832A CN 201810765266 A CN201810765266 A CN 201810765266A CN 108986832 A CN108986832 A CN 108986832A
Authority
CN
China
Prior art keywords
voice
signal
reverberation
power spectrum
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810765266.3A
Other languages
Chinese (zh)
Other versions
CN108986832B (en
Inventor
刘宏
王秀玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN201810765266.3A priority Critical patent/CN108986832B/en
Publication of CN108986832A publication Critical patent/CN108986832A/en
Application granted granted Critical
Publication of CN108986832B publication Critical patent/CN108986832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Abstract

The present invention discloses a kind of ears speech dereverberation method and device based on voice probability of occurrence and consistency.This method comprises: 1) voice signal received two microphones carries out delay compensation, the voice signal after being aligned on the time is obtained;2) adding window sub-frame processing is carried out, and so that voice signal is transformed from the time domain to frequency domain by Fourier transformation;3) the reverberation power spectrum based on voice probability of occurrence estimation low-frequency range part;4) consistency of the unlike signal component of voice signal is calculated;5) the reverberation power spectrum based on Uniform estimates high band part;6) the reverberation power spectrum for combining low-and high-frequency is estimated according to the division threshold value of height frequency range;7) final reverberation power spectrum is calculated using recurrence smoothing algorithm;8) frequency-region signal after dereverberation is obtained by gain function;9) time-domain signal after dereverberation is obtained using Short-time Fourier inverse transformation.The present invention can effectively remove the reverberation on entire frequency band, improve speech perceptual quality.

Description

Ears speech dereverberation method and device based on voice probability of occurrence and consistency
Technical field
The invention belongs to Audio Signal Processings and computer audio technical field, and in particular to one kind is suitable in the presence of mixed The dual microphone speech dereverberation method and device under environment are rung, by the mould for calculating reverberation power spectrum based on voice probability of occurrence Type carries out the reverberation removal of low frequency part, is removed using voice consistency model to the reverberation of high frequency section, can be effective The reverberation on entire frequency band is removed, speech perceptual quality is improved.
Background technique
Binaural audio naturally has the advantage of many communications and multimedia experiences.In the daily interaction of person to person, the sense of hearing Perception is all one of interpersonal most effective most direct interactive mode.But in the actual environment, voice as person to person, The important information carrier that people is exchanged with machine is inevitably interfered by reverberation, ambient noise etc., so that voice is clear Clear degree, intelligibility and comfort level substantially reduce, and seriously affect the Auditory Perception of human ear and the performance of subsequent voice processing system.It is logical Often, since multipath transmisstion arrives when microphone can also receive sound-source signal by channel other than receiving the through part of sound source The reflection signal (signal that ground, wall, ceiling, household furnishing reflection in such as room generate) reached, acoustically prolongs The back wave that the slow time reaches about 50ms or more is known as echo, and the effect that remaining back wave in addition to direct sound wave generates is known as Reverberation phenomenon will have an impact the reception of desired voice signal.In order to offset the decline of the sound quality as caused by reverberation, scholar Propose dereverberation (or reverberation elimination) technology, it is intended to propose the quality and intelligibility of section voice.
Speech dereverbcration technology has to be widely applied very much.With the development of modern signal processing technology and intelligent subject, The intelligence degree of robot is constantly promoted, and robot in practical applications usually can be various in complicated acoustic enviroment The noise Deng Huishi robot of type is interfered when obtaining voice, under the discrimination meeting rapidly of voice under reverberant ambiance Drop, influences subsequent operation and the realization of function, or even be unable to satisfy practical application.Therefore, ears speech dereverbcration technology is utilized Reverberation is reduced to have great importance to the influence of robot in practical applications.For another example, ears speech dereverbcration technology can be with Pretreatment is provided for many voice process technologies, such as: binaural sound sources positioning, speech recognition.In addition, for another example, for having For the personage of dysaudia, it is often necessary to be exchanged by hearing-aid device or artificial cochlea.But under reverberant ambiance, hearing aid Hearing aid effect will receive very big influence.It is needed at this time using speech dereverbcration algorithm before impure voice signal is amplified It is pre-processed, reverb signal can be removed to a certain extent, to help hearing-impaired people preferably to exchange.
Speech dereverbcration technology can usually be divided according to single channel and multicenter voice enhancing.Single channel dereverberation algorithm Speech enhan-cement is carried out using single microphone, such method has been obtained and has been widely applied with its simple model and cheap cost With mature development.But since single-channel voice dereverberation algorithm can only inhibit mixed using the statistical property of single channel voice signal It rings.Multicenter voice dereverberation system uses multiple microphones, i.e. microphone array collected sound signal, obtains multiple signals. Due to the increase of input channel number, signal processing algorithm can use the correlation between each channel signal to carry out voice increasing By force.Compared to the limitation that single channel can only be enhanced using the difference of voice and reverberation on time-frequency domain, microphone array draws Enter the deficiency that can make up single-channel voice dereverberation algorithm.Usually, the quantity for increasing microphone can be improved voice and go The effect of reverberation.Compared to single microphone, the Time-Frequency Information of signal not only can use based on microphone array, it can be with signal Spatial information, widely paid close attention to.But the disadvantage is that structure size is huge, system-computed complexity and calculation amount are too big etc.. Comprehensively consider the cost of equipment, voice increases the real-time of method algorithm and the effect of algorithm, using double-channel pronunciation dereverberation, also It is that speech dereverbcrations are carried out using two microphones is a kind of relatively good compromise proposal.
The algorithm of dual microphone speech dereverbcration mainly has the side based on consistency model and based on binary channels Wiener filtering Method etc..Wherein, the algorithm based on consistency dereverberation is mainly different according to the consistency between clean speech and reverberation voice To design filter.This method assume clean speech part and reverberant part be it is incoherent, utilize clean speech, reverberation voice Voice consistency is received with microphone to estimate to receive the reverberation power in voice, passes through the reverberation power meter for estimating to obtain The gain for calculating filter, to obtain the voice after dereverberation.Double-channel pronunciation dereverberation method based on consistency is mainly wrapped Containing following steps:
1, voice input, pre-filtering, analog to digital conversion.The analoging sound signal of typing is first carried out pre-filtering, high-pass filtering Inhibit 50Hz power supply noise signal;Low-pass filtering filtering sound signal intermediate frequency rate component is more than the part of sample frequency half, is prevented Only aliasing interferes, and carries out sampling to analoging sound signal and quantization obtains digital signal.
2, preemphasis.Signal is by high emphasis filtering device impulse response, to compensate lip radiation bring high frequency attenuation.
3, framing, adding window.Due to the slow time-varying of voice signal, whole non-stationary, local stationary, it is considered that voice letter It number is that smoothly, voice signal can be carried out framing according to the length of 20ms in 10-30ms.Framing function are as follows:
xk(n)=w (n) s (Nk+n) n=0,1...N-1;K=0,1...L-1 (1)
Wherein N is frame length, and L is frame number, and s indicates voice signal.W (n) is window function, its selection (shape and length) is right The characteristic influence of short-time analysis parameter is very big, and common window function includes rectangular window, Hanning window and Hamming window etc..It is general to select the Chinese Bright window can react the characteristic variations of voice signal, Hamming window expression formula well are as follows:
4, reverberation power Spectral Estimation.Clean speech and reverberation voice consistency are obtained using the form that forefathers study when estimation It arrives, the defined formula that microphone receives voice consistency consistency calculates.
5, it calculates filter gain and double-channel signal is filtered.
6, filtered voice is transformed into time domain output with inverse Fourier transform.
Summary of the invention
The present invention proposes a kind of new ears speech dereverberation method and device, for improving the diamylose gram based on consistency Dereverberation effect of the wind dereverberation algorithm in low-frequency range part.
Traditional dual microphone dereverberation algorithm based on consistency assumes that reverberation is scattering acoustic field, has lower consistent Property, and consistency with higher between clean speech, therefore can be gone reverberation according to the height of consistency, but in low frequency Section, the consistency of reverberation voice is also higher, therefore in the less of the reverberation of low-frequency range removal.In addition, conventional method is calculating respectively Free field computation is used when the consistency of a voice parts, and for ears microphone situation, due to the presence of " Head shadow effect ", The consistency of each voice parts will receive the influence that head is blocked, and the form of free field is not applicable.For this two problem, originally Invention proposes the ears speech dereverberation method based on voice probability of occurrence and consistency.
The technical solution adopted by the invention is as follows:
A kind of ears speech dereverberation method based on voice probability of occurrence and consistency, mainly comprises the steps that
1) voice signal for receiving two microphones carries out delay compensation, obtains the voice letter after being aligned on the time Number;
2) adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and voice is made by Fourier transformation Signal transforms from the time domain to frequency domain;
3) the reverberation power spectrum of the low-frequency range part based on voice probability of occurrence estimated speech signal;
4) consistency of the unlike signal component of voice signal is calculated;
5) the reverberation power spectrum of the high band part based on the Uniform estimates voice signal;
6) according to the reverberation power spectrum of the reverberation power spectrum of the low-frequency range part and the high band part, according to height The division threshold value of frequency range estimates the reverberation power spectrum for combining low-and high-frequency;
7) according to the reverberation power spectrum of the combination low-and high-frequency, final reverberation function is calculated using recurrence smoothing algorithm Rate spectrum;
8) frequency domain according to the final reverberation spectra calculation gain function, after dereverberation is obtained by gain function Signal;
9) the time domain letter according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation Number.
Above steps is described as follows:
1) voice signal for receiving two microphones carries out delay compensation, obtains the voice after being aligned on the time.By There are the time differences when voice signal reaches two microphones, so needing to handle after signal alignment.Here base is used Time delay estimation, the mainly spectrum peak position by finding cross-correlation function are carried out in the GCC-PHAT- ρ γ method of broad sense cross-correlation And determine the ears time difference.This method can overcome the disturbing factors such as correlated noise, reverberation in environment to compose cross-correlation function The influence of peak position, compared with robust.
In time-domain, double-channel pronunciation model be can be described as:
xi(n)=si(n)+vi(n), (3)
Wherein, xi(n) voice signal that microphone receives, s are indicatedi(n) clean speech signal, v are indicatedi(n) it indicates to make an uproar Acoustical signal, wherein subscript i ∈ { l, r } represents first microphone signal and second microphone signal.
Using Short Time Fourier Transform, double-channel pronunciation model can indicate on frequency domain are as follows:
Xi(λ, μ)=Si(λ,μ)+Vi(λ,μ), (4)
Wherein, λ and μ respectively indicate frame number and frequency.Then, two cross-correlation functions for receiving voice can express Are as follows:
Wherein Δ τ is the time difference, and * expression takes complex conjugate, and ω indicates angular frequency.W (ω) represents frequency domain weighting functions,For sharpening the spectral peak of cross-correlation function, parameter ρ is the reverberation factors determined by signal-to-noise ratio, γ (ω) is that microphone receives the compatibility function (can be discussed in detail in step 4)) of voice, and the two be all can be according to ring Border automatic adjusument, G (ω) represents crosspower spectrum, G (ω)=Xl(ω)Xr *(ω).Then, time delay can pass through maximization Broad sense cross-correlation function obtains:
2) adding window framing pretreatment work is carried out to the voice after two alignment, and carries out Fourier transformation, make signal from Time domain transforms to frequency domain.
3) the reverberation power spectrum based on voice probability of occurrence estimation low-frequency range part.The step is by the reverberation power of low-frequency range Spectrum separately estimation, to guarantee also remove the reverberation of low-frequency range.For the voice in each channel, by phonetic speech power and reverberation Power be denoted as φ respectivelyss(λ, μ) and φvv(λ, μ), due to voice whether occur be it is uncertain, utilize least mean-square error Method obtain noise power spectrum E (| V |2| X), calculation formula are as follows:
E(|V|2| X)=P (H0|X)E(|V|2|X,H0)+P(H1|X)E(|V|2|X,H1) (7)
Wherein X and V respectively indicates the Discrete Fourier Transform of signal and reverb signal that microphone receives, H1Indicate language Sound, H0Indicate non-voice, P (H0| X) the expression non-probability of occurrence of voice, E (| V |2|X,H0) indicate mixed in the case of voice does not occur Ring power spectrum, P (H1| X) indicate voice occur probability, E (| V |2|X,H1) indicate voice occur in the case of reverberation power spectrum.
Define the mixed ratio of posteriority letter are as follows:
ξ=φssvv (8)
Voice probability of occurrence can be calculated with formula (9):
Wherein, ξoptIndicate the mixed ratio of optimal posteriority letter.Studies have shown that when real posteriori SNR-∞ to 20dB it Between, take 10log10(ξoptIt is minimum to calculate error for voice probability of occurrence when)=15dB.Calculate voice probability of occurrence P (H1| X) it Afterwards, the non-probability of occurrence P (H of voice0| X) it can be calculated with following formula:
P(H0| X)=1-P (H1|X) (10)
When voice does not occur, it is believed that the voice that microphone receives is exactly reverberation noise, therefore reverberation power spectrum can It is obtained with following formula:
E(|V|2|X,H0)=E (| V |2| V)=| V |2=| X |2 (11)
When voice occurs, reverberation power spectrum is calculated by the reverberation estimated result of previous frame:
WhereinFor the auto-power spectrum of estimated reverberation.Therefore, reverberation power spectrum E (| V |2| X) calculating can change It is written as:
It is smooth that interframe is carried out to reverberation power spectrum:
Wherein α is smoothing factor.
When the larger value in the voice probability of occurrence of two channels (i.e. two microphones) is lower than some threshold value, to reverberation Power spectrum is updated, and is not otherwise updated:
If 1) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl)<P(H1|Xr),
Then
If 2) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl) > P (H1|Xr),
Then
3) other,
Wherein, P (H1|Xl) indicate first microphone signal voice probability of occurrence, P (H1|Xr) indicate second Mike The voice probability of occurrence of wind number, p0Indicate threshold value.
Band reverberation voice signal low frequency part is based on this method and carries out reverberation power Spectral Estimation, is as a result denoted as
4) consistency of unlike signal component is calculated.Reverb signal and voice signal have obviously in the consistency of high frequency section Difference, therefore estimated in the reverberation of high frequency section using consistency.It is consistent between voice difference component firstly the need of calculating Property.The consistency for the voice that microphone receives can be calculated directly by the definition of consistency, the consistency between two signals It may be defined as on frequency domain:
WhereinWithIndicate signal x1And x2Auto-power spectrum,What is indicated is the crosspower spectrum of signal, using recurrence The method of average calculates:
Wherein, αPSDIt is smoothing factor, * indicates complex conjugate.
Reverberation voice generally assumes that for scattering acoustic field, scattering sound field is by countless incoherent signals with identical energy Caused by amount is propagated simultaneously in all directions, the consistency calculation method of ideal scattering sound field in conventional method are as follows:
Wherein, f indicates frequency, dmicIndicate that the distance between two microphones, c indicate the velocity of sound.But when two microphone positions At the left and right ear of the number of people, due to blocking for the number of people, scattering sound field consistency is more complicated.Therefore it is mentioned using M.Jeub et al. Out with the method for curve matching come the approximate model:
Wherein, ap, bpWith cpIt is constant, value is 2.3810 respectively-3, 1371,151.5, P indicate model order, value It is 3.
For clean speech, the consistency of voice is higher, it is assumed that reaches two microphones with angle, θ, then clean speech point Consistency between amount may be expressed as:
Wherein f indicates frequency, and c indicates the aerial spread speed of sound, dmicIndicate the distance of two microphones.
5) the reverberation power spectrum based on signal conformance estimation high band part.As it is assumed that reverberant field is diffusivity sound , so the noise signal power spectrum having the same that each microphone receivesIn view of Head shadow effect, The clean speech signal power spectrum difference that ears microphone receives cannot directly be ignored, and the power spectrum of clean speech signal can be with It indicates are as follows:
Wherein, HlAnd HrThe transmission function of left and right ear is respectively indicated, S indicates sound-source signal, SlIndicate that left side microphone receives The voice signal arrived, SrIndicate the voice signal that right side microphones receive.It can be obtained in conjunction with binaural signal coherent function γ:
To left and right ear clean speech signal slAnd sr, reverb signal vlAnd vrThe voice signal x received with microphonel And xrAuto-power spectrum and the relationship of crosspower spectrum may be expressed as:
As it is assumed that reverberation is uncorrelated to voice, convolution (23), (25), (26) can be obtained:
Definition and formula (28) in conjunction with ears coherence, can obtain:
Solution formula (29) can obtain reverberation power Spectral Estimation result φvv.It rewrites formula (29) are as follows:
It solves:
For theoretically, since voice signal consistency is stronger, reverb signal consistency is weaker, receives the consistent of signal Property be not more than clean speech signal consistency, thereforeTo it is believed that formula (31) have solution. In order to guarantee reverberation power spectrum φvvPositive number is taken, calculates reverberation power spectrum using formula (32):
Wherein auto-power spectrumWith crosspower spectrumEqually calculated using recursive average method.
High frequency section with reverberation voice is based on this method and carries out reverberation power Spectral Estimation, and estimated result is
Since there are a certain distance for theoretic signal conformance and actual signal consistency, and then influence the function of reverberation The result of rate Power estimation.In order to further increase the effect of estimation, the consistency of signal is updated here.
When the larger value in two voice probabilities of occurrence is lower than some threshold value, the voice signal that is received using microphone Consistency consistency obtained to reverb signal be updated:
If max (P (H1|Xl),P(H1|Xr))<p0
Then
When the smaller value in two voice probabilities of occurrence is higher than some threshold value, the voice signal that is received using microphone Consistency the consistency of clean speech signal is updated, can be obtained by formula (29):
If min (P (H1|Xl),P(H1|Xr)) > p1
Then
Wherein, p0、p1Indicate threshold value,Indicate the consistency of reverb signal, αγIndicate smoothing factor, γxlxrIndicate wheat The consistency between two voices that gram wind receives,Indicate the consistency of clean speech signal,Indicate two Mikes The crosspower spectrum for the voice that wind receives, φxlxlIndicate the auto-power spectrum for the voice that left microphone receives,Indicate right wheat The auto-power spectrum for the voice that gram wind receives, φvvIndicate the auto-power spectrum of reverb signal.Due to the reverberation power based on consistency Power estimation has only used square of clean speech signal consistency, so need to be only updated using formula (35).
6) the reverberation power Spectral Estimation for combining low-and high-frequency, as the small Mr. Yu's setting value μ of frequency μs(distinguish the frequency of low-and high-frequency Value) when, reverberation power spectrum isWhen frequency is greater than the threshold value μsWhen, reverberation power spectrum isThat is:
7) the reverberation power spectrum of the combination low-and high-frequency obtained according to step 6) estimation, utilizes existing recurrence smoothing algorithm meter Calculation obtains final reverberation power spectrum.
8) gain function is calculated.It is calculated after the power spectrum of reverb signal, can use the design of reverberation power spectrum and increase Beneficial function, the signal that microphone receives, which is multiplied by gain function, can be obtained the signal after dereverberation.Based on reverberation power Spectral Estimation Speech dereverbcration be often filtered with spectrum-subtraction.It is based on a simple principle: assuming that reverberation be postback noise, by from Microphone receive with the estimation for subtracting reverberation spectrum in reverberation speech manual, available pure voice signal spectrum.Gain letter Number is as follows:
Wherein,Indicate the reverberation power spectrum of estimation,Indicate that the microphone calculated receives voice letter Power frequency spectrum, ξ2Square of (λ) expression posteriori SNR.Subtract in order to avoid crossing, sets a lower bound Gmin.Language after dereverberation Sound signal indicates on frequency domain are as follows:
9) finally, utilizing the time-domain signal after the available dereverberation of Short-time Fourier inverse transformation
Accordingly with above method, the present invention also provides a kind of ears voices based on voice probability of occurrence and consistency to go Reverberation unit comprising:
Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains the time Shang pair Voice signal after neat;Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and passes through Fourier transformation Voice signal is set to transform from the time domain to frequency domain;
Low-frequency range reverberation power Spectral Estimation unit is responsible for the low-frequency range part based on voice probability of occurrence estimated speech signal Reverberation power spectrum;
High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal;It is based on The reverberation power spectrum of the high band part of the Uniform estimates voice signal;
In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for the reverberation power spectrum and institute according to the low-frequency range part The reverberation power spectrum for stating high band part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range Meter;
Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated using recurrence smoothing algorithm To final reverberation power spectrum;According to the final reverberation spectra calculation gain function, obtain mixing by gain function Frequency-region signal after sound;According to the frequency-region signal after the dereverberation, after obtaining dereverberation using Short-time Fourier inverse transformation Time-domain signal.
The beneficial effects of the present invention are:
The difference of consistency, adopts low-and high-frequency between the reverberation and clean speech that the present invention is received using two microphones Different reverberation power Spectral Estimations is taken, the model by calculating reverberation power spectrum based on voice probability of occurrence carries out low frequency part Reverberation removal, is removed the reverberation of high frequency section using voice consistency model, can effectively remove on entire frequency band Reverberation improves speech perceptual quality.
Detailed description of the invention
Fig. 1 is the flow diagram of the ears speech dereverberation method the present invention is based on voice probability of occurrence and consistency.
Fig. 2 is that true reverberation power and the method based on consistency dereverberation improve front and back to reverberation in the embodiment of the present invention The comparison diagram of power Spectral Estimation.
Fig. 3 (a)-Fig. 3 (c) is the voice signal polluted by reverberation respectively, mix based on consistency before improvement The sound spectrograph of voice after sound, and carry out the sound spectrograph of the voice after dereverberation after improving using voice probability of occurrence and consistency.
Specific embodiment
Below with reference to examples and drawings, the present invention is clearly and completely described.
The more authority and be using the widest in field of speech enhancement in the world of database used by the present embodiment One of general database.Clean speech is derived from TSP database, shares 80 voices for testing.The signal that microphone receives It is that clean speech signal obtains in the room impulse response convolution provided by Air (Aachen Impulse Response) database It arrives.Air impulse response data library is to be recorded by communication system research institute of Aachen, Germany polytechnical university using HMS2 human simulation foreman System, including the different types of scene such as office, meeting room, Conference Hall is ground for signal processing algorithm under reverberant ambiance Study carefully.Two microphones are located at the left and right ear of dummy head, about 0.17 meter of distance.
The present embodiment is existed using the ears speech dereverberation method based on voice probability of occurrence and consistency as shown in Figure 1 The evaluation of speech dereverbcration algorithm is carried out under different reverberation scenes.For the parameter in algorithm, specific setting is as shown in table 1.
The setting of 1 algorithm parameter of table
Parameter Value
Sample rate fs 16kHz
Frame length L 320
Frame moves M 160
Spectrum smoothing parameter alpha 50%
Subtraction factor β 0.85
Compose lower bound Gmin -10dB
Table 2 gives improve before only use consistency to carry out reverberation estimation and use after removing the method and improvement of reverberation Voice probability of occurrence and consistency come carry out reverberation estimation and remove speech perceptual quality (PESQ) that the method for reverberation obtains and The raising degree (Δ SRMR) of signal reverberation modulation ratio.It can be seen that occur based on voice from the comparison for the Δ SRMR for improving front and back The method of probability and consistency dereverberation can obviously remove more reverberation, thus can also obtain higher PESQ value.
2 noise power spectrum algorithm for estimating of table improves front and back noise power Power estimation log error
Reverberation scene Office Lecture Room Corridor Auditorium
Reverberation time 0.45s 0.85s 0.83s 5.16s
Initial p ESQ value 1.89 1.62 1.74 1.44
Before PESQ- is improved 2.19 1.78 1.92 1.61
After PESQ- is improved 2.42 2.00 2.07 1.78
Before Δ SRMR- is improved 1.05 1.11 1.19 0.90
After Δ SRMR- is improved 1.32 1.37 1.41 1.18
Fig. 2 is that reverberation scene is the power spectrum of true reverb signal and to make under conditions of office in the embodiment of the present invention With the reverberation power spectrum estimated based on Improvement of Consistency anterior-posterior approach.From Fig. 2 this it appears that with improved method Estimate that obtained power spectrum is closer to true reverberation power spectrum.
The effect of speech dereverbcration can be preferably observed using the sound spectrograph of voice signal after dereverberation.Fig. 3 (a)-Fig. 3 (c) example is given.Fig. 3 (a)-Fig. 3 (c) is the sound spectrograph of the voice signal after being polluted by reverberation respectively, before improvement The sound spectrograph of the voice after dereverberation is carried out based on consistency, and is made a return journey after improving using voice probability of occurrence and consistency mixed The sound spectrograph of voice after sound.From sound spectrograph as can be seen that carrying out the language obtained after speech dereverbcration using method of the invention The sound spectrograph of sound signal can remove more reverberation, especially in low frequency part.
Another embodiment of the present invention provides a kind of ears speech dereverbcration device based on voice probability of occurrence and consistency, Comprising:
Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains the time Shang pair Voice signal after neat;Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and passes through Fourier transformation Voice signal is set to transform from the time domain to frequency domain;
Low-frequency range reverberation power Spectral Estimation unit is responsible for the low-frequency range part based on voice probability of occurrence estimated speech signal Reverberation power spectrum;
High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal;It is based on The reverberation power spectrum of the high band part of the Uniform estimates voice signal;
In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for the reverberation power spectrum and institute according to the low-frequency range part The reverberation power spectrum for stating high band part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range Meter;
Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated using recurrence smoothing algorithm To final reverberation power spectrum;According to the final reverberation spectra calculation gain function, obtain mixing by gain function Frequency-region signal after sound;According to the frequency-region signal after the dereverberation, after obtaining dereverberation using Short-time Fourier inverse transformation Time-domain signal.
Examples detailed above is citing of the invention, although disclosing example of the invention, this field for the purpose of illustration Technical staff be understood that without departing from the spirit and scope of the invention and the appended claims, various replacements, variation It is all possible with modification.Therefore, the present invention should not be limited to the content of the example, and protection scope of the present invention should be wanted with right It asks subject to described in book.

Claims (10)

1. a kind of ears speech dereverberation method based on voice probability of occurrence and consistency, step include:
1) voice signal for receiving two microphones carries out delay compensation, obtains the voice signal after being aligned on the time;
2) adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and voice signal is made by Fourier transformation Transform from the time domain to frequency domain;
3) the reverberation power spectrum of the low-frequency range part based on voice probability of occurrence estimated speech signal;
4) consistency of the unlike signal component of voice signal is calculated;
5) the reverberation power spectrum of the high band part based on the Uniform estimates voice signal;
6) according to the reverberation power spectrum of the reverberation power spectrum of the low-frequency range part and the high band part, according to height frequency range Division threshold value to combine low-and high-frequency reverberation power spectrum estimate;
7) according to the reverberation power spectrum of the combination low-and high-frequency, final reverberation power is calculated using recurrence smoothing algorithm Spectrum;
8) the frequency domain letter according to the final reverberation spectra calculation gain function, after dereverberation is obtained by gain function Number;
9) time-domain signal according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation.
2. the method as described in claim 1, which is characterized in that two voice signals use the side γ GCC-PHAT- ρ in step 1) Method carries out delay compensation, to overcome influence of the disturbing factor in environment to cross-correlation function spectrum peak position.
3. the method as described in claim 1, which is characterized in that step 3) separately estimates the reverberation power spectrum of low-frequency range, with Guarantee the reverberation of removal low-frequency range.
4. method as claimed in claim 3, which is characterized in that in step 3) when in the voice probability of occurrence in two channels compared with When big value is lower than some threshold value, reverberation power spectrum is updated, is not otherwise updated;It is described that reverberation power spectrum is updated Method are as follows:
If 1) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl)<P(H1|Xr),
Then
If 2) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl) > P (H1|Xr),
Then
3) other,
Wherein, P (H1|Xl) indicate first microphone signal XlVoice probability of occurrence, P (H1|Xr) indicate second microphone Signal XrVoice probability of occurrence, p0Indicate threshold value, λ and μ respectively indicate frame number and frequency, H1Indicate voice, H0Indicate non-language Sound,For the auto-power spectrum of estimated reverberation.
5. the method as described in claim 1, which is characterized in that reverberation is assumed to be scattering sound field by step 4), and use has The reverberation consistency model that head is blocked calculates consistency.
6. the method as described in claim 1, which is characterized in that step 5) includes following sub-step:
5-1) according to the consistency of the voice probability of occurrence more new signal at all frequencies;
5-2) considering the influence of head occlusion effect, it is assumed that the clean speech signal power spectrum that two microphones receive is different, Reverberation power spectrum is estimated in conjunction with compatibility function.
7. method as claimed in claim 6, which is characterized in that the clean speech that two microphones receive in step 5) from Power spectrum and crosspower spectrum indicate are as follows:
Wherein, HlAnd HrThe transmission function of left and right ear is respectively indicated, S indicates sound-source signal,Indicate binaural signal coherent function, SlIndicate the voice signal that left side microphone receives, SrIndicate the voice signal that right side microphones receive.
8. the method for claim 7, which is characterized in that step 5-1) include:
A) consistency of reverberation voice is updated, i.e., when the larger value in two voice probabilities of occurrence is lower than some threshold value When, the consistency of the voice signal received using microphone obtains consistent update to reverb signal are as follows:
If max (P (H1|Xl),P(H1|Xr))<p0
Then
Wherein,Indicate the consistency of reverb signal, αγIndicate smoothing factor,Indicate two voices that microphone receives Between consistency, p0Indicate threshold value;
B) consistency of clean speech is updated, i.e., when the smaller value in two voice probabilities of occurrence is higher than some threshold value When, the consistent update of the consistency of the voice signal received using microphone to clean speech signal are as follows:
If min (P (H1|Xl),P(H1|Xr)) > p1
Then
Wherein,Indicate the consistency of clean speech signal,Indicate the crosspower spectrum for the voice that two microphones receive,Indicate the auto-power spectrum for the voice that left microphone receives,Indicate the auto-power spectrum for the voice that right microphone receives, φvvIndicate the auto-power spectrum of reverb signal, p1Indicate threshold value;
Step 5-2) estimation to reverberation power spectrum are as follows:
9. method according to claim 8, which is characterized in that the reverberation function for the combination low-and high-frequency that step 6) estimation obtains Rate spectrum are as follows:
Wherein, μ indicates a certain frequency, μsIndicate the frequency values of differentiation low-and high-frequency,It indicates to be based on voice probability of occurrence The reverberation power spectrum of the low-frequency range part of estimation;Indicate the reverberation function of the high band part based on Uniform estimates Rate spectrum.
10. a kind of ears speech dereverbcration device based on voice probability of occurrence and consistency characterized by comprising
Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains after being aligned on the time Voice signal;Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and language is made by Fourier transformation Sound signal transforms from the time domain to frequency domain;
Low-frequency range reverberation power Spectral Estimation unit is responsible for the mixed of the low-frequency range part based on voice probability of occurrence estimated speech signal Ring power spectrum;
High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal;Based on described The reverberation power spectrum of the high band part of Uniform estimates voice signal;
In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for reverberation power spectrum and the height according to the low-frequency range part The reverberation power spectrum of frequency range part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range;
Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated most using recurrence smoothing algorithm Whole reverberation power spectrum;According to the final reverberation spectra calculation gain function, after obtaining dereverberation by gain function Frequency-region signal;Time domain according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation Signal.
CN201810765266.3A 2018-07-12 2018-07-12 Binaural voice dereverberation method and device based on voice occurrence probability and consistency Active CN108986832B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810765266.3A CN108986832B (en) 2018-07-12 2018-07-12 Binaural voice dereverberation method and device based on voice occurrence probability and consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810765266.3A CN108986832B (en) 2018-07-12 2018-07-12 Binaural voice dereverberation method and device based on voice occurrence probability and consistency

Publications (2)

Publication Number Publication Date
CN108986832A true CN108986832A (en) 2018-12-11
CN108986832B CN108986832B (en) 2020-12-15

Family

ID=64537944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810765266.3A Active CN108986832B (en) 2018-07-12 2018-07-12 Binaural voice dereverberation method and device based on voice occurrence probability and consistency

Country Status (1)

Country Link
CN (1) CN108986832B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110012331A (en) * 2019-04-11 2019-07-12 杭州微纳科技股份有限公司 A kind of far field diamylose far field audio recognition method of infrared triggering
CN110095755A (en) * 2019-04-01 2019-08-06 北京云知声信息技术有限公司 A kind of sound localization method
CN110691296A (en) * 2019-11-27 2020-01-14 深圳市悦尔声学有限公司 Channel mapping method for built-in earphone of microphone
CN110718230A (en) * 2019-08-29 2020-01-21 云知声智能科技股份有限公司 Method and system for eliminating reverberation
CN111128213A (en) * 2019-12-10 2020-05-08 展讯通信(上海)有限公司 Noise suppression method and system for processing in different frequency bands
CN113613112A (en) * 2021-09-23 2021-11-05 三星半导体(中国)研究开发有限公司 Method and electronic device for suppressing wind noise of microphone
CN115831145A (en) * 2023-02-16 2023-03-21 之江实验室 Double-microphone speech enhancement method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243290A (en) * 2005-03-02 2006-09-14 Advanced Telecommunication Research Institute International Disturbance component suppressing device, computer program, and speech recognition system
WO2009151062A1 (en) * 2008-06-10 2009-12-17 ヤマハ株式会社 Acoustic echo canceller and acoustic echo cancel method
JP2011065128A (en) * 2009-08-20 2011-03-31 Mitsubishi Electric Corp Reverberation removing device
CN102347028A (en) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
JP2013044908A (en) * 2011-08-24 2013-03-04 Nippon Telegr & Teleph Corp <Ntt> Background sound suppressor, background sound suppression method and program
CN106297817A (en) * 2015-06-09 2017-01-04 中国科学院声学研究所 A kind of sound enhancement method based on binaural information
CN106971740A (en) * 2017-03-28 2017-07-21 吉林大学 Probability and the sound enhancement method of phase estimation are had based on voice

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243290A (en) * 2005-03-02 2006-09-14 Advanced Telecommunication Research Institute International Disturbance component suppressing device, computer program, and speech recognition system
WO2009151062A1 (en) * 2008-06-10 2009-12-17 ヤマハ株式会社 Acoustic echo canceller and acoustic echo cancel method
JP2011065128A (en) * 2009-08-20 2011-03-31 Mitsubishi Electric Corp Reverberation removing device
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN102347028A (en) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
JP2013044908A (en) * 2011-08-24 2013-03-04 Nippon Telegr & Teleph Corp <Ntt> Background sound suppressor, background sound suppression method and program
CN106297817A (en) * 2015-06-09 2017-01-04 中国科学院声学研究所 A kind of sound enhancement method based on binaural information
CN106971740A (en) * 2017-03-28 2017-07-21 吉林大学 Probability and the sound enhancement method of phase estimation are had based on voice

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHANG LONG ET AL.: "Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation", 《SPEECH COMMUNICATION》 *
陈建荣等: "基于麦克风阵列的混响消减处理", 《电声技术》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110095755B (en) * 2019-04-01 2021-03-12 云知声智能科技股份有限公司 Sound source positioning method
CN110095755A (en) * 2019-04-01 2019-08-06 北京云知声信息技术有限公司 A kind of sound localization method
CN110012331B (en) * 2019-04-11 2021-05-25 杭州微纳科技股份有限公司 Infrared-triggered far-field double-microphone far-field speech recognition method
CN110012331A (en) * 2019-04-11 2019-07-12 杭州微纳科技股份有限公司 A kind of far field diamylose far field audio recognition method of infrared triggering
CN110718230A (en) * 2019-08-29 2020-01-21 云知声智能科技股份有限公司 Method and system for eliminating reverberation
CN110718230B (en) * 2019-08-29 2021-12-17 云知声智能科技股份有限公司 Method and system for eliminating reverberation
CN110691296A (en) * 2019-11-27 2020-01-14 深圳市悦尔声学有限公司 Channel mapping method for built-in earphone of microphone
CN111128213A (en) * 2019-12-10 2020-05-08 展讯通信(上海)有限公司 Noise suppression method and system for processing in different frequency bands
WO2021114733A1 (en) * 2019-12-10 2021-06-17 展讯通信(上海)有限公司 Noise suppression method for processing at different frequency bands, and system thereof
CN111128213B (en) * 2019-12-10 2022-09-27 展讯通信(上海)有限公司 Noise suppression method and system for processing in different frequency bands
CN113613112A (en) * 2021-09-23 2021-11-05 三星半导体(中国)研究开发有限公司 Method and electronic device for suppressing wind noise of microphone
CN113613112B (en) * 2021-09-23 2024-03-29 三星半导体(中国)研究开发有限公司 Method for suppressing wind noise of microphone and electronic device
CN115831145A (en) * 2023-02-16 2023-03-21 之江实验室 Double-microphone speech enhancement method and system

Also Published As

Publication number Publication date
CN108986832B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN105869651B (en) Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
CN108986832A (en) Ears speech dereverberation method and device based on voice probability of occurrence and consistency
CN107479030B (en) Frequency division and improved generalized cross-correlation based binaural time delay estimation method
Jeub et al. Model-based dereverberation preserving binaural cues
CN111161751A (en) Distributed microphone pickup system and method under complex scene
US8880396B1 (en) Spectrum reconstruction for automatic speech recognition
US20140025374A1 (en) Speech enhancement to improve speech intelligibility and automatic speech recognition
US20130322643A1 (en) Multi-Microphone Robust Noise Suppression
CN110728989B (en) Binaural speech separation method based on long-time and short-time memory network L STM
CN105741849A (en) Voice enhancement method for fusing phase estimation and human ear hearing characteristics in digital hearing aid
Liu et al. A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers
Mosayyebpour et al. Single-microphone LP residual skewness-based inverse filtering of the room impulse response
Ren et al. A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement.
Aroudi et al. Cognitive-driven binaural LCMV beamformer using EEG-based auditory attention decoding
Sadjadi et al. Blind reverberation mitigation for robust speaker identification
CN115424627A (en) Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm
Pak et al. Multichannel speech reinforcement based on binaural unmasking
Feng et al. Preservation Of Interaural Level Difference Cue In A Deep Learning-Based Speech Separation System For Bilateral And Bimodal Cochlear Implants Users
CN113936687B (en) Method for real-time voice separation voice transcription
Hongo et al. Binaural speech enhancement method by wavelet transform based on interaural level and argument differences
Choi Speech processing system using a noise reduction neural network based on FFT spectrums
Unoki et al. Unified denoising and dereverberation method used in restoration of MTF-based power envelope
Yang et al. Environment-Aware Reconfigurable Noise Suppression
Hussain et al. A novel psychoacoustically motivated multichannel speech enhancement system
Chen et al. Early Reflections Based Speech Enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant