CN108986832A - Ears speech dereverberation method and device based on voice probability of occurrence and consistency - Google Patents
Ears speech dereverberation method and device based on voice probability of occurrence and consistency Download PDFInfo
- Publication number
- CN108986832A CN108986832A CN201810765266.3A CN201810765266A CN108986832A CN 108986832 A CN108986832 A CN 108986832A CN 201810765266 A CN201810765266 A CN 201810765266A CN 108986832 A CN108986832 A CN 108986832A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- reverberation
- power spectrum
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
Abstract
The present invention discloses a kind of ears speech dereverberation method and device based on voice probability of occurrence and consistency.This method comprises: 1) voice signal received two microphones carries out delay compensation, the voice signal after being aligned on the time is obtained;2) adding window sub-frame processing is carried out, and so that voice signal is transformed from the time domain to frequency domain by Fourier transformation;3) the reverberation power spectrum based on voice probability of occurrence estimation low-frequency range part;4) consistency of the unlike signal component of voice signal is calculated;5) the reverberation power spectrum based on Uniform estimates high band part;6) the reverberation power spectrum for combining low-and high-frequency is estimated according to the division threshold value of height frequency range;7) final reverberation power spectrum is calculated using recurrence smoothing algorithm;8) frequency-region signal after dereverberation is obtained by gain function;9) time-domain signal after dereverberation is obtained using Short-time Fourier inverse transformation.The present invention can effectively remove the reverberation on entire frequency band, improve speech perceptual quality.
Description
Technical field
The invention belongs to Audio Signal Processings and computer audio technical field, and in particular to one kind is suitable in the presence of mixed
The dual microphone speech dereverberation method and device under environment are rung, by the mould for calculating reverberation power spectrum based on voice probability of occurrence
Type carries out the reverberation removal of low frequency part, is removed using voice consistency model to the reverberation of high frequency section, can be effective
The reverberation on entire frequency band is removed, speech perceptual quality is improved.
Background technique
Binaural audio naturally has the advantage of many communications and multimedia experiences.In the daily interaction of person to person, the sense of hearing
Perception is all one of interpersonal most effective most direct interactive mode.But in the actual environment, voice as person to person,
The important information carrier that people is exchanged with machine is inevitably interfered by reverberation, ambient noise etc., so that voice is clear
Clear degree, intelligibility and comfort level substantially reduce, and seriously affect the Auditory Perception of human ear and the performance of subsequent voice processing system.It is logical
Often, since multipath transmisstion arrives when microphone can also receive sound-source signal by channel other than receiving the through part of sound source
The reflection signal (signal that ground, wall, ceiling, household furnishing reflection in such as room generate) reached, acoustically prolongs
The back wave that the slow time reaches about 50ms or more is known as echo, and the effect that remaining back wave in addition to direct sound wave generates is known as
Reverberation phenomenon will have an impact the reception of desired voice signal.In order to offset the decline of the sound quality as caused by reverberation, scholar
Propose dereverberation (or reverberation elimination) technology, it is intended to propose the quality and intelligibility of section voice.
Speech dereverbcration technology has to be widely applied very much.With the development of modern signal processing technology and intelligent subject,
The intelligence degree of robot is constantly promoted, and robot in practical applications usually can be various in complicated acoustic enviroment
The noise Deng Huishi robot of type is interfered when obtaining voice, under the discrimination meeting rapidly of voice under reverberant ambiance
Drop, influences subsequent operation and the realization of function, or even be unable to satisfy practical application.Therefore, ears speech dereverbcration technology is utilized
Reverberation is reduced to have great importance to the influence of robot in practical applications.For another example, ears speech dereverbcration technology can be with
Pretreatment is provided for many voice process technologies, such as: binaural sound sources positioning, speech recognition.In addition, for another example, for having
For the personage of dysaudia, it is often necessary to be exchanged by hearing-aid device or artificial cochlea.But under reverberant ambiance, hearing aid
Hearing aid effect will receive very big influence.It is needed at this time using speech dereverbcration algorithm before impure voice signal is amplified
It is pre-processed, reverb signal can be removed to a certain extent, to help hearing-impaired people preferably to exchange.
Speech dereverbcration technology can usually be divided according to single channel and multicenter voice enhancing.Single channel dereverberation algorithm
Speech enhan-cement is carried out using single microphone, such method has been obtained and has been widely applied with its simple model and cheap cost
With mature development.But since single-channel voice dereverberation algorithm can only inhibit mixed using the statistical property of single channel voice signal
It rings.Multicenter voice dereverberation system uses multiple microphones, i.e. microphone array collected sound signal, obtains multiple signals.
Due to the increase of input channel number, signal processing algorithm can use the correlation between each channel signal to carry out voice increasing
By force.Compared to the limitation that single channel can only be enhanced using the difference of voice and reverberation on time-frequency domain, microphone array draws
Enter the deficiency that can make up single-channel voice dereverberation algorithm.Usually, the quantity for increasing microphone can be improved voice and go
The effect of reverberation.Compared to single microphone, the Time-Frequency Information of signal not only can use based on microphone array, it can be with signal
Spatial information, widely paid close attention to.But the disadvantage is that structure size is huge, system-computed complexity and calculation amount are too big etc..
Comprehensively consider the cost of equipment, voice increases the real-time of method algorithm and the effect of algorithm, using double-channel pronunciation dereverberation, also
It is that speech dereverbcrations are carried out using two microphones is a kind of relatively good compromise proposal.
The algorithm of dual microphone speech dereverbcration mainly has the side based on consistency model and based on binary channels Wiener filtering
Method etc..Wherein, the algorithm based on consistency dereverberation is mainly different according to the consistency between clean speech and reverberation voice
To design filter.This method assume clean speech part and reverberant part be it is incoherent, utilize clean speech, reverberation voice
Voice consistency is received with microphone to estimate to receive the reverberation power in voice, passes through the reverberation power meter for estimating to obtain
The gain for calculating filter, to obtain the voice after dereverberation.Double-channel pronunciation dereverberation method based on consistency is mainly wrapped
Containing following steps:
1, voice input, pre-filtering, analog to digital conversion.The analoging sound signal of typing is first carried out pre-filtering, high-pass filtering
Inhibit 50Hz power supply noise signal;Low-pass filtering filtering sound signal intermediate frequency rate component is more than the part of sample frequency half, is prevented
Only aliasing interferes, and carries out sampling to analoging sound signal and quantization obtains digital signal.
2, preemphasis.Signal is by high emphasis filtering device impulse response, to compensate lip radiation bring high frequency attenuation.
3, framing, adding window.Due to the slow time-varying of voice signal, whole non-stationary, local stationary, it is considered that voice letter
It number is that smoothly, voice signal can be carried out framing according to the length of 20ms in 10-30ms.Framing function are as follows:
xk(n)=w (n) s (Nk+n) n=0,1...N-1;K=0,1...L-1 (1)
Wherein N is frame length, and L is frame number, and s indicates voice signal.W (n) is window function, its selection (shape and length) is right
The characteristic influence of short-time analysis parameter is very big, and common window function includes rectangular window, Hanning window and Hamming window etc..It is general to select the Chinese
Bright window can react the characteristic variations of voice signal, Hamming window expression formula well are as follows:
4, reverberation power Spectral Estimation.Clean speech and reverberation voice consistency are obtained using the form that forefathers study when estimation
It arrives, the defined formula that microphone receives voice consistency consistency calculates.
5, it calculates filter gain and double-channel signal is filtered.
6, filtered voice is transformed into time domain output with inverse Fourier transform.
Summary of the invention
The present invention proposes a kind of new ears speech dereverberation method and device, for improving the diamylose gram based on consistency
Dereverberation effect of the wind dereverberation algorithm in low-frequency range part.
Traditional dual microphone dereverberation algorithm based on consistency assumes that reverberation is scattering acoustic field, has lower consistent
Property, and consistency with higher between clean speech, therefore can be gone reverberation according to the height of consistency, but in low frequency
Section, the consistency of reverberation voice is also higher, therefore in the less of the reverberation of low-frequency range removal.In addition, conventional method is calculating respectively
Free field computation is used when the consistency of a voice parts, and for ears microphone situation, due to the presence of " Head shadow effect ",
The consistency of each voice parts will receive the influence that head is blocked, and the form of free field is not applicable.For this two problem, originally
Invention proposes the ears speech dereverberation method based on voice probability of occurrence and consistency.
The technical solution adopted by the invention is as follows:
A kind of ears speech dereverberation method based on voice probability of occurrence and consistency, mainly comprises the steps that
1) voice signal for receiving two microphones carries out delay compensation, obtains the voice letter after being aligned on the time
Number;
2) adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and voice is made by Fourier transformation
Signal transforms from the time domain to frequency domain;
3) the reverberation power spectrum of the low-frequency range part based on voice probability of occurrence estimated speech signal;
4) consistency of the unlike signal component of voice signal is calculated;
5) the reverberation power spectrum of the high band part based on the Uniform estimates voice signal;
6) according to the reverberation power spectrum of the reverberation power spectrum of the low-frequency range part and the high band part, according to height
The division threshold value of frequency range estimates the reverberation power spectrum for combining low-and high-frequency;
7) according to the reverberation power spectrum of the combination low-and high-frequency, final reverberation function is calculated using recurrence smoothing algorithm
Rate spectrum;
8) frequency domain according to the final reverberation spectra calculation gain function, after dereverberation is obtained by gain function
Signal;
9) the time domain letter according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation
Number.
Above steps is described as follows:
1) voice signal for receiving two microphones carries out delay compensation, obtains the voice after being aligned on the time.By
There are the time differences when voice signal reaches two microphones, so needing to handle after signal alignment.Here base is used
Time delay estimation, the mainly spectrum peak position by finding cross-correlation function are carried out in the GCC-PHAT- ρ γ method of broad sense cross-correlation
And determine the ears time difference.This method can overcome the disturbing factors such as correlated noise, reverberation in environment to compose cross-correlation function
The influence of peak position, compared with robust.
In time-domain, double-channel pronunciation model be can be described as:
xi(n)=si(n)+vi(n), (3)
Wherein, xi(n) voice signal that microphone receives, s are indicatedi(n) clean speech signal, v are indicatedi(n) it indicates to make an uproar
Acoustical signal, wherein subscript i ∈ { l, r } represents first microphone signal and second microphone signal.
Using Short Time Fourier Transform, double-channel pronunciation model can indicate on frequency domain are as follows:
Xi(λ, μ)=Si(λ,μ)+Vi(λ,μ), (4)
Wherein, λ and μ respectively indicate frame number and frequency.Then, two cross-correlation functions for receiving voice can express
Are as follows:
Wherein Δ τ is the time difference, and * expression takes complex conjugate, and ω indicates angular frequency.W (ω) represents frequency domain weighting functions,For sharpening the spectral peak of cross-correlation function, parameter ρ is the reverberation factors determined by signal-to-noise ratio, γ
(ω) is that microphone receives the compatibility function (can be discussed in detail in step 4)) of voice, and the two be all can be according to ring
Border automatic adjusument, G (ω) represents crosspower spectrum, G (ω)=Xl(ω)Xr *(ω).Then, time delay can pass through maximization
Broad sense cross-correlation function obtains:
2) adding window framing pretreatment work is carried out to the voice after two alignment, and carries out Fourier transformation, make signal from
Time domain transforms to frequency domain.
3) the reverberation power spectrum based on voice probability of occurrence estimation low-frequency range part.The step is by the reverberation power of low-frequency range
Spectrum separately estimation, to guarantee also remove the reverberation of low-frequency range.For the voice in each channel, by phonetic speech power and reverberation
Power be denoted as φ respectivelyss(λ, μ) and φvv(λ, μ), due to voice whether occur be it is uncertain, utilize least mean-square error
Method obtain noise power spectrum E (| V |2| X), calculation formula are as follows:
E(|V|2| X)=P (H0|X)E(|V|2|X,H0)+P(H1|X)E(|V|2|X,H1) (7)
Wherein X and V respectively indicates the Discrete Fourier Transform of signal and reverb signal that microphone receives, H1Indicate language
Sound, H0Indicate non-voice, P (H0| X) the expression non-probability of occurrence of voice, E (| V |2|X,H0) indicate mixed in the case of voice does not occur
Ring power spectrum, P (H1| X) indicate voice occur probability, E (| V |2|X,H1) indicate voice occur in the case of reverberation power spectrum.
Define the mixed ratio of posteriority letter are as follows:
ξ=φss/φvv (8)
Voice probability of occurrence can be calculated with formula (9):
Wherein, ξoptIndicate the mixed ratio of optimal posteriority letter.Studies have shown that when real posteriori SNR-∞ to 20dB it
Between, take 10log10(ξoptIt is minimum to calculate error for voice probability of occurrence when)=15dB.Calculate voice probability of occurrence P (H1| X) it
Afterwards, the non-probability of occurrence P (H of voice0| X) it can be calculated with following formula:
P(H0| X)=1-P (H1|X) (10)
When voice does not occur, it is believed that the voice that microphone receives is exactly reverberation noise, therefore reverberation power spectrum can
It is obtained with following formula:
E(|V|2|X,H0)=E (| V |2| V)=| V |2=| X |2 (11)
When voice occurs, reverberation power spectrum is calculated by the reverberation estimated result of previous frame:
WhereinFor the auto-power spectrum of estimated reverberation.Therefore, reverberation power spectrum E (| V |2| X) calculating can change
It is written as:
It is smooth that interframe is carried out to reverberation power spectrum:
Wherein α is smoothing factor.
When the larger value in the voice probability of occurrence of two channels (i.e. two microphones) is lower than some threshold value, to reverberation
Power spectrum is updated, and is not otherwise updated:
If 1) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl)<P(H1|Xr),
Then
If 2) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl) > P (H1|Xr),
Then
3) other,
Wherein, P (H1|Xl) indicate first microphone signal voice probability of occurrence, P (H1|Xr) indicate second Mike
The voice probability of occurrence of wind number, p0Indicate threshold value.
Band reverberation voice signal low frequency part is based on this method and carries out reverberation power Spectral Estimation, is as a result denoted as
4) consistency of unlike signal component is calculated.Reverb signal and voice signal have obviously in the consistency of high frequency section
Difference, therefore estimated in the reverberation of high frequency section using consistency.It is consistent between voice difference component firstly the need of calculating
Property.The consistency for the voice that microphone receives can be calculated directly by the definition of consistency, the consistency between two signals
It may be defined as on frequency domain:
WhereinWithIndicate signal x1And x2Auto-power spectrum,What is indicated is the crosspower spectrum of signal, using recurrence
The method of average calculates:
Wherein, αPSDIt is smoothing factor, * indicates complex conjugate.
Reverberation voice generally assumes that for scattering acoustic field, scattering sound field is by countless incoherent signals with identical energy
Caused by amount is propagated simultaneously in all directions, the consistency calculation method of ideal scattering sound field in conventional method are as follows:
Wherein, f indicates frequency, dmicIndicate that the distance between two microphones, c indicate the velocity of sound.But when two microphone positions
At the left and right ear of the number of people, due to blocking for the number of people, scattering sound field consistency is more complicated.Therefore it is mentioned using M.Jeub et al.
Out with the method for curve matching come the approximate model:
Wherein, ap, bpWith cpIt is constant, value is 2.3810 respectively-3, 1371,151.5, P indicate model order, value
It is 3.
For clean speech, the consistency of voice is higher, it is assumed that reaches two microphones with angle, θ, then clean speech point
Consistency between amount may be expressed as:
Wherein f indicates frequency, and c indicates the aerial spread speed of sound, dmicIndicate the distance of two microphones.
5) the reverberation power spectrum based on signal conformance estimation high band part.As it is assumed that reverberant field is diffusivity sound
, so the noise signal power spectrum having the same that each microphone receivesIn view of Head shadow effect,
The clean speech signal power spectrum difference that ears microphone receives cannot directly be ignored, and the power spectrum of clean speech signal can be with
It indicates are as follows:
Wherein, HlAnd HrThe transmission function of left and right ear is respectively indicated, S indicates sound-source signal, SlIndicate that left side microphone receives
The voice signal arrived, SrIndicate the voice signal that right side microphones receive.It can be obtained in conjunction with binaural signal coherent function γ:
To left and right ear clean speech signal slAnd sr, reverb signal vlAnd vrThe voice signal x received with microphonel
And xrAuto-power spectrum and the relationship of crosspower spectrum may be expressed as:
As it is assumed that reverberation is uncorrelated to voice, convolution (23), (25), (26) can be obtained:
Definition and formula (28) in conjunction with ears coherence, can obtain:
Solution formula (29) can obtain reverberation power Spectral Estimation result φvv.It rewrites formula (29) are as follows:
It solves:
For theoretically, since voice signal consistency is stronger, reverb signal consistency is weaker, receives the consistent of signal
Property be not more than clean speech signal consistency, thereforeTo it is believed that formula (31) have solution.
In order to guarantee reverberation power spectrum φvvPositive number is taken, calculates reverberation power spectrum using formula (32):
Wherein auto-power spectrumWith crosspower spectrumEqually calculated using recursive average method.
High frequency section with reverberation voice is based on this method and carries out reverberation power Spectral Estimation, and estimated result is
Since there are a certain distance for theoretic signal conformance and actual signal consistency, and then influence the function of reverberation
The result of rate Power estimation.In order to further increase the effect of estimation, the consistency of signal is updated here.
When the larger value in two voice probabilities of occurrence is lower than some threshold value, the voice signal that is received using microphone
Consistency consistency obtained to reverb signal be updated:
If max (P (H1|Xl),P(H1|Xr))<p0
Then
When the smaller value in two voice probabilities of occurrence is higher than some threshold value, the voice signal that is received using microphone
Consistency the consistency of clean speech signal is updated, can be obtained by formula (29):
If min (P (H1|Xl),P(H1|Xr)) > p1
Then
Wherein, p0、p1Indicate threshold value,Indicate the consistency of reverb signal, αγIndicate smoothing factor, γxlxrIndicate wheat
The consistency between two voices that gram wind receives,Indicate the consistency of clean speech signal,Indicate two Mikes
The crosspower spectrum for the voice that wind receives, φxlxlIndicate the auto-power spectrum for the voice that left microphone receives,Indicate right wheat
The auto-power spectrum for the voice that gram wind receives, φvvIndicate the auto-power spectrum of reverb signal.Due to the reverberation power based on consistency
Power estimation has only used square of clean speech signal consistency, so need to be only updated using formula (35).
6) the reverberation power Spectral Estimation for combining low-and high-frequency, as the small Mr. Yu's setting value μ of frequency μs(distinguish the frequency of low-and high-frequency
Value) when, reverberation power spectrum isWhen frequency is greater than the threshold value μsWhen, reverberation power spectrum isThat is:
7) the reverberation power spectrum of the combination low-and high-frequency obtained according to step 6) estimation, utilizes existing recurrence smoothing algorithm meter
Calculation obtains final reverberation power spectrum.
8) gain function is calculated.It is calculated after the power spectrum of reverb signal, can use the design of reverberation power spectrum and increase
Beneficial function, the signal that microphone receives, which is multiplied by gain function, can be obtained the signal after dereverberation.Based on reverberation power Spectral Estimation
Speech dereverbcration be often filtered with spectrum-subtraction.It is based on a simple principle: assuming that reverberation be postback noise, by from
Microphone receive with the estimation for subtracting reverberation spectrum in reverberation speech manual, available pure voice signal spectrum.Gain letter
Number is as follows:
Wherein,Indicate the reverberation power spectrum of estimation,Indicate that the microphone calculated receives voice letter
Power frequency spectrum, ξ2Square of (λ) expression posteriori SNR.Subtract in order to avoid crossing, sets a lower bound Gmin.Language after dereverberation
Sound signal indicates on frequency domain are as follows:
9) finally, utilizing the time-domain signal after the available dereverberation of Short-time Fourier inverse transformation
Accordingly with above method, the present invention also provides a kind of ears voices based on voice probability of occurrence and consistency to go
Reverberation unit comprising:
Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains the time Shang pair
Voice signal after neat;Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and passes through Fourier transformation
Voice signal is set to transform from the time domain to frequency domain;
Low-frequency range reverberation power Spectral Estimation unit is responsible for the low-frequency range part based on voice probability of occurrence estimated speech signal
Reverberation power spectrum;
High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal;It is based on
The reverberation power spectrum of the high band part of the Uniform estimates voice signal;
In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for the reverberation power spectrum and institute according to the low-frequency range part
The reverberation power spectrum for stating high band part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range
Meter;
Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated using recurrence smoothing algorithm
To final reverberation power spectrum;According to the final reverberation spectra calculation gain function, obtain mixing by gain function
Frequency-region signal after sound;According to the frequency-region signal after the dereverberation, after obtaining dereverberation using Short-time Fourier inverse transformation
Time-domain signal.
The beneficial effects of the present invention are:
The difference of consistency, adopts low-and high-frequency between the reverberation and clean speech that the present invention is received using two microphones
Different reverberation power Spectral Estimations is taken, the model by calculating reverberation power spectrum based on voice probability of occurrence carries out low frequency part
Reverberation removal, is removed the reverberation of high frequency section using voice consistency model, can effectively remove on entire frequency band
Reverberation improves speech perceptual quality.
Detailed description of the invention
Fig. 1 is the flow diagram of the ears speech dereverberation method the present invention is based on voice probability of occurrence and consistency.
Fig. 2 is that true reverberation power and the method based on consistency dereverberation improve front and back to reverberation in the embodiment of the present invention
The comparison diagram of power Spectral Estimation.
Fig. 3 (a)-Fig. 3 (c) is the voice signal polluted by reverberation respectively, mix based on consistency before improvement
The sound spectrograph of voice after sound, and carry out the sound spectrograph of the voice after dereverberation after improving using voice probability of occurrence and consistency.
Specific embodiment
Below with reference to examples and drawings, the present invention is clearly and completely described.
The more authority and be using the widest in field of speech enhancement in the world of database used by the present embodiment
One of general database.Clean speech is derived from TSP database, shares 80 voices for testing.The signal that microphone receives
It is that clean speech signal obtains in the room impulse response convolution provided by Air (Aachen Impulse Response) database
It arrives.Air impulse response data library is to be recorded by communication system research institute of Aachen, Germany polytechnical university using HMS2 human simulation foreman
System, including the different types of scene such as office, meeting room, Conference Hall is ground for signal processing algorithm under reverberant ambiance
Study carefully.Two microphones are located at the left and right ear of dummy head, about 0.17 meter of distance.
The present embodiment is existed using the ears speech dereverberation method based on voice probability of occurrence and consistency as shown in Figure 1
The evaluation of speech dereverbcration algorithm is carried out under different reverberation scenes.For the parameter in algorithm, specific setting is as shown in table 1.
The setting of 1 algorithm parameter of table
Parameter | Value |
Sample rate fs | 16kHz |
Frame length L | 320 |
Frame moves M | 160 |
Spectrum smoothing parameter alpha | 50% |
Subtraction factor β | 0.85 |
Compose lower bound Gmin | -10dB |
Table 2 gives improve before only use consistency to carry out reverberation estimation and use after removing the method and improvement of reverberation
Voice probability of occurrence and consistency come carry out reverberation estimation and remove speech perceptual quality (PESQ) that the method for reverberation obtains and
The raising degree (Δ SRMR) of signal reverberation modulation ratio.It can be seen that occur based on voice from the comparison for the Δ SRMR for improving front and back
The method of probability and consistency dereverberation can obviously remove more reverberation, thus can also obtain higher PESQ value.
2 noise power spectrum algorithm for estimating of table improves front and back noise power Power estimation log error
Reverberation scene | Office | Lecture Room | Corridor | Auditorium |
Reverberation time | 0.45s | 0.85s | 0.83s | 5.16s |
Initial p ESQ value | 1.89 | 1.62 | 1.74 | 1.44 |
Before PESQ- is improved | 2.19 | 1.78 | 1.92 | 1.61 |
After PESQ- is improved | 2.42 | 2.00 | 2.07 | 1.78 |
Before Δ SRMR- is improved | 1.05 | 1.11 | 1.19 | 0.90 |
After Δ SRMR- is improved | 1.32 | 1.37 | 1.41 | 1.18 |
Fig. 2 is that reverberation scene is the power spectrum of true reverb signal and to make under conditions of office in the embodiment of the present invention
With the reverberation power spectrum estimated based on Improvement of Consistency anterior-posterior approach.From Fig. 2 this it appears that with improved method
Estimate that obtained power spectrum is closer to true reverberation power spectrum.
The effect of speech dereverbcration can be preferably observed using the sound spectrograph of voice signal after dereverberation.Fig. 3 (a)-Fig. 3
(c) example is given.Fig. 3 (a)-Fig. 3 (c) is the sound spectrograph of the voice signal after being polluted by reverberation respectively, before improvement
The sound spectrograph of the voice after dereverberation is carried out based on consistency, and is made a return journey after improving using voice probability of occurrence and consistency mixed
The sound spectrograph of voice after sound.From sound spectrograph as can be seen that carrying out the language obtained after speech dereverbcration using method of the invention
The sound spectrograph of sound signal can remove more reverberation, especially in low frequency part.
Another embodiment of the present invention provides a kind of ears speech dereverbcration device based on voice probability of occurrence and consistency,
Comprising:
Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains the time Shang pair
Voice signal after neat;Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and passes through Fourier transformation
Voice signal is set to transform from the time domain to frequency domain;
Low-frequency range reverberation power Spectral Estimation unit is responsible for the low-frequency range part based on voice probability of occurrence estimated speech signal
Reverberation power spectrum;
High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal;It is based on
The reverberation power spectrum of the high band part of the Uniform estimates voice signal;
In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for the reverberation power spectrum and institute according to the low-frequency range part
The reverberation power spectrum for stating high band part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range
Meter;
Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated using recurrence smoothing algorithm
To final reverberation power spectrum;According to the final reverberation spectra calculation gain function, obtain mixing by gain function
Frequency-region signal after sound;According to the frequency-region signal after the dereverberation, after obtaining dereverberation using Short-time Fourier inverse transformation
Time-domain signal.
Examples detailed above is citing of the invention, although disclosing example of the invention, this field for the purpose of illustration
Technical staff be understood that without departing from the spirit and scope of the invention and the appended claims, various replacements, variation
It is all possible with modification.Therefore, the present invention should not be limited to the content of the example, and protection scope of the present invention should be wanted with right
It asks subject to described in book.
Claims (10)
1. a kind of ears speech dereverberation method based on voice probability of occurrence and consistency, step include:
1) voice signal for receiving two microphones carries out delay compensation, obtains the voice signal after being aligned on the time;
2) adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and voice signal is made by Fourier transformation
Transform from the time domain to frequency domain;
3) the reverberation power spectrum of the low-frequency range part based on voice probability of occurrence estimated speech signal;
4) consistency of the unlike signal component of voice signal is calculated;
5) the reverberation power spectrum of the high band part based on the Uniform estimates voice signal;
6) according to the reverberation power spectrum of the reverberation power spectrum of the low-frequency range part and the high band part, according to height frequency range
Division threshold value to combine low-and high-frequency reverberation power spectrum estimate;
7) according to the reverberation power spectrum of the combination low-and high-frequency, final reverberation power is calculated using recurrence smoothing algorithm
Spectrum;
8) the frequency domain letter according to the final reverberation spectra calculation gain function, after dereverberation is obtained by gain function
Number;
9) time-domain signal according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation.
2. the method as described in claim 1, which is characterized in that two voice signals use the side γ GCC-PHAT- ρ in step 1)
Method carries out delay compensation, to overcome influence of the disturbing factor in environment to cross-correlation function spectrum peak position.
3. the method as described in claim 1, which is characterized in that step 3) separately estimates the reverberation power spectrum of low-frequency range, with
Guarantee the reverberation of removal low-frequency range.
4. method as claimed in claim 3, which is characterized in that in step 3) when in the voice probability of occurrence in two channels compared with
When big value is lower than some threshold value, reverberation power spectrum is updated, is not otherwise updated;It is described that reverberation power spectrum is updated
Method are as follows:
If 1) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl)<P(H1|Xr),
Then
If 2) max (P (H1|Xl),P(H1|Xr))<p0, and P (H1|Xl) > P (H1|Xr),
Then
3) other,
Wherein, P (H1|Xl) indicate first microphone signal XlVoice probability of occurrence, P (H1|Xr) indicate second microphone
Signal XrVoice probability of occurrence, p0Indicate threshold value, λ and μ respectively indicate frame number and frequency, H1Indicate voice, H0Indicate non-language
Sound,For the auto-power spectrum of estimated reverberation.
5. the method as described in claim 1, which is characterized in that reverberation is assumed to be scattering sound field by step 4), and use has
The reverberation consistency model that head is blocked calculates consistency.
6. the method as described in claim 1, which is characterized in that step 5) includes following sub-step:
5-1) according to the consistency of the voice probability of occurrence more new signal at all frequencies;
5-2) considering the influence of head occlusion effect, it is assumed that the clean speech signal power spectrum that two microphones receive is different,
Reverberation power spectrum is estimated in conjunction with compatibility function.
7. method as claimed in claim 6, which is characterized in that the clean speech that two microphones receive in step 5) from
Power spectrum and crosspower spectrum indicate are as follows:
Wherein, HlAnd HrThe transmission function of left and right ear is respectively indicated, S indicates sound-source signal,Indicate binaural signal coherent function,
SlIndicate the voice signal that left side microphone receives, SrIndicate the voice signal that right side microphones receive.
8. the method for claim 7, which is characterized in that step 5-1) include:
A) consistency of reverberation voice is updated, i.e., when the larger value in two voice probabilities of occurrence is lower than some threshold value
When, the consistency of the voice signal received using microphone obtains consistent update to reverb signal are as follows:
If max (P (H1|Xl),P(H1|Xr))<p0
Then
Wherein,Indicate the consistency of reverb signal, αγIndicate smoothing factor,Indicate two voices that microphone receives
Between consistency, p0Indicate threshold value;
B) consistency of clean speech is updated, i.e., when the smaller value in two voice probabilities of occurrence is higher than some threshold value
When, the consistent update of the consistency of the voice signal received using microphone to clean speech signal are as follows:
If min (P (H1|Xl),P(H1|Xr)) > p1
Then
Wherein,Indicate the consistency of clean speech signal,Indicate the crosspower spectrum for the voice that two microphones receive,Indicate the auto-power spectrum for the voice that left microphone receives,Indicate the auto-power spectrum for the voice that right microphone receives,
φvvIndicate the auto-power spectrum of reverb signal, p1Indicate threshold value;
Step 5-2) estimation to reverberation power spectrum are as follows:
9. method according to claim 8, which is characterized in that the reverberation function for the combination low-and high-frequency that step 6) estimation obtains
Rate spectrum are as follows:
Wherein, μ indicates a certain frequency, μsIndicate the frequency values of differentiation low-and high-frequency,It indicates to be based on voice probability of occurrence
The reverberation power spectrum of the low-frequency range part of estimation;Indicate the reverberation function of the high band part based on Uniform estimates
Rate spectrum.
10. a kind of ears speech dereverbcration device based on voice probability of occurrence and consistency characterized by comprising
Pretreatment unit is responsible for the voice signal for receiving two microphones and carries out delay compensation, obtains after being aligned on the time
Voice signal;Adding window sub-frame processing is carried out to the voice signal after being aligned on the time, and language is made by Fourier transformation
Sound signal transforms from the time domain to frequency domain;
Low-frequency range reverberation power Spectral Estimation unit is responsible for the mixed of the low-frequency range part based on voice probability of occurrence estimated speech signal
Ring power spectrum;
High band reverberation power Spectral Estimation unit is responsible for calculating the consistency of the unlike signal component of voice signal;Based on described
The reverberation power spectrum of the high band part of Uniform estimates voice signal;
In conjunction with the reverberation power Spectral Estimation unit of low-and high-frequency, it is responsible for reverberation power spectrum and the height according to the low-frequency range part
The reverberation power spectrum of frequency range part estimates the reverberation power spectrum for combining low-and high-frequency according to the division threshold value of height frequency range;
Dereverberation unit is responsible for the reverberation power spectrum according to the combination low-and high-frequency, is calculated most using recurrence smoothing algorithm
Whole reverberation power spectrum;According to the final reverberation spectra calculation gain function, after obtaining dereverberation by gain function
Frequency-region signal;Time domain according to the frequency-region signal after the dereverberation, after dereverberation is obtained using Short-time Fourier inverse transformation
Signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810765266.3A CN108986832B (en) | 2018-07-12 | 2018-07-12 | Binaural voice dereverberation method and device based on voice occurrence probability and consistency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810765266.3A CN108986832B (en) | 2018-07-12 | 2018-07-12 | Binaural voice dereverberation method and device based on voice occurrence probability and consistency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108986832A true CN108986832A (en) | 2018-12-11 |
CN108986832B CN108986832B (en) | 2020-12-15 |
Family
ID=64537944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810765266.3A Active CN108986832B (en) | 2018-07-12 | 2018-07-12 | Binaural voice dereverberation method and device based on voice occurrence probability and consistency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108986832B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110012331A (en) * | 2019-04-11 | 2019-07-12 | 杭州微纳科技股份有限公司 | A kind of far field diamylose far field audio recognition method of infrared triggering |
CN110095755A (en) * | 2019-04-01 | 2019-08-06 | 北京云知声信息技术有限公司 | A kind of sound localization method |
CN110691296A (en) * | 2019-11-27 | 2020-01-14 | 深圳市悦尔声学有限公司 | Channel mapping method for built-in earphone of microphone |
CN110718230A (en) * | 2019-08-29 | 2020-01-21 | 云知声智能科技股份有限公司 | Method and system for eliminating reverberation |
CN111128213A (en) * | 2019-12-10 | 2020-05-08 | 展讯通信(上海)有限公司 | Noise suppression method and system for processing in different frequency bands |
CN113613112A (en) * | 2021-09-23 | 2021-11-05 | 三星半导体(中国)研究开发有限公司 | Method and electronic device for suppressing wind noise of microphone |
CN115831145A (en) * | 2023-02-16 | 2023-03-21 | 之江实验室 | Double-microphone speech enhancement method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006243290A (en) * | 2005-03-02 | 2006-09-14 | Advanced Telecommunication Research Institute International | Disturbance component suppressing device, computer program, and speech recognition system |
WO2009151062A1 (en) * | 2008-06-10 | 2009-12-17 | ヤマハ株式会社 | Acoustic echo canceller and acoustic echo cancel method |
JP2011065128A (en) * | 2009-08-20 | 2011-03-31 | Mitsubishi Electric Corp | Reverberation removing device |
CN102347028A (en) * | 2011-07-14 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | Double-microphone speech enhancer and speech enhancement method thereof |
CN102800322A (en) * | 2011-05-27 | 2012-11-28 | 中国科学院声学研究所 | Method for estimating noise power spectrum and voice activity |
JP2013044908A (en) * | 2011-08-24 | 2013-03-04 | Nippon Telegr & Teleph Corp <Ntt> | Background sound suppressor, background sound suppression method and program |
CN106297817A (en) * | 2015-06-09 | 2017-01-04 | 中国科学院声学研究所 | A kind of sound enhancement method based on binaural information |
CN106971740A (en) * | 2017-03-28 | 2017-07-21 | 吉林大学 | Probability and the sound enhancement method of phase estimation are had based on voice |
-
2018
- 2018-07-12 CN CN201810765266.3A patent/CN108986832B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006243290A (en) * | 2005-03-02 | 2006-09-14 | Advanced Telecommunication Research Institute International | Disturbance component suppressing device, computer program, and speech recognition system |
WO2009151062A1 (en) * | 2008-06-10 | 2009-12-17 | ヤマハ株式会社 | Acoustic echo canceller and acoustic echo cancel method |
JP2011065128A (en) * | 2009-08-20 | 2011-03-31 | Mitsubishi Electric Corp | Reverberation removing device |
CN102800322A (en) * | 2011-05-27 | 2012-11-28 | 中国科学院声学研究所 | Method for estimating noise power spectrum and voice activity |
CN102347028A (en) * | 2011-07-14 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | Double-microphone speech enhancer and speech enhancement method thereof |
JP2013044908A (en) * | 2011-08-24 | 2013-03-04 | Nippon Telegr & Teleph Corp <Ntt> | Background sound suppressor, background sound suppression method and program |
CN106297817A (en) * | 2015-06-09 | 2017-01-04 | 中国科学院声学研究所 | A kind of sound enhancement method based on binaural information |
CN106971740A (en) * | 2017-03-28 | 2017-07-21 | 吉林大学 | Probability and the sound enhancement method of phase estimation are had based on voice |
Non-Patent Citations (2)
Title |
---|
ZHANG LONG ET AL.: "Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation", 《SPEECH COMMUNICATION》 * |
陈建荣等: "基于麦克风阵列的混响消减处理", 《电声技术》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110095755B (en) * | 2019-04-01 | 2021-03-12 | 云知声智能科技股份有限公司 | Sound source positioning method |
CN110095755A (en) * | 2019-04-01 | 2019-08-06 | 北京云知声信息技术有限公司 | A kind of sound localization method |
CN110012331B (en) * | 2019-04-11 | 2021-05-25 | 杭州微纳科技股份有限公司 | Infrared-triggered far-field double-microphone far-field speech recognition method |
CN110012331A (en) * | 2019-04-11 | 2019-07-12 | 杭州微纳科技股份有限公司 | A kind of far field diamylose far field audio recognition method of infrared triggering |
CN110718230A (en) * | 2019-08-29 | 2020-01-21 | 云知声智能科技股份有限公司 | Method and system for eliminating reverberation |
CN110718230B (en) * | 2019-08-29 | 2021-12-17 | 云知声智能科技股份有限公司 | Method and system for eliminating reverberation |
CN110691296A (en) * | 2019-11-27 | 2020-01-14 | 深圳市悦尔声学有限公司 | Channel mapping method for built-in earphone of microphone |
CN111128213A (en) * | 2019-12-10 | 2020-05-08 | 展讯通信(上海)有限公司 | Noise suppression method and system for processing in different frequency bands |
WO2021114733A1 (en) * | 2019-12-10 | 2021-06-17 | 展讯通信(上海)有限公司 | Noise suppression method for processing at different frequency bands, and system thereof |
CN111128213B (en) * | 2019-12-10 | 2022-09-27 | 展讯通信(上海)有限公司 | Noise suppression method and system for processing in different frequency bands |
CN113613112A (en) * | 2021-09-23 | 2021-11-05 | 三星半导体(中国)研究开发有限公司 | Method and electronic device for suppressing wind noise of microphone |
CN113613112B (en) * | 2021-09-23 | 2024-03-29 | 三星半导体(中国)研究开发有限公司 | Method for suppressing wind noise of microphone and electronic device |
CN115831145A (en) * | 2023-02-16 | 2023-03-21 | 之江实验室 | Double-microphone speech enhancement method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108986832B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105869651B (en) | Binary channels Wave beam forming sound enhancement method based on noise mixing coherence | |
CN108986832A (en) | Ears speech dereverberation method and device based on voice probability of occurrence and consistency | |
CN107479030B (en) | Frequency division and improved generalized cross-correlation based binaural time delay estimation method | |
Jeub et al. | Model-based dereverberation preserving binaural cues | |
CN111161751A (en) | Distributed microphone pickup system and method under complex scene | |
US8880396B1 (en) | Spectrum reconstruction for automatic speech recognition | |
US20140025374A1 (en) | Speech enhancement to improve speech intelligibility and automatic speech recognition | |
US20130322643A1 (en) | Multi-Microphone Robust Noise Suppression | |
CN110728989B (en) | Binaural speech separation method based on long-time and short-time memory network L STM | |
CN105741849A (en) | Voice enhancement method for fusing phase estimation and human ear hearing characteristics in digital hearing aid | |
Liu et al. | A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers | |
Mosayyebpour et al. | Single-microphone LP residual skewness-based inverse filtering of the room impulse response | |
Ren et al. | A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement. | |
Aroudi et al. | Cognitive-driven binaural LCMV beamformer using EEG-based auditory attention decoding | |
Sadjadi et al. | Blind reverberation mitigation for robust speaker identification | |
CN115424627A (en) | Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm | |
Pak et al. | Multichannel speech reinforcement based on binaural unmasking | |
Feng et al. | Preservation Of Interaural Level Difference Cue In A Deep Learning-Based Speech Separation System For Bilateral And Bimodal Cochlear Implants Users | |
CN113936687B (en) | Method for real-time voice separation voice transcription | |
Hongo et al. | Binaural speech enhancement method by wavelet transform based on interaural level and argument differences | |
Choi | Speech processing system using a noise reduction neural network based on FFT spectrums | |
Unoki et al. | Unified denoising and dereverberation method used in restoration of MTF-based power envelope | |
Yang et al. | Environment-Aware Reconfigurable Noise Suppression | |
Hussain et al. | A novel psychoacoustically motivated multichannel speech enhancement system | |
Chen et al. | Early Reflections Based Speech Enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |