CN105869651A - Two-channel beam forming speech enhancement method based on noise mixed coherence - Google Patents

Two-channel beam forming speech enhancement method based on noise mixed coherence Download PDF

Info

Publication number
CN105869651A
CN105869651A CN201610167885.3A CN201610167885A CN105869651A CN 105869651 A CN105869651 A CN 105869651A CN 201610167885 A CN201610167885 A CN 201610167885A CN 105869651 A CN105869651 A CN 105869651A
Authority
CN
China
Prior art keywords
noise
signal
coherence
power spectrum
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610167885.3A
Other languages
Chinese (zh)
Other versions
CN105869651B (en
Inventor
刘宏
孙淼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN201610167885.3A priority Critical patent/CN105869651B/en
Publication of CN105869651A publication Critical patent/CN105869651A/en
Application granted granted Critical
Publication of CN105869651B publication Critical patent/CN105869651B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a two-channel beam forming speech enhancement method based on noise mixed coherence. Adaptive beam forming can effectively inhibit directional noise signals under the reverberation-free condition, but the inhibiting effect of the adaptive beam forming is greatly reduced in the presence of reverberation. Aiming at the problem, the invention provides the two-channel beam forming method based on noise mixed coherence. Considering that coherence and scattering noises exist simultaneously in a sound field, in the method provided by the invention, the hypothesis of replacing the traditional scattering sound field with a mixed noise field is provided, specifically, the method comprises the following steps: firstly, estimating the noise coherence in the mixed noise field, estimating a power spectrum of the noises by utilizing the noise coherence, and calculating a gain function of a frequency domain filtering by using the estimated result for the power spectrum of the noises; and carrying out frequency domain filtering processing on the noises and reverberant signals, and carrying out further processing on the residual noises by adopting a minimum variance distortionless response beam forming device. The experiment proves that the quality of the speech enhanced by adopting the method provided by the invention is obviously improved compared with that enhanced by adopting the traditional method.

Description

Dual-channel beam forming voice enhancement method based on noise mixing coherence
Technical Field
The invention belongs to the technical field of information, relates to a voice enhancement method suitable for a complex acoustic environment, and particularly relates to a dual-channel beam forming voice enhancement method based on noise mixing coherence.
Background
Speech is an important information carrier for people to communicate in language. However, in daily life communication, people are often in a noisy environment, so that interested voices are often interfered by noise, reverberation and other factors, the clarity, intelligibility and comfort of the voices are greatly reduced, and the hearing feeling of human ears is seriously affected. It is often desirable to improve speech quality to reduce auditory fatigue, particularly when exposed to noise for extended periods of time. Speech enhancement is the extraction of as clean original speech as possible from noisy speech to improve the quality and intelligibility of the speech. Through the speech enhancement algorithm, the noise can be filtered to a certain extent, so that the quality and intelligibility of the speech segment are improved.
In many situations we need speech enhancement. For example, when a mobile phone performs voice communication, one party speaks in a scene filled with background noise, such as a road, an airport, or a restaurant, and the other party hears a voice signal with various noises. At this time, the noisy speech signal can be processed by a speech enhancement algorithm, so as to improve the speech quality of the receiving end. As another example, in a teleconference system, noise collected at a terminal is transmitted to all other receiving terminals, and if a room in which the terminal is located is reverberated, the teleconference is more affected. Thus, overall system performance can be improved if the noisy speech can be enhanced before the audio signal is broadcast to other receiving ends. For those with hearing impairment, it is often necessary to communicate with hearing aids or cochlear implants. However, in noisy environments, the effectiveness of the hearing aid is greatly affected. At the moment, the voice enhancement algorithm can be used for preprocessing the signal before the noisy voice signal is amplified, so that the interference of the noise signal can be reduced to a certain extent, and hearing-impaired people can be helped to communicate better.
Speech enhancement techniques can be generally divided into single-channel speech enhancement algorithms and multi-channel speech enhancement algorithms. Single-channel speech enhancement algorithms utilize a single microphone for speech enhancement, and such methods have achieved widespread application and mature development in their simple models and inexpensive cost. However, the single-channel speech enhancement algorithm can only suppress noise by using the statistical characteristics of a single-channel noisy speech signal, and the enhancement effect of the single-channel speech enhancement algorithm is sharply reduced under the condition of non-stationary noise or strong interference. The multi-channel speech enhancement system collects sound signals by using a plurality of microphones, namely a microphone array, and obtains multi-channel signals. Due to the increase of the number of input channels, the signal processing algorithm can utilize the correlation between the channel signals to perform voice enhancement. Compared with the limitation that a single channel can only be enhanced by using the difference of the voice and the noise in the time-frequency domain, the introduction of the microphone array can make up the defect of single-channel voice enhancement. Generally, increasing the number of microphones can improve the effect of speech enhancement. Compared with single-microphone speech enhancement, the microphone-based array not only can utilize time-frequency information of signals, but also can utilize spatial information of the signals, can make up for the deficiency of single-microphone speech enhancement, and is widely concerned. But the disadvantages are that the structure size is huge, the system is complex in calculation, the calculation amount is too large, and the like. The cost of equipment, the real-time performance of the voice enhancement algorithm and the effect of the algorithm are comprehensively considered, and the dual-channel voice enhancement, namely the voice enhancement by using two microphones, is a better compromise scheme.
The so-called dual channel speech enhancement is to enhance the sound source signal of interest and suppress the sound source signal and noise that are not of interest by processing the dual channel speech data. The basic methods for speech enhancement based on two channels include fixed beam forming, adaptive beam forming, post filtering, sub-band beam forming, near-field beam forming and the like. Among them, beamforming (beamforming) is the earliest most classical method, which synchronizes the voice signals in each channel by performing delay compensation on the voice signals of the individual channels, then weights, sums, and finally outputs the system. Beamforming can be divided into fixed beamforming and adaptive beamforming according to whether the weights used in weighting depend on the input signals. Beamforming obtains a signal from a particular direction by weighting each channel signal to emphasize a particular direction signal and attenuate other direction signals. The dual-channel beam forming method mainly comprises the following steps:
1. voice input, pre-filtering and analog-to-digital conversion. Firstly, pre-filtering an input analog sound signal, and carrying out high-pass filtering to inhibit a 50Hz power supply noise signal; the low-pass filtering filters the part of the sound signal with the frequency component exceeding half of the sampling frequency, prevents aliasing interference, and samples and quantizes the analog sound signal to obtain a digital signal.
2. Pre-emphasis is performed. The signal is passed through a high frequency emphasis filter impulse response to compensate for the high frequency attenuation of the lip radiation.
3. And (4) framing and windowing. Due to the slow time-varying property of the voice signal, the voice signal is not stable as a whole and is stable locally, the voice signal is generally considered to be stable within 10-30ms, and the voice signal can be framed according to the length of 20 ms. The framing function is:
xk(n)=w(n)s(Nk+n)n=0,1...N-1;k=0,1...L-1 (1)
where N is the frame length, L is the frame number, and s represents the speech signal. w (n) is a window function whose choice (shape and length) has a large influence on the behavior of the analysis parameters in short time, and commonly used window functions include rectangular windows, hanning windows, hamming windows, and the like. The Hamming window is generally selected, so that the characteristic change of the voice signal can be well reflected, and the Hamming window expression is as follows:
4. and (4) estimating time delay. Each frame of signal can extract features to represent the information contained in the frame of signal, such as time difference, energy difference, etc., so that the voice signals in each channel are synchronous.
5. Beamforming enhances speech. And weighting and summing the synchronized dual-channel signals, and finally outputting.
Disclosure of Invention
The invention provides a novel dual-channel beam forming method which is used for improving the suppression effect of self-adaptive beam forming on noise and reverberation in a complex acoustic environment (the reverberation and the directional noise exist at the same time). Although the adaptive beamforming can effectively suppress the directional noise signals under the condition of no reverberation, under the condition that the reverberation exists, the noise signals come from all directions due to the multipath reflection of the room walls on the signals, so that the noise suppression effect of the adaptive beamforming method is greatly reduced. Aiming at the problem, the invention provides a two-channel beam forming method based on noise mixing coherence.
The traditional method always treats the noise sound field in the reverberation environment as a scattering sound field. However, if the noise is from a particular direction, the signals received by the microphone include not only coherent signals (direct signals) but also diffuse signals (reflected signals). Based on the assumption that the traditional scattering sound field is replaced by the mixed noise sound field, the noise mixing coherence in the mixed noise sound field is estimated firstly, the traditional coherent-based noise estimation method is improved by using the noise mixing coherence, a more accurate noise estimation result is obtained, and the noise estimation result is used for the design of frequency domain filtering. After the noise and the reverberation signal are subjected to frequency domain filtering processing, the residual noise passes through a minimum variance distortionless response beam former. By the method, the effect of the traditional self-adaptive beam forming method under the reverberation environment can be improved.
The technical scheme adopted by the invention is a dual-channel beam forming voice enhancement method based on noise mixing coherence, which mainly comprises the following steps:
1) and (4) performing mixed coherence estimation. In the time domain, the two-channel speech model can be described as:
xi(n)=si(n)+vi(n),i=1,2 (3)
wherein x isi(n) denotes the noisy signal received by the microphone, si(n) denotes a clean speech signal, vi(n) denotes the noise signal, where the indices 1, 2 denote the first and second microphone signals, respectively.
With a short-time fourier transform, the two-channel speech model can be represented in the frequency domain as:
Xi(λ,μ)=Si(λ,μ)+Vi(λ,μ),i=1,2 (4)
where λ and μ denote the frame number and frequency, respectively.
The coherence of the two signals is defined in the frequency domain as:
Γ v 1 v 2 = φ v 1 v 2 φ v 1 v 1 φ v 2 v 2 - - - ( 5 )
whereinAndrespectively representing signals v1And v2Self-power spectrum and cross-power spectrum.
Coherent noise sound fields are generated by noise sources in a particular direction. Suppose the noise arrival angle is θvThen the coherence of the noise signals received by the two microphones is:
Γ d i r , v 1 v 2 = e j 2 π f ( d m i c / c ) cosθ v - - - ( 6 )
where f denotes the frequency variable, c-340 m/s denotes the speed of sound propagation in air, dmicRepresenting the distance of the two microphones.
As room reverberation time increases, numerous uncorrelated point source signals propagate through the air simultaneously, and scattered noise exists in the noise sound field. Diffuse noise soundfields are often considered to be a good approximation of the soundfield in reverberant environments. The ideal diffuse noise field coherence is:
Γ d i f f , v 1 v 2 = sin c ( 2 πfd m i c c ) - - - ( 7 )
since noise comes from a specific direction and the microphone receives signals including coherence signals and scattering simultaneously, the noise sound field is more suitable to be regarded as a mixed sound field. For a mixed noise sound field, the self-power spectrum of the noise is equal to the self-power spectrum addition of different sound field noises, and the same is true for a cross-power spectrum. Thus, the noise coherence in the mixed noise sound field is:
Γ h y b , v 1 v 2 = φ d i f f , v 1 v 2 + φ d i r , v 1 v 2 ( φ d i f f , v 1 v 1 + φ d i r , v 1 v 1 ) ( φ d i f f , v 2 v 2 + φ d i r , v 2 v 2 ) - - - ( 8 )
assuming that the self-power spectra of the noise received by the two microphones are equal, i.e.
φ d i f f , v 1 v 1 = φ d i f f , v 2 v 2 = φ d i f f , v - - - ( 9 )
φ d i r , v 1 v 1 = φ d i r , v 2 v 2 = φ d i r , v - - - ( 10 )
In conjunction with equations (5) (9) (10), equation (8) can be rewritten as:
Γ h y b , v 1 v 2 = 1 1 + φ d i r , v φ d i f f , v · Γ d i f f , v 1 v + φ d i r , v φ d i f f , v 1 + φ d i r , v φ d i f f , v · Γ d i r , v 1 v - - - ( 11 )
from equation (11), the noise coherence in a mixed noise sound field can be considered to be a weighted addition of the coherence noise and the coherence of the scattering noise. Although the power spectra of the coherent noise and the scattering noise in the formula cannot be directly obtained, the power spectra of the coherent noise and the scattering noise in the formulaThe energy ratio of coherent noise to scattered noise can be obtained by the following equation:
Ψ ^ = Γ d i f f , v 1 v 2 Re { Γ x 1 x 2 } - | Γ x 1 x 2 | 2 Γ d i f f , v 1 v 2 2 Re { Γ x 1 x 2 } 2 - Γ d i f f , v 1 v 2 2 | Γ x 1 x 2 | 2 + Γ d i f f , v 1 v 2 2 - 2 Γ d i f f , v 1 v 2 Re { Γ x 1 x 2 } - | Γ x 1 x 2 | 2 | Γ x 1 x 2 | 2 - 1 - - - ( 12 )
whereinIndicating the coherence of the two noisy signals,representing the energy ratio of coherence noise and scattering noise.
Since the direction of the coherence noise is unknown, it cannot be directly calculated from equation (6)From equation (5), it can be found that the coherence of the signal and the argument of the cross-power spectrum are equal, i.e.:
argΓ v = argφ v 1 v 2 - - - ( 13 )
thus, the cross-power spectrum of the noise signal can be used for calculationNamely:
Γ d i r , v 1 v 2 = e j ∠ φ v 1 v 2 - - - ( 14 )
wherein the cross-power spectrum of the noise signalIt can be calculated by leading the non-speech frame of the noisy speech signal and is recorded asThereby the device is provided withCan be calculated using equation (15):
Γ d i r , v 1 v 2 = e j ∠ φ ‾ x 1 x 2 N I S - - - ( 15 )
finally, an estimation formula for noise mixing coherence can be obtained as follows:
Γ ^ h y b , v 1 v 2 = 1 1 + Ψ ^ · sin c ( 2 πfd m i c c ) + Ψ ^ 1 + Ψ ^ · e j ∠ φ ‾ x 1 x 2 N I S . - - - ( 16 )
2) and (5) filtering in a frequency domain. The noisy signal is processed by frequency domain filtering before being processed by the beamformer. The noise signal power spectrum required for frequency domain filtering is derived based on noise mixing coherence. The noise power spectrum estimation method based on the mixing coherence comprises the following steps:
φ ^ v = φ x 1 x 1 φ x 2 x 2 - Re { φ x 1 x 2 } 1 - Re { Γ ^ h y b , v 1 v 2 } - - - ( 17 )
whereinRepresenting the estimated power spectrum of the noise,andrespectively representing noisy signals xiSelf-power spectrum of and noisy signal xjThe self-power spectrum of (a) a,representing a noisy signal xiAnd xjCross power spectrum of (a). The self-power spectrum and cross-power spectrum of the signal are obtained by a recursive average method:
φ ^ x i x j ( λ , μ ) = α φ ^ x i x j ( λ - 1 , μ ) + ( 1 - α ) X i ( λ , μ ) X j * ( λ , μ ) - - - ( 18 )
where α is the smoothing factor, where,Xirepresenting a signal xiThe short-time amplitude spectrum of (a),representing a signal xjComplex conjugation of the short-time amplitude spectrum.
The accuracy of the noise estimation can be determined by the true noise power spectrum phivAnd estimated noise power spectrumTo calculate the log error of:
logErr P S D = 1 K M Σ λ = 1 K Σ μ = 1 M | 10 log 10 [ φ v ( λ , μ ) φ ^ v ( λ , μ ) ] | - - - ( 19 )
after estimating the noise power spectrum, the frequency filtering gain function can be calculated by the following formula:
G i ( λ , μ ) = 1 - β · φ ^ v ( λ , μ ) φ ^ x i x i ( λ , μ ) - - - ( 20 )
wherein β denotes a subtraction factor, G being used to avoid negative valuesminAs a lower bound of the gain function.
The noisy speech signal after the frequency filtering process becomes:
Z ^ i ( λ , μ ) = X i ( λ , μ ) G i ( λ , μ ) - - - ( 21 )
3) post-filtering based beamforming. The frequency domain filtered signal still has residual noise and is then further processed by a minimum variance undistorted response beamformer. For ease of derivation, the beamformer input signals are noted as:
Z ^ ( λ , μ ) = [ Z ^ 1 ( λ , μ ) , Z ^ 2 ( λ , μ ) ] T - - - ( 22 )
according to the least square error undistorted response principle, the weight of the beam former is:
W ( λ , μ ) = R Z ^ Z ^ - 1 ( λ , μ ) d s ( μ ) d s H ( μ ) R Z ^ Z ^ - 1 ( λ , μ ) d s ( μ ) - - - ( 23 )
wherein,representing an input signalThe autocorrelation matrix of dsAnd (mu) is a direction vector of the target speech signal. Which is uniquely determined by the relative position with respect to the speech receiving end. The final voice signal after the beamforming process is:
S ^ ( λ , μ ) = W H ( λ , μ ) Z ^ ( λ , μ ) - - - ( 24 )
the two-channel time domain voice signal after voice enhancement can be obtained by short-time inverse Fourier transform and overlap addition.
Different from the traditional method, the method considers the noise sound field generated by the directional noise in the reverberation environment as a mixed sound field instead of a simple scattering noise sound field, deduces a mixed coherence estimation method of a noise signal, and uses the mixed coherence for the estimation of the noise power spectrum in a frequency domain filter, thereby obtaining a more accurate estimation result and further improving the effect of a voice enhancement algorithm. In addition, in order to improve the effect of the adaptive beamforming method under the reverberation condition, the traditional method usually adds post-filtering after the beamformer. The method of the invention is different from the method of adding post filtering by the beam former, and provides a new dual-channel beam former which consists of a frequency domain filter and a minimum variance distortionless response beam former. A part of reverberation and direct noise signals are removed through frequency domain filtering, and residual signals are processed by a minimum variance distortionless response beam former. This is done because the direct processing of the reverberant signal by the beamformer works poorly, while the first frequency domain filtering can remove the reverberant interference to some extent. Compared with the traditional method, the method can effectively inhibit the noise signals in the reverberation environment, and the signal-to-noise ratio and the perceived voice quality of the enhanced voice are obviously improved compared with the traditional method.
Drawings
FIG. 1 is a flow chart of a dual channel beamforming speech enhancement method of the present invention.
Fig. 2 is a comparison of the signal-to-noise ratio improvement of the enhanced speech signal obtained by the experiment of the embodiment of the present invention and three other existing speech enhancement methods. These three speech enhancement methods are the classical beamforming plus post-filtering method proposed by Zelinski (Zelinski post-filter), the coherence-based method proposed by Yousefian and loizoio in 2013 (COH based), and the direct and scattered signal energy ratio-based method proposed by Schwarz and Kellermann in 2015 (CDR based), respectively.
Fig. 3 is a comparison of the perceived speech quality of the enhanced speech compared to the speech signal before enhancement, obtained by the experiment of the above three other speech enhancement methods in the embodiment of the present invention.
Fig. 4(a) -4(b) are the spectrogram of a clean speech signal and the spectrogram of a speech signal contaminated with reverberation and noise, respectively. FIGS. 4(c) -4(f) are spectrograms of speech enhanced by the method of the present invention and the above-described three additional speech enhancement methods, respectively. Fig. 4(c) corresponds to the classical beamforming plus post filtering method proposed by Zelinski (Zelinskipost-filter), fig. 4(d) corresponds to the coherence based method proposed by Yousefian and loizoio in 2013 (COHbased), fig. 4(e) is the direct signal and scattered signal energy ratio based method proposed by Schwarz and Kellermann in 2015 (CDR based), and fig. 4(f) corresponds to the method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The database adopted by the embodiment is more authoritative in speech enhancement internationally and is one of the most widely used databases. Pure speech was taken from the TSP database for a total of 80 utterances for testing. The noise signal is taken from the NOISEX database and the two noisy microphone signals are derived by convolving the speech and noise signals with the room Impulse response provided by the air (aachen Impulse response) database. The Air impulse response database is recorded by a communication system research institute of the Gem industry university in Germany by utilizing an HMS2 simulation artificial head, comprises different types of scenes such as offices, conference rooms, report halls and the like, and is used for researching a signal processing algorithm in a reverberation environment. The two microphones are respectively positioned at the left ear and the right ear of the artificial head, and the distance is about 0.17 meter. Under different experimental scenes, parameters such as the size of a room, reverberation time, the position of a sound source, the distance between the sound source and the head of a person and the like are different.
Considering that the speech enhancement algorithm often faces different noise environments in practical application, the robustness of the algorithm is tested by setting different reverberation times, sound sources in different directions and different noise types in experiments. Table 1 gives the settings of different noise scenes, the type of the inclusion noise, the room reverberation time, the angle of the target sound source and the angle of the noise source.
TABLE 1 different noise scene settings
Implementation examples speech enhancement algorithm evaluations were performed in different noise scenarios using a two-channel beamforming speech enhancement method as shown in fig. 1. The specific settings for the parameters in the algorithm are shown in table 2.
TABLE 2 Algorithm parameter set
Table 3 shows the logarithmic errors obtained by estimating the noise power spectrum with the mixture coherence (after improvement) and with the scattering noise coherence (before improvement), respectively, and it can be seen that the logarithmic errors obtained by estimation based on the mixture coherence are significantly reduced.
TABLE 3 noise Power Spectrum estimation Log error before and after noise Power Spectrum estimation Algorithm improvement
Fig. 2 is a comparison of the enhancement speech obtained by the experiment of the above three speech enhancement methods in the embodiment of the present invention and the signal-to-noise ratio of the speech signal before enhancement. The method for enhancing the voice by the dual-channel beam forming based on the noise mixing coherence carries out voice enhancement on the signals received by the microphone under the conditions of reverberation and directional noise, and compared with other methods, the method has higher signal-to-noise ratio improvement.
Fig. 3 is a comparison of the perceived speech quality of the enhanced speech compared to the speech signal before enhancement, obtained by the experiment of the above three other speech enhancement methods in the embodiment of the present invention. It can be seen that speech enhancement of the signal received by the microphone under reverberant and directional noise conditions using the speech enhancement method proposed herein results in better speech quality.
The voice enhancement effect can be better observed by utilizing the spectrogram of the enhanced voice signal. Examples are given in fig. 4(a) -4 (f). Fig. 4(a) -4(b) are the spectrogram of a clean speech signal and the spectrogram of a speech signal contaminated with reverberation and noise, respectively. FIGS. 4(c) -4(f) are spectrogram diagrams of speech enhanced using several different speech enhancement algorithms, respectively. As can be seen from the spectrogram, the spectrogram of the voice signal obtained by performing voice enhancement by using the method of the invention is closer to the spectrogram of a pure voice signal.
The above examples are merely illustrative of the present invention, and although examples of the present invention are disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the contents of this example.

Claims (9)

1. A dual channel beamforming speech enhancement method, comprising the steps of:
1) sequentially framing and windowing the dual-channel noisy speech signal in a fixed time window, transforming the signal to a frequency domain by using short-time Fourier transform, and then processing the signal in the frequency domain; then, calculating the energy ratio of the coherence signal to the scattering signal by using the voice signal with noise; then, estimating the mixing coherence of the noise by using the energy ratio of the coherent signal to the scattering signal;
2) performing noise power spectrum estimation by using the mixing coherence, and calculating a gain function of frequency domain filtering by using an estimation result;
3) and processing the noisy speech signal by frequency domain filtering, and then further suppressing residual noise by a minimum variance undistorted response beam former to obtain a finally enhanced speech signal so as to finish speech enhancement.
2. The dual channel beamforming speech enhancement method of claim 1, wherein step 1) treats a noise soundfield in which both reverberation and directional noise are present as a mixed noise field.
3. The dual channel beamforming speech enhancement method of claim 1 wherein step 1) utilizes the cross-power spectrum of the non-speech frame portion of the noisy speech signal preamble with unknown noise cross-power spectrumInstead of the cross-power spectrum of the noise.
4. The dual channel beamforming speech enhancement method according to claim 3, wherein step 1) uses the cross-power spectrum of the noise signal to find the coherence of the directional noise under the condition that the direction of the noise source is unknown:
Γ d i r , v 1 v 2 = e j ∠ φ ‾ x 1 x 2 N I S .
5. the dual channel beamforming speech enhancement method of claim 4 wherein step 1) estimates the mixing coherence of the noise using the energy ratio of the coherence signal to the scattering signal, wherein the coherence calculation takes into account the effect of the noise source angle of incidence on the coherence calculation by:
Γ ^ h y b , v 1 v 2 = 1 1 + Ψ ^ · sin c ( 2 πfd m i c c ) + Ψ ^ 1 + Ψ ^ · e j ∠ φ ‾ x 1 x 2 N I S .
whereinRepresenting the energy ratio of coherence noise and scattering noise, f representing the frequency variation, c 340m/s representing the speed of sound propagation in air, dmicRepresenting the distance of the two microphones.
6. The dual channel beamforming speech enhancement method of claim 1 wherein the step 2) of noise power spectrum estimation based on the mixing coherence method comprises:
φ ^ v = φ x 1 x 1 φ x 2 x 2 - Re { φ x 1 x 2 } 1 - Re { Γ ^ h y b , v 1 v 2 } ,
whereinRepresenting the estimated power spectrum of the noise,andrespectively representing noisy signals xiSelf-power spectrum of and noisy signal xjThe self-power spectrum of (a) a,representing a noisy signal xiAnd xjCross power spectrum of; the self-power spectrum and the cross-power spectrum of the signal are obtained by a recursive average method:
φ ^ x i x j ( λ , μ ) = α φ ^ x i x j ( λ - 1 , μ ) + ( 1 - α ) X i ( λ , μ ) X j * ( λ , μ ) ,
where α is a smoothing factor, λ and μ denote frame number and frequency, respectively, XiRepresenting a signal xiThe short-time amplitude spectrum of (a),representing a signal xjComplex conjugation of the short-time amplitude spectrum.
7. The dual channel beamforming speech enhancement method of claim 6 wherein step 2) uses the result of the noise power spectrum estimation based on the mixing coherence to calculate a gain function of a frequency domain filter to achieve simultaneous suppression of the reverberation and noise signals, the gain function of the frequency domain filter being:
G i ( λ , μ ) = 1 - β · φ ^ v ( λ , μ ) φ ^ x i x i ( λ , μ ) ,
wherein β denotes a subtraction factor, G being used to avoid negative valuesminAs a lower bound of the gain function.
8. The dual channel beamforming speech enhancement method of claim 1 wherein step 3) uses the frequency domain filter and the minimum variance distortionless response beamformer to form a dual channel beamformer which is used to process the noisy signal to obtain a speech enhanced signal; where the frequency domain filter is before the minimum variance undistorted response beamformer.
9. The dual channel beamforming speech enhancement method of claim 8, wherein step 3) comprises the sub-steps of:
3-1) multiplying the magnitude spectrum of the voice signal with noise by the gain function of the frequency domain filter to obtain the magnitude spectrum of the voice signal after filtering processing;
3-2) multiplying the obtained amplitude spectrum by the weight of the minimum variance undistorted response beam former to obtain the amplitude spectrum of the finally enhanced voice signal;
3-3) transforming the signal to the time domain by using short-time inverse Fourier transform and overlap addition to obtain an enhanced signal in the time domain.
CN201610167885.3A 2016-03-23 2016-03-23 Binary channels Wave beam forming sound enhancement method based on noise mixing coherence Expired - Fee Related CN105869651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610167885.3A CN105869651B (en) 2016-03-23 2016-03-23 Binary channels Wave beam forming sound enhancement method based on noise mixing coherence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610167885.3A CN105869651B (en) 2016-03-23 2016-03-23 Binary channels Wave beam forming sound enhancement method based on noise mixing coherence

Publications (2)

Publication Number Publication Date
CN105869651A true CN105869651A (en) 2016-08-17
CN105869651B CN105869651B (en) 2019-05-31

Family

ID=56625443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610167885.3A Expired - Fee Related CN105869651B (en) 2016-03-23 2016-03-23 Binary channels Wave beam forming sound enhancement method based on noise mixing coherence

Country Status (1)

Country Link
CN (1) CN105869651B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448693A (en) * 2016-09-05 2017-02-22 华为技术有限公司 Speech signal processing method and apparatus
US9906859B1 (en) 2016-09-30 2018-02-27 Bose Corporation Noise estimation for dynamic sound adjustment
CN107785029A (en) * 2017-10-23 2018-03-09 科大讯飞股份有限公司 Target voice detection method and device
CN109473118A (en) * 2018-12-24 2019-03-15 苏州思必驰信息科技有限公司 Double-channel pronunciation Enhancement Method and device
CN109712637A (en) * 2018-12-21 2019-05-03 珠海慧联科技有限公司 A kind of Reverberation Rejection system and method
CN109754803A (en) * 2019-01-23 2019-05-14 上海华镇电子科技有限公司 Vehicle multi-sound area voice interactive system and method
CN110415718A (en) * 2019-09-05 2019-11-05 腾讯科技(深圳)有限公司 The method of signal generation, audio recognition method and device based on artificial intelligence
CN110636423A (en) * 2018-06-22 2019-12-31 西万拓私人有限公司 Method for enhancing signal directionality in a hearing device
CN111341339A (en) * 2019-12-31 2020-06-26 深圳海岸语音技术有限公司 Target voice enhancement method based on acoustic vector sensor adaptive beam forming and deep neural network technology
CN111866439A (en) * 2020-07-21 2020-10-30 厦门亿联网络技术股份有限公司 Conference device and system for optimizing audio and video experience and operation method thereof
CN113362808A (en) * 2021-06-02 2021-09-07 云知声智能科技股份有限公司 Target direction voice extraction method and device, electronic equipment and storage medium
CN114143668A (en) * 2020-09-04 2022-03-04 阿里巴巴集团控股有限公司 Audio signal processing, reverberation detection and conference method, apparatus and storage medium
US11295718B2 (en) 2018-11-02 2022-04-05 Bose Corporation Ambient volume control in open audio device
TWI767696B (en) * 2020-09-08 2022-06-11 英屬開曼群島商意騰科技股份有限公司 Apparatus and method for own voice suppression
WO2023016032A1 (en) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 Video processing method and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574824A (en) * 1994-04-11 1996-11-12 The United States Of America As Represented By The Secretary Of The Air Force Analysis/synthesis-based microphone array speech enhancer with variable signal distortion
US20080187152A1 (en) * 2007-02-07 2008-08-07 Samsung Electronics Co., Ltd. Apparatus and method for beamforming in consideration of actual noise environment character
CN101263734A (en) * 2005-09-02 2008-09-10 丰田自动车株式会社 Post-filter for microphone array
CN101447190A (en) * 2008-06-25 2009-06-03 北京大学深圳研究生院 Voice enhancement method employing combination of nesting-subarray-based post filtering and spectrum-subtraction
WO2015086377A1 (en) * 2013-12-11 2015-06-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Extraction of reverberant sound using microphone arrays
CN105244036A (en) * 2014-06-27 2016-01-13 中兴通讯股份有限公司 Microphone speech enhancement method and microphone speech enhancement device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574824A (en) * 1994-04-11 1996-11-12 The United States Of America As Represented By The Secretary Of The Air Force Analysis/synthesis-based microphone array speech enhancer with variable signal distortion
CN101263734A (en) * 2005-09-02 2008-09-10 丰田自动车株式会社 Post-filter for microphone array
US20080187152A1 (en) * 2007-02-07 2008-08-07 Samsung Electronics Co., Ltd. Apparatus and method for beamforming in consideration of actual noise environment character
CN101447190A (en) * 2008-06-25 2009-06-03 北京大学深圳研究生院 Voice enhancement method employing combination of nesting-subarray-based post filtering and spectrum-subtraction
WO2015086377A1 (en) * 2013-12-11 2015-06-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Extraction of reverberant sound using microphone arrays
CN105244036A (en) * 2014-06-27 2016-01-13 中兴通讯股份有限公司 Microphone speech enhancement method and microphone speech enhancement device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马晓红等: "基于信号相位差和后置滤波的语音增强方法", 《电子学报》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106448693A (en) * 2016-09-05 2017-02-22 华为技术有限公司 Speech signal processing method and apparatus
CN106448693B (en) * 2016-09-05 2019-11-29 华为技术有限公司 A kind of audio signal processing method and device
US9906859B1 (en) 2016-09-30 2018-02-27 Bose Corporation Noise estimation for dynamic sound adjustment
US10542346B2 (en) 2016-09-30 2020-01-21 Bose Corporation Noise estimation for dynamic sound adjustment
WO2018063504A1 (en) * 2016-09-30 2018-04-05 Bose Corporation Noise estimation for dynamic sound adjustment
US10158944B2 (en) 2016-09-30 2018-12-18 Bose Corporation Noise estimation for dynamic sound adjustment
US11308974B2 (en) 2017-10-23 2022-04-19 Iflytek Co., Ltd. Target voice detection method and apparatus
CN107785029A (en) * 2017-10-23 2018-03-09 科大讯飞股份有限公司 Target voice detection method and device
CN107785029B (en) * 2017-10-23 2021-01-29 科大讯飞股份有限公司 Target voice detection method and device
CN110636423A (en) * 2018-06-22 2019-12-31 西万拓私人有限公司 Method for enhancing signal directionality in a hearing device
CN110636423B (en) * 2018-06-22 2021-08-17 西万拓私人有限公司 Method for enhancing signal directionality in a hearing device
US11955107B2 (en) 2018-11-02 2024-04-09 Bose Corporation Ambient volume control in open audio device
US11295718B2 (en) 2018-11-02 2022-04-05 Bose Corporation Ambient volume control in open audio device
CN109712637A (en) * 2018-12-21 2019-05-03 珠海慧联科技有限公司 A kind of Reverberation Rejection system and method
CN109712637B (en) * 2018-12-21 2020-09-22 珠海慧联科技有限公司 Reverberation suppression system and method
CN109473118A (en) * 2018-12-24 2019-03-15 苏州思必驰信息科技有限公司 Double-channel pronunciation Enhancement Method and device
CN109473118B (en) * 2018-12-24 2021-07-20 思必驰科技股份有限公司 Dual-channel speech enhancement method and device
CN109754803A (en) * 2019-01-23 2019-05-14 上海华镇电子科技有限公司 Vehicle multi-sound area voice interactive system and method
CN110415718A (en) * 2019-09-05 2019-11-05 腾讯科技(深圳)有限公司 The method of signal generation, audio recognition method and device based on artificial intelligence
CN111341339A (en) * 2019-12-31 2020-06-26 深圳海岸语音技术有限公司 Target voice enhancement method based on acoustic vector sensor adaptive beam forming and deep neural network technology
CN111866439A (en) * 2020-07-21 2020-10-30 厦门亿联网络技术股份有限公司 Conference device and system for optimizing audio and video experience and operation method thereof
CN111866439B (en) * 2020-07-21 2022-07-05 厦门亿联网络技术股份有限公司 Conference device and system for optimizing audio and video experience and operation method thereof
CN114143668A (en) * 2020-09-04 2022-03-04 阿里巴巴集团控股有限公司 Audio signal processing, reverberation detection and conference method, apparatus and storage medium
TWI767696B (en) * 2020-09-08 2022-06-11 英屬開曼群島商意騰科技股份有限公司 Apparatus and method for own voice suppression
US11622208B2 (en) 2020-09-08 2023-04-04 British Cayman Islands Intelligo Technology Inc. Apparatus and method for own voice suppression
CN113362808A (en) * 2021-06-02 2021-09-07 云知声智能科技股份有限公司 Target direction voice extraction method and device, electronic equipment and storage medium
CN113362808B (en) * 2021-06-02 2023-03-21 云知声智能科技股份有限公司 Target direction voice extraction method and device, electronic equipment and storage medium
WO2023016032A1 (en) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 Video processing method and electronic device

Also Published As

Publication number Publication date
CN105869651B (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN105869651B (en) Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
Yousefian et al. A dual-microphone speech enhancement algorithm based on the coherence function
Schwarz et al. Coherent-to-diffuse power ratio estimation for dereverberation
CN107479030B (en) Frequency division and improved generalized cross-correlation based binaural time delay estimation method
CN110085248B (en) Noise estimation at noise reduction and echo cancellation in personal communications
EP2393463B1 (en) Multiple microphone based directional sound filter
Huang et al. A multi-frame approach to the frequency-domain single-channel noise reduction problem
Ren et al. A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement.
CN108986832B (en) Binaural voice dereverberation method and device based on voice occurrence probability and consistency
CN102157156B (en) Single-channel voice enhancement method and system
CN102456351A (en) Voice enhancement system
US8682006B1 (en) Noise suppression based on null coherence
TW201142829A (en) Adaptive noise reduction using level cues
Sunohara et al. Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components
Löllmann et al. Low delay noise reduction and dereverberation for hearing aids
Yousefian et al. A coherence-based noise reduction algorithm for binaural hearing aids
Schwarz et al. A two-channel reverberation suppression scheme based on blind signal separation and Wiener filtering
Pasha et al. Spatial multi-channel linear prediction for dereverberation of ad-hoc microphones
Madhu et al. Localisation-based, situation-adaptive mask generation for source separation
Nordholm et al. Assistive listening headsets for high noise environments: Protection and communication
Akagi et al. Noise reduction using a small-scale microphone array in multi noise source environment
Bagekar et al. Dual channel coherence based speech enhancement with wavelet denoising
Martın-Donas et al. A postfiltering approach for dual-microphone smartphones
Unoki et al. Unified denoising and dereverberation method used in restoration of MTF-based power envelope
Schäfer Multi-channel audio-processing: enhancement, compression and evaluation of quality

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190531

CF01 Termination of patent right due to non-payment of annual fee