WO2014089914A1 - 一种基于双麦克的语音混响消减方法和装置 - Google Patents

一种基于双麦克的语音混响消减方法和装置 Download PDF

Info

Publication number
WO2014089914A1
WO2014089914A1 PCT/CN2013/001557 CN2013001557W WO2014089914A1 WO 2014089914 A1 WO2014089914 A1 WO 2014089914A1 CN 2013001557 W CN2013001557 W CN 2013001557W WO 2014089914 A1 WO2014089914 A1 WO 2014089914A1
Authority
WO
WIPO (PCT)
Prior art keywords
input signal
reverberation
microphone input
main microphone
spectrum
Prior art date
Application number
PCT/CN2013/001557
Other languages
English (en)
French (fr)
Inventor
楼厦厦
李波
黄秋晨
Original Assignee
歌尔声学股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔声学股份有限公司 filed Critical 歌尔声学股份有限公司
Priority to US14/411,651 priority Critical patent/US9414157B2/en
Priority to KR1020147036443A priority patent/KR101502297B1/ko
Priority to DK13863250.0T priority patent/DK2858379T3/en
Priority to EP13863250.0A priority patent/EP2858379B1/en
Priority to JP2015524601A priority patent/JP5785674B2/ja
Publication of WO2014089914A1 publication Critical patent/WO2014089914A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/002Damping circuit arrangements for transducers, e.g. motional feedback circuits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/009Signal processing in [PA] systems to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2410/00Microphones
    • H04R2410/05Noise reduction with a separate noise microphone

Definitions

  • the present invention relates to the field of voice enhancement technologies, and in particular, to a dual microphone-based voice reverberation reduction method and apparatus.
  • a hard interface such as a wall or a ground
  • the sound signals, these non-direct sounds constitute the reverberation signal.
  • the sound signal after one or a few reflections is called the early reflection signal, and the early reflection signal constitutes the early reverberation signal, and the early reverberation signal can enhance the speech.
  • the sound signal after multiple reflections is called the late reflection signal, and the late reflection signal constitutes the late reverberation signal. When the late reverberation is strong, the clarity of the speech is reduced.
  • the microphone receiving signal includes a direct sound signal and a reverberation signal, and as described above, the reverberation can be further divided into early reverberation and late reverberation. Among them, the main reason for reducing the speech intelligibility is the late reverberation, and the early reverberation generally enhances the speech. So the key to improving clarity is to reduce the late reverberation signal.
  • the spectral subtraction reverberation method based on double microphone has received much attention.
  • the structure of the adaptive beamforming (GSC) is used to obtain two signals, and the first signal is the output of the delay-sum beamformer;
  • the two-way signal is the output of the blocking matrix.
  • the energy envelope of the two signals is used to estimate the reverberation of the first signal through an adaptive filter, and the spectral subtraction is used to remove the reverberation.
  • the present invention has been made in order to provide a dual microphone based speech reverberation reduction method and apparatus that overcomes the above problems or at least partially solves the above problems.
  • a dual microphone-based speech reverberation reduction method comprising:
  • the main mic input signal de-reverberation time domain signal is superimposed frame by frame
  • the main mic input signal is outputted to the reverberation continuous signal.
  • a dual microphone-based voice reverberation reduction apparatus which processes a signal received by a primary mic and a secondary mic on a frame-by-frame basis; the apparatus includes: a reverberation spectrum estimation unit and a spectral subtraction unit , among them:
  • the reverberation spectrum estimation unit is configured to receive the main microphone input signal and the auxiliary microphone input signal, calculate a transfer function of the auxiliary microphone to the main microphone according to the main microphone input signal and the auxiliary microphone input signal, and obtain the trailing part of the transfer function/? And, according to the transfer function wt) to judge the strength of the reverberation, calculate the adjustment factor of the gain function is output to the speech reduction unit, and use the auxiliary microphone input signal to convolve with (t) to obtain the late reverberation of the main microphone input signal. Estimating the signal, performing time domain to frequency domain conversion on the late reverberation estimation signal of the main microphone input signal, obtaining the late reverberation of the main microphone input signal, and outputting to the speech reduction unit;
  • a spectral subtraction unit configured to receive an adjustment factor of a gain function of the main microphone input signal and the reverberation spectrum estimation unit, and a late reverberation spectrum of the main microphone input signal, and perform time domain to frequency domain conversion on the main microphone input signal to obtain a main
  • the spectrum of the mic input signal is calculated according to the spectrum of the main mic input signal, the adjustment factor of the gain function, and the late reverberation spectrum of the main mic input signal, and the main mic input signal is obtained by multiplying the spectrum of the main mic input signal by the gain function.
  • frequency-domain to time-domain conversion is performed on the spectrum of the main microphone input signal after de-reverberation, and the time-domain signal after the main microphone input signal is dereverbered is obtained, and the main microphone input signal is de-reverbered.
  • the time domain signal is superimposed frame by frame, the main microphone input signal is outputted to the continuous signal after the reverberation.
  • the present invention calculates the tail portion of the transfer function wt) by calculating the transfer function of the auxiliary mic to the main mic according to the main mic input signal and the auxiliary mic input signal and determines the strength of the reverb according to the transfer function wt).
  • the adjustment factor of the gain function is then convolved according to the auxiliary microphone input signal and ut) to obtain the late reverberation estimation signal of the main microphone input signal, and according to the spectrum of the main microphone input signal, the adjustment factor of the gain function, and the main microphone input.
  • Late reverberation spectrum calculation gain of the signal Function multiplying the spectrum of the main mic input signal by the gain function to obtain the spectrum of the main mic input signal after reverberation, that is, subtracting the late reverberation estimation spectrum of the main mic input signal from the spectrum of the main mic input signal by spectral subtraction, Therefore, the late reverberation can be effectively eliminated from the input signal of the main microphone, and the early reverberation is retained, so that the processed sound is not thinned, and the quality of the speech is improved.
  • the intensity of the spectrum is adjusted according to the intensity of the reverberation.
  • FIG. 1 is a schematic diagram showing a transfer function of an excitation signal to a microphone input signal according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a transfer function of a secondary microphone to a primary microphone according to an embodiment of the present invention
  • FIG. 4 is a schematic overall flow diagram of a dual microphone-based voice reverberation reduction method in still another embodiment of the present invention.
  • 5a is a schematic diagram of a transfer function of a secondary mic to a primary mic when the distance from the sound source to the main mic is 0.5 m in the embodiment of the present invention
  • 5b is a schematic diagram of a transfer function of a secondary mic to a primary mic when the distance from the sound source to the main mic is lm in the embodiment of the present invention
  • 5c is a schematic diagram of a transfer function of a secondary mic to a primary mic when the distance from the sound source to the main mic is 2 m in the embodiment of the present invention
  • 5d is a schematic diagram of a transfer function of a secondary mic to a primary mic when the distance from the sound source to the main mic is 4 m in the embodiment of the present invention
  • 6a is a schematic diagram showing amplitude-frequency characteristics of a frequency compensation filter when the spacing between the main and auxiliary microphones is 6 cm in the embodiment of the present invention
  • 6b is a schematic diagram showing amplitude-frequency characteristics of a frequency compensation filter when the spacing between the main and auxiliary microphones is 18 cm in the embodiment of the present invention
  • Figure 7a is a time-domain diagram of a primary microphone input signal in an embodiment of the present invention.
  • Figure 7b is a time-domain diagram of the main microphone after reverberation in the embodiment of the present invention.
  • Figure 7c is a chromatogram of the main microphone input signal in the embodiment of the present invention.
  • Figure 7d is a spectrum diagram of the main microphone after reverberation in the embodiment of the present invention.
  • FIG. 8 is a structural diagram of a dual-microphone based speech reverberation reduction device according to an embodiment of the present invention.
  • FIG. 9 is a detailed structural diagram of a dual microphone-based voice reverberation reduction device and an input/output diagram thereof in a preferred embodiment of the present invention.
  • the invention proposes a de-reverberation scheme based on double microphone, fully utilizes the approximation relationship between the reverberation and the double-macro space transfer function, estimates the late reverberation and judges the reverberation strength and the spectrum by using the double-macro space transfer function.
  • the reduced module fits in a variety of reverberant environments to achieve near-optimal speech quality while meeting clarity.
  • the solution of the present invention does not require separation of direct sound and does not require direction of arrival estimation, thus eliminating the need for microphone consistency and relaxing the requirements for acoustic design.
  • the basic principle of the present invention is: Estimating the late reverberation through the tail of the transfer function between the double mics, so that the spectral subtraction can preserve the direct sound and the early reverberation.
  • the head and tail energy difference of the double-mike transfer function is further used to estimate the degree of room reverberation, and the spectral decrement is adjusted.
  • the reverberation is weak, the speech refinement is reduced or even reduced, and the speech quality is protected.
  • FIG. 1 is a schematic diagram showing a transfer function of an excitation signal to a microphone input signal in an embodiment of the present invention.
  • the maximum peak value corresponds to the direct sound in the transfer function of the excitation signal to the microphone input signal.
  • a point from the maximum peak is used as the boundary point between the early reflection and the late reflection, and the maximum peak to the boundary point.
  • the part after the demarcation point corresponds to the late reverberation.
  • the cutoff point is 50ms.
  • the excitation signal is recorded as s(f)
  • the microphone input signal is recorded as x(t)
  • the transfer function of the excitation signal to the microphone input signal is recorded as t/(t)
  • the transfer function corresponding to the direct sound and the early reverberation portion is recorded as Tf person t
  • the transfer function corresponding to the late reverberation part is recorded as (0.
  • J w 2 (t)dt vv(t) is the transfer function of the excitation signal to the microphone input signal. 0 ⁇ 50ms corresponds to the direct sound and the early reverberation part, and corresponds to the late reverberation part after 50ms. The stronger the reverb, C 5 . The smaller the value. Go to C 5 before and after reverberation. The improvement can reflect the effect of de-reverberation, therefore, C 5 . Can be used as an objective measure of dereverberation.
  • the principle of reverberation estimation based on dual mics (primary mic and auxiliary mic) in the present invention is as follows: the input signal of the primary mic is recorded as x 2 (t), the input signal of the secondary mic is recorded as ⁇ ), and the transmission of the secondary mic to the primary mic is performed.
  • 2 is a schematic diagram of a transfer function/ ⁇ 0 of a secondary mic to a primary mic given in an embodiment of the present invention.
  • head and tail can be divided into two parts: head and tail:
  • the trailing part of h(t) / ⁇ (t) reflects the multiple reflection of the signal in space, so the tailing part h r (t) and the convolution signal of the auxiliary mic input signal ⁇ ) and the late stage of the main mic
  • the reverberation components are similar and can be used as an estimate of the main reverberation component of the main mic. Select a point on / ⁇ 0 as the demarcation point of (t) and / ut, and set / before the demarcation point to 0 to get /t).
  • the distance from the boundary point to the maximum peak of Kt) can be set to 30ms to 80ms (experience value).
  • the demarcation point is taken as 50 ms as an example for description.
  • FIG. 3 is a schematic flow chart of a dual microphone-based voice reverberation reduction method in an embodiment of the present invention. As shown in FIG. 3, the method mainly includes a reverberation estimation part and a spectrum subtraction part, and specifically, the frame-by-frame # is processed as follows:
  • the main microphone input signal (0 time domain to frequency domain conversion to obtain the spectrum r 2 of the main microphone input signal;
  • the main mic input signal is dereverbered, the time domain signal is superimposed frame by frame, and the main mic is output.
  • the input signal goes to the continuous signal after reverberation 3 ⁇ 4 (0.
  • the secondary microphone input signal is obtained by convolution with the auxiliary microphone, a late reverberation estimation signal of the main microphone input signal is obtained, and then the main microphone is subtracted from the spectrum of the main microphone input signal by spectral subtraction.
  • the late reverberation estimation spectrum of the input signal can effectively eliminate the late reverberation from the input signal of the main mic, while retaining its early reverberation and improving the quality of the speech.
  • the scheme shown in Fig. 3 is in the late stage of estimation. In the reverberation, the intensity of the spectrum is adjusted according to the intensity of the reverberation.
  • the reverberation is weak, the spectrum is reduced or not reduced, which ensures that the speech quality will not be damaged in the case of weak reverberation and high speech intelligibility. Voice quality. And in this scheme, it is not necessary to accurately estimate the direction of the direct sound wave, so the microphone is not required to have a high uniformity of 4 inches, and there is no strict limitation on the acoustic design.
  • the late reverberation estimation signal of the main mic input signal is compared with the real late reverberation component of the main mic input signal, and the late reverberation estimation is performed.
  • the low pass filter is designed according to different mic spacing to perform corresponding frequency compensation on the late reverberation estimation signal. See the embodiment shown in Figure 4 for details.
  • FIG. 4 is a schematic overall flow chart of a dual microphone-based voice reverberation reduction method in still another embodiment of the present invention.
  • the input to the entire system is the auxiliary microphone input signal (t) and the main microphone input signal x 2 (t), and the output is the de-reverbered signal X).
  • It consists of two parts: the reverberation estimation process and the spectral subtraction process.
  • Figure 4 increases the frequency compensation step for the late reverberation estimation signal compared to the method flow shown in Figure 3.
  • the step of frequency compensation for the late reverberation estimation signal in Figure 4 is step 1.45, time-frequency domain conversion. The steps are still marked as step 1.5).
  • the method will be described in detail below with reference to FIG. 4:
  • auxiliary signal (t) of the auxiliary microphone input signal x 2 (t) of the main microphone
  • Output the adjustment factor of the gain function (as an input to the spectral subtraction process), the late reverberation spectrum of the main mic input signal (as an input to the spectral subtraction process);
  • the reverberation spectrum estimates include six steps of 1.1, 1.2, 1.3, 1.4, 1.45, and 1.5.
  • main mic input signal (0, adjustment factor of gain function (output of reverberation estimation process), post-reverberation spectrum of main mic (output of reverberation spectrum estimation process);
  • Input of 1.1 Input signal (t) of the auxiliary microphone and input signal x 2 (t) of the main microphone.
  • Output of 1.1 Transfer function of the auxiliary mic to the main mic /Kt) (as input to 1.2).
  • the transfer function H is calculated using the mutual power spectrum of the secondary microphone input signal (t) and the primary microphone input signal x 2 (t) and the power spectrum ⁇ of the secondary microphone input signal (t):
  • the frequency domain transfer function H is inverse Fourier transformed to obtain the time domain transfer function /t).
  • different methods may be used for calculation, such as an adaptive filtering method, etc., which will not be described in detail herein.
  • Input 1.2 The transfer function of the auxiliary mic to the main mic / ⁇ t) (1.1 output).
  • Output of 1.2 The trailing part of the transfer function of the secondary mic to the main mic (t) (as input to 1.4).
  • the demarcation point of the early reverberation and the late reverberation is taken on the time axis of the transfer function, and the value before the demarcation point of the transfer function Kt) is set to 0, that is, the trailing part of the transfer function is obtained. (0.
  • a point is selected on /, the distance from the maximum peak to which the point is reached is 50 ms, and the value of / ⁇ t) before the point is set to 0, denoted as h People are ashamed.
  • Input of 1.3 Transfer function of auxiliary mic to main mic / ⁇ t) (output of 1.1).
  • Output of 1.3 The adjustment factor of the gain function (as an input to the spectral subtraction process).
  • the adjustment factor of the gain function is calculated by judging the reverberation strength.
  • the ratio of the head energy of the transfer function of the secondary mic to the primary mic to the energy of the trailing portion is logarithmically p:
  • is the specified demarcation point on the time axis of / ⁇ t).
  • the demarcation point T is not necessarily the demarcation point of the early reverberation and late reverberation, but the demarcation point T must include the direct sound before, and may include some or all of the early reverberation.
  • FIG. 5a is a schematic diagram of a transfer function of a secondary mic to a primary mic when the distance from the sound source to the main mic is 0.5 m in the embodiment of the present invention.
  • FIG. 5b is a schematic diagram of a transfer function of a secondary mic to a primary mic when the distance from the sound source to the main mic is lm in the embodiment of the present invention.
  • FIG. 5c is a schematic diagram of the transfer function of the auxiliary mic to the main mic when the distance from the sound source to the main mic is 2 m in the embodiment of the present invention.
  • the value range of T is 20ms ⁇ 50ms.
  • T is 50 ms (that is, the time point at which the maximum peak of the distance / ⁇ t) is 50 ms)
  • FIG. 5d is a schematic diagram of the transfer function of the auxiliary mic to the main mic when the distance from the sound source to the main mic is 4 m in the embodiment of the present invention.
  • formula (6) is a calculated empirical formula in the embodiment of the present invention:
  • the input signal of the auxiliary microphone t) is convoluted with the trailing part Wt) of the transfer function of the auxiliary microphone to the main microphone to obtain a late reverberation estimation signal (t) of the main microphone input signal.
  • Input of 1.4 Input signal of the auxiliary microphone (0, the trailing part of the transfer function of the auxiliary microphone to the main microphone Wt) (output of 1.2).
  • Output of 1.4 Late reverberation estimation signal of the main mic input signal (as input of 1.45).
  • Input of 1.45 Late reverberation estimation signal r'(t) of the main mic input signal (output of 1.4).
  • Output of 1.45 Late reverberation estimation signal r _ EQ ⁇ t) of the frequency-compensated main mic input signal (as input to 1.5).
  • the late reverberation estimation signal of the main mic input signal is less estimated than the true late reverberation component of the main mic input signal in the low frequency portion of the late reverberation estimation signal (t). Therefore, in the present invention, the late reverberation estimation signal of the main mic input signal is subjected to frequency compensation.
  • the spacing between the main and auxiliary microphones may affect the late reverberation estimation signal. Therefore, in the embodiment of the present invention, the low-pass filter is designed according to different mic spacing to perform corresponding frequency compensation on the late reverberation estimation signal, and the compensated Late reverberation estimation signal _ EQ ⁇ t).
  • FIG. 6a is a frequency compensation filter when the spacing between the main auxiliary microphones is 6 cm in the embodiment of the present invention.
  • Fig. 6b is a schematic diagram showing the amplitude-frequency characteristics of the frequency compensation filter when the pitch between the main auxiliary microphones is 18 cm in the embodiment of the present invention. It can be seen that in the embodiment of the present invention, the greater the distance between the primary mic and the secondary mic, the less the degree of frequency compensation for the low frequency portion of the late reverberation estimation signal of the primary mic input signal.
  • Late reverberation estimation signal f _ EQ(t) (output of 1.45) of the frequency-compensated main mic input signal.
  • Output of 1.5 The late reverberation spectrum of the main mic input signal (as an input to the spectral subtraction process).
  • the input signal of the main mic ⁇ 2 (0 is transformed from the time domain to the frequency domain, denoted as 2) .
  • Input of 2.1 Input signal x 2 (t) of the main mic.
  • Output of 2.1 Spectrum ⁇ 2 of the main mic input signal (as input to 2.2).
  • the specific formula is as follows:
  • the gain function G is calculated from the spectrum 2 of the main mic input signal and the estimated late reverberation spectrum of the main mic, and the gain function is adjusted according to the adjustment factor.
  • Input of 2.2 Spectrum ⁇ 2 of the main mic input signal (output of 2.1), late reverberation spectrum of the main mic (1.5 output during reverberation spectrum estimation), gain function adjustment factor (reverberation spectrum estimation process) 1.3 output).
  • the power spectrum subtraction method is used to calculate the gain function G(l, k) according to the following formula:
  • / is the frame number, is the frequency point number, is the adjustment factor of the gain function, the late reverberation spectrum of the main microphone input signal, and the spectrum of the main microphone input signal.
  • Input of 2.3 Spectrum of the main mic input signal ⁇ 2 (output of 2.1), gain function G (output of 2.2).
  • Output of 2.3 The spectrum of the main mic input signal after de-reverberation (as input to 2.4). Specifically, the spectrum of the main microphone input signal after de-reverberation is calculated according to the following formula) ( ):
  • D(l, k) G(l, k) - - exp(j ⁇ phaseij, k)) ( 11 )
  • / is the frame number, is the frequency point number
  • is the amplitude of the input signal of the main microphone, which is the gain function, phased, ⁇ ) The phase of the main mic input signal.
  • Input of 2.4 Spectrum of the main mic input signal after reverberation) (output of 2.3).
  • Output of 2.4 The time domain signal i (t) of the main mic input signal after reverberation (as input of 2.5).
  • Input of 2.5 Time-domain signal i (t) of the main mic input signal after reverberation (output of 2.4).
  • Output of 2.5 The main mic input signal goes to the continuous signal after reverberation 3 ⁇ 4 (0 (output of the entire system).
  • Figure 7a is a time-domain diagram of the main mic input signal in the embodiment of the present invention
  • Figure 7b is a time-domain diagram of the main mic dereverberation in the embodiment of the present invention
  • Figure 7c is a main mic input in the embodiment of the present invention
  • Figure 7d is a spectral diagram of the main microphone after reverberation in the embodiment of the present invention.
  • the main auxiliary microphone is facing the sound source, the vertical distance of the sound source to the double microphone is 2m, and the main microphone input signal is 18cm, the main microphone input signal is de-reverbered before the C 5Q
  • the C 5Q after reverberation with the scheme shown in Figure 4 is 10.5 dB, and it can be seen that C 5Q is improved by 3.7 dB after using the scheme of the present invention.
  • Fig. 8 is a structural diagram showing the structure of a speech reverberation reduction apparatus based on dual microphones in an embodiment of the present invention.
  • the apparatus processes the signals received by the primary mic and the secondary mic on a frame-by-frame basis.
  • the apparatus includes: a reverberation estimation unit 700 and a speech reduction unit 800, wherein:
  • the reverberation spectrum estimating unit 700 is configured to receive the main mic input signal and the auxiliary mic input signal, calculate a transfer function of the auxiliary mic to the main mic according to the main mic input signal and the auxiliary mic input signal, and obtain a transfer function of the transfer function Wt)
  • the tail part /t), and according to the transfer function Wt) to judge the strength of the reverberation, calculate the adjustment factor of the gain function is output to the speech reduction unit 800, and the auxiliary microphone input signal is convolved with (t) to obtain the main microphone input.
  • the post-reverberation estimation signal of the signal performs time-domain to frequency-domain conversion on the late reverberation estimation signal of the main mic input signal, and obtains a late reverberation spectrum of the main mic input signal, and outputs the post-reverberation spectrum to the spectral subtraction unit 800.
  • the spectrum subtraction unit 800 is configured to receive the adjustment factor of the gain function output by the main microphone input signal and the reverberation estimating unit 700 and the late reverberation spectrum of the main microphone input signal, and perform time domain to frequency domain conversion on the main microphone input signal. Obtaining the spectrum of the main mic input signal, according to the spectrum of the main mic input signal, increasing The adjustment factor of the benefit function and the late reverberation spectrum of the main mic input signal calculate the gain function. The spectrum of the main mic input signal is multiplied by the gain function to obtain the spectrum of the main mic input signal after dereverberation, and the main mic input signal is dereverbered.
  • the time domain signal after the main microphone input signal is dereverbered is obtained, and the time domain signal after the main microphone input signal is dereverbered is superimposed frame by frame, and the main microphone is outputted.
  • the input signal goes to the continuous signal after reverberation.
  • the reverberation estimating unit 700 convolves with the (t) after the auxiliary microphone input signal, and obtains the post-reverberation estimation signal of the main microphone input signal, and then performs the post-mixing of the main microphone input signal.
  • the estimated signal is subjected to frequency compensation, and then the frequency-compensated signal is subjected to time domain to frequency domain conversion to obtain a late reverberation of the main microphone input signal and output to the speech reduction unit 800.
  • the dual microphone-based voice reverberation reduction apparatus includes: a reverberation estimation unit 91 and a speech reduction unit 92.
  • the reverberation estimation unit 91 includes: a transfer function calculation unit 911, a transfer function smear calculation unit 912, a reverberation strength determination unit 913, a late reverberation estimation unit 914, a frequency compensation unit 915, and a first time-frequency conversion unit 916.
  • the speech subtraction unit 92 includes: a second time-frequency conversion unit 921, a gain function calculation unit 922, a de-reverberation unit 923, a frequency-time conversion unit 924, and a splicing addition unit 925.
  • the transfer function calculating unit 911 is configured to receive the main mic input signal and the auxiliary mic input signal, calculate a transfer function of the auxiliary mic to the main mic according to the main mic input signal and the auxiliary mic input signal, and output the transfer function / ⁇ 0 to the transfer function
  • the smear calculation unit 912 and the reverberation strength determination unit 913 are configured to receive the main mic input signal and the auxiliary mic input signal, calculate a transfer function of the auxiliary mic to the main mic according to the main mic input signal and the auxiliary mic input signal, and output the transfer function / ⁇ 0 to the transfer function.
  • the transfer function smear calculation unit 912 is used to obtain the trailing part of the transfer function /?(t)/? ), and output to the late reverberation estimation unit 914.
  • the transfer function smear calculation unit 912 specifically takes the demarcation point of the early reverberation and the late reverberation on the time axis of the transfer function Wt), sets the value before the demarcation point of the transfer function / ⁇ t) to 0, and obtains the transfer function/ The trailing part of (t) (t).
  • the reverberation strength judging unit 913 is configured to judge the strength of the reverberation according to the transfer function / Kt), and calculate the adjustment factor of the gain function to output to the gain function calculation unit. Specifically, the reverberation strength judging unit 913 indicates the parameter p of the reverberation strength based on the previous calculation.
  • the reverberation strength judging unit 913 calculates the adjustment factor of the gain function in accordance with the aforementioned formula (6).
  • p 2 takes 2dB (the mic spacing is 6cm).
  • the late reverberation estimating unit 914 is configured to receive the auxiliary mic input signal, and use the auxiliary mic input signal to h r (t) is convoluted to obtain a late reverberation estimation signal of the main microphone input signal and output to the frequency compensation unit
  • the frequency compensation unit 915 is configured to perform frequency compensation on the late reverberation estimation signal of the main microphone input signal, and output the frequency compensated signal to the first time frequency conversion unit 916. The greater the distance between the primary mic and the secondary mic, the less the frequency compensation unit 915 performs frequency compensation for the late reverberation estimation signal of the primary mic input signal.
  • the first time-frequency conversion unit 916 is configured to perform time-domain to frequency-domain conversion on the post-resonance estimation signal of the frequency-compensated main mic input signal, obtain a late reverberation spectrum of the main mic input signal, and output the signal to the gain function. Calculation unit 922.
  • the second time-frequency converting unit 921 is configured to receive the main mic input signal, perform time domain to frequency domain conversion, obtain a spectrum of the main mic input signal, and output the signal to the gain function calculating unit 922 and the dereverberation unit 923.
  • the gain function calculation unit 922 is configured to adjust the spectrum of the main microphone input signal output by the second time-frequency conversion unit 921, the adjustment factor of the gain function output by the reverberation strength determination unit 913, and the main output of the first time-frequency conversion unit 916.
  • the gain signal is calculated from the late reverberation spectrum of the mic input signal, and the gain function is output to the dereverberation unit 923.
  • the gain function calculation unit 922 can calculate the gain function according to the aforementioned formula (10). That is, G(l, k) where / is the frame number, is the frequency point number, is the gain function
  • the adjustment factor, ⁇ is the late reverberation spectrum of the main mic input signal, and ⁇ 2 is the spectrum of the main mic input signal.
  • the reverberation unit 923 extracts the spectrum of the main mic input signal to the reverberation by multiplying the spectrum of the main mic input signal by the gain function, and outputs it to the frequency conversion unit 924.
  • the dereverberation unit 923 calculates the spectrum after the main mic input signal is dereverbered according to the aforementioned formula (1 1 ).
  • D(l, k) G(l, k) - - exp(j - phased k)) .
  • / is the frame number
  • A is the frequency point number
  • is the amplitude of the main mic input signal
  • G(/, ) is the gain function, 1 ⁇ 2 « /, ) The phase of the input signal to the main mic.
  • the frequency conversion unit 924 is configured to perform frequency domain to time domain conversion on the spectrum of the main microphone input signal after dereverberation, to obtain a time domain signal after the main microphone input signal is dereverberated and output to the splicing and adding unit 925. .
  • the splicing and adding unit 925 is configured to add the time domain signals output by the time-frequency converting unit 924 frame by frame to obtain a continuous signal after the main mic input signal is dereverbered.
  • the dual microphone-based voice reverberation reduction apparatus processes the signals received by the primary mic and the secondary mic on a frame-by-frame basis.
  • a reverberation spectrum estimation unit in the apparatus configured to receive an input signal x 2 (t) of the primary microphone and a secondary microphone input signal ⁇ ), and calculate a transfer function of the secondary microphone to the primary microphone according to x 2 (t) and ⁇ ) Wt), get the trailing part of / ⁇ t) /?
  • the spectral subtraction unit in the device is used for When x 2 (t) is in progress Conversion to frequency domain to obtain a spectrum x 2 (t) of the spectrum x 2 (t), and calculating a gain function using the spectrum by a gain function x 2 (t) obtained by x 2 (t) dereverberation
  • the latter spectrum is converted from the frequency domain to the time domain to obtain a time domain signal after x 2 (t) dereverberation.
  • the secondary mic input x) signal is convolved with / Ut)
  • the late reverberation estimation signal r'(t) of the main mic input signal x 2 (t) is obtained, and then spectral subtraction is performed.
  • the post-reverberation estimation spectrum R of the main mic input signal is subtracted from the spectrum of the main mic input signal x 2 (t), so that the late reverberation can be effectively eliminated from the input signal x 2 (t) of the main mic, and Retaining its early reverberation improves the quality of the voice.
  • the invention adjusts the spectral decrement strength according to the reverberation strength in the estimation of the late reverberation, and does not even do the spectral subtraction when the reverberation is weak, which ensures that the reverberation is weak and the speech intelligibility is originally high. It does not damage the voice and protects the voice quality.
  • the technical solution of the present invention effectively protects the voice while removing the reverberation, automatically estimates the strength of the room reverberation, and selects appropriate processing in various environments to achieve near-optimal speech quality.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

本发明公开了一种基于双麦克的语音混响消减方法和装置。该方法中,根据主麦克的输入信号ϰ2t)和辅麦克的输入信号ϰ1t)计算辅麦克到主麦克的传递函数ht),取ht)的拖尾部分h r t),根据ht)判断混响的强弱,计算出增益函数的调节因子β;ϰ1t)与h r t)作卷积,得到ϰ2t)的后期混响估计信号r̂(t),并根据ϰ2t)的频谱、β以及r̂(t)的频谱计算增益函数,用ϰ2t)的频谱乘以增益函数得到ϰ2t)去混响后的频谱,频时转换得到ϰ2t)去后期混响后的时域信号。这样能从主麦克输入信号中消除后期混响,而保留早期混响,不会使处理后的声音变得单薄,提高了语音的质量。同时根据混响强弱调节谱减力度,保证了在混响弱、语音清晰度原本比较高的情形下不会损伤语音。并且不需要准确估计直达声波达方向,因而不要求麦克风具有很高一致性。

Description

一种基于双麦克的语音混响消减方法和装置 技术领域 本发明涉及语音增强技术领域, 特别涉及一种基于双麦克的语音混响消减 方法和装置。 发明背景 声音信号在室内的传播过程中, 由于墙壁、 地面等硬质界面对声音的反射, 到达传声器的声音除了直接从声源传来的直达声外, 还有经过一次或多次反射 传来的声音信号, 这些非直达声构成了混响信号。 经过一次或少量的几次反射 的声音信号称为早期反射信号, 早期反射信号构成了早期混响信号, 早期混响 信号能够对语音起到增强作用。 经过多次反射的声音信号称为后期反射信号, 后期反射信号构成了后期混响信号, 后期混响较强则会降低语音的清晰度。
在一些免提语音通讯中, 通话者距离麦克风较远, 语音清晰度会因房间混 响而下降, 导致通话质量下降。 因此需要一些技术消减混响, 提升语音清晰度。 麦克风接收信号包括直达声信号和混响信号, 而由前述可知混响又可以分为早 期混响和后期混响。 其中降低语音清晰度的主要是后期混响, 而早期混响一般 对语音有增强作用。 因此提升清晰度的关键是降低后期混响信号。
在各种混响消减技术中, 基于双麦克的谱减去混响方法受到较多关注。 现 有的一种基于双麦克风谱减的去混响方法中, 釆用自适应波束形成(GSC ) 的 结构得到两路信号, 第一路信号是延时 -求和波束形成器的输出; 第二路信号是 阻塞矩阵的输出。 两路信号的能量包络通过一个自适应滤波器估计出第一路信 号的混响, 再利用谱减法去除混响。 这种方法有几个缺点:
1 )会去除早期混响, 使处理后的声音变得单薄。
2 )对混响强弱没有判断, 在不同混响情况下都沿用一样的谱减处理, 这样 在混响弱、 语音清晰度原本比较高时可能损伤语音质量。
3 )需要准确估计直达声波达方向, 分离直达声, 因而要求麦克风具有很高 一致性, 对声学设计也有严格的限制。 发明内容 鉴于上述问题, 提出了本发明以便提供一种克服上述问题或者至少部分地 解决上述问题的基于双麦克的语音混响消减方法和装置。
依据本发明的一个方面, 提供了一种基于双麦克的语音混响消减方法, 该 方法包括:
接收主麦克输入信号和辅麦克输入信号, 逐帧做如下处理:
根据主麦克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数 h(t) ;
获取传递函数 /?(t)的拖尾部分/? ) , 并根据传递函数 /?(t)判断混响的强弱, 计算出增益函数的调节因子^ ;
用辅麦克输入信号与 作卷积, 得到主麦克输入信号的后期混响估计信 号;
对主麦克输入信号的后期混响估计信号进行时域到频域的转换得到主麦克 输入信号的后期混响谱; 对主麦克输入信号进行时域到频域的转换得到主麦克 输入信号的频谱;
根据主麦克输入信号的频谱、增益函数的调节因子 以及主麦克输入信号的 后期混响谱计算增益函数;
用主麦克输入信号的频谱乘以增益函数得到主麦克输入信号去混响后的频 谱;
对主麦克输入信号去混响后的频谱进行频域到时域的转换, 得到主麦克输 入信号去混响后的时域信号;
将主麦克输入信号去混响后的时域信号逐帧叠接相加后, 输出主麦克输入 信号去混响后的连续信号。
依据本发明的另一方面, 提供了一种基于双麦克的语音混响消减装置, 对 主麦克和辅麦克接收到的信号逐帧进行处理; 该装置包括: 混响谱估计单元和 谱减单元, 其中:
混响谱估计单元, 用于接收主麦克输入信号和辅麦克输入信号, 根据主麦 克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数 wo , 获取传递函 数 的拖尾部分/? ), 并根据传递函数 wt)判断混响的强弱, 计算出增益函数 的调节因子 输出给语减单元, 用辅麦克输入信号与 (t)作卷积, 得到主麦克输 入信号的后期混响估计信号, 对主麦克输入信号的后期混响估计信号进行时域 到频域的转换, 得到主麦克输入信号的后期混响语后输出给语减单元;
谱减单元, 用于接收主麦克输入信号和混响谱估计单元输出的增益函数的 调节因子以及主麦克输入信号的后期混响谱, 对主麦克输入信号进行时域到频 域的转换得到主麦克输入信号的频谱, 根据主麦克输入信号的频谱、 增益函数 的调节因子 以及主麦克输入信号的后期混响谱计算增益函数,用主麦克输入信 号的频谱乘以增益函数得到主麦克输入信号去混响后的频谱, 对主麦克输入信 号去混响后的频谱进行频域到时域的转换, 得到主麦克输入信号去混响后的时 域信号, 将主麦克输入信号去混响后的时域信号逐帧叠接相加后, 输出主麦克 输入信号去混响后的连续信号。
由上述可见, 本发明通过根据主麦克输入信号和辅麦克输入信号计算辅麦 克到主麦克的传递函数 取传递函数 wt)的拖尾部分/ 并且根据传递函 数 wt)判断混响的强弱, 计算出增益函数的调节因子 ; 然后根据辅麦克输入信 号与 ut)作卷积,得到主麦克输入信号的后期混响估计信号,并根据主麦克输入 信号的频谱、 增益函数的调节因子 以及主麦克输入信号的后期混响谱计算增益 函数, 用主麦克输入信号的频谱乘以增益函数得到主麦克输入信号去混响后的 频谱, 即通过谱减法从主麦克输入信号的频谱中减去主麦克输入信号的后期混 响估计谱, 因此能从主麦克的输入信号中有效地消除其后期混响, 而保留其早 期混响, 不会使处理后的声音变得单薄, 提高了语音的质量。 同时在估计后期 混响中根据混响强弱调节谱减力度, 在混响弱的时候少做甚至不做谱减, 保证 了在混响弱、 语音清晰度原本比较高的情形下不会损伤语音, 保护语音质量。 并且这种方案中不需要准确估计直达声波达方向, 因而不要求麦克风具有很高 一致性, 对声学设计也没有严格的限制。 附图简要说明 图 1是本发明实施例中给出的激励信号到麦克输入信号的传递函数示意图; 图 2是本发明实施例中给出的辅麦克到主麦克的传递函数的示意图; 图 3 是本发明一个实施例中的一种基于双麦克的语音混响消减方法的流程 示意图;
图 4是本发明又一个实施例中的基于双麦克的语音混响消减方法的整体流 程示意图;
图 5a是本发明实施例中的声源到主麦克距离为 0.5m时辅麦克到主麦克的 传递函数的示意图;
图 5b是本发明实施例中的声源到主麦克距离为 lm时辅麦克到主麦克的传 递函数的示意图;
图 5c是本发明实施例中的声源到主麦克距离为 2m时辅麦克到主麦克的传 递函数的示意图;
图 5d是本发明实施例中的声源到主麦克距离为 4m时辅麦克到主麦克的传 递函数的示意图;
图 6a是本发明实施例中的主辅麦克之间的间距为 6cm时的频率补偿滤波器 的幅频特性示意图;
图 6b是本发明实施例中的主辅麦克之间的间距为 18cm时的频率补偿滤波 器的幅频特性示意图;
图 7a是本发明实施例中的主麦克输入信号的时域图;
图 7b是本发明实施例中的主麦克去混响后的时域图;
图 7c是本发明实施例中的主麦克输入信号的语谱图;
图 7d是本发明实施例中的主麦克去混响后的语谱图;
图 8是本发明实施例中的一种基于双麦克的语音混响消减装置的组成结构 图;
图 9是本发明的一个较佳实施例中的基于双麦克的语音混响消减装置的详 细组成结构及其输入输出示意图。 实施本发明的方式 首先需要声明的是: 为使申请文件简要, 在本申请文件中将"麦克风 "简称 为"麦克"。
根据对现有技术的分析, 为了更好的混响消减需要在去除后期混响的同时 保护直达声和早期混响, 因而需要准确稳定的后期混响估计和混响强弱判断。
本发明提出了基于双麦克的去混响方案, 充分的利用了混响和双麦克空间 传递函数之间的近似关系, 利用双麦克空间传递函数估计后期混响并判断混响 强弱, 与谱减模块配合在各种混响环境下都可以在满足清晰度的同时得到接近 最优的语音质量。 另外本发明中的方案不需要分离直达声也不用做波达方向估 计, 因而不要求麦克风一致性, 放宽了对声学设计的要求。
本发明的基本原理是: 通过双麦克之间传递函数的尾部估计后期混响, 因 此谱减中可以艮好的保留直达声和早期混响。 并且在估计后期混响中进一步利 用双麦克传递函数的头部和尾部能量差异估计房间混响的程度, 调节谱减力度, 在混响弱的时候少做甚至不做语减 , 保护语音质量。
为使本发明的技术方案清楚, 下面对本发明的技术原理进行分析说明。 早期混响信号能够对语音起到增强作用, 后期混响则会降低语音的清晰度。 图 1 是本发明实施例中给出激励信号到麦克输入信号的传递函数示意图。 参见 图 1 , 在激励信号到麦克输入信号的传递函数上, 峰值最大的地方对应直达声, 通常将距离最大峰的某个点作为早期反射和后期反射的分界点, 最大峰到分界 点的部分对应早期混响, 分界点后的部分对应后期混响。 在图 1 中, 该分界点 为 50ms。
将激励信号记为 s(f) , 麦克输入信号记为 x(t) , 激励信号到麦克输入信号的 传递函数记为 t/(t) ,对应直达声和早期混响部分的传递函数记为 tf人 t、 ,对应后期 混响部分的传递函数记为 (0。 那么, 麦克输入信号可以表示为激励信号与传 递函数的卷积 x(t) = 0*? 0 ,麦克输入信号的直达声和早期混响成分可以表示为 xd (t) = s(t) * tfd (t) , 麦克输入信号的后期混响成分可以表示为 x (t) = s(t) * tfr it)。 则麦 克输入信号也可以表示为 x(t) = s{t) * tf{t) = s{t) * (tfd (t) + tfr (t)) = xd (t) + xr (t)。
语音的清晰度可以用 c5。来表示, 其计算公式为:
J w2(t)dt
C50 = lOlog^ dB ( 1 )
J w2(t)dt vv(t)是激励信号到麦克输入信号的传递函数。 0〜50ms对应直达声和早期混 响部分, 50ms以后对应后期混响部分。 混响越强, C5。的值越小。 去混响前后 C5。 的提升可以反映去混响的效果, 因此, C5。可以作为一个去混响的客观评测指标。
本发明中基于双麦克 (主麦克和辅麦克) 的混响估计原理如下: 主麦克的 输入信号记为 x2 (t) , 辅麦克的输入信号记为 χ ), 辅麦克到主麦克的传递函数记 为 h(f) , 如图 2所示。 图 2是本发明实施例中给出的辅麦克到主麦克的传递函数 /<0的示意图。
主麦克的输入信号 (0等于辅麦克的输入信号 (t)与传递函数/ 的卷积: x2(t) = xl(t)*h(t) (2)
)可以分为头尾两部分:
h(t) = hd(t) + hr(t) (3) 其中, (t)表示/ /(t)的头部, // )表示/ /(t)的尾部。
h(t)的拖尾部分/^ (t)反映了信号在空间的多次反射, 因此 的拖尾部分 hr(t)和辅麦克输入信号 χ )的卷积信号 )与主麦克的后期混响成分相近, 可以 作为主麦克后期混响成分的估计信号。 在 /<0上选取一点作为 (t)和/ ut)的分界 点, 将/ 在分界点以前的值置 0, 可以得到/ t)。 分界点到 Kt)最大峰的距离范 围可以设置为 30ms〜80ms (经验值)。 根据经验, 若分界点到/ <t)的最大峰大于 等于 50ms,则主麦克的后期混响估计信号 中完全没有直达声和早期反射成分 的残留, 可以减少对语音的损伤, 因此在本发明的实施例中以分界点取 50ms为 例进行说明。
为使本发明的目的、 技术方案和优点更加清楚, 下面结合附图对本发明实 施方式作进一步地详细描述。
图 3 是本发明一个实施例中的一种基于双麦克的语音混响消减方法的流程 示意图。 如图 3 所示, 该方法主要包括混响估计部分和谱减部分, 具体是逐帧 #支如下处理:
1.1, 接收主麦克输入信号 x2(t)和辅麦克输入信号 (t) , 根据主麦克输入信 号和辅麦克输入信号计算辅麦克到主麦克的传递函数/ Kt);
1.2, 获取传递函数 /?(t)的拖尾部分 (t);
1.3,并且根据传递函数/ <t)判断混响的强弱,计算出增益函数的调节因子 ;
1.4, 用辅麦克输入信号与 作卷积, 得到主麦克输入信号的后期混响估 计信号 );
1.5 ,对主麦克输入信号的后期混响估计信号 进行时域到频域的转换得到 主麦克输入信号的后期混响谱 ;
2.1, 对主麦克输入信号 (0进行时域到频域的转换得到主麦克输入信号的 频谱 r2;
2.2, 根据主麦克输入信号的频谱 2、 增益函数的调节因子 以及主麦克输 入信号的后期混响谱 计算增益函数 G;
2.3, 用主麦克输入信号的频谱 Τ2乘以增益函数 G得到主麦克输入信号去混 响后的频谱 );
2.4, 对主麦克输入信号去混响后的频谱 进行频域到时域的转换, 得到主 麦克输入信号去混响后的时域信号 c ■
2.5, 将主麦克输入信号去混响后的时域信号逐帧叠接相加后, 输出主麦克 输入信号去混响后的连续信号¾(0。
在图 3所示的方法中, 由于通过辅麦克输入信号与 (0作卷积,得到主麦克 输入信号的后期混响估计信号, 然后通过谱减法从主麦克输入信号的频谱中减 去主麦克输入信号的后期混响估计谱, 因此能从主麦克的输入信号中有效地消 除其后期混响, 而保留其早期混响, 提高了语音的质量。 同时, 图 3所示方案, 在估计后期混响中根据混响强弱调节谱减力度, 在混响弱的时候少做甚至不做 谱减, 保证了在混响弱、 语音清晰度原本比较高的情形下不会损伤语音质量, 保护语音质量。 并且这种方案中不需要准确估计直达声波达方向, 因而不要求 麦克风具有 4艮高一致性, 对声学设计也没有严格的限制。
在本发明的一个实施例中, 在图 3 所示方案的基础上, 还进一步考虑主麦 克输入信号的后期混响估计信号与主麦克输入信号的真实后期混响成分相比, 后期混响估计信号在低频部分估计不足的问题, 根据不同的麦克间距设计低通 滤波器对后期混响估计信号进行相应的频率补偿。 具体参见图 4所示的实施例。
图 4是本发明又一个实施例中的基于双麦克的语音混响消减方法的整体流 程示意图。如图 4所示, 整个系统的输入是辅麦克输入信号 (t)和主麦克输入信 号 x2 (t) , 输出是去混响后的信号 X )。 包括两大部分: 混响语估计过程和谱减过 程。 图 4与图 3所示的方法流程相比增加了对后期混响估计信号进行频率补偿 的步骤(在图 4 中对后期混响估计信号进行频率补偿的步骤为步骤 1.45 , 时频 域转换的步骤仍标记为步骤 1.5 )。 以下参照图 4, 对该方法进行详细说明:
1.混响谱估计
输入: 辅麦克的输入信号 (t)、 主麦克的输入信号 x2 (t) ;
输出: 增益函数的调节因子 (作为谱减过程的一个输入)、 主麦克输入 信号的后期混响谱 (作为谱减过程的一个输入);
混响谱估计包括 1.1、 1.2、 1.3、 1.4、 1.45、 1.5六个步骤。
2.谱减
输入: 主麦克输入信号 (0、 增益函数的调节因子 (混响语估计过程的 输出)、 主麦克的后期混响谱 (混响谱估计过程的输出);
输出: 主麦克输入信号去混响后的信号 ¾ (t) (也是整个系统的输出); 谱减过程包括 2.1、 2.2、 2.3、 2.4、 2.5五个步骤。
下面详细介绍混响谱估计过程和谱减过程中的各个步骤以及各步骤之间的 关系。
1.混响语估计过程:
1.1计算辅麦克到主麦克的传递函数 。
1.1的输入: 辅麦克的输入信号 (t)和主麦克的输入信号 x2 (t)。
1.1的输出: 辅麦克到主麦克的传递函数 /Kt) (作为 1.2的输入)。
在本发明的一个实施例中,用辅麦克输入信号 (t)和主麦克输入信号 x2 (t)的 互功率谱 ^以及辅麦克输入信号 (t)的功率谱 ^来计算传递函数 H : 将频域的传递函数 H作逆傅里叶变换, 得到时域的传递函数 / t)。 在本发明的其他实施例中, 的计算可以釆用不同的方法, 如自适应滤波 的方法等, 这里不再详述。
1.2求取传递函数 wt)的拖尾部分 w。
1.2的输入: 辅麦克到主麦克的传递函数 /<t) ( 1.1的输出)。
1.2的输出: 辅麦克到主麦克的传递函数的拖尾部分 (t) (作为 1.4的 输入)。
在本发明的实施例中, 在传递函数 的时间轴上取早期混响和后期混响的 分界点,将传递函数 Kt)的该分界点之前的值置 0, 即得到传递函数 的拖尾部 分 (0。 在本发明的一个较佳实施例中, 在/ 上选取一点, 令该点到 的最 大峰的距离为 50ms, 将/ <t)在该点以前的值置为 0, 记为 h人 ή。
1.3根据辅麦克到主麦克的传递函数/ 判断混响强弱, 求出增益函数的调 节因子 。
1.3的输入: 辅麦克到主麦克的传递函数 /<t) ( 1.1的输出)。
1.3的输出: 增益函数的调节因子 (作为谱减过程的一个输入)。 为了减少弱混响时去混响对语音的损伤,本步骤 1.3中通过判断混响强弱来 计算增益函数的调节因子 。在本发明的实施例中,辅麦克到主麦克的传递函数 的头部能量与拖尾部分的能量的比值取对数记为 p:
Figure imgf000009_0001
其中, Wt)为辅麦克到主麦克的传递函数, Τ为 /<t)的时间轴上的指定分界点。 该分界点 T不一定是早期混响和后期混响的分界点, 但该分界点 T之前一定包 括直达声, 还可已包括部分或全部早期混响。
图 5a是本发明实施例中的声源到主麦克距离为 0.5m时辅麦克到主麦克的 传递函数的示意图。 当声源到主麦克距离 L=0.5m, T的取值范围是 20ms~50ms, 这里 T取 50ms (即分界点 T为距离 /<0的最大峰 50ms的时间点)时, 语音的清 晰度指标 C5。=12.3dB, p =9.4dB。
图 5b是本发明实施例中的声源到主麦克距离为 lm时辅麦克到主麦克的传 递函数的示意图。 当声源到主麦克距离 L=lm, T的取值范围是 20ms~50ms, 这里 T取 50ms (即分界点 T为距离 /<0的最大峰 50ms的时间点)时, 语音的清 晰度指标<^5。=8.1(18, p =6.0dB。
图 5c是本发明实施例中的声源到主麦克距离为 2m时辅麦克到主麦克的传 递函数的示意图。 当声源到主麦克距离 L=2m, T的取值范围是 20ms~50ms, 这里 T取 50ms (即分界点 T为距离 /<t)的最大峰 50ms的时间点)时, 语音的清 晰度指标<^5。=5.4(18, p =3.7dB。
图 5d是本发明实施例中的声源到主麦克距离为 4m时辅麦克到主麦克的传 递函数的示意图。 当声源到主麦克距离 L=4m, T的取值范围是 20ms~50ms, 这里 T取 50ms (即分界点 T为距离 /<t)的最大峰 50ms的时间点)时, 语音的清 晰度指标<^5。=4.5(18, p =2.2dB。
声源距麦克越远, 混响就越强。 从图 5a至图 5d可知, 随着混响增强, 辅 麦克到主麦克的传递函数的头部能量变低, 拖尾部分的能量变高, 二者比值取 的对数 p可以反映混响强弱。 随着混响变强, p的值逐渐变小。 因此可以根据 p 的值来判断混响强弱, 并由此求出增益函数调节因子 。
可以有多种计算方式, 公式(6 )是本发明实施例中的一种计算 的经验 公式:
Figure imgf000010_0001
和 取设定值, 是一种经验值, 在本发明的实施例中, p 9dB, ?2取 2dB (麦克间距为 6cm )。
1.4用辅麦克的输入信号 t)与辅麦克到主麦克的传递函数的拖尾部分 Wt) 作卷积, 得到主麦克输入信号的后期混响估计信号 (t)。
1.4 的输入: 辅麦克的输入信号 (0、 辅麦克到主麦克的传递函数的拖 尾部分 Wt) ( 1.2的输出 )。
1.4的输出: 主麦克输入信号的后期混响估计信号 (作为 1.45的输 入)。
具体如下公式:
Figure imgf000010_0002
( 7 )
1.45对主麦克输入信号的后期混响估计信号 进行频率补偿,得到补偿后 的信号 _ Eg(t)。
1.45的输入: 主麦克输入信号的后期混响估计信号 r'(t) ( 1.4的输出)。 1.45 的输出: 经过频率补偿的主麦克输入信号的后期混响估计信号 r _ EQ{t) (作为 1.5的输入)。
主麦克输入信号的后期混响估计信号 )与主麦克输入信号的真实后期混 响成分相比, 后期混响估计信号 (t)在低频部分估计不足。 因此, 在本发明中对 主麦克输入信号的后期混响估计信号 )进行频率补偿。 主辅麦克之间的间距会 对后期混响估计信号 产生影响, 因此本发明的实施例中根据不同的麦克间距 设计低通滤波器对后期混响估计信号进行相应的频率补偿, 得到补偿后的后期 混响估计信号 _ EQ{t)。
图 6a是本发明实施例中的主辅助麦克之间的间距为 6cm时的频率补偿滤波 器的幅频特性示意图。图 6b是本发明实施例中的主辅助麦克之间的间距为 18cm 时的频率补偿滤波器的幅频特性示意图。 可以看出, 在本发明的实施例中, 主 麦克与辅麦克之间的距离越大, 对主麦克输入信号的后期混响估计信号 的低 频部分进行频率补偿的程度越小。
1.5将经过频率补偿的主麦克输入信号的后期混响估计信号 ¾(t)由时域 转换到频域, 得到主麦克输入信号的后期混响谱
1.5 的输入: 经过频率补偿的主麦克输入信号的后期混响估计信号 f _ EQ(t) ( 1.45的输出)。
1.5 的输出: 主麦克输入信号的后期混响谱 (作为谱减过程的一个输 入)。
将经过频率补偿的主麦克的后期混响估计信号 变换到频域, 就可 以得到主麦克输入信号的后期混响谱^:
Figure imgf000011_0001
2.谱减过程
2.1将主麦克的输入信号 χ2(0由时域变换到频域, 记为 2
2.1的输入: 主麦克的输入信号 x2 (t)。
2.1的输出: 主麦克输入信号的频谱 Τ2 (作为 2.2的输入)。 具体公式如下:
Figure imgf000011_0002
2.2 由主麦克输入信号的频谱 2以及估计出的主麦克的后期混响谱 计算 增益函数 G , 并根据调节因子 来调节增益函数。
2.2的输入: 主麦克输入信号的频谱 Τ2 ( 2.1 的输出)、 主麦克的后期混 响谱 (混响谱估计过程中的 1.5的输出)、 增益函数调节因子 (混响 谱估计过程中的 1.3的输出)。
2.2输出: 增益函数 G (作为 2.3的一个输入)。
本发明的一个实施例中, 釆用功率谱减法, 根据如下公式计算增益函数 G(l, k):
Figure imgf000011_0003
其中, /为帧号, 为频点号, 为增益函数的调节因子, 为主麦克输入 信号的后期混响频谱, 为主麦克输入信号的频谱。
根据公式(10 ) 可以看出, 通过增益函数的调节因子 可以调节增益函数 G(J, k)的大小。 这样在混响弱的时候可以少做甚至不做谱减, 保证了在混响弱、 语音清晰度原本比较高的情形下不会损伤语音, 保护语音质量。
2.3用主麦克输入信号的幅度语 | 2 |乘以增益函数 G , 并结合主麦克输入信 号的相位, 得到主麦克输入信号去混响后的频谱 D。 2.3的输入: 主麦克输入信号的频谱 Τ2 ( 2.1的输出)、 增益函数 G ( 2.2 的输出)。
2.3的输出: 主麦克输入信号去混响后的频谱 (作为 2.4的输入)。 具体为根据如下公式计算主麦克输入信号去混响后的频谱 )( ) :
D(l, k) = G(l, k) -
Figure imgf000012_0001
- exp(j · phaseij, k)) ( 11 ) 其中, /为帧号, 为频点号, | 2(/, )|为主麦克输入信号的幅度语, 为 增益函数, phased, ^)为主麦克输入信号的相位。
2.4将主麦克输入信号去混响后的频谱 转换到时域, 记为 )。
2.4的输入: 主麦克输入信号去混响后的频谱 ) ( 2.3的输出)。
2.4的输出: 主麦克输入信号去混响后的时域信号 i (t) (作为 2.5的输 入)。
Figure imgf000012_0002
2.5将主麦克输入信号去混响后的时域信号逐帧叠接相加, 得到主麦克输 入信号去混响后的连续信号 (t)。
2.5的输入: 主麦克输入信号去混响后的时域信号 i (t) ( 2.4的输出)。 2.5的输出: 主麦克输入信号去混响后的连续信号¾(0 (整个系统的输 出)。
图 7a是本发明实施例中的主麦克输入信号的时域图; 图 7b是本发明实施 例中的主麦克去混响后的时域图; 图 7c是本发明实施例中的主麦克输入信号的 语谱图; 图 7d是本发明实施例中的主麦克去混响后的语谱图。
参见图 7a-7d, 在本实施例中, 主辅麦克正对声源, 声源到双麦克的垂直距 离是 2m, 主辅麦克间距为 18cm时, 主麦克输入信号去混响前的 C5Q为 6.8dB, 釆用图 4所示方案去混响后的 C5Q为 10.5dB,可见釆用本发明的方案后 C5Q提高了 3.7dB。
图 8是本发明实施例中的一种基于双麦克的语音混响消减装置的组成结构 图。 该装置对主麦克和辅麦克接收到的信号逐帧进行处理, 参见图 8, 该装置包 括: 混响语估计单元 700和语减单元 800, 其中:
混响谱估计单元 700, 用于接收主麦克输入信号和辅麦克输入信号,根据主 麦克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数/ Kt) , 获取传递 函数 Wt)的拖尾部分/ t) , 并根据传递函数 Wt)判断混响的强弱, 计算出增益函 数的调节因子 输出给语减单元 800, 用辅麦克输入信号与 (t)作卷积, 得到主 麦克输入信号的后期混响估计信号, 对主麦克输入信号的后期混响估计信号进 行时域到频域的转换, 得到主麦克输入信号的后期混响谱后输出给谱减单元 800。
谱减单元 800,用于接收主麦克输入信号和混响语估计单元 700输出的增益 函数的调节因子以及主麦克输入信号的后期混响谱, 对主麦克输入信号进行时 域到频域的转换得到主麦克输入信号的频谱, 根据主麦克输入信号的频谱、 增 益函数的调节因子 以及主麦克输入信号的后期混响谱计算增益函数,用主麦克 输入信号的频谱乘以增益函数得到主麦克输入信号去混响后的频谱, 对主麦克 输入信号去混响后的频谱进行频域到时域的转换, 得到主麦克输入信号去混响 后的时域信号, 将主麦克输入信号去混响后的时域信号逐帧叠接相加后, 输出 主麦克输入信号去混响后的连续信号。
在本发明的一个实施例中 ,混响语估计单元 700用辅麦克输入信号与 (t)作 卷积, 得到主麦克输入信号的后期混响估计信号后, 先对主麦克输入信号的后 期混响估计信号进行频率补偿, 然后对频率补偿后的信号进行时域到频域的转 换得到主麦克输入信号的后期混响语后输出给语减单元 800。
图 9是本发明的一个较佳实施例中的基于双麦克的语音混响消减装置的详 细组成结构及其输入输出示意图。 参见图 9, 该基于双麦克的语音混响消减装置 包括: 混响语估计单元 91和语减单元 92。 其中: 混响估计单元 91包括: 传递 函数计算单元 911、 传递函数拖尾计算单元 912、 混响强弱判断单元 913、 后期 混响估计单元 914、 频率补偿单元 915和第一时频转换单元 916。 语减单元 92 包括: 第二时频转换单元 921、 增益函数计算单元 922、 去混响单元 923和频时 转换单元 924和叠接相加单元 925。
传递函数计算单元 911 , 用于接收主麦克输入信号和辅麦克输入信号, 根据 主麦克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数 , 并将传 递函数 /<0输出给传递函数拖尾计算单元 912和混响强弱判断单元 913。
传递函数拖尾计算单元 912, 用于求取传递函数 /?(t)的拖尾部分/? ), 并输 出给后期混响估计单元 914。 传递函数拖尾计算单元 912具体在传递函数 Wt)的 时间轴上取早期混响和后期混响的分界点, 将传递函数 /<t)的该分界点之前的值 置 0, 得到传递函数 /<t)的拖尾部分 (t)。
混响强弱判断单元 913 , 用于根据传递函数/ Kt)判断混响的强弱, 并计算出 增益函数的调节因子 输出给增益函数计算单元。 具体地, 混响强弱判断单元 913根据前 算表示混响强弱的参数 p。
, 其中, /?(t)为辅麦克到主麦克的传递函数, Τ为/? (t)
Figure imgf000013_0001
的时间轴上的指定分界点。
然后, 混响强弱判断单元 913根据前述的公式(6 )计算增益函数的调节因 子 。
0 p > Pl
即^ 2(pl - p)/(pl - p2) p2 < p < Pl , 其中, A和 ¾取设定值。如, ?1取9(18,
2 P < P2
p2取 2dB (麦克间距为 6cm )。
后期混响估计单元 914, 用于接收辅麦克输入信号, 用辅麦克输入信号与 hr(t)作卷积, 得到主麦克输入信号的后期混响估计信号并输出给频率补偿单元
915。
频率补偿单元 915 ,用于对主麦克输入信号的后期混响估计信号进行频率补 偿, 将进行频率补偿后的信号输出给第一时频转换单元 916。在主麦克与辅麦克 之间的距离越大时, 频率补偿单元 915对主麦克输入信号的后期混响估计信号 进行频率补偿的程度越小。
第一时频转换单元 916 ,用于对经过频率补偿后的主麦克输入信号的后期混 响估计信号进行时域到频域的转换, 得到主麦克输入信号的后期混响谱后输出 给增益函数计算单元 922。
第二时频转换单元 921 , 用于接收主麦克输入信号, 进行时域到频域的转换 得到主麦克输入信号的频谱并输出给增益函数计算单元 922和去混响单元 923。
增益函数计算单元 922 ,用于根据第二时频转换单元 921输出的主麦克输入 信号的频谱、混响强弱判断单元 913输出的增益函数的调节因子 以及第一时频 转换单元 916输出的主麦克输入信号的后期混响谱计算出增益函数, 并将增益 函数输出给去混响单元 923。 增益函数计算单元 922可以根据前述的公式( 10 ) 计算增益函数 。 即 G(l, k) 其中, /为帧号, 为频点号, 为增益函数
Figure imgf000014_0001
的调节因子, ^为主麦克输入信号的后期混响频谱, ^2为主麦克输入信号的频 谱。
去混响单元 923 ,用主麦克输入信号的频谱乘以增益函数得到主麦克输入信 号去混响后的频谱, 并输出给频时转换单元 924。 在本实施例中, 去混响单元 923根据前述公式( 1 1 )计算主麦克输入信号去混响后的频谱
D(l, k) = G(l, k) -
Figure imgf000014_0002
- exp(j - phased k)) . 其中, /为帧号, A为频点号, | 2( )|为主麦克输入信号的幅度, G(/, )为增益函数, ½« /, )为主麦克输入信 号的相位。
频时转换单元 924 ,用于对主麦克输入信号去混响后的频谱进行频域到时域 的转换, 得到主麦克输入信号去混响后的时域信号并输出给叠接相加单元 925。
叠接相加单元 925 ,用于将频时转换单元 924输出的时域信号逐帧叠接相加 得到主麦克输入信号去混响后的连续信号。
综上所述, 本发明实施例这种基于双麦克的语音混响消减装置对主麦克和 辅麦克接收到的信号逐帧进行处理。 该装置中的混响谱估计单元, 用于接收主 麦克的输入信号 x2 (t)和辅麦克输入信号 χ ), 艮据 x2 (t)和 χ )计算辅麦克到主麦 克的传递函数 Wt) , 获取/ <t)的拖尾部分/? ), 并根据 Wt)判断混响的强弱, 计算 出增益函数的调节因子 输出给该装置中的谱减单元, 用 (t)与/ ^t)作卷积, 得 到 x2 (t)的后期混响估计信号 (ή ,对 进行时域到频域的转换,得到 x2 (t)的后期 混响谱 后输出给该装置中谱减单元。该装置中的谱减单元, 用于对 x2 (t)进行时 域到频域的转换得到 x2 (t)的频谱, 根据 x2 (t)的频谱、 以及 计算增益函数, 用 x2 (t)的频谱乘以增益函数得到 x2 (t)去混响后的频谱,进行频域到时域的转换,得 到 x2 (t)去混响后的时域信号。 本发明的这种方案中, 由于通过辅麦克输入 x )信 号与/ Ut)作卷积, 得到主麦克输入信号 x2 (t)的后期混响估计信号 r'(t) , 然后通过 谱减法从主麦克输入信号 x2 (t)的频谱中减去主麦克输入信号的后期混响估计谱 R , 因此能从主麦克的输入信号 x2 (t)中有效地消除其后期混响, 而保留其早期混 响, 提高了语音的质量。 同时, 本发明在估计后期混响中根据混响强弱调节谱 减力度, 在混响弱的时候少做甚至不做谱减, 保证了在混响弱、 语音清晰度原 本比较高的情形下不会损伤语音, 保护语音质量。 并且这种方案中不需要准确 估计直达声波达方向, 因而不要求麦克风具有很高一致性, 对声学设计也没有 严格的限制。
可见本发明的技术方案, 在去除混响的同时有效保护语音, 自动估计房间 混响的强弱程度, 在各种环境下都选择合适的处理, 达到接近最优的语音质量。 且对麦克风的一致性和声学设计没有严格限制, 应用更灵活便捷。
以上所述仅为本发明的较佳实施例而已, 并非用于限定本发明的保护范围。 凡在本发明的精神和原则之内所作的任何修改、 等同替换、 改进等, 均包含在 本发明的保护范围内。

Claims

权利要求书
1、 一种基于双麦克的语音混响消减方法, 其特征在于, 该方法包括: 接收主麦克输入信号和辅麦克输入信号, 逐帧做如下处理:
根据主麦克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数 获取传递函数 的拖尾部分 hr (t) , 并根据传递函数 判断混响的强弱, 计算出增益函数的调节因子^ ;
用辅麦克输入信号与 作卷积, 得到主麦克输入信号的后期混响估计信 号;
对主麦克输入信号的后期混响估计信号进行时域到频域的转换得到主麦克 输入信号的后期混响谱; 对主麦克输入信号进行时域到频域的转换得到主麦克 输入信号的频谱;
根据主麦克输入信号的频谱、增益函数的调节因子 以及主麦克输入信号的 后期混响谱计算增益函数;
用主麦克输入信号的频谱乘以增益函数得到主麦克输入信号去混响后的频 谱;
对主麦克输入信号去混响后的频谱进行频域到时域的转换, 得到主麦克输 入信号去混响后的时域信号;
将主麦克输入信号去混响后的时域信号逐帧叠接相加后, 输出主麦克输入 信号去混响后的连续信号。
2、 根据权利要求 1所述的方法, 其特征在于, 在所述得到主麦克输入信号 的后期混响估计信号之后, 并在进行时域到频域的转换之前, 该方法进一步包 括:
对主麦克输入信号的后期混响估计信号进行频率补偿, 并且, 主麦克与辅 麦克之间的距离越大, 对主麦克输入信号的后期混响估计信号进行频率补偿的 程度越小;
对进行频率补偿后的信号进行时域到频域的转换得到主麦克输入信号的后 期混响谱。
3、 根据权利要求 1 所述的方法, 其特征在于, 所述根据传递函数 Wt)判断 混响的强弱具体为根据如下公式计算表示混响强弱的参数 ρ:
Figure imgf000016_0001
其中, Wt)为辅麦克到主麦克的传递函数, Τ为 Wt)的时间轴上的指定分界 点;
所述计算出增益函数的调节因子 具体为根据如下公式计算:
Figure imgf000017_0001
其中, ^和 取设定值。
4、 根据权利要求 1所述的方法, 其特征在于, 所述根据主麦克输入信号的 频谱、 增益函数的调节因子 以及主麦克输入信号的后期混响谱计算增益函数, 具体为根据如下公式计算增益函数 G( ):
Figure imgf000017_0002
其中, /为帧号, 为频点号, 为增益函数的调节因子, 为主麦克输入 信号的后期混响频谱, 为主麦克输入信号的频谱。
5、 根据权利要求 1 所述的方法, 其特征在于, 所述获取传递函数 Wt)的拖 尾部分 包括:
在传递函数 Wt)的时间轴上取早期混响和后期混响的分界点, 将传递函数 h(t)的该分界点之前的值置 0, 得到传递函数 /<t)的拖尾部分/ ^(t)。
6、 一种基于双麦克的语音混响消减装置, 其特征在于, 该装置对主麦克和 辅麦克接收到的信号逐帧进行处理, 该装置包括: 混响语估计单元和谱减单元, 其巾:
混响谱估计单元, 用于接收主麦克输入信号和辅麦克输入信号, 根据主麦 克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数 /<0 , 获取传递函 数 Wt)的拖尾部分/? ), 并根据传递函数 Wt)判断混响的强弱, 计算出增益函数 的调节因子 输出给语减单元, 用辅麦克输入信号与 (t)作卷积, 得到主麦克输 入信号的后期混响估计信号, 对主麦克输入信号的后期混响估计信号进行时域 到频域的转换, 得到主麦克输入信号的后期混响语后输出给语减单元;
谱减单元, 用于接收主麦克输入信号和混响谱估计单元输出的增益函数的 调节因子以及主麦克输入信号的后期混响谱, 对主麦克输入信号进行时域到频 域的转换得到主麦克输入信号的频谱, 根据主麦克输入信号的频谱、 增益函数 的调节因子 以及主麦克输入信号的后期混响谱计算增益函数,用主麦克输入信 号的频谱乘以增益函数得到主麦克输入信号去混响后的频谱, 对主麦克输入信 号去混响后的频谱进行频域到时域的转换, 得到主麦克输入信号去混响后的时 域信号, 将主麦克输入信号去混响后的时域信号逐帧叠接相加后, 输出主麦克 输入信号去混响后的连续信号。
7、 根据权利要求 6所述的装置, 其特征在于, 所述混响估计单元包括: 传 递函数计算单元、 传递函数拖尾计算单元、 混响强弱判断单元、 后期混响估计 单元和第一时频转换单元; 此外所述混响估计单元还包括频率补偿单元; 所述 谱减单元包括: 第二时频转换单元、 增益函数计算单元、 去混响单元、 频时转 换单元和叠接相加单元; 其中: 传递函数计算单元, 用于接收主麦克输入信号和辅麦克输入信号, 根据主 麦克输入信号和辅麦克输入信号计算辅麦克到主麦克的传递函数 , 并将传递 函数 输出给传递函数拖尾计算单元和混响强弱判断单元;
传递函数拖尾计算单元, 用于求取传递函数 wt)的拖尾部分/? ), 并输出给 后期混响估计单元;
混响强弱判断单元, 用于根据传递函数 wt)判断混响的强弱, 计算出增益函 数的调节因子 输出给增益函数计算单元;
后期混响估计单元,用于接收辅麦克输入信号,用辅麦克输入信号与 (t)作 卷积, 得到主麦克输入信号的后期混响估计信号并输出给频率补偿单元;
频率补偿单元, 用于对主麦克输入信号的后期混响估计信号进行频率补偿 并输出给第一时频转换单元, 其中, 在主麦克与辅麦克之间的距离越大时, 对 主麦克输入信号的后期混响估计信号进行频率补偿的程度越小;
第一时频转换单元, 用于对频率补偿后的主麦克输入信号的后期混响估计 信号进行时域到频域的转换, 得到主麦克输入信号的后期混响谱后输出给增益 函数计算单元;
第二时频转换单元, 用于接收主麦克输入信号, 进行时域到频域的转换得 到主麦克输入信号的频谱并输出给增益函数计算单元;
增益函数计算单元, 用于根据第二时频转换单元输出的主麦克输入信号的 频谱、混响强弱判断单元输出的增益函数的调节因子 以及第一时频转换单元输 出的主麦克输入信号的后期混响谱计算增益函数并输出给去混响单元;
去混响单元, 用主麦克输入信号的频谱乘以增益函数得到主麦克输入信号 去混响后的频谱, 并输出给频时转换单元;
频时转换单元, 用于对主麦克输入信号去混响后的频谱进行频域到时域的 转换, 得到主麦克输入信号去混响后的时域信号并输出给叠接相加单元;
叠接相加单元, 用于将主麦克输入信号去混响后的时域信号逐帧叠接相加 后, 输出主麦克输入信号去混响后的连续信号。
8、 根据权利要求 7所述的装置, 其特征在于,
所述混响强弱判断单元, 用于根据如下公式计算表示混响强弱的参数 P: h {t)dt
Figure imgf000018_0001
其中, Wt)为辅麦克到主麦克的传递函数, T为 Wt)的时间轴上的指定分界 占
然后根据如下公式计算增益函数的调节因子^
Figure imgf000018_0002
其中, ^和 取设定值。
9、 根据权利要求 7所述的装置, 其特征在于,
所述增益函数计算单元, 用于根据如下公式计
Figure imgf000019_0002
Figure imgf000019_0001
其中, /为帧号, 为频点号, 为增益函数的调节因子, 为主麦克输入 信号的后期混响频谱, 2为主麦克输入信号的频谱。
10、 根据权利要求 7所述的装置, 其特征在于,
所述传递函数拖尾计算单元, 具体用于在传递函数 Wt)的时间轴上取早期混 响和后期混响的分界点, 将传递函数 Wt)的该分界点之前的值置 0, 得到传递函 数/? (t)的拖尾部分/? )。
PCT/CN2013/001557 2012-12-12 2013-12-12 一种基于双麦克的语音混响消减方法和装置 WO2014089914A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US14/411,651 US9414157B2 (en) 2012-12-12 2013-12-12 Method and device for reducing voice reverberation based on double microphones
KR1020147036443A KR101502297B1 (ko) 2012-12-12 2013-12-12 이중 마이크로폰에 기초해서 음성 잔향을 저감하기 위한 방법 및 장치
DK13863250.0T DK2858379T3 (en) 2012-12-12 2013-12-12 Method and device for reducing voice echo based on dual microphones
EP13863250.0A EP2858379B1 (en) 2012-12-12 2013-12-12 Voice reverberation reduction method and device based on dual microphones
JP2015524601A JP5785674B2 (ja) 2012-12-12 2013-12-12 デュアルマイクに基づく音声残響低減方法及びその装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210536578.X 2012-12-12
CN201210536578.XA CN103067821B (zh) 2012-12-12 2012-12-12 一种基于双麦克的语音混响消减方法和装置

Publications (1)

Publication Number Publication Date
WO2014089914A1 true WO2014089914A1 (zh) 2014-06-19

Family

ID=48110252

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/001557 WO2014089914A1 (zh) 2012-12-12 2013-12-12 一种基于双麦克的语音混响消减方法和装置

Country Status (7)

Country Link
US (1) US9414157B2 (zh)
EP (1) EP2858379B1 (zh)
JP (1) JP5785674B2 (zh)
KR (1) KR101502297B1 (zh)
CN (1) CN103067821B (zh)
DK (1) DK2858379T3 (zh)
WO (1) WO2014089914A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067821B (zh) * 2012-12-12 2015-03-11 歌尔声学股份有限公司 一种基于双麦克的语音混响消减方法和装置
CN104156553B (zh) * 2014-05-09 2018-08-17 哈尔滨工业大学深圳研究生院 无需信源数估计的相干信号波达方向估计方法及系统
CN105848052B (zh) * 2015-01-16 2019-10-11 宇龙计算机通信科技(深圳)有限公司 一种麦克切换方法及终端
GB2549103B (en) * 2016-04-04 2021-05-05 Toshiba Res Europe Limited A speech processing system and speech processing method
CN110660404B (zh) * 2019-09-19 2021-12-07 北京声加科技有限公司 基于零陷滤波预处理的语音通信和交互应用系统、方法
EP3809726A1 (en) * 2019-10-17 2021-04-21 Bang & Olufsen A/S Echo based room estimation
CN111179958A (zh) * 2020-01-08 2020-05-19 厦门亿联网络技术股份有限公司 一种语音晚期混响抑制方法及系统
CN112053698A (zh) * 2020-07-31 2020-12-08 出门问问信息科技有限公司 语音转换方法及装置
CN113542980B (zh) * 2021-07-21 2023-03-31 深圳市悦尔声学有限公司 一种抑制扬声器串扰的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976565A (zh) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 基于双麦克风语音增强装置及方法
CN102347028A (zh) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 双麦克风语音增强装置及方法
JP2012048133A (ja) * 2010-08-30 2012-03-08 Nippon Telegr & Teleph Corp <Ntt> 残響除去方法とその装置とプログラム
CN103067821A (zh) * 2012-12-12 2013-04-24 歌尔声学股份有限公司 一种基于双麦克的语音混响消减方法和装置
CN203243506U (zh) * 2012-12-12 2013-10-16 歌尔声学股份有限公司 一种基于双麦克的语音混响消减装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3649847B2 (ja) * 1996-03-25 2005-05-18 日本電信電話株式会社 残響除去方法及び装置
WO1997045995A1 (en) * 1996-05-31 1997-12-04 Philips Electronics N.V. Arrangement for suppressing an interfering component of an input signal
US6549586B2 (en) * 1999-04-12 2003-04-15 Telefonaktiebolaget L M Ericsson System and method for dual microphone signal noise reduction using spectral subtraction
EP1879181B1 (en) * 2006-07-11 2014-05-21 Nuance Communications, Inc. Method for compensation audio signal components in a vehicle communication system and system therefor
JP5166117B2 (ja) * 2008-05-20 2013-03-21 株式会社船井電機新応用技術研究所 音声入力装置及びその製造方法、並びに、情報処理システム
EP2211564B1 (en) * 2009-01-23 2014-09-10 Harman Becker Automotive Systems GmbH Passenger compartment communication system
US8233352B2 (en) * 2009-08-17 2012-07-31 Broadcom Corporation Audio source localization system and method
US8897455B2 (en) * 2010-02-18 2014-11-25 Qualcomm Incorporated Microphone array subset selection for robust noise reduction
JP5594133B2 (ja) * 2010-12-28 2014-09-24 ソニー株式会社 音声信号処理装置、音声信号処理方法及びプログラム
CN103087821B (zh) 2013-01-15 2016-09-07 武汉工业学院 一种保留谷维素的米糠油精炼方法
US9100762B2 (en) * 2013-05-22 2015-08-04 Gn Resound A/S Hearing aid with improved localization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976565A (zh) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 基于双麦克风语音增强装置及方法
JP2012048133A (ja) * 2010-08-30 2012-03-08 Nippon Telegr & Teleph Corp <Ntt> 残響除去方法とその装置とプログラム
CN102347028A (zh) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 双麦克风语音增强装置及方法
CN103067821A (zh) * 2012-12-12 2013-04-24 歌尔声学股份有限公司 一种基于双麦克的语音混响消减方法和装置
CN203243506U (zh) * 2012-12-12 2013-10-16 歌尔声学股份有限公司 一种基于双麦克的语音混响消减装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2858379A4 *

Also Published As

Publication number Publication date
DK2858379T3 (en) 2019-01-21
EP2858379A4 (en) 2015-11-11
KR101502297B1 (ko) 2015-03-12
CN103067821B (zh) 2015-03-11
EP2858379B1 (en) 2018-10-31
EP2858379A1 (en) 2015-04-08
JP5785674B2 (ja) 2015-09-30
US20150189431A1 (en) 2015-07-02
CN103067821A (zh) 2013-04-24
KR20150008925A (ko) 2015-01-23
JP2015523609A (ja) 2015-08-13
US9414157B2 (en) 2016-08-09

Similar Documents

Publication Publication Date Title
WO2014089914A1 (zh) 一种基于双麦克的语音混响消减方法和装置
JP5479655B2 (ja) 残留エコーを抑制するための方法及び装置
US11245976B2 (en) Earphone signal processing method and system, and earphone
US8194880B2 (en) System and method for utilizing omni-directional microphones for speech enhancement
CN103428385B (zh) 用于处理音频信号的方法及用于处理音频信号的电路布置
US9532149B2 (en) Method of signal processing in a hearing aid system and a hearing aid system
JP5671147B2 (ja) 後期残響成分のモデリングを含むエコー抑制
US8638952B2 (en) Signal processing apparatus and signal processing method
US20140025374A1 (en) Speech enhancement to improve speech intelligibility and automatic speech recognition
JP4957810B2 (ja) 音処理装置、音処理方法及び音処理プログラム
JP2011515897A (ja) 複数のデバイス上の複数のマイクロフォンを用いた音声強調
WO2007081916A2 (en) System and method for utilizing inter-microphone level differences for speech enhancement
CN203243506U (zh) 一种基于双麦克的语音混响消减装置
TW200921645A (en) Voice enhancer for hands-free devices
JP2005514668A (ja) スペクトル出力比依存のプロセッサを有する音声向上システム
JP6854967B1 (ja) 雑音抑圧装置、雑音抑圧方法、及び雑音抑圧プログラム
JP4209348B2 (ja) エコー抑圧方法、この方法を実施する装置、プログラムおよび記録媒体
TW202331701A (zh) 雙麥克風陣列回音消除方法、雙麥克風陣列回音消除裝置、電子設備、及非揮發性電腦可讀儲存媒體
JP2021150959A (ja) 聴覚装置および聴覚装置に関連する方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13863250

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2015524601

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20147036443

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 14411651

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2013863250

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE