WO2015196729A1 - Microphone array speech enhancement method and device - Google Patents

Microphone array speech enhancement method and device Download PDF

Info

Publication number
WO2015196729A1
WO2015196729A1 PCT/CN2014/092217 CN2014092217W WO2015196729A1 WO 2015196729 A1 WO2015196729 A1 WO 2015196729A1 CN 2014092217 W CN2014092217 W CN 2014092217W WO 2015196729 A1 WO2015196729 A1 WO 2015196729A1
Authority
WO
WIPO (PCT)
Prior art keywords
array
voice
speech
signal
power spectrum
Prior art date
Application number
PCT/CN2014/092217
Other languages
French (fr)
Chinese (zh)
Inventor
范泛
付中华
黎家力
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015196729A1 publication Critical patent/WO2015196729A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the present invention relates to voice processing, and in particular, to a microphone array voice enhancement method and apparatus.
  • Embodiments of the present invention provide a microphone array voice enhancement method and apparatus.
  • the method and apparatus are capable of processing original speech of an array of speech acquisition devices having more array elements and larger spacing.
  • a microphone array speech enhancement method includes the following steps:
  • the minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
  • the method before acquiring the first array voice signal input by the multi-channel digital voice collection device, the method further includes:
  • the optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T is used to perform frequency domain optimal super-directional beam processing on the time-frequency representative signal, and obtained First array of speech signals
  • n is a discrete time variable
  • N is the number of array elements
  • k is the frequency point number
  • is the short time frame number.
  • the optimal super-directional beam coefficient is set according to a setting manner of the multi-channel digital voice collection device.
  • the minimum variance adaptive beam optimization model of the first array of speech signals is:
  • w H (k) is a conjugate transformation matrix of w(k); a noise coherence matrix estimated according to the first array of speech signals; A spatial steering vector for the target sound source to the digital speech acquisition device.
  • the spatial steering vector of the target sound source to the digital voice collection device is calculated according to the following formula:
  • d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array
  • c is the sound velocity
  • f s is the sampling frequency
  • is the orientation of the target sound source to the digital speech acquisition device angle
  • Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
  • the method further includes:
  • the step of performing noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD includes:
  • noise power spectrum in the speech state and the noise power spectrum in the non-speech state are traded to obtain a noise power spectrum estimation value.
  • the step of calculating a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state includes:
  • the power spectrum of the noise signal array is estimated using the following formula:
  • the power spectrum of the noise signal array is estimated by the following formula:
  • the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
  • a 1 is the noise spectrum update parameter
  • a a and a d are the smoothing coefficients respectively.
  • the power spectrum estimation value of the optimal beam output signal is calculated by using the following formula:
  • a power spectrum estimate for the optimal beam output signal Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
  • a microphone array voice enhancement device includes:
  • a first acquiring module configured to: acquire a first array voice signal that is input through a multi-channel digital voice collecting device;
  • An optimal beam output signal calculation module is configured to calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal;
  • a first enhancement module configured to: perform a single channel speech enhancement process by using a power spectrum estimation value of the optimal beam output signal
  • the minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
  • the device further includes:
  • the original signal acquisition module is configured to: collect the original voice array signals y 1 (n), ... y N (n) through the multi-channel digital voice collection device;
  • an original signal transformation module configured to: perform short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, ⁇ ) ... y N (k, ⁇ ) of the original speech array signal ;
  • An optimal super-directional beam processing module configured to: represent the time-frequency representation using an optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T
  • the signal is subjected to frequency domain optimal super-directional beam processing to obtain a first array of speech signals
  • n is a discrete time variable
  • N is the number of array elements
  • k is the frequency point number
  • is the short time frame number.
  • the optimal super-directional beam coefficient is set according to a setting manner of the multi-channel digital voice collection device.
  • the optimal beam output signal calculation module is configured to adopt the following formula according to the The minimum variance adaptive beam optimization model of the first array of speech signals is used to calculate an optimal beam output signal synthesized by the first array of speech signals using the first array of speech signals:
  • the minimum variance adaptive beam optimization model of the first array of speech signals is:
  • w H (k) is a conjugate transformation matrix of w(k); a noise coherence matrix estimated according to the first array of speech signals; A spatial steering vector for the target sound source to the digital speech acquisition device.
  • the optimal beam output signal calculation module calculates the optimal beam output signal of the first array voice signal
  • the spatial steering vector of the target sound source to the digital voice acquisition device is calculated according to the following formula:
  • d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array
  • c is the sound velocity
  • f s is the sampling frequency
  • is the orientation of the target sound source to the digital speech acquisition device angle
  • It is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
  • it also includes:
  • a VAD module configured to: perform voice activity detection VAD on an array of noise signals in the array voice input signals of the plurality of channels;
  • a noise power spectrum estimation module configured to: perform noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD;
  • a second enhancement module configured to: estimate an optimal power spectrum according to the optimal beam output signal The estimate and the noise power spectrum estimate provide a second enhancement to the optimal beam output signal.
  • the noise power spectrum estimation module includes:
  • a first noise power spectrum calculation unit configured to: calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
  • the second noise power spectrum calculation unit is configured to perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
  • the first noise power spectrum calculation unit includes:
  • the no-speech state calculation sub-unit is set to: when in the non-speech state, estimate the power spectrum of the noise signal array using the following formula:
  • the voice start and voice state calculation subunit is set to: when in the voice start state and the voice state, estimate the power spectrum of the noise signal array by using the following formula:
  • the no-speech state calculation sub-unit is set to: when in the speech end state, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
  • a 1 is the noise spectrum update parameter
  • a a and a d are the smoothing coefficients respectively.
  • the power spectrum estimation value of the optimal beam output signal is calculated by using the following formula:
  • a power spectrum estimate for the optimal beam output signal is a optimal beam;
  • a 0 is the noise spectrum update parameters.
  • Embodiments of the present invention also provide a computer program comprising program instructions that, when executed by a computer, cause the computer to perform the above method.
  • Embodiments of the present invention also provide a computer readable storage medium carrying the computer program.
  • the microphone array voice enhancement method and apparatus use the minimum variance adaptive beam optimization model to calculate the first array voice signal collected and input by the multi-channel digital voice signal acquisition device, and
  • the minimum variance adaptive beam optimization model includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device, and can perform speech enhancement processing on the microphone array with larger inter-array spacing, and can achieve high-quality pickup.
  • the microphone array speech enhancement method and apparatus provided by the embodiments of the present invention estimate the power spectrum of the noise signal array at different stages of the speech according to the result of the voice activity detection, and have higher noise estimation accuracy, thereby improving the voice enhancement. Effect.
  • FIG. 1 is a schematic flowchart of a microphone array voice enhancement method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an original voice collection processing process according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a noise power spectrum estimation process according to an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart showing a detailed calculation of noise power spectrum according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a microphone array voice enhancement apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of voice signal processing according to an embodiment of the present invention.
  • beamforming techniques related to embodiments of the present invention include both fixed beam and adaptive beam.
  • Fixed beam means that the parameters of the array signal processing system do not change with the pickup signal, but are determined by the array topology and the preset noise field model, including the time domain fixed beam and the frequency domain fixed beam.
  • the directivity of the fixed beam at the low and medium frequencies is degraded, and the speech signal is a wideband signal. If the mid-low frequency directivity is improved, the robustness of the array will be deteriorated, so it is less used alone in practical small microphone array applications.
  • the adaptive beam dynamically generates the optimal beam parameters according to the optimized conditions by automatically estimating the sound field and the transfer function of the sound source to the microphone.
  • the transfer function of the sound source to each microphone is difficult to estimate, it is often combined with multi-channel noise suppression technology or post-filtering after beam processing, which requires accurate estimation of noise statistical characteristics. And find the best balance between target signal distortion and noise suppression.
  • Embodiments of the present invention provide a microphone array wind voice enhancement method, including the steps shown in FIG. 1:
  • Step 101 Acquire a first array voice signal that is input by using a multi-channel digital voice collection device.
  • Step 102 Calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal.
  • Step 103 Perform single channel speech enhancement processing by using a power spectrum estimation value of the optimal beam output signal.
  • the minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
  • the microphone array voice enhancement method calculates the first array voice signal collected and input by the multi-channel digital voice signal acquisition device by using the minimum variance adaptive beam optimization model, and the The minimum variance adaptive beam optimization model includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device, and can perform speech enhancement processing on the microphone array with larger inter-array spacing, and can achieve high-quality pickup.
  • the log MMSE method is applied to process the optimal beam output signal.
  • the steps shown in FIG. 2 are also included:
  • Step 201 Acquire original voice array signals y 1 (n), ... y N (n) through a multi-channel digital voice collecting device;
  • Step 202 performing short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, ⁇ ) ... y N (k, ⁇ ) of the original speech array signal;
  • n is a discrete time variable
  • N is the number of array elements
  • k is the frequency point number
  • is the short time frame number.
  • the original voice array signals collected by the multi-channel digital voice collecting device are y 1 (n) ... y n (n), and the signals collected by the multi-channel digital voice collecting devices are according to the time window length L wnd
  • the adjacent windows are overlapped by L ovlp for windowing and truncation.
  • the windowing truncation adopts a Hanning window, which overlaps 3/4 window length.
  • the signal after windowing of each channel is subjected to short-time Fourier transform to obtain a representative signal of the time-frequency of the original speech array signal: y 1 (k, ⁇ ) ... y N (k, ⁇ ).
  • the time-frequency representation signal y i (k, ⁇ ), the noise signal v 1 (k, ⁇ ) of the original speech array signal, and the target speech signal x(k, ⁇ ) emitted by the target sound source satisfy the following relationship :
  • y i (k, ⁇ ) v 1 (k, ⁇ ) + x (k, ⁇ ).
  • the optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T is used to perform frequency domain optimal super-directional beam processing on the time-frequency representative signal, and obtained First array of speech signals among them,
  • the optimal super-directional beam coefficients are determined in accordance with an array topology of the multi-channel digital speech acquisition device in conjunction with a sound source direction.
  • the inter-frame spacing of the multi-channel digital acquisition device is allowed to be larger than that of the multi-channel speech collection device in the related art.
  • the following formula when the first array of speech signals is used to calculate an optimal beam output signal synthesized by the first array of speech signals, the following formula is used:
  • the minimum variance adaptive beam optimization model of the first array of speech signals is:
  • w H (k) is a conjugate transformation matrix of w(k); a noise coherence matrix estimated according to the first array of speech signals; A spatial steering vector for the target sound source to the digital speech acquisition device.
  • the conjugate complex number of the adaptive filter parameters calculated from the noise signal column vector and the optimal super-directional beam coefficient and the spatial guidance vector of the target sound source to the digital speech acquisition device is:
  • the space steering vector of the target sound source to the digital voice collecting device is calculated by the following formula:
  • d 1 ?? d N is a first to N digital audio capture device to the device from the digital voice collecting center of the array, c is the speed of sound; f s is the sampling frequency; [theta] is the orientation of a target sound source to the digital voice collecting device angle. Since the signal is first processed by the frequency domain super-directed beam, the spatial steering vector of the target sound source to the digital speech acquisition device after the frequency domain hyper-pointing processing becomes:
  • the noise signal estimated according to the first array voice signal is Correspondingly, the noise coherence matrix estimated by the first array of speech signals is: Where E represents the desired utility function; for Conjugated transformation matrix.
  • the method further comprises the steps shown in Figure 3:
  • Step 301 Perform a voice activity detection (VAD) on the noise signal array in the array voice input signals of the multiple channels;
  • VAD voice activity detection
  • Step 302 Perform noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD.
  • Step 303 Perform a second enhancement on the optimal beam output signal according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value.
  • the above embodiment can perform dynamic time-varying estimation of the noise signal in the first array speech signal to prepare for secondary enhancement of the sound.
  • noise can be estimated using the following formula when there is no speech:
  • the noise can be estimated by the following formula:
  • a R is the smoothing factor
  • the step of performing noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD may include the process shown in FIG. 4:
  • Step 401 Calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
  • Step 402 Perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
  • the step of power spectrum estimation of the noise signal array based on the result of the voice activity detection VAD comprises:
  • L 1 is the number of frequency points.
  • the noise peak is then calculated using the power spectrum estimate of the noise signal array at the beginning of the speech:
  • a 1 is a noise spectrum update parameter; a a and a d are respectively a smoothing coefficient; a 0 is a noise spectrum update parameter; a fast smooth estimate of the power spectrum of the noise signal array; Smoothing the estimated value for the two-pole regression of the power spectrum of the noise signal array; An optimal beam output signal power spectrum estimate for the single channel enhancement process. An estimate of the noise power threshold for the noise signal array.
  • the power spectrum estimate of the optimal beam output signal is calculated using the following formula:
  • a power spectrum estimate for the optimal beam output signal Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
  • the post filter is input for processing.
  • a schematic diagram of the speech signal processing process is shown in FIG.
  • the inverse processed FFT transform is performed on the signal processed by the post filter, and then the enhanced time domain signal stream is reconstructed by the splicing addition method.
  • the frequency domain optimal super-directional beam is designed according to the array topology and the sound source direction, and then the original speech array signal is subjected to short-time Fourier transform, and then estimated according to the original voice array signal.
  • the noise coherence matrix is calculated by using the optimal super-directional beam parameters to calculate the original speech array signal after the short-time Fourier transform, so that the speech signal is enhanced, and the dynamic estimation of the noise correlation matrix is performed to update the optimal adaptive filter parameters.
  • the post filter is used to further improve the signal quality.
  • only a small number of microphones can be used to achieve high-quality long-distance voice pickup, and the complex noise outside the beam is obviously suppressed, and the voice distortion is hardly heard.
  • the microphone array voice enhancement method provided by the embodiment of the present invention can accurately calculate the noise signal in the original voice signal input by the voice collection device, so that the noise signal can be effectively effective when the voice is enhanced. Suppression.
  • the embodiment of the invention further provides a microphone array voice enhancement device, which has the structure shown in FIG. 5 and includes:
  • a first acquiring module configured to: acquire a first array voice signal that is input through a multi-channel digital voice collecting device;
  • An optimal beam output signal calculation module is configured to calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal;
  • a first enhancement module configured to: perform a single channel speech enhancement process by using a power spectrum estimation value of the optimal beam output signal
  • the minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
  • the microphone array voice enhancement device uses the optimal beam output signal calculation module to process the first array voice signal collected by the multi-channel digital voice collection device, and applies the minimum variance adaptive method.
  • the beam optimization model calculates an optimal beam output signal of the first array of speech signals, and can have a larger array of microphone arrays with larger array elements in the digital speech acquisition device.
  • the apparatus further includes:
  • the original signal acquisition module is configured to: collect the original voice array signals y 1 (n), ... y N (n) through the multi-channel digital voice collection device;
  • an original signal transformation module configured to: perform short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, ⁇ ) ... y N (k, ⁇ ) of the original speech array signal ;
  • n is a discrete time variable
  • N is the number of array elements
  • k is the frequency point number
  • is the short time frame number.
  • the optimal super-directional beam coefficients are set according to a manner in which the multi-channel digital voice collection device is set.
  • the optimal beam output signal calculation module calculates an optimal beam synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal.
  • the following formula is used:
  • the minimum variance adaptive beam optimization model of the first array of speech signals is:
  • w H (k) is a conjugate transformation matrix of w(k); a noise coherence matrix estimated according to the first array of speech signals; A spatial steering vector for the target sound source to the digital speech acquisition device.
  • the optimal beam output signal calculation module calculates the optimal beam output signal of the first array of speech signals
  • the spatial steering vector of the target sound source to the digital speech acquisition device is calculated according to the following formula. :
  • d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array
  • c is the sound velocity
  • f s is the sampling frequency
  • is the orientation of the target sound source to the digital speech acquisition device angle
  • It is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
  • the apparatus further includes:
  • a VAD module configured to: perform voice activity detection VAD on an array of noise signals in the array voice input signals of the plurality of channels;
  • a noise power spectrum estimation module configured to: perform noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD;
  • a second enhancement module configured to: perform the second enhancement on the optimal beam output signal according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value.
  • the noise power spectrum estimation module includes:
  • a first noise power spectrum calculation unit configured to: calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
  • the second noise power spectrum calculation unit is configured to perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
  • the first noise power spectrum calculation unit includes:
  • the no-speech state calculation sub-unit is set to: when in the non-speech state, estimate the power spectrum of the noise signal array using the following formula:
  • the voice start and voice state calculation subunit is set to: when in the voice start state and the voice state, estimate the power spectrum of the noise signal array by using the following formula:
  • the no-speech state calculation sub-unit is set to: when in the speech end state, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
  • a 1 is the noise spectrum update parameter
  • a a and a d are the smoothing coefficients respectively.
  • the power spectrum estimate of the optimal beam output signal is calculated using the following formula:
  • a power spectrum estimate for the optimal beam output signal Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
  • the estimated noise power spectrum after the compromise And power spectrum estimates of the optimal beam output signal Input the post filter for processing.
  • the inverse processed FFT transform is performed on the signal processed by the post filter, and then the enhanced time domain signal stream is reconstructed by the splicing addition method.
  • the microphone array voice enhancement device provided by the embodiment of the present invention can effectively estimate and process the noise signal in the first array voice signal collected by the multi-channel digital voice collection device, which is beneficial to In the process of subsequent speech enhancement, the noise signal is effectively filtered out, and the speech enhancement effect is improved.
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve. Thus, the invention is not limited to any specific combination of hardware and software.
  • Each device/function module/functional unit in the above embodiments may use a general-purpose computing device. Implementations can be centralized on a single computing device or distributed across a network of multiple computing devices.
  • each device/function module/functional unit in the above embodiment When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the embodiment of the invention can perform voice enhancement processing on the microphone array with larger spacing of the array elements, and can realize high quality pickup.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

A microphone array speech enhancement method and a corresponding device, the method comprising the following steps: first array speech signals collected and inputted via a multi-path digital speech collection device are obtained (101); according to a minimum variance adaptive beam optimisation model of the first array speech signals, the first array speech signals are used to calculate an optimal beam output signal synthesised by the first array speech signals (102); a power spectrum estimation value of the optimal beam output signal is used to perform single-channel speech enhancement processing (103). The minimum variance adaptive beam optimisation model of the first array speech signals comprises a spatial steering vector of a target sound source to the multi-path digital speech collection device.

Description

一种麦克风阵列语音增强方法及装置Microphone array speech enhancement method and device 技术领域Technical field
本发明涉及语音处理,特别涉及一种麦克风阵列语音增强方法及装置。The present invention relates to voice processing, and in particular, to a microphone array voice enhancement method and apparatus.
背景技术Background technique
随着免提通话、会议系统、智能家居和智能家电的发展,高质量的远距离语音拾音成为影响语音采集处理系统性能的关键因素之一。为了适应复杂的声音环境,单麦克风技术已经很难胜任,具有多路语音采集设备的麦克风阵列则日益成为主流,其中最常用的就是各种波束形成技术、语音增强技术等。语音增强技术需要从语音采集设备所采集的原始语音信号中提取尽可能纯净的目标语音。波束形成技术通过调整参数提高传声器阵列对某个方向声音的灵敏度,提高语音增强的效果。然而相关技术中大多数语音增强技术只能处理阵元少、间距小的语音采集设备阵列所采集的原始语音,因此传统阵列语音增强技术往往性能非常有限。With the development of hands-free calling, conference systems, smart homes and smart home appliances, high-quality long-distance voice pickup has become one of the key factors affecting the performance of voice acquisition and processing systems. In order to adapt to the complex sound environment, single-microphone technology has been difficult, and microphone arrays with multi-channel voice acquisition devices are becoming more and more mainstream. The most commonly used are beamforming technology and voice enhancement technology. Speech enhancement technology needs to extract as pure a target voice as possible from the original speech signal collected by the speech acquisition device. The beamforming technique improves the sensitivity of the microphone array to sound in a certain direction by adjusting the parameters, and improves the effect of speech enhancement. However, most of the speech enhancement techniques in the related art can only process the original speech collected by the array of speech acquisition devices with few array elements and small spacing, so the traditional array speech enhancement technology often has very limited performance.
发明内容Summary of the invention
本发明实施例提供一种麦克风阵列语音增强方法及装置。所述方法及装置能够处理阵元较多、间距较大的语音采集设备阵列的原始语音。Embodiments of the present invention provide a microphone array voice enhancement method and apparatus. The method and apparatus are capable of processing original speech of an array of speech acquisition devices having more array elements and larger spacing.
一种麦克风阵列语音增强方法,包括如下步骤:A microphone array speech enhancement method includes the following steps:
获取通过多路数字语音采集设备采集输入的第一阵列语音信号;Acquiring the first array of voice signals collected by the multi-channel digital voice collection device;
根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号;Calculating an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal;
采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理;Performing single channel speech enhancement processing using the power spectrum estimation value of the optimal beam output signal;
所述第一阵列语音信号的最小方差自适应波束优化模型包括目标声源到所述多路数字语音采集设备的空间导向矢量。The minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
可选的,获取通过多路数字语音采集设备采集输入的第一阵列语音信号之前,还包括: Optionally, before acquiring the first array voice signal input by the multi-channel digital voice collection device, the method further includes:
通过多路数字语音采集设备采集原始语音阵列信号y1(n),……yN(n);Acquiring original speech array signals y 1 (n), ... y N (n) through a multi-channel digital voice acquisition device;
对所述原始语音信号进行短时傅里叶变换得到所述原始语音阵列信号的时频表示信号y1(k,λ)……yN(k,λ);Performing a short time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, λ) ... y N (k, λ) of the original speech array signal;
采用最优超指向波束系数A(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
Figure PCTCN2014092217-appb-000001
The optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T is used to perform frequency domain optimal super-directional beam processing on the time-frequency representative signal, and obtained First array of speech signals
Figure PCTCN2014092217-appb-000001
所述n为离散时间变量;N为阵元个数;k为频点编号;λ为短时帧编号。The n is a discrete time variable; N is the number of array elements; k is the frequency point number; λ is the short time frame number.
可选的,所述最优超指向波束系数根据所述多路数字语音采集设备的设置方式进行设定。Optionally, the optimal super-directional beam coefficient is set according to a setting manner of the multi-channel digital voice collection device.
可选的,根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号时,采用下述公式:Optionally, according to the minimum variance adaptive beam optimization model of the first array voice signal, when the first array voice signal is used to calculate the optimal beam output signal synthesized by the first array voice signal, the following formula is adopted:
Figure PCTCN2014092217-appb-000002
Figure PCTCN2014092217-appb-000002
Figure PCTCN2014092217-appb-000003
为所述最优波束输出信号;
Figure PCTCN2014092217-appb-000004
为根据噪声信号列矢量和最优超指向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数;
Figure PCTCN2014092217-appb-000005
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数;yi(k,λ)为所述第一阵列语音信号。
Figure PCTCN2014092217-appb-000003
Outputting a signal for the optimal beam;
Figure PCTCN2014092217-appb-000004
An adaptive filter parameter calculated according to a noise signal column vector and an optimal super-directional beam coefficient and a spatial guidance vector of the target sound source to the digital speech acquisition device;
Figure PCTCN2014092217-appb-000005
Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T ; y i (k, λ) is The first array of speech signals.
可选的,所述第一阵列语音信号的最小方差自适应波束优化模型为:Optionally, the minimum variance adaptive beam optimization model of the first array of speech signals is:
Figure PCTCN2014092217-appb-000006
且满足
Figure PCTCN2014092217-appb-000007
Figure PCTCN2014092217-appb-000006
And satisfied
Figure PCTCN2014092217-appb-000007
其中,w(k)中的阵元与
Figure PCTCN2014092217-appb-000008
互为共轭复数;wH(k)为w(k)的共轭转制矩阵;
Figure PCTCN2014092217-appb-000009
为根据所述第一阵列语音信号估计的噪声相干矩阵;
Figure PCTCN2014092217-appb-000010
为目标声源到所述数字语音采集设备的空间导向矢量。
Where, the array elements in w(k)
Figure PCTCN2014092217-appb-000008
Conjugated complex numbers of each other; w H (k) is a conjugate transformation matrix of w(k);
Figure PCTCN2014092217-appb-000009
a noise coherence matrix estimated according to the first array of speech signals;
Figure PCTCN2014092217-appb-000010
A spatial steering vector for the target sound source to the digital speech acquisition device.
可选的,所述目标声源到数字语音采集设备的空间导向矢量根据下述公式计算:Optionally, the spatial steering vector of the target sound source to the digital voice collection device is calculated according to the following formula:
Figure PCTCN2014092217-appb-000011
Figure PCTCN2014092217-appb-000011
其中,d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离,c是声速;fs是采样频率;θ为目标声源到数字语音采集设备的方位角;
Figure PCTCN2014092217-appb-000012
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭 复数。
Where d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array, c is the sound velocity; f s is the sampling frequency; θ is the orientation of the target sound source to the digital speech acquisition device angle;
Figure PCTCN2014092217-appb-000012
Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
可选的,所述方法还包括:Optionally, the method further includes:
对所述多个通道的阵列语音输入信号中的噪声信号阵列进行语音活动检测VAD;Performing a voice activity detection VAD on the noise signal array in the array voice input signals of the plurality of channels;
根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计;Performing noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD;
根据所述最优波束输出信号的最优功率谱估计值和所述噪声功率谱估计值对所述最优波束输出信号进行第二次增强。And performing, according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value, the second enhancement of the optimal beam output signal.
可选的,根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计的步骤包括:Optionally, the step of performing noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD includes:
计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱;Calculating a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
对所述有语音状态时的噪声功率谱和无语音状态时的噪声功率谱进行折中处理,得到噪声功率谱估计值。The noise power spectrum in the speech state and the noise power spectrum in the non-speech state are traded to obtain a noise power spectrum estimation value.
可选的,计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱的步骤包括:Optionally, the step of calculating a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state includes:
当处于无语音状态时,采用下述公式对噪声信号阵列功率谱估计:When in the no-speech state, the power spectrum of the noise signal array is estimated using the following formula:
Figure PCTCN2014092217-appb-000013
Figure PCTCN2014092217-appb-000013
当处于语音开始状态和有语音状态时,采用下述公式对噪声信号阵列功率谱进行估计:When in the voice start state and the voice state, the power spectrum of the noise signal array is estimated by the following formula:
Figure PCTCN2014092217-appb-000014
Figure PCTCN2014092217-appb-000014
处于语音结束状态时,采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:At the end of speech, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
Figure PCTCN2014092217-appb-000015
Figure PCTCN2014092217-appb-000015
上述公式中,
Figure PCTCN2014092217-appb-000016
In the above formula,
Figure PCTCN2014092217-appb-000016
Figure PCTCN2014092217-appb-000017
Figure PCTCN2014092217-appb-000017
Figure PCTCN2014092217-appb-000018
Figure PCTCN2014092217-appb-000018
其中,a1为噪声谱更新参数;aa、ad分别为平滑系数。Where a 1 is the noise spectrum update parameter; a a and a d are the smoothing coefficients respectively.
可选的,所述最优波束输出信号的功率谱估计值采用下述公式进行计算:Optionally, the power spectrum estimation value of the optimal beam output signal is calculated by using the following formula:
Figure PCTCN2014092217-appb-000019
Figure PCTCN2014092217-appb-000019
其中,
Figure PCTCN2014092217-appb-000020
为所述最优波束输出信号的功率谱估计值;
Figure PCTCN2014092217-appb-000021
为所述最优波束输出信号;a0为噪声谱更新参数。
among them,
Figure PCTCN2014092217-appb-000020
A power spectrum estimate for the optimal beam output signal;
Figure PCTCN2014092217-appb-000021
Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
一种麦克风阵列语音增强装置,包括:A microphone array voice enhancement device includes:
第一获取模块,其设置为:获取通过多路数字语音采集设备采集输入的第一阵列语音信号;a first acquiring module, configured to: acquire a first array voice signal that is input through a multi-channel digital voice collecting device;
最优波束输出信号计算模块,其设置为:根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号;An optimal beam output signal calculation module is configured to calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal;
第一增强模块,其设置为:采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理;a first enhancement module, configured to: perform a single channel speech enhancement process by using a power spectrum estimation value of the optimal beam output signal;
所述第一阵列语音信号的最小方差自适应波束优化模型包括目标声源到所述多路数字语音采集设备的空间导向矢量。The minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
可选的,所述装置还包括:Optionally, the device further includes:
原始信号采集模块,其设置为:通过多路数字语音采集设备采集原始语音阵列信号y1(n),……yN(n);The original signal acquisition module is configured to: collect the original voice array signals y 1 (n), ... y N (n) through the multi-channel digital voice collection device;
原始信号变换模块,其设置为:对所述原始语音信号进行短时傅里叶变换得到所述原始语音阵列信号的时频表示信号y1(k,λ)……yN(k,λ);And an original signal transformation module, configured to: perform short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, λ) ... y N (k, λ) of the original speech array signal ;
最优超指向波束处理模块,其设置为:采用最优超指向波束系数A(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
Figure PCTCN2014092217-appb-000022
An optimal super-directional beam processing module, configured to: represent the time-frequency representation using an optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T The signal is subjected to frequency domain optimal super-directional beam processing to obtain a first array of speech signals
Figure PCTCN2014092217-appb-000022
所述n为离散时间变量;N为阵元个数;k为频点编号;λ为短时帧编号。The n is a discrete time variable; N is the number of array elements; k is the frequency point number; λ is the short time frame number.
可选的,所述最优超指向波束系数根据所述多路数字语音采集设备的设置方式进行设定。Optionally, the optimal super-directional beam coefficient is set according to a setting manner of the multi-channel digital voice collection device.
可选的,所述最优波束输出信号计算模块是设置为采用下述公式根据所 述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号:Optionally, the optimal beam output signal calculation module is configured to adopt the following formula according to the The minimum variance adaptive beam optimization model of the first array of speech signals is used to calculate an optimal beam output signal synthesized by the first array of speech signals using the first array of speech signals:
Figure PCTCN2014092217-appb-000023
Figure PCTCN2014092217-appb-000023
Figure PCTCN2014092217-appb-000024
为所述最优波束输出信号;
Figure PCTCN2014092217-appb-000025
为根据噪声信号列矢量和最优超指向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数;
Figure PCTCN2014092217-appb-000026
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数;yi(k,λ)为所述第一阵列语音信号。
Figure PCTCN2014092217-appb-000024
Outputting a signal for the optimal beam;
Figure PCTCN2014092217-appb-000025
An adaptive filter parameter calculated according to a noise signal column vector and an optimal super-directional beam coefficient and a spatial guidance vector of the target sound source to the digital speech acquisition device;
Figure PCTCN2014092217-appb-000026
Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T ; y i (k, λ) is The first array of speech signals.
可选的,第一阵列语音信号的最小方差自适应波束优化模型为:Optionally, the minimum variance adaptive beam optimization model of the first array of speech signals is:
Figure PCTCN2014092217-appb-000027
且满足
Figure PCTCN2014092217-appb-000028
Figure PCTCN2014092217-appb-000027
And satisfied
Figure PCTCN2014092217-appb-000028
其中,w(k)中的阵元与
Figure PCTCN2014092217-appb-000029
互为共轭复数;wH(k)为w(k)的共轭转制矩阵;
Figure PCTCN2014092217-appb-000030
为根据所述第一阵列语音信号估计的噪声相干矩阵;
Figure PCTCN2014092217-appb-000031
为目标声源到所述数字语音采集设备的空间导向矢量。
Where, the array elements in w(k)
Figure PCTCN2014092217-appb-000029
Conjugated complex numbers of each other; w H (k) is a conjugate transformation matrix of w(k);
Figure PCTCN2014092217-appb-000030
a noise coherence matrix estimated according to the first array of speech signals;
Figure PCTCN2014092217-appb-000031
A spatial steering vector for the target sound source to the digital speech acquisition device.
可选的,最优波束输出信号计算模块计算第一阵列语音信号所和成的最优波束输出信号时,所采用的目标声源到数字语音采集设备的空间导向矢量根据下述公式计算:Optionally, when the optimal beam output signal calculation module calculates the optimal beam output signal of the first array voice signal, the spatial steering vector of the target sound source to the digital voice acquisition device is calculated according to the following formula:
Figure PCTCN2014092217-appb-000032
Figure PCTCN2014092217-appb-000032
其中,d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离,c是声速;fs是采样频率;θ为目标声源到数字语音采集设备的方位角;
Figure PCTCN2014092217-appb-000033
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数。
Where d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array, c is the sound velocity; f s is the sampling frequency; θ is the orientation of the target sound source to the digital speech acquisition device angle;
Figure PCTCN2014092217-appb-000033
It is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
可选的,还包括:Optionally, it also includes:
VAD模块,其设置为:对所述多个通道的阵列语音输入信号中的噪声信号阵列进行语音活动检测VAD;a VAD module, configured to: perform voice activity detection VAD on an array of noise signals in the array voice input signals of the plurality of channels;
噪声功率谱估计模块,其设置为:根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计;a noise power spectrum estimation module, configured to: perform noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD;
第二增强模块,其设置为:根据所述最优波束输出信号的最优功率谱估 计值和所述噪声功率谱估计值对所述最优波束输出信号进行第二次增强。a second enhancement module, configured to: estimate an optimal power spectrum according to the optimal beam output signal The estimate and the noise power spectrum estimate provide a second enhancement to the optimal beam output signal.
可选的,所述噪声功率谱估计模块包括:Optionally, the noise power spectrum estimation module includes:
第一噪声功率谱计算单元,其设置为:计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱;a first noise power spectrum calculation unit configured to: calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
第二噪声功率谱计算单元,其设置为:对所述有语音状态时的噪声功率谱和无语音状态时的噪声功率谱进行折中处理,得到噪声功率谱估计值。The second noise power spectrum calculation unit is configured to perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
可选的,所述第一噪声功率谱计算单元包括:Optionally, the first noise power spectrum calculation unit includes:
无语音状态计算子单元,其设置为:当处于无语音状态时,采用下述公式对噪声信号阵列功率谱估计:The no-speech state calculation sub-unit is set to: when in the non-speech state, estimate the power spectrum of the noise signal array using the following formula:
Figure PCTCN2014092217-appb-000034
Figure PCTCN2014092217-appb-000034
语音开始和有语音状态计算子单元,其设置为:当处于语音开始状态和有语音状态时,采用下述公式对噪声信号阵列功率谱进行估计:The voice start and voice state calculation subunit is set to: when in the voice start state and the voice state, estimate the power spectrum of the noise signal array by using the following formula:
Figure PCTCN2014092217-appb-000035
Figure PCTCN2014092217-appb-000035
无语音状态计算子单元,其设置为:当处于语音结束状态时,采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:The no-speech state calculation sub-unit is set to: when in the speech end state, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
Figure PCTCN2014092217-appb-000036
Figure PCTCN2014092217-appb-000036
上述公式中,
Figure PCTCN2014092217-appb-000037
In the above formula,
Figure PCTCN2014092217-appb-000037
Figure PCTCN2014092217-appb-000038
Figure PCTCN2014092217-appb-000038
Figure PCTCN2014092217-appb-000039
Figure PCTCN2014092217-appb-000039
其中,a1为噪声谱更新参数;aa、ad分别为平滑系数。Where a 1 is the noise spectrum update parameter; a a and a d are the smoothing coefficients respectively.
可选的,所述最优波束输出信号的功率谱估计值采用下述公式进行计算:Optionally, the power spectrum estimation value of the optimal beam output signal is calculated by using the following formula:
Figure PCTCN2014092217-appb-000040
Figure PCTCN2014092217-appb-000040
其中,
Figure PCTCN2014092217-appb-000041
为所述最优波束输出信号的功率谱估计值;
Figure PCTCN2014092217-appb-000042
为所述最优波束输出信号;a0为噪声谱更新参数。
among them,
Figure PCTCN2014092217-appb-000041
A power spectrum estimate for the optimal beam output signal;
Figure PCTCN2014092217-appb-000042
The output signal is a optimal beam; a 0 is the noise spectrum update parameters.
本发明实施例还提供一种计算机程序,包括程序指令,当该程序指令被计算机执行时,使得该计算机可执行上述方法。 Embodiments of the present invention also provide a computer program comprising program instructions that, when executed by a computer, cause the computer to perform the above method.
本发明实施例还提供一种载有所述计算机程序的计算机可读存储介质。Embodiments of the present invention also provide a computer readable storage medium carrying the computer program.
从上面所述可以看出,本发明实施例提供的麦克风阵列语音增强方法及装置,采用最小方差自适应波束优化模型对多路数字语音信号采集设备采集输入的第一阵列语音信号进行计算,且所述最小方差自适应波束优化模型中包括目标声源到所述多路数字语音采集设备的空间导向矢量,可以对阵元间距较大的传声器阵列进行语音增强处理,并能实现高品质拾音。此外,本发明实施例提供的麦克风阵列语音增强方法及装置根据语音活动检测的结果,在语音的不同阶段对噪声信号阵列功率谱进行估计,具有更高的噪声估计准确度,从而可提高语音增强的效果。As can be seen from the above, the microphone array voice enhancement method and apparatus provided by the embodiments of the present invention use the minimum variance adaptive beam optimization model to calculate the first array voice signal collected and input by the multi-channel digital voice signal acquisition device, and The minimum variance adaptive beam optimization model includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device, and can perform speech enhancement processing on the microphone array with larger inter-array spacing, and can achieve high-quality pickup. In addition, the microphone array speech enhancement method and apparatus provided by the embodiments of the present invention estimate the power spectrum of the noise signal array at different stages of the speech according to the result of the voice activity detection, and have higher noise estimation accuracy, thereby improving the voice enhancement. Effect.
附图概述BRIEF abstract
图1为本发明实施例的麦克风阵列语音增强方法流程示意图;1 is a schematic flowchart of a microphone array voice enhancement method according to an embodiment of the present invention;
图2为本发明一种实施例的原始语音采集处理流程示意图;2 is a schematic flowchart of an original voice collection processing process according to an embodiment of the present invention;
图3为本发明一种实施例的噪声功率谱估计流程示意图;3 is a schematic diagram of a noise power spectrum estimation process according to an embodiment of the present invention;
图4为本发明一种实施例的噪声功率谱估计较详细的流程示意图;4 is a schematic flow chart showing a detailed calculation of noise power spectrum according to an embodiment of the present invention;
图5为本发明实施例的麦克风阵列语音增强装置结构示意图;FIG. 5 is a schematic structural diagram of a microphone array voice enhancement apparatus according to an embodiment of the present invention; FIG.
图6为本发明实施例的语音信号处理示意图。FIG. 6 is a schematic diagram of voice signal processing according to an embodiment of the present invention.
本发明的较佳实施方式Preferred embodiment of the invention
以下结合说明书附图对本发明的实施例进行说明,在不冲突的情况下,本发明实施例和实施例中的特征可以相互任意组合。The embodiments of the present invention are described below with reference to the accompanying drawings, and the features of the embodiments and the embodiments of the present invention may be arbitrarily combined with each other without conflict.
首先,与本发明实施例相关的波束形成技术包括固定波束和自适应波束两类。First, beamforming techniques related to embodiments of the present invention include both fixed beam and adaptive beam.
固定波束是指阵列信号处理系统的参数不随拾音信号变化,而是由阵列拓扑结构和预设的噪声场模型决定的,包括时域固定波束和频域固定波束等。固定波束在中低频的指向性变差,而语音信号是宽带信号,如果提高中低频指向性将会导致阵列的稳健性变差,因此在实际的小型麦克风阵列应用中较少单独采用。 Fixed beam means that the parameters of the array signal processing system do not change with the pickup signal, but are determined by the array topology and the preset noise field model, including the time domain fixed beam and the frequency domain fixed beam. The directivity of the fixed beam at the low and medium frequencies is degraded, and the speech signal is a wideband signal. If the mid-low frequency directivity is improved, the robustness of the array will be deteriorated, so it is less used alone in practical small microphone array applications.
自适应波束是通过自动估测声场情况以及声源到达传声器的传输函数,根据优化条件动态生成最优波束参数。在实际应用中,由于声源到达各麦克风的传递函数很难估计,因此常常与多通道噪声抑制技术结合,或者在波束处理之后增加后置滤波处理,这都需要对噪声统计特性进行准确估计,并在目标信号失真和噪声抑制程度之间寻找最佳平衡点。The adaptive beam dynamically generates the optimal beam parameters according to the optimized conditions by automatically estimating the sound field and the transfer function of the sound source to the microphone. In practical applications, since the transfer function of the sound source to each microphone is difficult to estimate, it is often combined with multi-channel noise suppression technology or post-filtering after beam processing, which requires accurate estimation of noise statistical characteristics. And find the best balance between target signal distortion and noise suppression.
本发明实施例提供一种麦克阵列风语音增强方法,包括如图1所示的步骤:Embodiments of the present invention provide a microphone array wind voice enhancement method, including the steps shown in FIG. 1:
步骤101:获取通过多路数字语音采集设备采集输入的第一阵列语音信号;Step 101: Acquire a first array voice signal that is input by using a multi-channel digital voice collection device.
步骤102:根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号;Step 102: Calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal.
步骤103:采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理。Step 103: Perform single channel speech enhancement processing by using a power spectrum estimation value of the optimal beam output signal.
所述第一阵列语音信号的最小方差自适应波束优化模型包括目标声源到所述多路数字语音采集设备的空间导向矢量。The minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
从上面所述可以看出,本发明实施例提供的麦克风阵列语音增强方法,采用最小方差自适应波束优化模型对多路数字语音信号采集设备采集输入的第一阵列语音信号进行计算,且所述最小方差自适应波束优化模型中包括目标声源到所述多路数字语音采集设备的空间导向矢量,可以对阵元间距较大大的传声器阵列进行语音增强处理,并能实现高品质拾音。As can be seen from the above, the microphone array voice enhancement method provided by the embodiment of the present invention calculates the first array voice signal collected and input by the multi-channel digital voice signal acquisition device by using the minimum variance adaptive beam optimization model, and the The minimum variance adaptive beam optimization model includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device, and can perform speech enhancement processing on the microphone array with larger inter-array spacing, and can achieve high-quality pickup.
在本发明的一些实施例中,采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理的步骤中,应用logMMSE方法对最优波束输出信号进行处理。In some embodiments of the present invention, in the step of performing single channel speech enhancement processing using the power spectrum estimation value of the optimal beam output signal, the log MMSE method is applied to process the optimal beam output signal.
在本发明的一些实施例中,获取通过多路数字语音采集设备采集输入的第一阵列语音信号之前,还包括图2所示的步骤:In some embodiments of the present invention, before acquiring the first array of voice signals input by the multi-channel digital voice collection device, the steps shown in FIG. 2 are also included:
步骤201:通过多路数字语音采集设备采集原始语音阵列信号y1(n),……yN(n);Step 201: Acquire original voice array signals y 1 (n), ... y N (n) through a multi-channel digital voice collecting device;
步骤202:对所述原始语音信号进行短时傅里叶变换得到所述原始语音阵列信号的时频表示信号y1(k,λ)……yN(k,λ); Step 202: performing short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, λ) ... y N (k, λ) of the original speech array signal;
步骤203:采用最优超指向波束系数A(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
Figure PCTCN2014092217-appb-000043
i=1……N。
Step 203: Perform frequency domain optimal hyper-directional beam on the time-frequency representation signal by using an optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T Processing to obtain a first array of speech signals
Figure PCTCN2014092217-appb-000043
i=1...N.
所述n为离散时间变量;N为阵元个数;k为频点编号;λ为短时帧编号。The n is a discrete time variable; N is the number of array elements; k is the frequency point number; λ is the short time frame number.
其中,通过多路数字语音采集设备所采集到的原始语音阵列信号为y1(n)……yn(n),对这些多路数字语音采集设备所采集到的信号按照时间窗长度Lwnd、相邻窗之间重叠Lovlp进行加窗截短。所述加窗截短所采用的为汉宁(Hanning)窗,重叠3/4窗长。然后再将每个通道加窗后的信号进行短时傅里叶变换得到所述原始语音阵列信号时频的表示信号:y1(k,λ)……yN(k,λ)。理论上,所述原始语音阵列信号时频的表示信号yi(k,λ)、噪声信号v1(k,λ)和目标声源所发出的目标语音信号x(k,λ)满足如下关系:The original voice array signals collected by the multi-channel digital voice collecting device are y 1 (n) ... y n (n), and the signals collected by the multi-channel digital voice collecting devices are according to the time window length L wnd The adjacent windows are overlapped by L ovlp for windowing and truncation. The windowing truncation adopts a Hanning window, which overlaps 3/4 window length. Then, the signal after windowing of each channel is subjected to short-time Fourier transform to obtain a representative signal of the time-frequency of the original speech array signal: y 1 (k, λ) ... y N (k, λ). Theoretically, the time-frequency representation signal y i (k, λ), the noise signal v 1 (k, λ) of the original speech array signal, and the target speech signal x(k, λ) emitted by the target sound source satisfy the following relationship :
yi(k,λ)=v1(k,λ)+x(k,λ)。y i (k, λ) = v 1 (k, λ) + x (k, λ).
采用最优超指向波束系数A(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
Figure PCTCN2014092217-appb-000044
其中,
The optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T is used to perform frequency domain optimal super-directional beam processing on the time-frequency representative signal, and obtained First array of speech signals
Figure PCTCN2014092217-appb-000044
among them,
Figure PCTCN2014092217-appb-000045
Figure PCTCN2014092217-appb-000045
其中,
Figure PCTCN2014092217-appb-000046
为a(k)的共轭矩阵;
Figure PCTCN2014092217-appb-000047
表示频域加权后的目标语音信号,
Figure PCTCN2014092217-appb-000048
表示频域加权后的噪声信号;i=1……N。
among them,
Figure PCTCN2014092217-appb-000046
a conjugate matrix of a(k);
Figure PCTCN2014092217-appb-000047
Representing the frequency domain weighted target speech signal,
Figure PCTCN2014092217-appb-000048
Represents a frequency domain weighted noise signal; i = 1 ... N.
在一些实施例中,上述最优超指向波束系数根据所述多路数字语音采集设备的阵列拓扑结构结合声源方向确定。In some embodiments, the optimal super-directional beam coefficients are determined in accordance with an array topology of the multi-channel digital speech acquisition device in conjunction with a sound source direction.
由于采用最优超指向波束处理,允许多路数字采集设备阵元间距大于相关技术中的多路语音采集设备阵元间距。Due to the use of optimal super-directional beam processing, the inter-frame spacing of the multi-channel digital acquisition device is allowed to be larger than that of the multi-channel speech collection device in the related art.
在一些实施例中,根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号时,采用下述公式:In some embodiments, according to the minimum variance adaptive beam optimization model of the first array of speech signals, when the first array of speech signals is used to calculate an optimal beam output signal synthesized by the first array of speech signals, the following formula is used:
Figure PCTCN2014092217-appb-000049
Figure PCTCN2014092217-appb-000049
Figure PCTCN2014092217-appb-000050
为所述最优波束输出信号;
Figure PCTCN2014092217-appb-000051
为根据噪声信号列矢量和最优超指 向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数;
Figure PCTCN2014092217-appb-000052
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T的共轭复数;yi(k,λ)为所述第一阵列语音信号。
Figure PCTCN2014092217-appb-000050
Outputting a signal for the optimal beam;
Figure PCTCN2014092217-appb-000051
An adaptive filter parameter calculated according to a noise signal column vector and an optimal super-finger beam coefficient and a target sound source to a spatial steering vector of the digital speech acquisition device;
Figure PCTCN2014092217-appb-000052
a conjugate complex number of the optimal super-directional beam coefficients A(k)[a 1 (k), . . . , a N (k)] T ; y i (k, λ) is the first array voice signal.
在一些实施例中,所述第一阵列语音信号的最小方差自适应波束优化模型为:In some embodiments, the minimum variance adaptive beam optimization model of the first array of speech signals is:
Figure PCTCN2014092217-appb-000053
且满足
Figure PCTCN2014092217-appb-000054
Figure PCTCN2014092217-appb-000053
And satisfied
Figure PCTCN2014092217-appb-000054
其中,w(k)中的阵元与
Figure PCTCN2014092217-appb-000055
互为共轭复数;wH(k)为w(k)的共轭转制矩阵;
Figure PCTCN2014092217-appb-000056
为根据所述第一阵列语音信号估计的噪声相干矩阵;
Figure PCTCN2014092217-appb-000057
为目标声源到所述数字语音采集设备的空间导向矢量。
Where, the array elements in w(k)
Figure PCTCN2014092217-appb-000055
Conjugated complex numbers of each other; w H (k) is a conjugate transformation matrix of w(k);
Figure PCTCN2014092217-appb-000056
a noise coherence matrix estimated according to the first array of speech signals;
Figure PCTCN2014092217-appb-000057
A spatial steering vector for the target sound source to the digital speech acquisition device.
根据上述模型,根据噪声信号列矢量和最优超指向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数的共轭复数为:According to the above model, the conjugate complex number of the adaptive filter parameters calculated from the noise signal column vector and the optimal super-directional beam coefficient and the spatial guidance vector of the target sound source to the digital speech acquisition device is:
Figure PCTCN2014092217-appb-000058
Figure PCTCN2014092217-appb-000058
其中,
Figure PCTCN2014092217-appb-000059
Figure PCTCN2014092217-appb-000060
的共轭转制矩阵。
among them,
Figure PCTCN2014092217-appb-000059
for
Figure PCTCN2014092217-appb-000060
Conjugated transformation matrix.
其中,所述目标声源到数字语音采集设备的空间导向矢量采用下述公式进行计算:The space steering vector of the target sound source to the digital voice collecting device is calculated by the following formula:
Figure PCTCN2014092217-appb-000061
Figure PCTCN2014092217-appb-000061
其中,d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离,c是声速;fs是采样频率;θ为目标声源到数字语音采集设备的方位角。由于信号先经过频域超指向波束处理,因此经过频域超指向处理后的目标声源到数字语音采集设备的空间导向矢量变成:Wherein, d 1 ...... d N is a first to N digital audio capture device to the device from the digital voice collecting center of the array, c is the speed of sound; f s is the sampling frequency; [theta] is the orientation of a target sound source to the digital voice collecting device angle. Since the signal is first processed by the frequency domain super-directed beam, the spatial steering vector of the target sound source to the digital speech acquisition device after the frequency domain hyper-pointing processing becomes:
Figure PCTCN2014092217-appb-000062
Figure PCTCN2014092217-appb-000062
其中,根据所述第一阵列语音信号所估计的噪声信号为
Figure PCTCN2014092217-appb-000063
相应的,第一阵列语音信号估计的噪声相干矩阵为:
Figure PCTCN2014092217-appb-000064
其中,E表示期望效用函数;
Figure PCTCN2014092217-appb-000065
Figure PCTCN2014092217-appb-000066
的共轭转制矩阵。
Wherein the noise signal estimated according to the first array voice signal is
Figure PCTCN2014092217-appb-000063
Correspondingly, the noise coherence matrix estimated by the first array of speech signals is:
Figure PCTCN2014092217-appb-000064
Where E represents the desired utility function;
Figure PCTCN2014092217-appb-000065
for
Figure PCTCN2014092217-appb-000066
Conjugated transformation matrix.
其中,w(k)[w1(k),......,wN(k)]T Wherein, w (k) [w 1 (k), ......, w N (k)] T.
在本发明的一些实施例中,所述方法还包括图3所示的步骤:In some embodiments of the invention, the method further comprises the steps shown in Figure 3:
步骤301:对所述多个通道的阵列语音输入信号中的噪声信号阵列进行语音活动检测(Voice Activity Detect,VAD);Step 301: Perform a voice activity detection (VAD) on the noise signal array in the array voice input signals of the multiple channels;
步骤302:根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计;Step 302: Perform noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD.
步骤303:根据所述最优波束输出信号的最优功率谱估计值和所述噪声功率谱估计值对所述最优波束输出信号进行第二次增强。Step 303: Perform a second enhancement on the optimal beam output signal according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value.
上述实施例可以对第一阵列语音信号中的噪声信号进行动态时变估计,为声音的二次增强做准备。The above embodiment can perform dynamic time-varying estimation of the noise signal in the first array speech signal to prepare for secondary enhancement of the sound.
总体来说,无语音时可采用下述公式对噪声进行估计:In general, noise can be estimated using the following formula when there is no speech:
Figure PCTCN2014092217-appb-000067
Figure PCTCN2014092217-appb-000067
有语音时,可采用下述公式对噪声进行估计:When there is speech, the noise can be estimated by the following formula:
Figure PCTCN2014092217-appb-000068
Figure PCTCN2014092217-appb-000068
aR为平滑系数。a R is the smoothing factor.
其中,根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计的步骤可以包括如图4所示的过程:The step of performing noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD may include the process shown in FIG. 4:
步骤401:计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱;Step 401: Calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
步骤402:对所述有语音状态时的噪声功率谱和无语音状态时的噪声功率谱进行折中处理,得到噪声功率谱估计值。Step 402: Perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
在一些实施例中,根据所述语音活动检测VAD的结果对噪声信号阵列进行功率谱估计的步骤包括:In some embodiments, the step of power spectrum estimation of the noise signal array based on the result of the voice activity detection VAD comprises:
当处于无语音状态时,采用下述公式对噪声信号阵列功率谱进行快速平滑估计:When in the no-speech state, the power spectrum of the noise signal array is quickly and smoothly estimated using the following formula:
Figure PCTCN2014092217-appb-000069
Figure PCTCN2014092217-appb-000069
然后采用按照下述公式计算噪声功率谱门限:Then calculate the noise power spectrum threshold according to the following formula:
Figure PCTCN2014092217-appb-000070
Figure PCTCN2014092217-appb-000070
其中,L1为频点数。 Where L 1 is the number of frequency points.
当处于语音开始状态时,先采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:When in the beginning of speech, the noise spectrum array power spectrum is firstly subjected to two-pole regression smooth estimation using the following formula:
Figure PCTCN2014092217-appb-000071
Figure PCTCN2014092217-appb-000071
然后采用语音开始状态时的噪声信号阵列功率谱估计值对噪声峰值进行计算:The noise peak is then calculated using the power spectrum estimate of the noise signal array at the beginning of the speech:
Figure PCTCN2014092217-appb-000072
Figure PCTCN2014092217-appb-000072
当处于语音结束状态时,采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:When in the speech end state, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
Figure PCTCN2014092217-appb-000073
Figure PCTCN2014092217-appb-000073
然后计算折中后的噪声功率谱估计值:Then calculate the estimated noise power spectrum after the compromise:
Figure PCTCN2014092217-appb-000074
Figure PCTCN2014092217-appb-000074
其中,a1为噪声谱更新参数;aa、ad分别为平滑系数;a0为噪声谱更新参数;
Figure PCTCN2014092217-appb-000075
为噪声信号阵列功率谱的快速平滑估计值;
Figure PCTCN2014092217-appb-000076
为噪声信号阵列功率谱的双极点回归平滑估计值;
Figure PCTCN2014092217-appb-000077
为所述单通道增强处理的最优波束输出信号功率谱估计值。
Figure PCTCN2014092217-appb-000078
为噪声信号阵列的噪声功率门限估计值。
Where a 1 is a noise spectrum update parameter; a a and a d are respectively a smoothing coefficient; a 0 is a noise spectrum update parameter;
Figure PCTCN2014092217-appb-000075
a fast smooth estimate of the power spectrum of the noise signal array;
Figure PCTCN2014092217-appb-000076
Smoothing the estimated value for the two-pole regression of the power spectrum of the noise signal array;
Figure PCTCN2014092217-appb-000077
An optimal beam output signal power spectrum estimate for the single channel enhancement process.
Figure PCTCN2014092217-appb-000078
An estimate of the noise power threshold for the noise signal array.
在一些实施例中,所述最优波束输出信号的功率谱估计值采用下述公式进行计算:In some embodiments, the power spectrum estimate of the optimal beam output signal is calculated using the following formula:
Figure PCTCN2014092217-appb-000079
Figure PCTCN2014092217-appb-000079
其中,
Figure PCTCN2014092217-appb-000080
为所述最优波束输出信号的功率谱估计值;
Figure PCTCN2014092217-appb-000081
为所述最优波束输出信号;a0为噪声谱更新参数。
among them,
Figure PCTCN2014092217-appb-000080
A power spectrum estimate for the optimal beam output signal;
Figure PCTCN2014092217-appb-000081
Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
可选的,还可以将折中后的噪声功率谱估计值
Figure PCTCN2014092217-appb-000082
和最优波束输出信号的功率谱估计值
Figure PCTCN2014092217-appb-000083
输入后置滤波器进行处理,在这一实施例中,语音信号处理过程示意图参见图6。对后置滤波器处理后的信号进行逆FFT变换,然后再用叠接相加法重构增强后的时域信号流。
Optionally, you can also estimate the noise power spectrum after the compromise.
Figure PCTCN2014092217-appb-000082
And power spectrum estimates of the optimal beam output signal
Figure PCTCN2014092217-appb-000083
The post filter is input for processing. In this embodiment, a schematic diagram of the speech signal processing process is shown in FIG. The inverse processed FFT transform is performed on the signal processed by the post filter, and then the enhanced time domain signal stream is reconstructed by the splicing addition method.
对于采样频率为16kHz的声音信号采样系统,本发明实施例中各项参数可参考下列数值进行取值: For a sound signal sampling system with a sampling frequency of 16 kHz, various parameters in the embodiment of the present invention can be referred to the following values:
N=6;Lwnd=32ms;Lovlp=24ms;c=340m/s;fs=16000Hz;a0=0.8;aR=0.95;a1=0.85;aa=0.995;ad=0.85;L1=7。N = 6; L wnd = 32 ms; L ovlp = 24 ms; c = 340 m / s; f s = 16000 Hz; a 0 = 0.8; a R = 0.95; a 1 = 0.85; a a = 0.995; a d = 0.85; L 1 = 7.
在本发明的一种实施例中,先根据阵列拓扑结构和声源方向设计频域最优超指向波束,然后将原始语音阵列信号进行短时傅里叶变换,再根据原始语音阵列信号预估噪声相干矩阵,并用最优超指向波束参数对经过短时傅里叶变换的原始语音阵列信号进行计算使得语音信号得到增强,同时进行噪声相关矩阵的动态估计以更新最优自适应滤波器参数,最后用后置滤波器进一步提高信号质量。本发明实施例只需要使用少量的麦克风即可实现高品质的远距离语音拾音,对波束外的复杂噪声有着明显的抑制能力,语音失真几乎听不出来。In an embodiment of the present invention, the frequency domain optimal super-directional beam is designed according to the array topology and the sound source direction, and then the original speech array signal is subjected to short-time Fourier transform, and then estimated according to the original voice array signal. The noise coherence matrix is calculated by using the optimal super-directional beam parameters to calculate the original speech array signal after the short-time Fourier transform, so that the speech signal is enhanced, and the dynamic estimation of the noise correlation matrix is performed to update the optimal adaptive filter parameters. Finally, the post filter is used to further improve the signal quality. In the embodiment of the present invention, only a small number of microphones can be used to achieve high-quality long-distance voice pickup, and the complex noise outside the beam is obviously suppressed, and the voice distortion is hardly heard.
从上面所述可以看出,本发明实施例提供的麦克风阵列语音增强方法,能够对语音采集设备采集输入的原始语音信号中的噪声信号进行准确地计算,从而在语音增强时噪声信号能够得到有效的抑制。As can be seen from the above, the microphone array voice enhancement method provided by the embodiment of the present invention can accurately calculate the noise signal in the original voice signal input by the voice collection device, so that the noise signal can be effectively effective when the voice is enhanced. Suppression.
本发明实施例还提供一种麦克风阵列语音增强装置,结构如图5所示,包括:The embodiment of the invention further provides a microphone array voice enhancement device, which has the structure shown in FIG. 5 and includes:
第一获取模块,其设置为:获取通过多路数字语音采集设备采集输入的第一阵列语音信号;a first acquiring module, configured to: acquire a first array voice signal that is input through a multi-channel digital voice collecting device;
最优波束输出信号计算模块,其设置为:根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号;An optimal beam output signal calculation module is configured to calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal;
第一增强模块,其设置为:采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理;a first enhancement module, configured to: perform a single channel speech enhancement process by using a power spectrum estimation value of the optimal beam output signal;
所述第一阵列语音信号的最小方差自适应波束优化模型包括目标声源到所述多路数字语音采集设备的空间导向矢量。The minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
从上面所述可以看出,本发明实施例提供的麦克风阵列语音增强装置,采用最优波束输出信号计算模块处理多路数字语音采集设备采集输入的第一阵列语音信号,同时应用最小方差自适应波束优化模型,计算第一阵列语音信号的最优波束输出信号,可以对数字语音采集设备阵列阵元较多、间距较大的麦克风阵列。 It can be seen from the above that the microphone array voice enhancement device provided by the embodiment of the present invention uses the optimal beam output signal calculation module to process the first array voice signal collected by the multi-channel digital voice collection device, and applies the minimum variance adaptive method. The beam optimization model calculates an optimal beam output signal of the first array of speech signals, and can have a larger array of microphone arrays with larger array elements in the digital speech acquisition device.
仍然参照图5,在一些实施例中,所述装置还包括:Still referring to FIG. 5, in some embodiments, the apparatus further includes:
原始信号采集模块,其设置为:通过多路数字语音采集设备采集原始语音阵列信号y1(n),……yN(n);The original signal acquisition module is configured to: collect the original voice array signals y 1 (n), ... y N (n) through the multi-channel digital voice collection device;
原始信号变换模块,其设置为:对所述原始语音信号进行短时傅里叶变换得到所述原始语音阵列信号的时频表示信号y1(k,λ)……yN(k,λ);And an original signal transformation module, configured to: perform short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, λ) ... y N (k, λ) of the original speech array signal ;
最优超指向波束处理模块:用于采用最优超指向波束系数a(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
Figure PCTCN2014092217-appb-000084
i=1……N;
An optimal super-directional beam processing module for performing the time-frequency representation signal using an optimal super-directional beam coefficient a(k)[a 1 (k), . . . , a N (k)] T Frequency domain optimal super-directional beam processing to obtain a first array of speech signals
Figure PCTCN2014092217-appb-000084
i=1...N;
所述n为离散时间变量;N为阵元个数;k为频点编号;λ为短时帧编号。The n is a discrete time variable; N is the number of array elements; k is the frequency point number; λ is the short time frame number.
在一些实施例中,所述最优超指向波束系数根据所述多路数字语音采集设备的设置方式进行设定。In some embodiments, the optimal super-directional beam coefficients are set according to a manner in which the multi-channel digital voice collection device is set.
在一些实施例中,所述最优波束输出信号计算模块根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号时,采用下述公式:In some embodiments, the optimal beam output signal calculation module calculates an optimal beam synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal. When outputting a signal, the following formula is used:
Figure PCTCN2014092217-appb-000085
Figure PCTCN2014092217-appb-000085
Figure PCTCN2014092217-appb-000086
为所述最优波束输出信号;
Figure PCTCN2014092217-appb-000087
为根据噪声信号列矢量和最优超指向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数;
Figure PCTCN2014092217-appb-000088
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数;yi(k,λ)为所述第一阵列语音信号。
Figure PCTCN2014092217-appb-000086
Outputting a signal for the optimal beam;
Figure PCTCN2014092217-appb-000087
An adaptive filter parameter calculated according to a noise signal column vector and an optimal super-directional beam coefficient and a spatial guidance vector of the target sound source to the digital speech acquisition device;
Figure PCTCN2014092217-appb-000088
Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T ; y i (k, λ) is The first array of speech signals.
在一些实施例中,第一阵列语音信号的最小方差自适应波束优化模型为:In some embodiments, the minimum variance adaptive beam optimization model of the first array of speech signals is:
Figure PCTCN2014092217-appb-000089
且满足
Figure PCTCN2014092217-appb-000090
Figure PCTCN2014092217-appb-000089
And satisfied
Figure PCTCN2014092217-appb-000090
其中,w(k)中的阵元为
Figure PCTCN2014092217-appb-000091
互为共轭复数;wH(k)为w(k)的共轭转制矩阵;
Figure PCTCN2014092217-appb-000092
为根据所述第一阵列语音信号估计的噪声相干矩阵;
Figure PCTCN2014092217-appb-000093
为目标声源到所述数字语音采集设备的空间导向矢量。
Where the array element in w(k) is
Figure PCTCN2014092217-appb-000091
Conjugated complex numbers of each other; w H (k) is a conjugate transformation matrix of w(k);
Figure PCTCN2014092217-appb-000092
a noise coherence matrix estimated according to the first array of speech signals;
Figure PCTCN2014092217-appb-000093
A spatial steering vector for the target sound source to the digital speech acquisition device.
在一些实施例中,最优波束输出信号计算模块计算第一阵列语音信号所和成的最优波束输出信号时,所采用的目标声源到数字语音采集设备的空间导向矢量根据下述公式计算: In some embodiments, when the optimal beam output signal calculation module calculates the optimal beam output signal of the first array of speech signals, the spatial steering vector of the target sound source to the digital speech acquisition device is calculated according to the following formula. :
Figure PCTCN2014092217-appb-000094
Figure PCTCN2014092217-appb-000094
其中,d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离,c是声速;fs是采样频率;θ为目标声源到数字语音采集设备的方位角;
Figure PCTCN2014092217-appb-000095
为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数。
Where d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array, c is the sound velocity; f s is the sampling frequency; θ is the orientation of the target sound source to the digital speech acquisition device angle;
Figure PCTCN2014092217-appb-000095
It is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
仍然参照图5,在一些实施例中,所述装置还包括:Still referring to FIG. 5, in some embodiments, the apparatus further includes:
VAD模块,其设置为:对所述多个通道的阵列语音输入信号中的噪声信号阵列进行语音活动检测VAD;a VAD module, configured to: perform voice activity detection VAD on an array of noise signals in the array voice input signals of the plurality of channels;
噪声功率谱估计模块,其设置为:根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计;a noise power spectrum estimation module, configured to: perform noise power spectrum estimation on the noise signal array according to the result of the voice activity detection VAD;
第二增强模块,其设置为:根据所述最优波束输出信号的最优功率谱估计值和所述噪声功率谱估计值对所述最优波束输出信号进行第二次增强。And a second enhancement module, configured to: perform the second enhancement on the optimal beam output signal according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value.
仍然参照图5,在一些实施例中,所述噪声功率谱估计模块包括:Still referring to FIG. 5, in some embodiments, the noise power spectrum estimation module includes:
第一噪声功率谱计算单元,其设置为:计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱;a first noise power spectrum calculation unit configured to: calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
第二噪声功率谱计算单元,其设置为:对所述有语音状态时的噪声功率谱和无语音状态时的噪声功率谱进行折中处理,得到噪声功率谱估计值。The second noise power spectrum calculation unit is configured to perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
在一些实施例中,所述第一噪声功率谱计算单元包括:In some embodiments, the first noise power spectrum calculation unit includes:
无语音状态计算子单元,其设置为:当处于无语音状态时,采用下述公式对噪声信号阵列功率谱估计:The no-speech state calculation sub-unit is set to: when in the non-speech state, estimate the power spectrum of the noise signal array using the following formula:
Figure PCTCN2014092217-appb-000096
Figure PCTCN2014092217-appb-000096
语音开始和有语音状态计算子单元,其设置为:当处于语音开始状态和有语音状态时,采用下述公式对噪声信号阵列功率谱进行估计:The voice start and voice state calculation subunit is set to: when in the voice start state and the voice state, estimate the power spectrum of the noise signal array by using the following formula:
Figure PCTCN2014092217-appb-000097
Figure PCTCN2014092217-appb-000097
无语音状态计算子单元,其设置为:当处于语音结束状态时,采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:The no-speech state calculation sub-unit is set to: when in the speech end state, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
Figure PCTCN2014092217-appb-000098
Figure PCTCN2014092217-appb-000098
上述公式中,
Figure PCTCN2014092217-appb-000099
In the above formula,
Figure PCTCN2014092217-appb-000099
Figure PCTCN2014092217-appb-000100
Figure PCTCN2014092217-appb-000100
Figure PCTCN2014092217-appb-000101
Figure PCTCN2014092217-appb-000101
其中,a1为噪声谱更新参数;aa、ad分别为平滑系数。Where a 1 is the noise spectrum update parameter; a a and a d are the smoothing coefficients respectively.
在一些实施例中,所述最优波束输出信号的功率谱估计值采用下述公式进行计算:In some embodiments, the power spectrum estimate of the optimal beam output signal is calculated using the following formula:
Figure PCTCN2014092217-appb-000102
Figure PCTCN2014092217-appb-000102
其中,
Figure PCTCN2014092217-appb-000103
为所述最优波束输出信号的功率谱估计值;
Figure PCTCN2014092217-appb-000104
为所述最优波束输出信号;a0为噪声谱更新参数。
among them,
Figure PCTCN2014092217-appb-000103
A power spectrum estimate for the optimal beam output signal;
Figure PCTCN2014092217-appb-000104
Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
可选地,将折中后的噪声功率谱估计值
Figure PCTCN2014092217-appb-000105
和最优波束输出信号的功率谱估计值
Figure PCTCN2014092217-appb-000106
输入后置滤波器进行处理。对后置滤波器处理后的信号进行逆FFT变换,然后再用叠接相加法重构增强后的时域信号流。
Optionally, the estimated noise power spectrum after the compromise
Figure PCTCN2014092217-appb-000105
And power spectrum estimates of the optimal beam output signal
Figure PCTCN2014092217-appb-000106
Input the post filter for processing. The inverse processed FFT transform is performed on the signal processed by the post filter, and then the enhanced time domain signal stream is reconstructed by the splicing addition method.
从上面所述可以看出,本发明实施例提供的麦克风阵列语音增强装置,能够有效地对多路数字语音采集设备采集输入的第一阵列语音信号中的噪音信号进行估计和处理,有利于在后续语音增强的过程中有效滤除噪声信号,提高语音增强效果。As can be seen from the above, the microphone array voice enhancement device provided by the embodiment of the present invention can effectively estimate and process the noise signal in the first array voice signal collected by the multi-channel digital voice collection device, which is beneficial to In the process of subsequent speech enhancement, the noise signal is effectively filtered out, and the speech enhancement effect is improved.
应当理解,本说明书所描述的多个实施例仅用于说明和解释本发明,并不用于限定本发明。并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。It is to be understood that the various embodiments of the present invention are intended to illustrate and explain the invention. And in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve. Thus, the invention is not limited to any specific combination of hardware and software.
上述实施例中的各装置/功能模块/功能单元可以采用通用的计算装置来 实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。Each device/function module/functional unit in the above embodiments may use a general-purpose computing device. Implementations can be centralized on a single computing device or distributed across a network of multiple computing devices.
上述实施例中的各装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求所述的保护范围为准。Variations or substitutions are readily conceivable within the scope of the present invention by those skilled in the art and are within the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.
工业实用性Industrial applicability
本发明实施例可以对阵元间距较大的传声器阵列进行语音增强处理,并能实现高品质拾音。 The embodiment of the invention can perform voice enhancement processing on the microphone array with larger spacing of the array elements, and can realize high quality pickup.

Claims (22)

  1. 一种麦克风阵列语音增强方法,包括:A microphone array speech enhancement method includes:
    获取通过多路数字语音采集设备采集输入的第一阵列语音信号;Acquiring the first array of voice signals collected by the multi-channel digital voice collection device;
    根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用所述第一阵列语音信号计算所述第一阵列语音信号所合成的最优波束输出信号;Calculating an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal;
    采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理;Performing single channel speech enhancement processing using the power spectrum estimation value of the optimal beam output signal;
    其中,所述第一阵列语音信号的最小方差自适应波束优化模型包括目标声源到所述多路数字语音采集设备的空间导向矢量。The minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
  2. 根据权利要求1所述的方法,其中,获取通过多路数字语音采集设备采集输入的第一阵列语音信号之前,所述方法还包括:The method of claim 1, wherein the method further comprises: before acquiring the first array of voice signals input by the multiplexed digital voice collection device, the method further comprising:
    通过多路数字语音采集设备采集原始语音阵列信号y1(n),……yN(n);Acquiring original speech array signals y 1 (n), ... y N (n) through a multi-channel digital voice acquisition device;
    对所述原始语音信号进行短时傅里叶变换得到所述原始语音阵列信号的时频表示信号y1(k,λ)……yN(k,λ);Performing a short time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, λ) ... y N (k, λ) of the original speech array signal;
    采用最优超指向波束系数A(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
    Figure PCTCN2014092217-appb-100001
    i=1……N;
    The optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T is used to perform frequency domain optimal super-directional beam processing on the time-frequency representative signal, and obtained First array of speech signals
    Figure PCTCN2014092217-appb-100001
    i=1...N;
    所述n为离散时间变量;N为阵元个数;k为频点编号;λ为短时帧编号。The n is a discrete time variable; N is the number of array elements; k is the frequency point number; λ is the short time frame number.
  3. 根据权利要求2所述的方法,其中,所述最优超指向波束系数根据所述多路数字语音采集设备的设置方式进行设定。The method of claim 2 wherein said optimal super-directional beam coefficients are set according to a manner in which said plurality of digital voice capture devices are arranged.
  4. 根据权利要求1所述的方法,其中,根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号的步骤中,采用下述公式计算所述最优波束输出信号:The method of claim 1, wherein the step of calculating an optimal beam output signal synthesized by the first array of speech signals using the first array of speech signals is performed according to a minimum variance adaptive beam optimization model of the first array of speech signals The optimal beam output signal is calculated using the following formula:
    Figure PCTCN2014092217-appb-100002
    Figure PCTCN2014092217-appb-100002
    Figure PCTCN2014092217-appb-100003
    为所述最优波束输出信号;
    Figure PCTCN2014092217-appb-100004
    为根据噪声信号列矢量和最优超指向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数;
    Figure PCTCN2014092217-appb-100005
    为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数;yi(k,λ)为所述第一阵列语音信号。
    Figure PCTCN2014092217-appb-100003
    Outputting a signal for the optimal beam;
    Figure PCTCN2014092217-appb-100004
    An adaptive filter parameter calculated according to a noise signal column vector and an optimal super-directional beam coefficient and a spatial guidance vector of the target sound source to the digital speech acquisition device;
    Figure PCTCN2014092217-appb-100005
    Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T ; y i (k, λ) is The first array of speech signals.
  5. 根据权利要求3所述的方法,其中,所述第一阵列语音信号的最小方差自适应波束优化模型为:The method of claim 3 wherein the minimum variance adaptive beam optimization model of the first array of speech signals is:
    Figure PCTCN2014092217-appb-100006
    且满足
    Figure PCTCN2014092217-appb-100007
    Figure PCTCN2014092217-appb-100006
    And satisfied
    Figure PCTCN2014092217-appb-100007
    其中,w(k)中的阵元与
    Figure PCTCN2014092217-appb-100008
    互为共轭复数;wH(k)为w(k)的共轭转制矩阵;
    Figure PCTCN2014092217-appb-100009
    为根据所述第一阵列语音信号估计的噪声相干矩阵;
    Figure PCTCN2014092217-appb-100010
    为目标声源到所述数字语音采集设备的空间导向矢量。
    Where, the array elements in w(k)
    Figure PCTCN2014092217-appb-100008
    Conjugated complex numbers of each other; w H (k) is a conjugate transformation matrix of w(k);
    Figure PCTCN2014092217-appb-100009
    a noise coherence matrix estimated according to the first array of speech signals;
    Figure PCTCN2014092217-appb-100010
    A spatial steering vector for the target sound source to the digital speech acquisition device.
  6. 权利要求5所述的方法,其中,所述目标声源到数字语音采集设备的空间导向矢量根据下述公式计算:The method of claim 5 wherein the spatial steering vector of the target sound source to the digital speech acquisition device is calculated according to the following formula:
    Figure PCTCN2014092217-appb-100011
    Figure PCTCN2014092217-appb-100011
    其中,d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离,c是声速;fs是采样频率;θ为目标声源到数字语音采集设备的方位角;
    Figure PCTCN2014092217-appb-100012
    为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数。
    Where d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array, c is the sound velocity; f s is the sampling frequency; θ is the orientation of the target sound source to the digital speech acquisition device angle;
    Figure PCTCN2014092217-appb-100012
    It is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
  7. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1 wherein the method further comprises:
    对所述多个通道的阵列语音输入信号中的噪声信号阵列进行语音活动检测VAD;Performing a voice activity detection VAD on the noise signal array in the array voice input signals of the plurality of channels;
    根据所述VAD的结果对噪声信号阵列进行噪声功率谱估计;Performing noise power spectrum estimation on the noise signal array according to the result of the VAD;
    根据所述最优波束输出信号的最优功率谱估计值和所述噪声功率谱估计值对所述最优波束输出信号进行第二次增强。And performing, according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value, the second enhancement of the optimal beam output signal.
  8. 根据权利要求7所述的方法,其中,根据所述语音活动检测VAD的结果对噪声信号阵列进行噪声功率谱估计的步骤包括:The method of claim 7, wherein the step of estimating the noise power spectrum of the noise signal array based on the result of the voice activity detection VAD comprises:
    计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱;Calculating a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
    对所述有语音状态时的噪声功率谱和无语音状态时的噪声功率谱进行折中处理,得到噪声功率谱估计值。The noise power spectrum in the speech state and the noise power spectrum in the non-speech state are traded to obtain a noise power spectrum estimation value.
  9. 根据权利要求8所述的方法,其中,计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱的步骤包括:The method according to claim 8, wherein the calculating the noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state comprises:
    当处于无语音状态时,采用下述公式对噪声信号阵列功率谱估计: When in the no-speech state, the power spectrum of the noise signal array is estimated using the following formula:
    Figure PCTCN2014092217-appb-100013
    Figure PCTCN2014092217-appb-100013
    当处于语音开始状态和有语音状态时,采用下述公式对噪声信号阵列功率谱进行估计:When in the voice start state and the voice state, the power spectrum of the noise signal array is estimated by the following formula:
    Figure PCTCN2014092217-appb-100014
    Figure PCTCN2014092217-appb-100014
    处于语音结束状态时,采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:At the end of speech, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
    Figure PCTCN2014092217-appb-100015
    Figure PCTCN2014092217-appb-100015
    上述公式中,
    Figure PCTCN2014092217-appb-100016
    In the above formula,
    Figure PCTCN2014092217-appb-100016
    Figure PCTCN2014092217-appb-100017
    Figure PCTCN2014092217-appb-100017
    Figure PCTCN2014092217-appb-100018
    Figure PCTCN2014092217-appb-100018
    其中,a1为噪声谱更新参数;aa、ad分别为平滑系数。Where a 1 is the noise spectrum update parameter; a a and a d are the smoothing coefficients respectively.
  10. 根据权利要求1所述的方法,其中,所述最优波束输出信号的功率谱估计值采用下述公式进行计算:The method of claim 1 wherein the power spectrum estimate of the optimal beam output signal is calculated using the following formula:
    Figure PCTCN2014092217-appb-100019
    Figure PCTCN2014092217-appb-100019
    其中,
    Figure PCTCN2014092217-appb-100020
    为所述最优波束输出信号的功率谱估计值;
    Figure PCTCN2014092217-appb-100021
    为所述最优波束输出信号;a0为噪声谱更新参数。
    among them,
    Figure PCTCN2014092217-appb-100020
    A power spectrum estimate for the optimal beam output signal;
    Figure PCTCN2014092217-appb-100021
    Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
  11. 一种麦克风阵列语音增强装置,包括:A microphone array voice enhancement device includes:
    第一获取模块,其设置为:获取通过多路数字语音采集设备采集输入的第一阵列语音信号;a first acquiring module, configured to: acquire a first array voice signal that is input through a multi-channel digital voice collecting device;
    最优波束输出信号计算模块,其设置为:根据所述第一阵列语音信号的最小方差自适应波束优化模型,采用第一阵列语音信号计算第一阵列语音信号所合成的最优波束输出信号;以及An optimal beam output signal calculation module is configured to calculate an optimal beam output signal synthesized by the first array voice signal by using the first array voice signal according to the minimum variance adaptive beam optimization model of the first array voice signal; as well as
    第一增强模块,其设置为:采用所述最优波束输出信号的功率谱估计值进行单通道语音增强处理;a first enhancement module, configured to: perform a single channel speech enhancement process by using a power spectrum estimation value of the optimal beam output signal;
    其中,所述第一阵列语音信号的最小方差自适应波束优化模型包括目标声源到所述多路数字语音采集设备的空间导向矢量。 The minimum variance adaptive beam optimization model of the first array of speech signals includes a spatial steering vector of the target sound source to the multi-channel digital speech acquisition device.
  12. 根据权利要求11所述的装置,所述装置还包括:The apparatus of claim 11 further comprising:
    原始信号采集模块,其设置为:通过多路数字语音采集设备采集原始语音阵列信号y1(n),……yN(n);The original signal acquisition module is configured to: collect the original voice array signals y 1 (n), ... y N (n) through the multi-channel digital voice collection device;
    原始信号变换模块,其设置为:对所述原始语音信号进行短时傅里叶变换得到所述原始语音阵列信号的时频表示信号y1(k,λ)……yN(k,λ);以及And an original signal transformation module, configured to: perform short-time Fourier transform on the original speech signal to obtain a time-frequency representation signal y 1 (k, λ) ... y N (k, λ) of the original speech array signal ;as well as
    最优超指向波束处理模块,其设置为:采用最优超指向波束系数A(k)[a1(k),......,aN(k)]T对所述时频表示信号进行频域最优超指向波束处理,得到第一阵列语音信号
    Figure PCTCN2014092217-appb-100022
    i=1……N;
    An optimal super-directional beam processing module, configured to: represent the time-frequency representation using an optimal super-directional beam coefficient A(k)[a 1 (k), . . . , a N (k)] T The signal is subjected to frequency domain optimal super-directional beam processing to obtain a first array of speech signals
    Figure PCTCN2014092217-appb-100022
    i=1...N;
    所述n为离散时间变量;N为阵元个数;k为频点编号;λ为短时帧编号。The n is a discrete time variable; N is the number of array elements; k is the frequency point number; λ is the short time frame number.
  13. 根据权利要求12所述的装置,其中,所述最优超指向波束系数根据所述多路数字语音采集设备的设置方式进行设定。The apparatus of claim 12, wherein the optimal super-directional beam coefficients are set according to a manner in which the multi-channel digital voice collection device is set.
  14. 根据权利要求11所述的装置,其中,所述最优波束输出信号计算模块是设置为采用下属公式计算第一阵列语音信号所合成的最优波束输出信号:The apparatus of claim 11, wherein the optimal beam output signal calculation module is an optimal beam output signal that is configured to calculate a first array of speech signals using a subordinate formula:
    Figure PCTCN2014092217-appb-100023
    Figure PCTCN2014092217-appb-100023
    Figure PCTCN2014092217-appb-100024
    为所述最优波束输出信号;
    Figure PCTCN2014092217-appb-100025
    为根据噪声信号列矢量和最优超指向波束系数和目标声源到数字语音采集设备的空间导向矢量计算的自适应滤波器参数;
    Figure PCTCN2014092217-appb-100026
    为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数;yi(k,λ)为所述第一阵列语音信号。
    Figure PCTCN2014092217-appb-100024
    Outputting a signal for the optimal beam;
    Figure PCTCN2014092217-appb-100025
    An adaptive filter parameter calculated according to a noise signal column vector and an optimal super-directional beam coefficient and a spatial guidance vector of the target sound source to the digital speech acquisition device;
    Figure PCTCN2014092217-appb-100026
    Is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T ; y i (k, λ) is The first array of speech signals.
  15. 根据权利要求13所述的装置,其中,第一阵列语音信号的最小方差自适应波束优化模型为:The apparatus of claim 13 wherein the minimum variance adaptive beam optimization model of the first array of speech signals is:
    Figure PCTCN2014092217-appb-100027
    且满足
    Figure PCTCN2014092217-appb-100028
    Figure PCTCN2014092217-appb-100027
    And satisfied
    Figure PCTCN2014092217-appb-100028
    其中,w(k)中的阵元与
    Figure PCTCN2014092217-appb-100029
    互为共轭复数;wH(k)为w(k)的共轭转制矩阵;
    Figure PCTCN2014092217-appb-100030
    为根据所述第一阵列语音信号估计的噪声相干矩阵;
    Figure PCTCN2014092217-appb-100031
    为目标声源到所述数字语音采集设备的空间导向矢量。
    Where, the array elements in w(k)
    Figure PCTCN2014092217-appb-100029
    Conjugated complex numbers of each other; w H (k) is a conjugate transformation matrix of w(k);
    Figure PCTCN2014092217-appb-100030
    a noise coherence matrix estimated according to the first array of speech signals;
    Figure PCTCN2014092217-appb-100031
    A spatial steering vector for the target sound source to the digital speech acquisition device.
  16. 根据权利要求15所述的装置,其中,最优波束输出信号计算模块是设置为根据下述公式计算所采用的目标声源到数字语音采集设备的空间导向矢量: The apparatus of claim 15 wherein the optimal beam output signal calculation module is arranged to calculate a spatial steering vector of the target sound source to the digital speech acquisition device employed in accordance with the following formula:
    Figure PCTCN2014092217-appb-100032
    Figure PCTCN2014092217-appb-100032
    其中,d1……dN是第1到N个数字语音采集设备到数字语音采集设备阵列中心的距离,c是声速;fs是采样频率;θ为目标声源到数字语音采集设备的方位角;
    Figure PCTCN2014092217-appb-100033
    为最优超指向波束系数A(k)[a1(k),......,aN(k)]T中阵元ai的共轭复数。
    Where d 1 ... d N is the distance from the first to N digital speech collection devices to the center of the digital speech collection device array, c is the sound velocity; f s is the sampling frequency; θ is the orientation of the target sound source to the digital speech acquisition device angle;
    Figure PCTCN2014092217-appb-100033
    It is the conjugate complex number of the array element a i in the optimal super-directed beam coefficient A(k)[a 1 (k), ..., a N (k)] T .
  17. 根据权利要求11所述的装置,还包括:The apparatus of claim 11 further comprising:
    VAD模块,其设置为:对所述多个通道的阵列语音输入信号中的噪声信号阵列进行VAD;a VAD module, configured to: perform VAD on the array of noise signals in the array voice input signals of the multiple channels;
    噪声功率谱估计模块,其设置为:根据所述VAD的结果对噪声信号阵列进行噪声功率谱估计;以及a noise power spectrum estimation module configured to: perform noise power spectrum estimation on the noise signal array according to the result of the VAD;
    第二增强模块,其设置为:根据所述最优波束输出信号的最优功率谱估计值和所述噪声功率谱估计值对所述最优波束输出信号进行第二次增强。And a second enhancement module, configured to: perform the second enhancement on the optimal beam output signal according to the optimal power spectrum estimation value of the optimal beam output signal and the noise power spectrum estimation value.
  18. 根据权利要求17所述的装置,其中,所述噪声功率谱估计模块包括:The apparatus of claim 17, wherein the noise power spectrum estimation module comprises:
    第一噪声功率谱计算单元,其设置为:计算有语音状态、无语音状态、语音开始状态、语音结束状态时的噪声功率谱;以及a first noise power spectrum calculation unit configured to: calculate a noise power spectrum when there is a voice state, a voiceless state, a voice start state, and a voice end state;
    第二噪声功率谱计算单元,其设置为:对所述有语音状态时的噪声功率谱和无语音状态时的噪声功率谱进行折中处理,得到噪声功率谱估计值。The second noise power spectrum calculation unit is configured to perform a compromise process on the noise power spectrum in the voice state and the noise power spectrum in the voiceless state to obtain a noise power spectrum estimation value.
  19. 根据权利要求18所述的装置,其中,所述第一噪声功率谱计算单元包括:The apparatus of claim 18, wherein the first noise power spectrum calculation unit comprises:
    无语音状态计算子单元,其设置为:当处于无语音状态时,采用下述公式对噪声信号阵列功率谱估计:The no-speech state calculation sub-unit is set to: when in the non-speech state, estimate the power spectrum of the noise signal array using the following formula:
    Figure PCTCN2014092217-appb-100034
    Figure PCTCN2014092217-appb-100034
    语音开始和有语音状态计算子单元,其设置为:当处于语音开始状态和有语音状态时,采用下述公式对噪声信号阵列功率谱进行估计:The voice start and voice state calculation subunit is set to: when in the voice start state and the voice state, estimate the power spectrum of the noise signal array by using the following formula:
    Figure PCTCN2014092217-appb-100035
    Figure PCTCN2014092217-appb-100035
    无语音状态计算子单元,其设置为:当处于语音结束状态时,采用下述公式对噪声信号阵列功率谱进行双极点回归平滑估计:The no-speech state calculation sub-unit is set to: when in the speech end state, the noise spectrum array power spectrum is subjected to two-pole regression smooth estimation using the following formula:
    Figure PCTCN2014092217-appb-100036
    Figure PCTCN2014092217-appb-100036
    上述公式中,
    Figure PCTCN2014092217-appb-100037
    In the above formula,
    Figure PCTCN2014092217-appb-100037
    Figure PCTCN2014092217-appb-100038
    Figure PCTCN2014092217-appb-100038
    Figure PCTCN2014092217-appb-100039
    Figure PCTCN2014092217-appb-100039
    其中,a1为噪声谱更新参数;aa、ad分别为平滑系数。Where a 1 is the noise spectrum update parameter; a a and a d are the smoothing coefficients respectively.
  20. 根据权利要求11所述的装置,其中,所述最优波束输出信号的功率谱估计值采用下述公式进行计算:The apparatus of claim 11 wherein the power spectrum estimate of the optimal beam output signal is calculated using the following formula:
    Figure PCTCN2014092217-appb-100040
    Figure PCTCN2014092217-appb-100040
    其中,
    Figure PCTCN2014092217-appb-100041
    为所述最优波束输出信号的功率谱估计值;
    Figure PCTCN2014092217-appb-100042
    为所述最优波束输出信号;a0为噪声谱更新参数。
    among them,
    Figure PCTCN2014092217-appb-100041
    A power spectrum estimate for the optimal beam output signal;
    Figure PCTCN2014092217-appb-100042
    Outputting a signal for the optimal beam; a 0 is a noise spectrum update parameter.
  21. 一种计算机程序,包括程序指令,当该程序指令被计算机执行时,使得该计算机可执行权利要求1-11任一项所述的方法。A computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-11.
  22. 一种载有权利要求21所述计算机程序的计算机可读存储介质。 A computer readable storage medium carrying the computer program of claim 21.
PCT/CN2014/092217 2014-06-27 2014-11-25 Microphone array speech enhancement method and device WO2015196729A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410305776.4 2014-06-27
CN201410305776.4A CN105244036A (en) 2014-06-27 2014-06-27 Microphone speech enhancement method and microphone speech enhancement device

Publications (1)

Publication Number Publication Date
WO2015196729A1 true WO2015196729A1 (en) 2015-12-30

Family

ID=54936653

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/092217 WO2015196729A1 (en) 2014-06-27 2014-11-25 Microphone array speech enhancement method and device

Country Status (2)

Country Link
CN (1) CN105244036A (en)
WO (1) WO2015196729A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106125056A (en) * 2016-06-13 2016-11-16 西安电子科技大学 Minimum variance Power estimation method based on modifying factor
CN106371079A (en) * 2016-08-19 2017-02-01 西安电子科技大学 Spectrum sharpening based multi-signal classification spectrum estimation method
CN109389991A (en) * 2018-10-24 2019-02-26 中国科学院上海微系统与信息技术研究所 A kind of signal enhancing method based on microphone array
CN109884591A (en) * 2019-02-25 2019-06-14 南京理工大学 A kind of multi-rotor unmanned aerial vehicle acoustical signal Enhancement Method based on microphone array
CN110415720A (en) * 2019-07-11 2019-11-05 湖北工业大学 The constant Beamforming Method of the super directional frequency of quaternary difference microphone array
CN110444220A (en) * 2019-08-01 2019-11-12 浙江大学 A kind of multi-modal remote speech cognitive method and device
CN111866665A (en) * 2020-07-22 2020-10-30 海尔优家智能科技(北京)有限公司 Microphone array beam forming method and device
CN111880146A (en) * 2020-06-30 2020-11-03 海尔优家智能科技(北京)有限公司 Sound source orientation method and device and storage medium
CN112216295A (en) * 2019-06-25 2021-01-12 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN112712818A (en) * 2020-12-29 2021-04-27 苏州科达科技股份有限公司 Voice enhancement method, device and equipment
CN113030862A (en) * 2021-03-12 2021-06-25 中国科学院声学研究所 Multi-channel speech enhancement method and device
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113223552A (en) * 2021-04-28 2021-08-06 锐迪科微电子(上海)有限公司 Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program
CN113329288A (en) * 2021-04-29 2021-08-31 开放智能技术(南京)有限公司 Bluetooth headset noise reduction method based on notch technology
CN113628634A (en) * 2021-08-20 2021-11-09 随锐科技集团股份有限公司 Real-time voice separation method and device guided by pointing information
CN114913868A (en) * 2022-05-17 2022-08-16 电子科技大学 FPGA-based acoustic array directional pickup method
CN118376980A (en) * 2024-06-21 2024-07-23 之江实验室 Multi-sound source positioning method and device based on subarray signal enhancement

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107018470B (en) * 2016-01-28 2019-02-26 讯飞智元信息科技有限公司 A kind of voice recording method and system based on annular microphone array
CN105869651B (en) * 2016-03-23 2019-05-31 北京大学深圳研究生院 Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
CN106448693B (en) * 2016-09-05 2019-11-29 华为技术有限公司 A kind of audio signal processing method and device
CN107785029B (en) 2017-10-23 2021-01-29 科大讯飞股份有限公司 Target voice detection method and device
CN108417208B (en) * 2018-03-26 2020-09-11 宇龙计算机通信科技(深圳)有限公司 Voice input method and device
CN110890100B (en) * 2018-09-10 2022-11-18 杭州海康威视数字技术股份有限公司 Voice enhancement method, multimedia data acquisition method, multimedia data playing method, device and monitoring system
CN109346100A (en) * 2018-10-25 2019-02-15 烟台市奥境数字科技有限公司 A kind of network transfer method of Digital Media interactive instructional system
CN110970046B (en) * 2019-11-29 2022-03-11 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN113645542B (en) * 2020-05-11 2023-05-02 阿里巴巴集团控股有限公司 Voice signal processing method and system and audio and video communication equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1523573A (en) * 2003-09-12 2004-08-25 中国科学院声学研究所 A multichannel speech enhancement method using postfilter
CN101238511A (en) * 2005-08-11 2008-08-06 旭化成株式会社 Sound source separating device, speech recognizing device, portable telephone, and sound source separating method, and program
CN101447190A (en) * 2008-06-25 2009-06-03 北京大学深圳研究生院 Voice enhancement method employing combination of nesting-subarray-based post filtering and spectrum-subtraction
WO2009151578A2 (en) * 2008-06-09 2009-12-17 The Board Of Trustees Of The University Of Illinois Method and apparatus for blind signal recovery in noisy, reverberant environments
CN101763858A (en) * 2009-10-19 2010-06-30 瑞声声学科技(深圳)有限公司 Method for processing double-microphone signal
US7783478B2 (en) * 2007-01-03 2010-08-24 Alexander Goldin Two stage frequency subband decomposition
CN102509552A (en) * 2011-10-21 2012-06-20 浙江大学 Method for enhancing microphone array voice based on combined inhibition
CN103235959A (en) * 2013-04-01 2013-08-07 深圳市远望谷信息技术股份有限公司 Method for outputting antenna array in reader-writer to form digital wave beams
JP2013201525A (en) * 2012-03-23 2013-10-03 Mitsubishi Electric Corp Beam forming processing unit

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004004297A2 (en) * 2002-07-01 2004-01-08 Koninklijke Philips Electronics N.V. Stationary spectral power dependent audio enhancement system
CN101778322B (en) * 2009-12-07 2013-09-25 中国科学院自动化研究所 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic
EP2444967A1 (en) * 2010-10-25 2012-04-25 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Echo suppression comprising modeling of late reverberation components
CN102930870B (en) * 2012-09-27 2014-04-09 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN103856866B (en) * 2012-12-04 2019-11-05 西北工业大学 Low noise differential microphone array
CN103308889B (en) * 2013-05-13 2014-07-02 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1523573A (en) * 2003-09-12 2004-08-25 中国科学院声学研究所 A multichannel speech enhancement method using postfilter
CN101238511A (en) * 2005-08-11 2008-08-06 旭化成株式会社 Sound source separating device, speech recognizing device, portable telephone, and sound source separating method, and program
US7783478B2 (en) * 2007-01-03 2010-08-24 Alexander Goldin Two stage frequency subband decomposition
WO2009151578A2 (en) * 2008-06-09 2009-12-17 The Board Of Trustees Of The University Of Illinois Method and apparatus for blind signal recovery in noisy, reverberant environments
CN101447190A (en) * 2008-06-25 2009-06-03 北京大学深圳研究生院 Voice enhancement method employing combination of nesting-subarray-based post filtering and spectrum-subtraction
CN101763858A (en) * 2009-10-19 2010-06-30 瑞声声学科技(深圳)有限公司 Method for processing double-microphone signal
CN102509552A (en) * 2011-10-21 2012-06-20 浙江大学 Method for enhancing microphone array voice based on combined inhibition
JP2013201525A (en) * 2012-03-23 2013-10-03 Mitsubishi Electric Corp Beam forming processing unit
CN103235959A (en) * 2013-04-01 2013-08-07 深圳市远望谷信息技术股份有限公司 Method for outputting antenna array in reader-writer to form digital wave beams

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAN, ZHAOLI ET AL.: "The Modified Post-Filter Beamforming for Speech Enhancement", JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, vol. 28, no. 12, 31 December 2006 (2006-12-31), pages 2269 - 2272, ISSN: 1009-5896 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106125056B (en) * 2016-06-13 2018-07-06 西安电子科技大学 Minimum variance Power estimation method based on modifying factor
CN106125056A (en) * 2016-06-13 2016-11-16 西安电子科技大学 Minimum variance Power estimation method based on modifying factor
CN106371079A (en) * 2016-08-19 2017-02-01 西安电子科技大学 Spectrum sharpening based multi-signal classification spectrum estimation method
CN109389991A (en) * 2018-10-24 2019-02-26 中国科学院上海微系统与信息技术研究所 A kind of signal enhancing method based on microphone array
CN109884591A (en) * 2019-02-25 2019-06-14 南京理工大学 A kind of multi-rotor unmanned aerial vehicle acoustical signal Enhancement Method based on microphone array
CN112216295A (en) * 2019-06-25 2021-01-12 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN112216295B (en) * 2019-06-25 2024-04-26 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN110415720A (en) * 2019-07-11 2019-11-05 湖北工业大学 The constant Beamforming Method of the super directional frequency of quaternary difference microphone array
CN110415720B (en) * 2019-07-11 2020-05-12 湖北工业大学 Quaternary differential microphone array super-directivity frequency-invariant beam forming method
CN110444220B (en) * 2019-08-01 2023-02-10 浙江大学 Multi-mode remote voice perception method and device
CN110444220A (en) * 2019-08-01 2019-11-12 浙江大学 A kind of multi-modal remote speech cognitive method and device
CN111880146A (en) * 2020-06-30 2020-11-03 海尔优家智能科技(北京)有限公司 Sound source orientation method and device and storage medium
CN111880146B (en) * 2020-06-30 2023-08-18 海尔优家智能科技(北京)有限公司 Sound source orientation method and device and storage medium
CN111866665A (en) * 2020-07-22 2020-10-30 海尔优家智能科技(北京)有限公司 Microphone array beam forming method and device
CN112712818A (en) * 2020-12-29 2021-04-27 苏州科达科技股份有限公司 Voice enhancement method, device and equipment
CN113030862A (en) * 2021-03-12 2021-06-25 中国科学院声学研究所 Multi-channel speech enhancement method and device
CN113223552A (en) * 2021-04-28 2021-08-06 锐迪科微电子(上海)有限公司 Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program
CN113329288A (en) * 2021-04-29 2021-08-31 开放智能技术(南京)有限公司 Bluetooth headset noise reduction method based on notch technology
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113628634A (en) * 2021-08-20 2021-11-09 随锐科技集团股份有限公司 Real-time voice separation method and device guided by pointing information
CN113628634B (en) * 2021-08-20 2023-10-03 随锐科技集团股份有限公司 Real-time voice separation method and device guided by directional information
CN114913868A (en) * 2022-05-17 2022-08-16 电子科技大学 FPGA-based acoustic array directional pickup method
CN118376980A (en) * 2024-06-21 2024-07-23 之江实验室 Multi-sound source positioning method and device based on subarray signal enhancement

Also Published As

Publication number Publication date
CN105244036A (en) 2016-01-13

Similar Documents

Publication Publication Date Title
WO2015196729A1 (en) Microphone array speech enhancement method and device
CN106782590B (en) Microphone array beam forming method based on reverberation environment
CN101593522B (en) Method and equipment for full frequency domain digital hearing aid
US8654990B2 (en) Multiple microphone based directional sound filter
Kjems et al. Maximum likelihood based noise covariance matrix estimation for multi-microphone speech enhancement
CN109285557B (en) Directional pickup method and device and electronic equipment
JP2019191558A (en) Method and apparatus for amplifying speech
CN110610718B (en) Method and device for extracting expected sound source voice signal
JP2009522942A (en) System and method using level differences between microphones for speech improvement
JP6604331B2 (en) Audio processing apparatus and method, and program
JP6225245B2 (en) Signal processing apparatus, method and program
JP6840302B2 (en) Information processing equipment, programs and information processing methods
CN112363112A (en) Sound source positioning method and device based on linear microphone array
CN110111802B (en) Kalman filtering-based adaptive dereverberation method
Jin et al. Multi-channel noise reduction for hands-free voice communication on mobile phones
CN112802490B (en) Beam forming method and device based on microphone array
CN109901114A (en) A kind of delay time estimation method suitable for auditory localization
US20130253923A1 (en) Multichannel enhancement system for preserving spatial cues
Zhu et al. Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming
Miyazaki et al. Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction
CN113948101A (en) Noise suppression method and device based on spatial discrimination detection
CN111863017B (en) In-vehicle directional pickup method based on double microphone arrays and related device
JP6263890B2 (en) Audio signal processing apparatus and program
Kako et al. Wiener filter design by estimating sensitivities between distributed asynchronous microphones and sound sources
JP6221463B2 (en) Audio signal processing apparatus and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14895821

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14895821

Country of ref document: EP

Kind code of ref document: A1