TWI465121B

TWI465121B - System and method for utilizing omni-directional microphones for speech enhancement

Info

Publication number: TWI465121B
Application number: TW096146144A
Authority: TW
Inventors: Carlos Avendano
Original assignee: Audience Inc
Priority date: 2007-01-29
Filing date: 2007-12-04
Publication date: 2014-12-11
Also published as: TW200835374A

Description

System and method for improving call using omnidirectional microphone

本發明大體上係關於音訊處理且更特定而言係關於使用麥克風間位準差異之通話改善。The present invention is generally directed to audio processing and, more particularly, to call improvements using inter-microphone level differences.

目前，存在用於在惡劣環境中減少背景雜訊及改善通話之許多方法。此類方法中之一者為在音訊裝置上使用兩個或兩個以上之麥克風。此等麥克風處於指定位置中且允許音訊裝置判定麥克風信號之間的位準差異。舉例而言，歸因於麥克風之間的空間差異，可利用來自通話源之信號到達麥克風的時間之差異來定位通話源。一旦定位通話源，則可對信號進行空間濾波以抑制發端於不同方向之雜訊。Currently, there are many ways to reduce background noise and improve calls in harsh environments. One such method is to use two or more microphones on an audio device. These microphones are in a designated position and allow the audio device to determine the level difference between the microphone signals. For example, due to the spatial difference between the microphones, the difference in time from the source of the call to the microphone can be utilized to locate the source of the call. Once the source of the call is located, the signal can be spatially filtered to suppress noise that originates in different directions.

為了利用兩個全方向麥克風之間的位準差異，通話源需要較接近於該等麥克風中之一者。亦即，為了獲得顯著位準差異，自該源至第一麥克風之距離需要短於自該源至第二麥克風之距離。同樣地，通話源必須維持相對接近於該等麥克風，尤其係在麥克風如行動電話應用所可能要求般非常接近的情況下。In order to take advantage of the level difference between the two omnidirectional microphones, the source of the call needs to be closer to one of the microphones. That is, in order to obtain a significant level difference, the distance from the source to the first microphone needs to be shorter than the distance from the source to the second microphone. Likewise, the source of the call must remain relatively close to the microphones, especially if the microphones are as close as possible to a mobile phone application.

可藉由使用定向麥克風獲得對距離約束之解決方案。使用定向麥克風允許使用者以窄位準間差異(ILD)波束將兩個麥克風之間的有效位準差異擴展到較大範圍。此對於諸如即按即說(PTT)或視訊電話(其中通話源不如(例如)電話應用般非常接近於麥克風)之應用而言可為所要的。A solution to distance constraints can be obtained by using a directional microphone. The use of a directional microphone allows the user to extend the effective level difference between the two microphones to a larger range with a narrow inter-level difference (ILD) beam. This may be desirable for applications such as push-to-talk (PTT) or video telephony where the source of the call is not as close to the microphone as, for example, a telephony application.

不利地，定向麥克風具有許多實體缺陷。通常，定向麥克風尺寸大且並不很好地適合小型電話或蜂巢式電話。另外，定向麥克風難以安裝，因為其需要端口以便使聲音自複數個方向到達。製造上之些微變化可導致失配，從而導致更昂貴之製造及生產成本。Disadvantageously, directional microphones have many physical defects. Usually, directional wheat The size of the wind is large and not well suited for small phones or cellular phones. In addition, directional microphones are difficult to install because they require ports to allow sound to arrive in multiple directions. Minor variations in manufacturing can result in mismatches, resulting in more expensive manufacturing and production costs.

因此，需要在通話改善系統中利用定向麥克風之特性而並無使用定向麥克風本身之劣勢。Therefore, there is a need to utilize the characteristics of a directional microphone in a call improvement system without the disadvantage of using a directional microphone itself.

本發明之實施例克服或大體上減輕與雜訊抑制及通話改善相關聯之先前問題。大體上，提供用於利用麥克風間位準差異(ILD)使雜訊減弱及改善通話的系統及方法。在示範性實施例中，ILD係基於一對全方向麥克風之能量位準差異。Embodiments of the present invention overcome or substantially alleviate the previous problems associated with noise suppression and call improvement. In general, systems and methods are provided for attenuating noise and improving calls using inter-microphone level difference (ILD). In an exemplary embodiment, the ILD is based on an energy level difference of a pair of omnidirectional microphones.

本發明之示範性實施例使用非線性處理來組合來自該對全方向麥克風之聲信號的分量以便獲得ILD。在示範性實施例中，一主要聲信號由一主要麥克風接收，且一次要聲信號由一次要麥克風(例如，全方向麥克風)接收。將該主要聲信號及該次要聲信號轉換成主要電信號及次要電信號以供處理。An exemplary embodiment of the present invention uses nonlinear processing to combine components of acoustic signals from the pair of omnidirectional microphones to obtain ILD. In an exemplary embodiment, a primary acoustic signal is received by a primary microphone and the primary acoustic signal is received by a primary microphone (eg, an omnidirectional microphone). The primary acoustic signal and the secondary acoustic signal are converted into primary electrical signals and secondary electrical signals for processing.

差分麥克風陣列(DMA)模組處理該主要電信號及該次要電信號以判定一心形主要信號及一心形次要信號。在示範性實施例中，由延遲節點延遲主要電信號及次要電信號。接著藉由採用在主要電信號與經延遲之次要電信號之間的差異來判定心形主要信號，而藉由採用在次要電信號與經延遲之主要電信號之間的差異來判定心形次要信號。在各種實施例中，藉由一增益來調節經延遲之主要電信號及經延遲之次要電信號。該增益可為在主要聲信號之量值與次要聲信號之量值之間的比率。A differential microphone array (DMA) module processes the primary electrical signal and the secondary electrical signal to determine a heart shaped primary signal and a cardiac shaped secondary signal. In an exemplary embodiment, the primary electrical signal and the secondary electrical signal are delayed by the delay node. The heart shaped primary signal is then determined by employing the difference between the primary electrical signal and the delayed secondary electrical signal, and the heart is determined by employing the difference between the secondary electrical signal and the delayed primary electrical signal. Shape secondary signal. In each In one embodiment, the delayed primary electrical signal and the delayed secondary electrical signal are adjusted by a gain. The gain can be a ratio between the magnitude of the primary acoustic signal and the magnitude of the secondary acoustic signal.

經由一頻率分析模組來對該等心形信號進行濾波，該頻率分析模組採用該等信號且模仿藉由濾波器組在此實施例中模擬之耳蝸(亦即，耳蝸域)之頻率分析。或者，可使用其他濾波器來進行頻率分析及合成，諸如，短時傅立葉變換(STFT)、子頻帶濾波器組、調變複合重疊變換、耳蝸模型、小波等。接著計算與心形主要信號及心形次要信號相關聯之能量位準(例如，作為功率估計)，且藉由ILD模組使用非線性組合來處理該等結果以獲得ILD。在示範性實施例中，非線性組合包含用與心形次要信號相關聯之功率估計除與心形主要信號相關聯之功率估計。接著ILD可在雜訊減少系統中用作空間辨別提示以抑制非所欲之聲源及改善通話。The centroid signals are filtered by a frequency analysis module that uses the signals and mimics the frequency analysis of the cochlea (ie, the cochlear region) simulated by the filter bank in this embodiment. . Alternatively, other filters may be used for frequency analysis and synthesis, such as Short Time Fourier Transform (STFT), subband filter banks, modulated complex overlap transforms, cochlear models, wavelets, and the like. The energy levels associated with the heart shaped primary signal and the cardiac secondary signal are then calculated (eg, as a power estimate) and processed by the ILD module using a non-linear combination to obtain the ILD. In an exemplary embodiment, the non-linear combination includes a power estimate associated with the cardioid primary signal in addition to the power estimate associated with the cardioid secondary signal. The ILD can then be used as a spatial recognition hint in the noise reduction system to suppress unwanted sources and improve calls.

本發明提供用於利用至少兩個麥克風之麥克風間位準差異(ILD)識別由通話所支配之頻率範圍以便改善通話及使背景雜訊及遠場干擾減弱的示範性系統及方法。可對經組態以接收聲音之任何音訊裝置實踐本發明之實施例，該等裝置諸如(但不限於)蜂巢式電話、電話手持機、頭戴式耳機及會議系統。有利地，示範性實施例經組態以對小型裝置或在主要音訊源遠離該裝置之應用中提供改良之雜訊抑制。雖然將參考對蜂巢式電話之操作來描述本發明之一些實施例，但可對任何音訊裝置實踐本發明。The present invention provides an exemplary system and method for identifying a frequency range governed by a call using inter-microphone level difference (ILD) of at least two microphones to improve communication and to attenuate background noise and far-field interference. Embodiments of the invention may be practiced with any audio device configured to receive sound, such as, but not limited to, a cellular telephone, a telephone handset, a headset, and a conferencing system. Advantageously, the exemplary embodiments are configured to provide improved noise suppression for small devices or applications where the primary audio source is remote from the device. Although some of the present invention will be described with reference to the operation of a cellular telephone. Embodiments, but the invention may be practiced with any audio device.

參看圖1a及圖1b，展示可實踐本發明之實施例的環境。使用者將音訊(通話)源102提供至音訊裝置104。示範性音訊裝置104包含兩個麥克風：一與音訊源102有關之主要麥克風106及一位於距主要麥克風106有一段距離d處之次要麥克風108。在示範性實施例中，麥克風106及108為全方向麥克風。Referring to Figures 1a and 1b, an environment in which embodiments of the present invention may be practiced is shown. The user provides an audio (call) source 102 to the audio device 104. The exemplary audio device 104 includes two microphones: a primary microphone 106 associated with the audio source 102 and a secondary microphone 108 located at a distance d from the primary microphone 106. In the exemplary embodiment, microphones 106 and 108 are omnidirectional microphones.

雖然麥克風106及108自音訊源102接收聲音(亦即，聲信號)，但麥克風106及108亦拾取雜訊110。雖然在圖1a及圖1b中雜訊110展示為來自單個位置，但雜訊110可包含來自不同於音訊源102之一或多個位置的任何聲音且可包括迴響及回聲。While the microphones 106 and 108 receive sound (i.e., acoustic signals) from the audio source 102, the microphones 106 and 108 also pick up the noise 110. Although the noise 110 is shown as coming from a single location in FIGS. 1a and 1b, the noise 110 can include any sound from one or more locations other than the audio source 102 and can include reverberation and echo.

本發明之實施例採用在由該兩個麥克風106及108所接收之聲信號之間的位準差異(例如，能量差異)，此獨立於獲得該等位準差異之方式。在圖1a中，因為主要麥克風106比次要麥克風108更接近音訊源102，所以在通話/語音段期間對於主要麥克風106而言強度位準較高導致(例如)較大能量位準。在圖1b中，因為主要麥克風106之定向回應在音訊源102之方向上最高，且次要麥克風108之定向回應在音訊源102之方向上較低，所以位準差異在音訊源102之方向上最高且在別處較低。Embodiments of the present invention employ level differences (e.g., energy differences) between acoustic signals received by the two microphones 106 and 108, independent of the manner in which the level differences are obtained. In FIG. 1a, because the primary microphone 106 is closer to the audio source 102 than the secondary microphone 108, a higher level of intensity for the primary microphone 106 during the call/speech segment results in, for example, a larger energy level. In FIG. 1b, since the orientation response of the primary microphone 106 is highest in the direction of the audio source 102 and the orientation response of the secondary microphone 108 is lower in the direction of the audio source 102, the level difference is in the direction of the audio source 102. Highest and lower elsewhere.

接著可使用該位準差異來在時間頻率域中辨別通話及雜訊。其他實施例可使用能量位準差異與時間延遲之組合來辨別通話。基於雙耳提示編碼(binaural cue decoding)，可執行通話信號擷取或通話改善。This level difference can then be used to identify calls and noise in the time frequency domain. Other embodiments may use a combination of energy level difference and time delay to distinguish the call. Based on binaural cue decoding, Perform call signal capture or call improvement.

現參看圖2，更詳細地展示示範性音訊裝置104。在示範性實施例中，音訊裝置104為包含處理器202、主要麥克風106、次要麥克風108、音訊處理引擎204及輸出裝置206的音訊接收裝置。音訊裝置104可包含音訊裝置104操作所必需之其他組件。將結合圖3更詳細地論述音訊處理引擎204。Referring now to Figure 2, an exemplary audio device 104 is shown in greater detail. In the exemplary embodiment, the audio device 104 is an audio receiving device including a processor 202, a primary microphone 106, a secondary microphone 108, an audio processing engine 204, and an output device 206. The audio device 104 can include other components necessary for the operation of the audio device 104. The audio processing engine 204 will be discussed in greater detail in conjunction with FIG.

如先前所論述，主要麥克風106與次要麥克風108分別間隔一段距離以便允許其間存在能量位準差異。在由麥克風106及108接收到聲信號後，聲信號經轉換成電信號(亦即，主要電信號及次要電信號)。根據一些實施例，該等電信號本身可由類比數位轉換器(未圖示)轉換成數位信號以供處理。為了區分聲信號，由主要麥克風106接收之聲信號在本文中被稱作主要聲信號，而由次要麥克風108接收之聲信號在本文中被稱作次要聲信號。As previously discussed, the primary microphone 106 and the secondary microphone 108 are each separated by a distance to allow for an energy level difference therebetween. After the acoustic signals are received by the microphones 106 and 108, the acoustic signals are converted into electrical signals (i.e., primary electrical signals and secondary electrical signals). According to some embodiments, the isoelectric signal itself may be converted to a digital signal by an analog digital converter (not shown) for processing. To distinguish acoustic signals, the acoustic signal received by primary microphone 106 is referred to herein as the primary acoustic signal, and the acoustic signal received by secondary microphone 108 is referred to herein as the secondary acoustic signal.

輸出裝置206為向使用者提供音訊輸出的任何裝置。舉例而言，輸出裝置206可為頭戴式耳機或手持機之聽筒或會議裝置上之揚聲器。Output device 206 is any device that provides audio output to the user. For example, the output device 206 can be a headset or a handset on a handset or a speaker on a conference device.

圖3為根據本發明之一實施例的示範性音訊處理引擎204之詳細方塊圖。在示範性實施例中，音訊處理引擎204實施於記憶體裝置內。在操作中，將自主要麥克風106及次要麥克風108接收之聲信號(亦即，X₁ 及X₂ )轉換成電信號，且經由差分麥克風陣列(DMA)模組302來處理。DMA模組302經組態以使用DMA理論來為緊密間隔之麥克風106 及108產生定向圖案。DMA模組302可藉由延遲及去掉由麥克風106及108捕獲之聲信號來判定在音訊裝置104周圍之前及後心形區域中之聲音及信號。在下文中將自此等心形區域接收之信號(亦即，聲音)稱作心形信號。在一實例中，來自心形區域內之音訊源102的作為心形主要信號的聲音由主要麥克風106傳輸。來自相同音訊源102的作為心形次要信號的聲音由次要麥克風108傳輸。FIG. 3 is a detailed block diagram of an exemplary audio processing engine 204 in accordance with an embodiment of the present invention. In the exemplary embodiment, audio processing engine 204 is implemented within a memory device. In operation, the acoustic signals received from the primary microphone 106 and the secondary microphone 108 (i.e., X ₁ and X ₂ ) are converted to electrical signals and processed via a differential microphone array (DMA) module 302. The DMA module 302 is configured to use DMA theory to generate directional patterns for closely spaced microphones 106 and 108. The DMA module 302 can determine the sound and signals in the heartbeat regions before and after the audio device 104 by delaying and removing the acoustic signals captured by the microphones 106 and 108. The signals (i.e., sounds) received from such heart shaped regions are hereinafter referred to as heart shaped signals. In one example, the sound from the audio source 102 within the heart shaped region as the heart shaped primary signal is transmitted by the primary microphone 106. Sounds from the same audio source 102 as heart-shaped secondary signals are transmitted by the secondary microphone 108.

對於具有兩個麥克風之系統而言，DMA模組302可在音訊裝置104周圍產生兩個不同之定向圖案。每一定向圖案為音訊裝置104周圍之一區域，在該區域中由該區域內之音訊源102產生之聲音可在幾乎不減弱之情況下由麥克風106及108接收。由在定向圖案外之音訊源102產生之聲音可減弱。For systems with two microphones, the DMA module 302 can create two different orientation patterns around the audio device 104. Each directional pattern is an area around the audio device 104 in which the sound produced by the audio source 102 within the area can be received by the microphones 106 and 108 with little attenuation. The sound produced by the audio source 102 outside the directional pattern can be attenuated.

在一實例中，一由DMA模組302產生之定向圖案允許自在音訊裝置104周圍之前心形區域內之音訊源102產生之聲音被接收，且一第二圖案允許來自在音訊裝置104周圍之後心形區域內之第二音訊源102的聲音被接收。來自在此等區域外之音訊源102的聲音亦可被接收，但聲音可能會減弱。In one example, an orientation pattern generated by the DMA module 302 allows sound generated by the audio source 102 in the heart-shaped region before the audio device 104 to be received, and a second pattern is allowed to come from behind the audio device 104. The sound of the second audio source 102 in the shaped area is received. The sound from the audio source 102 outside of these areas can also be received, but the sound may be diminished.

接著由頻率分析模組304處理來自DMA模組302之心形信號。在一實施例中，頻率分析模組304採用該等心形信號且模仿由濾波器組模擬之耳蝸(亦即，耳蝸域)的頻率分析。在一實例中，頻率分析模組304將該等心形信號分離成頻率頻帶。或者，可使用其他濾波器來進行頻率分析及合成，該等濾波器諸如短時傅立葉變換(STFT)、子頻帶濾波器組、調變複合重疊變換、耳蝸模型、小波等。因為多數聲音(例如，聲信號)為複合的且包含一個以上之頻率，所以對聲信號進行之子頻帶分析判定在一訊框(例如，一預定時間週期)期間何等個別頻率存在於該複合聲信號中。在一實施例中，該訊框為8ms長。The heart shaped signal from DMA module 302 is then processed by frequency analysis module 304. In one embodiment, the frequency analysis module 304 employs the cardiac signals and mimics the frequency analysis of the cochlea (i.e., the cochlear region) simulated by the filter bank. In one example, frequency analysis module 304 separates the equal-shaped signals into frequency bands. Alternatively, other filters can be used for frequency analysis and Synthesis, such as short time Fourier transform (STFT), subband filter bank, modulated complex overlap transform, cochlear model, wavelet, and the like. Since most sounds (eg, acoustic signals) are composite and contain more than one frequency, subband analysis of the acoustic signals determines what individual frequencies exist in the composite acoustic signal during a frame (eg, a predetermined time period). in. In an embodiment, the frame is 8 ms long.

一旦判定出頻率，則將信號轉發至能量模組306，其計算在一時間間隔期間之能量位準估計(亦即，功率估計)。功率估計可基於耳蝸通道及心形信號之頻寬。功率估計接著由麥克風間位準差異(ILD)模組308用於判定ILD。Once the frequency is determined, the signal is forwarded to energy module 306, which calculates an energy level estimate (i.e., power estimate) during a time interval. The power estimate can be based on the width of the cochlear channel and the heart shaped signal. The power estimate is then used by the inter-microphone level difference (ILD) module 308 to determine the ILD.

在各種實施例中，DMA模組302將心形信號發送至能量模組306。能量模組306在由頻率分析模組304對心形信號進行分析之前計算功率估計。In various embodiments, DMA module 302 sends a heart shaped signal to energy module 306. The energy module 306 calculates the power estimate before the heart shape signal is analyzed by the frequency analysis module 304.

參看圖4a，提供DMA模組302、頻率分析模組304、能量模組306及ILD模組308之一實施例。在此實施例中，由DMA模組302處理由麥克風106及108接收之聲信號。示範性DMA模組302經由延遲節點402 z^τ1 延遲主要聲信號X₁ 。類似地，DMA模組302經由第二延遲節點404 z^τ2 延遲次要聲信號X₂ 。Referring to FIG. 4a, an embodiment of a DMA module 302, a frequency analysis module 304, an energy module 306, and an ILD module 308 is provided. In this embodiment, the acoustic signals received by the microphones 106 and 108 are processed by the DMA module 302. The exemplary DMA module 302 delays the primary acoustic signal X ₁ via the delay node 402 z ^τ1 . Similarly, DMA module 302 delays secondary acoustic signal X ₂ via second delay node 404 z ^τ2 .

在示範性實施例中，在頻域中將心形主要信號(C_f )數學地判定(Z變換)為C_f =X₁ -z^-τ1 gX₂ In an exemplary embodiment, the heart-shaped frequency domain primary signal (C _f) mathematically determined in (Z conversion) _{_{^{C f = X 1 -z -τ1 gX}}} 2

而將心形次要信號(C_b )數學地判定為C_b =gX₂ -z^-τ2 X₁ 增益因數g由增益模組406計算以使信號位準均衡。先前技術系統在麥克風信號具有不同位準時可能會遭受效能損失。在本文中進一步論述增益模組。The heart-shaped secondary signal (C _b ) is mathematically determined as C _b =gX ₂ -z ^{-τ2 The} X ₁ gain factor g is calculated by the gain module 406 to equalize the signal level. Prior art systems may suffer a loss of performance when the microphone signals have different levels. The gain module is further discussed herein.

在各種實施例中，可經由頻率分析模組304處理心形信號。可將濾波器係數施加至每一麥克風信號。結果，頻率分析模組304之輸出可包含一經濾波之心形主要信號αC_f (t,ω)及一經濾波之心形次要信號βC_f (t,ω)，其中t表示時間指標(t=0,1,…N)且ω表示頻率指標(ω=0,1,…K)。In various embodiments, the heart shaped signal can be processed via frequency analysis module 304. Filter coefficients can be applied to each microphone signal. As a result, the output of the frequency analysis module 304 can include a filtered heart-shaped primary signal αC _f (t, ω) and a filtered cardiac-shaped secondary signal βC _f (t, ω), where t represents a time index (t= 0, 1, ... N) and ω represents the frequency index (ω = 0, 1, ... K).

能量模組306採用來自頻率分析模組304之信號且計算與心形主要信號(C_f )及心形次要信號(C_b )相關聯之功率估計。在示範性實施例中，可藉由對頻率分析模組304之輸出之絕對值求平方及求積分來數學地判定功率估計。來自心形主要信號及心形次要信號之信號的功率估計在本文中被稱作分量。舉例而言，與主要麥克風信號相關聯之能量位準可藉由下式判定且與次要麥克風信號相關聯之能量位準可藉由下式判定 The energy module 306 employs signals from the frequency analysis module 304 and calculates power estimates associated with the heart shaped primary signal ( _Cf ) and the cardiac shaped secondary signal ( _Cb ). In an exemplary embodiment, the power estimate can be mathematically determined by squaring and integrating the absolute values of the output of the frequency analysis module 304. The power estimate from the heart shaped primary signal and the heart shaped secondary signal is referred to herein as a component. For example, the energy level associated with the primary microphone signal can be determined by And the energy level associated with the secondary microphone signal can be determined by

在給定計算出之能量位準的情況下，可藉由ILD模組308判定ILD。在示範性實施例中，藉由採用能量位準之比率以非線性方式判定ILD，諸如ILD(t,ω)=E_f (t,ω)/E_b (t,ω)將所判定之能量位準施加至此ILD方程式導致 Given the calculated energy level, the ILD can be determined by the ILD module 308. In an exemplary embodiment, the ILD is determined in a non-linear manner by using a ratio of energy levels, such as ILD(t, ω) = E _f (t, ω) / E _b (t, ω) Level applied to this ILD equation results in

藉由非線性地組合心形主要信號之能量位準(亦即，分量)與心形次要信號之能量位準(亦即，分量)，可有效地接收來自在音訊裝置104周圍之前至後心形區域(描繪於圖6中)內之音訊源102的聲音。可藉以擷取信號之空間範圍可由選定之ILD區域指定及控制。相反，若線性地組合心形主要信號及心形次要信號(例如，去掉該等信號)，則可有效地接收來自在超心形區域內之音訊源102的聲音。超心形區域可大於(寬於)選定之前至後心形ILD區域，因此經由ILD進行之非線性組合可產生較窄及較大空間選擇性之波束。By nonlinearly combining the energy level (i.e., component) of the heart shaped primary signal with the energy level (i.e., component) of the heart shaped secondary signal, it can be effectively received from around the periphery of the audio device 104. The sound of the audio source 102 within the heart shaped area (shown in Figure 6). The spatial extent by which the signal can be extracted can be specified and controlled by the selected ILD region. Conversely, if the heart shaped primary signal and the heart shaped secondary signal are linearly combined (e.g., the signals are removed), the sound from the audio source 102 in the supercardioid region can be effectively received. The supercardioid region may be larger (wider than) the selected front to back cardiac ILD region, so nonlinear combination via ILD may result in a narrower and more spatially selective beam.

一旦判定出ILD，則經由雜訊減少系統310處理該等信號。返回參看圖3，在示範性實施例中，雜訊減少系統310包含雜訊估計模組312、濾波器模組314、濾波器平滑模組316、掩蔽模組318，及頻率合成模組320。Once the ILD is determined, the signals are processed via the noise reduction system 310. Referring to FIG. 3 , in the exemplary embodiment, the noise reduction system 310 includes a noise estimation module 312 , a filter module 314 , a filter smoothing module 316 , a masking module 318 , and a frequency synthesis module 320 .

根據本發明之一示範性實施例，使用維納(Wiener)濾波器抑制雜訊/改善通話。然而，為了導出維納濾波器估計，需要特定輸入。此等輸入包含雜訊之功率譜密度及主要聲信號之功率譜密度。According to an exemplary embodiment of the present invention, a Wiener filter is used to suppress noise/improve calls. However, in order to derive Wiener filter estimates, specific inputs are required. These inputs include the power spectral density of the noise and the power spectral density of the primary acoustic signal.

在示範性實施例中，雜訊估計係僅基於來自主要麥克風106之聲信號。根據本發明之一實施例，示範性雜訊估計模組312為可藉由下式在數學上近似之分量N (t ,ω)=λ₁ (t ,ω)E ₁ (t ,ω)+(1-λ₁ (t ,ω))min[N (t -1,ω),E ₁ (t ,ω)] 如所示，在此實施例中之雜訊估計係基於主要聲信號之當前能量估計E ₁ (t ,ω )及前一時間訊框之雜訊估計N(t-1 ,ω) 的最小統計量。結果，有效地及低潛時地執行雜訊估計。In an exemplary embodiment, the noise estimate is based solely on acoustic signals from the primary microphone 106. According to an embodiment of the invention, the exemplary noise estimation module 312 is a mathematically approximated component N ( t , ω) = λ ₁ ( t , ω) E ₁ ( t , ω) + (1-λ ₁ ( t , ω)) min [ N ( t -1, ω), E ₁ ( t , ω)] As shown, the noise estimation in this embodiment is based on the current main acoustic signal The energy estimate E ₁ ( t , ω ) and the minimum statistic of the noise estimate N(t-1 , ω) of the previous time frame. As a result, noise estimation is performed efficiently and with low latency.

在上述方程式中之λ ₁ (t,ω )係自由ILD模組308近似之ILD導出，如 λ ₁ ( t, ω ) in the above equation is an ILD derived from the free ILD module 308, such as

亦即，當ILD在主要麥克風106處小於臨限值(例如，臨限 =0.5)(高於此臨限值被認為係通話)時，λ ₁ 為小的，且因此雜訊估計值緊密地跟隨雜訊。當ILD開始上升(例如，因為通話存在於大ILD區域內)，λ ₁ 增加。結果，雜訊估計模組312放慢雜訊估計過程，且通話能量並未顯著地影響最終雜訊估計。因此，本發明之示範性實施例可使用最小統計量與語音活動偵測之組合來判定雜訊估計。That is, when the ILD is less than the threshold at the primary microphone 106 (eg, threshold = 0.5) (above this threshold is considered to be a call), λ ₁ is small, and thus the noise estimate is tightly Follow the noise. When the ILD starts to rise (for example, because the call is present in the large ILD area), λ ₁ increases. As a result, the noise estimation module 312 slows down the noise estimation process and the call energy does not significantly affect the final noise estimate. Thus, exemplary embodiments of the present invention may use a combination of minimum statistics and voice activity detection to determine noise estimates.

濾波器模組314接著基於該雜訊估計導出一濾波器估計。在一實施例中，該濾波器為維納濾波器。替代實施例可涵蓋其他濾波器。因此，根據一實施例，維納濾波器可近似成 Filter module 314 then derives a filter estimate based on the noise estimate. In an embodiment, the filter is a Wiener filter. Alternative filters may encompass other filters. Therefore, according to an embodiment, the Wiener filter can be approximated

其中P _s 為通話之功率譜密度，且P _n 為雜訊之功率譜密度。根據一實施例，P _n 為雜訊估計N (t,ω)，其由雜訊估計模組312計算。在一示範性實施例中，P _s =E ₁ (t,ω)-γN (t,ω)，其中E ₁ (t,ω)為由能量模組306計算的與主要聲信號(例如，心形主要信號)相關聯之能量估計，且N(t,ω) 為由雜訊估計模組312提供之雜訊估計。因為雜訊估計隨每一訊框而變，所以濾波器估計亦將隨每一訊框而變。Where P _s is the power spectral density of the call and P _n is the power spectral density of the noise. According to an embodiment, P _n is the estimated noise N (t, ω), which is estimated by the noise calculation module 312. In an exemplary embodiment, P _s = E ₁ (t, ω) - γN (t, ω), where E ₁ (t, ω) is the primary acoustic signal (eg, heart calculated by energy module 306) The primary signal is associated with an energy estimate, and N(t, ω) is the noise estimate provided by the noise estimation module 312. Since the noise estimate varies with each frame, the filter estimate will also vary with each frame.

γ為過度減法項，其為ILD之函數。γ補償雜訊估計模組312之最小統計量的偏差且形成一感覺加權。因為時間常數不同，所以該偏差在純雜訊之部分與雜訊與通話之部分之間不同。因此，在一些實施例中，對此偏差之補償可為必需的。在示範性實施例中，以經驗來判定γ(例如，在大ILD時為2-3 dB，且在低ILD時為6-9 dB)。γ is an excessive subtraction, which is a function of ILD. The gamma compensates for the deviation of the minimum statistic of the noise estimation module 312 and forms a perceptual weight. Because the time constant is different, the deviation is different between the part of the pure noise and the part of the noise and the call. Thus, in some embodiments, compensation for this bias may be necessary. In an exemplary embodiment, gamma is empirically determined (eg, 2-3 dB at large ILDs and 6-9 dB at low ILDs).

在上述示範性維納濾波器方程式中之φ為進一步限制雜訊估計之因數。φ可為任何正值。在一實施例中，可藉由將φ設為2來獲得非線性擴充。根據示範性實施例，以經驗來判定φ且當之主體降至指定值下(例如，自最大可能值W降低了12 dB，其為整體)時施加φ。φ in the above exemplary Wiener filter equation is a factor that further limits the noise estimation. φ can be any positive value. In an embodiment, nonlinear expansion can be obtained by setting φ to two. According to an exemplary embodiment, empirically determining φ and when Apply φ when the body drops below the specified value (for example, 12 dB lower than the maximum possible value W, which is integral).

因為維納濾波器估計可快速地改變(例如，自一個訊框至下一訊框)且雜訊及通話估計可在每一訊框之間大幅變化，所以按現狀施加維納濾波器估計可導致假影(例如，不連續性、跳波、瞬變等)。因此，提供一可選濾波器平滑模組316以使施加至聲信號之維納濾波器估計根據時間而平滑。在一實施例中，濾波器平滑模組316可數學上近似成M (t ,ω)=λ_s (t ,ω)W (t ,ω)+(1-λ_s (t ,ω))M (t -1,ω)，其中λ_s 為維納濾波器估計及主要麥克風能量E₁ 之函數。Since the Wiener filter estimate can be changed quickly (for example, from one frame to the next frame) and the noise and call estimates can vary greatly from frame to frame, the Wiener filter estimate can be applied as it is. Causes artifacts (eg, discontinuities, jumps, transients, etc.). Accordingly, an optional filter smoothing module 316 is provided to smooth the Wiener filter estimate applied to the acoustic signal as a function of time. In one embodiment, the filter smoothing module 316 can be mathematically approximated by M ( t , ω) = λ _s ( t , ω) W ( t , ω ) + (1 - λ _s ( t , ω)) M ( t -1, ω), where λ _s is a function of the Wiener filter estimate and the main microphone energy E ₁ .

如所示，濾波器平滑模組316在時間(t)時將使用來自在時間(t-1)時之前一訊框之經平滑之維納濾波器估計之值來使維納濾波器估計平滑。為了允許快速回應於快速改變之聲信號，濾波器平滑模組316對快速改變之信號執行較少平滑，且對改變較慢之信號執行較多平滑。藉由根據E₁ 相對於時間之加權一階導數使λ_s 之值變化來達成此。若該一階導數為大的且能量改變為大的，則將λ_s 設為大值。若該導數為小的則將λ_s 設為較小值。As shown, the filter smoothing module 316 will use the smoothed Wiener filter estimate from the previous frame at time (t-1) to smooth the Wiener filter estimate at time (t). . To allow for a quick response to the rapidly changing acoustic signal, the filter smoothing module 316 performs less smoothing on the rapidly changing signal and more smoothing on the slower changing signal. This is achieved by varying the value of λ _s according to the weighted first derivative of E ₁ with respect to time. If the first derivative is large and the energy changes to be large, then λ _{s is} set to a large value. If the derivative is small, λ _{s is} set to a small value.

在藉由濾波器平滑模組316平滑後，將主要聲信號乘以經平滑之維納濾波器估計以估計通話。在上述維納濾波器實施例中，通話估計由S (t ,ω)=C _f (t ,ω)*M (t ,ω)近似，其中C _f (t ,ω)為心形主要信號。在示範性實施例中，通話估計發生於掩蔽模組318中。After smoothing by the filter smoothing module 316, the primary acoustic signal is multiplied by a smoothed Wiener filter estimate to estimate the call. Wiener filter in the above embodiment, the call estimated by S (t, ω) = C f (t, ω) * M (t, ω) is approximately, where C _f (t, ω) is heart-shaped primary signal. In an exemplary embodiment, call estimation occurs in the masking module 318.

接下來，將通話估計自耳蝸域轉換回至時域。該轉換包含採用通話估計S (t ,ω)及在頻率合成模組320中將耳蝸通道之相移信號相加。一旦完成轉換，則將信號輸出給使用者。Next, the call estimate is converted back from the cochlear domain to the time domain. The conversion includes summing the phase shift signals of the cochlear channel in the frequency synthesis module 320 using the call estimate S ( t , ω). Once the conversion is complete, the signal is output to the user.

請注意，圖3之音訊處理引擎204之系統架構為示範性的。替代實施例可包含更多組件、更少組件或相等組件且仍在本發明之實施例之範疇內。可將音訊處理引擎204之各種模組組合成單個模組。舉例而言，可將頻率分析模組304及能量模組306之功能性組合成單個模組。此外，ILD模組308之功能可與能量模組306之功能單獨組合，或與能量模組306協同頻率分析模組304而組合。進一步舉例而言，濾波器模組314之功能性可與濾波器平滑模組316之功能性組合。Please note that the system architecture of the audio processing engine 204 of FIG. 3 is exemplary. Alternative embodiments may include more components, fewer components, or equivalent components and still be within the scope of embodiments of the invention. The various modules of the audio processing engine 204 can be combined into a single module. For example, the functionality of the frequency analysis module 304 and the energy module 306 can be combined into a single module. In addition, the functions of the ILD module 308 can be combined with the functions of the energy module 306 alone or with the energy module 306 in conjunction with the frequency analysis module 304. By way of further example, the functionality of filter module 314 can be combined with the functionality of filter smoothing module 316.

現參看圖4b，展示根據本發明之一實施例的DMA模組302之實際實施例。在示範性實施例中，藉由使用使麥克風106及108均衡之濾波器412 F(z)來補償麥克風差異。在一些實施例中，由於濾波器412為非因果濾波器，所以藉由延遲節點414 D(z)將延遲施加至主要麥克風信號。延遲節點414之施加導致兩個通道之對準。Referring now to Figure 4b, a practical embodiment of a DMA module 302 in accordance with one embodiment of the present invention is shown. In an exemplary embodiment, the microphone difference is compensated for by using a filter 412 F(z) that equalizes the microphones 106 and 108. In some embodiments, since filter 412 is a non-causal filter, a delay is applied to the primary microphone signal by delay node 414 D(z). The application of delay node 414 results in the alignment of the two channels.

為了實施分數延遲，將全通濾波器416及418(例如，A₁ (z)及A₂ (z))施加至該等信號。然而，全通濾波器416及418之施加引入延遲。結果，需要兩個以上之延遲節點420及422(例如，D₁ (z)及D₂ (z))。To implement the fractional delay, all pass filters 416 and 418 (e.g., A ₁ (z) and A ₂ (z)) are applied to the signals. However, the application of all pass filters 416 and 418 introduces a delay. As a result, more than two delay nodes 420 and 422 (e.g., D ₁ (z) and D ₂ (z)) are required.

可藉由施加由增益模組406計算之增益來修改次要聲信號之量值以使其匹配主要聲信號之量值。增益模組406計算兩個信號(例如，X₁ 及X₂ )之量值且導出該增益g，其為在主要聲信號之量值與次要聲信號之量值之間的比率。接著可使用該增益來計算心形主要信號及心形次要信號。The magnitude of the secondary acoustic signal can be modified to match the magnitude of the primary acoustic signal by applying a gain calculated by gain module 406. Gain module 406 calculates the magnitude of the two signals (eg, X ₁ and X ₂ ) and derives the gain g, which is the ratio between the magnitude of the primary acoustic signal and the magnitude of the secondary acoustic signal. This gain can then be used to calculate the heart shaped primary signal and the heart shaped secondary signal.

由於全通濾波器416及418產生高達奈奎斯特(Nyquist)頻率之一半的所要分數延遲，所以以兩倍之系統採樣速率來施加該處理。Since all-pass filters 416 and 418 generate a desired fractional delay up to one-half of the Nyquist frequency, the process is applied at twice the system sampling rate.

結果，提供採樣速率轉換(SRC)節點424及426。SRC節點424及426之輸出為心形主要信號C_f 及心形次要信號C_b 。As a result, sample rate conversion (SRC) nodes 424 and 426 are provided. The outputs of SRC nodes 424 and 426 are a heart shaped primary signal _Cf and a heart shaped secondary signal _Cb .

圖5為本發明之替代實施例之方塊圖。在此實施例中，來自麥克風106及108之聲信號在由DMA模組302處理之前係由頻率分析模組304處理。根據本實施例，頻率分析模組304採用聲信號(亦即，X₁ 及X₂ )且使用濾波器組(諸如，快速傅立葉變換)模仿耳蝸實施例。或者，可使用其他濾波器來進行頻率分析及合成，諸如，短時傅立葉變換(STFT)、子頻帶濾波器組、調變複合重疊變換、耳蝸模型、小波等。頻率分析模組304之輸出可包含複數個信號(例如，每子頻帶或抽頭一個信號)。Figure 5 is a block diagram of an alternate embodiment of the present invention. In this embodiment, the acoustic signals from microphones 106 and 108 are processed by frequency analysis module 304 prior to processing by DMA module 302. In accordance with the present embodiment, frequency analysis module 304 employs acoustic signals (i.e., X ₁ and X ₂ ) and uses a filter bank such as a fast Fourier transform to mimic the cochlear embodiment. Alternatively, other filters may be used for frequency analysis and synthesis, such as Short Time Fourier Transform (STFT), subband filter banks, modulated complex overlap transforms, cochlear models, wavelets, and the like. The output of frequency analysis module 304 can include a plurality of signals (e.g., one sub-band or one signal per tap).

藉由計算次要聲信號及主要聲信號之量值且導出增益g(其為在主要聲信號之量值與次要聲信號之量值之間的比率)來修改次要聲信號量值，以使其匹配次要聲信號之量值。隨後，可經由DMA模組302處理該等信號。在本實施例中，利用該等信號之相移(例如，使用)來達成該等信號之分數延遲。Modifying the secondary acoustic signal magnitude by calculating the magnitude of the secondary acoustic signal and the primary acoustic signal and deriving a gain g, which is the ratio between the magnitude of the primary acoustic signal and the magnitude of the secondary acoustic signal, To match the magnitude of the secondary acoustic signal. These signals can then be processed via DMA module 302. In this embodiment, the phase shift of the signals is utilized (eg, using ) to achieve a fractional delay in the signals.

經由能量模組306及ILD模組308進行之處理的剩餘部分類似於結合圖4a描述之處理，但係以每子頻帶或抽頭為基礎。The remainder of the processing via energy module 306 and ILD module 308 is similar to that described in connection with Figure 4a, but is based on each sub-band or tap.

圖6為根據本發明之示範性實施例產生之前至後心形定向圖案602之極座標圖及ILD圖。心形定向圖案602說明可接收聲信號之範圍。如所示，藉由使用非線性組合處理及延遲節點(例如，420及422)，可在向前方向及向後方向上(亦即，沿著x軸)擴展心形定向圖案602之範圍。在向前方向及向後方向上之擴展允許自更遠離麥克風106及108之聲源獲得顯著ILD提示。結果，全方向麥克風106及108可達成模仿定向麥克風之彼等特性的聲學特性。FIG. 6 is a polar plot and ILD diagram of a front-to-back cardioid orientation pattern 602 produced in accordance with an exemplary embodiment of the present invention. The cardioid orientation pattern 602 illustrates the range of receivable acoustic signals. As shown, by using non-linear combination processing and delay nodes (eg, 420 and 422), the range of cardioid orientation patterns 602 can be extended in the forward and backward directions (ie, along the x-axis). The expansion in the forward and backward directions allows significant ILD cues to be obtained from sources that are further away from the microphones 106 and 108. As a result, omnidirectional microphones 106 and 108 can achieve acoustic characteristics that mimic the characteristics of the directional microphones.

現參看圖7，展示用於利用全方向麥克風之ILD來抑制雜訊及改善通話之示範性方法的流程圖700。在步驟702中，由主要麥克風106及次要麥克風108接收聲信號。在示範性實施例中，該等麥克風為全方向麥克風。在一些實施例中，藉由麥克風將聲信號轉換成電信號(亦即，主要電信號及次要電信號)以供處理。Referring now to Figure 7, there is shown an ILD for omnidirectional microphones to suppress miscellaneous Flowchart 700 of an exemplary method of improving communication and improving calls. In step 702, an acoustic signal is received by primary microphone 106 and secondary microphone 108. In an exemplary embodiment, the microphones are omnidirectional microphones. In some embodiments, the acoustic signal is converted to an electrical signal (ie, a primary electrical signal and a secondary electrical signal) by a microphone for processing.

接著於步驟704中，藉由DMA模組302對該等聲信號執行差分陣列分析。在示範性實施例中，DMA模組302經組態以藉由延遲、去掉由麥克風106及108捕獲之聲信號及將一增益因數施加至該等聲信號來判定心形主要信號及心形次要信號。特定地，DMA模組302藉由採用在主要電信號與經延遲之次要電信號之間的差異來判定心形主要信號。類似地，DMA模組302藉由採用在次要電信號與經延遲之主要電信號之間的差異來判定心形次要信號。Next, in step 704, differential array analysis is performed on the acoustic signals by the DMA module 302. In an exemplary embodiment, DMA module 302 is configured to determine the heart shaped primary signal and heart shape by delaying, removing acoustic signals captured by microphones 106 and 108, and applying a gain factor to the acoustic signals. Want a signal. In particular, DMA module 302 determines the heart shaped primary signal by employing the difference between the primary electrical signal and the delayed secondary electrical signal. Similarly, DMA module 302 determines the heart shaped secondary signal by employing the difference between the secondary electrical signal and the delayed primary electrical signal.

在步驟706中，頻率分析模組304對心形主要信號及心形次要信號執行頻率分析。根據一實施例，頻率分析模組304利用濾波器組來判定存在於複合之心形主要信號及心形次要信號中的個別頻率。In step 706, the frequency analysis module 304 performs frequency analysis on the heart shaped primary signal and the cardiac shaped secondary signal. According to an embodiment, the frequency analysis module 304 utilizes the filter bank to determine individual frequencies present in the composite heart shaped primary signal and the cardiac shaped secondary signal.

在步驟708中，計算用於心形主要信號及心形次要信號之能量估計。在一實施例中，由能量模組306判定能量估計。示範性能量模組306利用一當前心形信號及一先前計算出之能量估計來判定當前心形信號之當前能量估計。In step 708, energy estimates for the heart shaped primary signal and the cardiac shaped secondary signal are calculated. In an embodiment, the energy estimate is determined by energy module 306. The exemplary energy module 306 utilizes a current heart shaped signal and a previously calculated energy estimate to determine a current energy estimate for the current heart shaped signal.

一旦計算出能量估計，則在步驟710中計算麥克風間位準差異(ILD)。在一實施例中，基於心形主要信號及心形次要信號之能量估計的非線性組合來計算ILD。在示範性實施例中，由ILD模組308計算ILD。Once the energy estimate is calculated, an inter-microphone level difference (ILD) is calculated in step 710. In an embodiment, the ILD is calculated based on a non-linear combination of the heart shaped primary signal and the energy estimate of the heart shaped secondary signal. Demonstration In an embodiment, the ILD is calculated by the ILD module 308.

一旦判定出ILD，則在步驟712中經由雜訊減少系統處理心形主要信號及心形次要信號。將結合圖8更詳細地論述步驟712。接著在步驟714中將雜訊減少處理之結果輸出給使用者。在一些實施例中，將電信號轉換成類比信號以供輸出。可經由揚聲器、聽筒或其他類似裝置來輸出。Once the ILD is determined, the heart shaped primary signal and the heart shaped secondary signal are processed via the noise reduction system in step 712. Step 712 will be discussed in more detail in conjunction with FIG. Next, in step 714, the result of the noise reduction processing is output to the user. In some embodiments, the electrical signal is converted to an analog signal for output. It can be output via a speaker, earpiece or other similar device.

現參看圖8，提供示範性雜訊減少處理(步驟712)之流程圖。基於計算出之ILD，在步驟802中估計雜訊。根據本發明之實施例，雜訊估計僅基於在主要麥克風106處接收之聲信號。雜訊估計可基於來自主要麥克風106之聲信號之當前能量估計及一先前計算出之雜訊估計。在判定雜訊估計時，根據本發明之示範性實施例，當ILD增加時，停止或放慢雜訊估計。Referring now to Figure 8, a flowchart of an exemplary noise reduction process (step 712) is provided. Based on the calculated ILD, the noise is estimated in step 802. According to an embodiment of the invention, the noise estimate is based only on acoustic signals received at the primary microphone 106. The noise estimate may be based on a current energy estimate of the acoustic signal from the primary microphone 106 and a previously calculated noise estimate. In determining the noise estimate, according to an exemplary embodiment of the present invention, the noise estimation is stopped or slowed down as the ILD increases.

在步驟804中，藉由濾波器模組314計算濾波器估計。在一實施例中，音訊處理引擎208中所用之濾波器為維納濾波器。一旦判定出濾波器估計，則可在步驟806中使濾波器估計平滑。平滑防止可由音訊假影造成之快速波動。在步驟808中將經平滑之濾波器估計施加至來自主要麥克風106之聲信號以產生通話估計。In step 804, the filter estimate is calculated by filter module 314. In one embodiment, the filter used in the audio processing engine 208 is a Wiener filter. Once the filter estimate is determined, the filter estimate can be smoothed in step 806. Smoothing prevents rapid fluctuations caused by audio artifacts. The smoothed filter estimate is applied to the acoustic signal from the primary microphone 106 in step 808 to generate a call estimate.

在步驟810中，將通話估計轉換回至時域。示範性轉換技術將耳蝸通道之反頻率施加至通話估計。一旦轉換出通話估計，則現在可將音訊信號輸出給使用者。In step 810, the call estimate is converted back to the time domain. An exemplary conversion technique applies the inverse frequency of the cochlear channel to the call estimate. Once the call estimate is converted, the audio signal can now be output to the user.

上述模組可由儲存於儲存媒體上之指令組成。該等指令可由處理器202擷取及執行。指令之一些實例包括軟體、程式碼及韌體。儲存媒體之一些實例包含記憶體裝置及積體電路。該等指令在由處理器202執行時可操作以指導處理器202根據本發明之實施例來操作。熟習此項技術者熟悉指令、處理器及儲存媒體。The above modules may be composed of instructions stored on a storage medium. The instructions can be retrieved and executed by processor 202. Some examples of instructions include software, Code and firmware. Some examples of storage media include memory devices and integrated circuits. The instructions, when executed by processor 202, are operable to direct processor 202 to operate in accordance with embodiments of the present invention. Those skilled in the art are familiar with instructions, processors, and storage media.

上文參考示範性實施例來描述本發明。對於熟習此項技術者而言顯而易見的係，可在未脫離本發明之較寬範疇之情況下進行各種修改且可使用其他實施例。因此，本發明意欲覆蓋對示範性實施例之此等及其他變化。The invention has been described above with reference to exemplary embodiments. Various modifications may be made without departing from the broader scope of the invention, and other embodiments may be employed. Accordingly, the present invention is intended to cover such modifications and alternatives

102‧‧‧音訊源/聲源102‧‧‧Source/Source

104‧‧‧音訊裝置104‧‧‧ audio device

106‧‧‧主要麥克風106‧‧‧Main microphone

108‧‧‧次要麥克風108‧‧‧ secondary microphone

110‧‧‧雜訊110‧‧‧ Noise

202‧‧‧處理器202‧‧‧ processor

204‧‧‧音訊處理引擎204‧‧‧Optical Processing Engine

206‧‧‧輸出裝置206‧‧‧Output device

302‧‧‧差分麥克風陣列(DMA)模組302‧‧‧Differential Microphone Array (DMA) Module

304‧‧‧頻率分析模組304‧‧‧Frequency Analysis Module

306‧‧‧能量模組306‧‧‧Energy Module

308‧‧‧麥克風間位準差異(ILD)模組308‧‧‧Inter-microphone level difference (ILD) module

310‧‧‧雜訊減少系統310‧‧‧ Noise Reduction System

312‧‧‧雜訊估計模組312‧‧‧ Noise Estimation Module

314‧‧‧濾波器模組314‧‧‧Filter Module

316‧‧‧濾波器平滑模組316‧‧‧Filter Smoothing Module

318‧‧‧掩蔽模組318‧‧‧ Masking module

320‧‧‧頻率合成模組320‧‧‧frequency synthesis module

402‧‧‧延遲節點402‧‧‧delay node

404‧‧‧延遲節點404‧‧‧delay node

406‧‧‧增益模組406‧‧‧ Gain Module

412‧‧‧濾波器412‧‧‧ filter

414‧‧‧延遲節點414‧‧‧delay node

416‧‧‧全通濾波器416‧‧‧All-pass filter

418‧‧‧全通濾波器418‧‧‧All-pass filter

420‧‧‧延遲節點420‧‧‧delay node

422‧‧‧延遲節點422‧‧‧delay node

424‧‧‧採樣速率轉換(SRC)節點424‧‧‧Sampling Rate Conversion (SRC) node

426‧‧‧採樣速率轉換(SRC)節點426‧‧‧Sampling Rate Conversion (SRC) node

602‧‧‧前至後心形定向圖案602‧‧‧Heart-to-back heart-shaped pattern

g‧‧‧增益因數G‧‧‧gain factor

X₁ ‧‧‧聲信號X ₁ ‧‧‧ acoustic signal

X₂ ‧‧‧聲信號X ₂ ‧‧‧ acoustic signal

圖1a及圖1b為可實踐本發明之實施例的兩種環境之圖。1a and 1b are diagrams of two environments in which embodiments of the present invention may be practiced.

圖2為實施本發明之實施例之示範性音訊裝置的方塊圖。2 is a block diagram of an exemplary audio device embodying an embodiment of the present invention.

圖3為示範性音訊處理引擎之方塊圖。3 is a block diagram of an exemplary audio processing engine.

圖4a說明DMA模組、頻率分析模組、能量模組及ILD模組之示範性實施例。4a illustrates an exemplary embodiment of a DMA module, a frequency analysis module, an energy module, and an ILD module.

圖4b為DMA模組之示範性實施例。Figure 4b is an exemplary embodiment of a DMA module.

圖5為本發明之替代實施例之方塊圖。Figure 5 is a block diagram of an alternate embodiment of the present invention.

圖6為根據本發明之實施例產生之前至後心形定向圖案之極座標圖及ILD圖。6 is a polar plot and an ILD diagram of a front-to-back cardioid orientation pattern in accordance with an embodiment of the present invention.

圖7為用於利用全方向麥克風之ILD來改善通話之示範性方法的流程圖。7 is a flow diagram of an exemplary method for improving a call using an ILD of an omnidirectional microphone.

圖8為示範性雜訊減少處理之流程圖。Figure 8 is a flow diagram of an exemplary noise reduction process.

204‧‧‧音訊處理引擎204‧‧‧Optical Processing Engine

304‧‧‧頻率分析模組304‧‧‧Frequency Analysis Module

306‧‧‧能量模組306‧‧‧Energy Module

310‧‧‧雜訊減少系統310‧‧‧ Noise Reduction System

312‧‧‧雜訊估計模組312‧‧‧ Noise Estimation Module

314‧‧‧濾波器模組314‧‧‧Filter Module

316‧‧‧濾波器平滑模組316‧‧‧Filter Smoothing Module

318‧‧‧掩蔽模組318‧‧‧ Masking module

320‧‧‧頻率合成模組320‧‧‧frequency synthesis module

X₁ ‧‧‧聲信號X ₁ ‧‧‧ acoustic signal

X₂ ‧‧‧聲信號X ₂ ‧‧‧ acoustic signal

Claims

A system for improving a call, comprising: a primary microphone and a primary microphone, the primary microphone and the secondary microphone configured to receive a primary acoustic signal and a primary acoustic signal; a differential microphone array (DMA) mode a set configured to determine a heart shaped primary signal and a cardiac shaped secondary signal based on a primary electrical signal converted from the primary acoustic signal and a secondary electrical signal converted from the secondary acoustic signal, the DMA module Further configuring to determine the heart shaped primary signal based at least in part on delaying at least one of the primary electrical signal and the secondary electrical signal; and an inter-microphone level difference module configured to non-linearly combine the The heart-shaped main signal and the component of the heart-shaped secondary signal obtain a level difference between the microphones.

The system of claim 1, wherein the DMA module is configured to determine the heart shaped primary signal by employing a difference between a delayed primary electrical signal and a delayed and level balanced secondary electrical signal .

The system of claim 1, wherein the DMA module is configured to determine by determining a gain and using a difference between a primary electrical signal and a delayed secondary electrical signal adjusted by the gain The heart shape is the main signal.

A system as claimed in claim 3, wherein the gain is a ratio between a magnitude of the primary acoustic signal and a magnitude of the secondary acoustic signal.

The system of claim 1, wherein the DMA module is configured to employ a difference between the secondary electrical signal and a delayed primary electrical signal To determine the heart shaped secondary signal.

The system of claim 1, further comprising a frequency analysis module configured to determine the heart shaped primary signal and the frequency of the cardiac shaped secondary signal.

The system of claim 1, further comprising an energy module configured to determine an energy estimate of the heart shaped primary signal and the frame of the cardiac shaped secondary signal.

The system of claim 1, further comprising a noise estimation module configured to determine a noise estimate for one of the primary acoustic signals based on an energy estimate of the one of the heart shaped primary signals and the difference in the level between the microphones.

The system of claim 1 further comprising a filter module configured to determine a filter estimate to be applied to the primary acoustic signal.

The system of claim 9, further comprising a filter smoothing module configured to smooth the filter estimate prior to applying the filter estimate to the primary acoustic signal.

The system of claim 1 further comprising a masking module configured to determine a call estimate.

The system of claim 11, further comprising a frequency synthesis module configured to convert the call estimate to a time domain for output.

The system of claim 1, wherein the DMA module determines the heart shaped primary signal and a heart shaped secondary signal of one of the primary electrical signals.

The system of claim 1, wherein the DMA module is configured to determine the heart shaped secondary signal by employing a difference between a quasi-equalized secondary electrical signal and a delayed primary electrical signal.

A method for improving a call, comprising: Receiving a primary acoustic signal at a primary microphone and receiving a primary acoustic signal at a primary microphone; determining based on a primary electrical signal converted from the primary acoustic signal and a secondary electrical signal converted from the secondary acoustic signal a heart shaped primary signal and a cardiac shaped secondary signal; further determining the cardiac shaped primary signal based at least in part on delaying at least one of the primary electrical signal and the secondary electrical signal; and nonlinearly combining the components of the cardiac shaped primary signal And the component of the heart-shaped secondary signal to obtain a level difference between the microphones.

The method of claim 15, wherein the determining the heart shaped primary signal comprises using a difference between a delayed primary electrical signal and a delayed secondary electrical signal.

The method of claim 15, wherein determining the heart shaped primary signal comprises determining a gain and employing a difference between a primary electrical signal and a delayed secondary electrical signal adjusted by the gain.

The method of claim 17, wherein the gain is a ratio between a magnitude of the primary acoustic signal and a magnitude of the secondary acoustic signal.

The method of claim 15, wherein the determining the cardiac secondary signal comprises employing a difference between the secondary electrical signal and a delayed primary electrical signal.

The method of claim 15, wherein the non-linear combination comprises dividing the component of the cardioid primary signal by the component of the cardiac secondary signal.

The method of claim 15, further comprising determining an energy estimate for each of the acoustic signals during a frame.

The method of claim 15, further comprising determining a noise estimate based on an energy estimate of the primary acoustic signal and the inter-microphone level difference.

The method of claim 22, further comprising determining a filter estimate based on the noise estimate of the primary acoustic signal, the energy estimate of the primary acoustic signal, and the inter-microphone level difference.

The method of claim 23, further comprising generating a call estimate by applying the filter estimate to the primary acoustic signal.

The method of claim 23, further comprising smoothing the filter estimate.

The method of claim 15, wherein the heart shaped primary signal and the cardiac shaped secondary signal are each a sub-band of the primary electrical signal.

The method of claim 15, wherein the determining the heart shaped primary signal comprises using a difference between a delayed primary electrical signal and a quasi-equalized secondary electrical signal.

A non-transitory computer readable storage medium having a program thereon, the program being executed by a processor to perform a method for improving a call, the method comprising: receiving a primary at a primary microphone Acoustic signal and receiving an acoustic signal at a primary microphone; determining a heart-shaped main signal and a heart-shaped signal based on a primary electrical signal converted from the primary acoustic signal and a secondary electrical signal converted from the secondary acoustic signal The signal is further determined based at least in part on delaying at least one of the primary electrical signal and the secondary electrical signal; and The components of the heart shaped primary signal and the components of the cardiac shaped secondary signal are nonlinearly combined to obtain a level difference between the microphones.