CZ304330B6

CZ304330B6 - Method of suppressing noise and accentuation of speech signal for cellular phone with two or more microphones

Info

Publication number: CZ304330B6
Application number: CZ2012-831A
Authority: CZ
Inventors: Zbyněk Koldovský; Petr Tichavský
Original assignee: Technická univerzita v Liberci; Ústav teorie informace a automatizace AV ČR, v.v.i.
Priority date: 2012-11-23
Filing date: 2012-11-23
Publication date: 2014-03-05
Also published as: CZ2012831A3

Abstract

The present invention relating to a method of suppressing noise and accentuation of speech signal for cellular phone with two or more microphones is characterized in that a bank (20) of speech suppressing filters is performed for the given cellular telephone in the production or calibration thereof. Each filter of the bank is designed for a concrete position of a speaker relative to the telephone such that output signal thereof has a maximum distance of noise from the speech. The speaker positions relative to the telephone for which the filters are derived in the bank, are selected such as to cover the most probable positions of the speaker. In order to suppress noise and to accentuate the speech signal during a call, the signals of the microphones are filtered in parallel by all the filter of the bank. At a given instant, an output signal is selected as a reference signal from a filter, the output variance of which is minimal. The speech signal is then accentuated by a focuser (40), while the noise is suppressed by an adaptive filter. For the given position of the speaker relative to the telephone, the speech-suppressing filter is derived from a noise-free record of the speaker. The filter is designed such as the variance of its output is minimal, provided the record is an input.

Description

Vynález se týká způsobu potlačení šumu a interferujících zvuků (dále jen „šumu“) a zvýraznění řečového signálu v mobilním telefonu se dvěma nebo více mikrofony. Tento způsob potlačení šumu je založen na využití předem naměřeného systému řeč-potlačuj ících filtrů.The present invention relates to a method of suppressing noise and interfering sounds (hereinafter referred to as "noise") and enhancing a speech signal in a mobile phone with two or more microphones. This method of noise suppression is based on the use of a pre-measured speech-suppression filter system.

Dosavadní stav technikyBACKGROUND OF THE INVENTION

Standardní metodou potlačování nežádoucího šumu pomocí mikrofonních polí je fokusování, v anglicky psané literatuře označované jako beamforming, kdy se hledá taková lineární kombinace výstupu mikrofonního pole, která maximalizuje poměr energie užitečného signálu a energie šumu. Toto fokusování je účinné v případě, že geometrie systému mikrofonní pole - zdroj signálu - zdroj šumu se nemění v čase, respektive nemění se v čase rychle, a současně jsou-li zdroje rušení spíše bodové. (Alternativou rušení pocházejícího z bodových zdrojů je šum difuzní, který přichází jakoby ze všech směrů současně.)The standard method of eliminating unwanted noise using microphone fields is to focus, in English literature referred to as beamforming, to search for a linear combination of microphone field output that maximizes the ratio of useful signal energy and noise energy. This focusing is effective when the geometry of the microphone field - signal source - noise source system does not change over time or does not change rapidly over time, and at the same time if the interference sources are rather point-like. (An alternative to point source noise is diffuse noise, which seems to come from all directions simultaneously.)

Příkladem potlačování nežádoucího šumu pomocí fokusování je metoda popsána v patentu US 2012245933 autorů Flaks a kolektiv z ledna 2010, která v sobě zahrnuje řadu technik a možností. Jistá podobnost s řešením zde navrženým je v patentovém nároku č. 17, kde je zmíněna možnost využití předem naměřených fokusovacích parametrů pro určitý počet vzájemných poloh zdroje užitečného signálu (úst řečníka) a mikrofonního pole.An example of unwanted noise suppression by focusing is the method described in US 2012245933 to Flaks et al., January 2010, which includes a variety of techniques and options. A certain similarity to the solution proposed herein is in claim 17, where the possibility of using predetermined focusing parameters for a number of relative positions of the useful signal source (speaker mouth) and the microphone field is mentioned.

Další standardní metodou potlačování nežádoucího šumu je použití Wienerova filtru v časofrekvenční oblasti. Tato metoda je vhodná pro odstraňování šumu difuzního typu. Je pro ni potřeba odhadnout okamžité frekvenční spektrum nežádoucího šumu. Tato metoda existuje v mnoha různých variantách, objevuje se například v práci S. F. Bolí, Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Trans. Acoustics, Speech and Signál Processing, vol. 27, pp. 113-120, 1979. Ve variantě nazvané dvojité spektrální odečítání se objevuje v patentu H. Gustafssonem, I Claessonem a S. Nordholmem pod číslem US 6 549 586, duben 1999. Jako jinou variantu spektrálního odečítání lze chápat algoritmus PLD (Power Level Differences), navrženou v práci M. Jeub a kolektiv, „Noise Reduction for Dual-Microphone Mobile Phones Exploiting Power Level Differences“, Proč. IEEE International Conference on Acoustics, Speech and Signál Processing (ICASSP), pp. 1693-1696, Kyoto, Japonsko, březen 2012.Another standard method of suppressing unwanted noise is to use a Wiener filter in the time-frequency domain. This method is suitable for diffuse type noise removal. It needs to estimate the instantaneous frequency spectrum of unwanted noise. This method exists in many different variations, such as appears in S. F. Bolí, Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 113-120, 1979. In a variant called double spectral subtraction, it appears in the patent by H. Gustafsson, I Claesson and S. Nordholm under US 6,549,586, April 1999. Another variant of spectral subtraction is the PLD (Power Level Differences) algorithm. , proposed by M. Jeub et al., " Noise Reduction for Dual-Microphone Mobile Phones Exploiting Power Level Differences, " IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1693-1696, Kyoto, Japan, March 2012.

V posledních dvou jmenovaných pracích se předpokládá, že akustický signál je snímaný dvěma mikrofony, přičemž jeden mikrofon je orientován směrem k ústům řečníka a druhý do prostoru. Metoda PLD je založena na předpokladu, že první mikrofon snímá převážně řečový signál a druhý mikrofon převážně nežádoucí šum.In the latter two works it is assumed that the acoustic signal is sensed by two microphones, one microphone oriented towards the mouth of the speaker and the other directed towards the space. The PLD method is based on the assumption that the first microphone captures mainly the speech signal and the second microphone captures unwanted noise.

Omezením této metody je, že řečový signál mluvčího je částečně přítomen i na vzdálenějším mikrofonu, a ačkoliv je zde typicky o 10 dB slabší, ovlivňuje to negativně přesnost odhadu spektra šumu a tím i účinnost separace užitečného signálu.A limitation of this method is that the speaker's speech signal is partially present on the distant microphone, and although it is typically 10 dB less, it negatively affects the accuracy of the noise spectrum estimation and hence the efficiency of the separation of the useful signal.

Podobná koncepce řešení mobilního telefonu s dvěma mikrofony se objevuje v novějším patentu H. J. W. Belta a kolektivu US 2007/0230712 Al z 11.8.2005. Zde se připouští, že účinnost spektrálního odečítání (Wienerova filtru) závisí na poloze mobilního telefonu vzhledem k ústům řečníka, a proto je navrženo doplnění zařízení o detektor polohy.A similar concept for a mobile phone with two microphones appears in a more recent patent by H. J. W. Belt et al., US 2007/0230712 A1 of August 11, 2005. Here, it is recognized that the efficiency of spectral readings (Wiener filter) depends on the position of the mobile phone relative to the mouth of the speaker, and therefore it is proposed to add a position detector to the device.

V patentové i další odborné literatuře existují i další metody potlačování šumu a zvýraznění řeči u systému s dvěma mikrofony, ale výše uvedené patenty a práce jsou ke zde navrženému řešeníThere are other methods of noise suppression and speech enhancement in the dual microphone system in the patent and other scholarly literature, but the above patents and works are for the solution proposed here.

- 1 CZ 304330 B6 koncepčně nejbližší. Z těch dalších prací je to například patent Hao Deng, a kol., US 2007/0165879 z 13.1.2007 (metoda založená na adaptivní filtraci) nebo článek Kai Li a kolektiv, Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitudě Estimation, IEICE Trans, on Information and Systems, vol. E95-D, pp. 1454-1464, 2012.Conceptually closest. Among those other works, for example, Hao Deng, et al., US 2007/0165879 of January 13, 2007 (adaptive filtering method) or Kai Li et al., Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation , IEICE Trans, on Information and Systems, vol. 1454-1464, 2012.

Cílem vynálezu je navrhnout způsob potlačení nežádoucího šumu, který je vhodný zejména pokud je tento šum difúzního charakteru, neboje zdroj bodový ale rychle měnící svoji polohu, nepotřebuje detektor polohy mobilního telefonu a má lepší kvalitu než dosud známá řešení.It is an object of the present invention to provide a method of suppressing unwanted noise, which is particularly suitable when the noise is diffuse in nature, or is a point source but rapidly changing its position, does not need a mobile phone position detector and is of better quality than known solutions.

Podstata vynálezuSUMMARY OF THE INVENTION

Podstata způsobu potlačení šumu a zvýraznění řečového signálu v mobilním telefonu se dvěma nebo více mikrofony spočívá v tom, že pro daný mobilní telefon (dále jen „telefon“) je při výrobě nebo před běžným používáním vytvořena banka 10 až 256 řeč-potlačujících filtrů délky 100 až 1000 (dále jen „banka“). Vstupem každého filtru z banky jsou signály z mikrofonů a výstupem je jeden signál. Každý filtr z banky je navržen pro konkrétní pozici nebo skupinu pozic řečníka vůči telefonu tak, aby jeho výstupní signál obsahoval pouze šum a co nejvíce potlačoval hlas řečníka. Pozice řečníka vůči telefonu, pro něž jsou filtry v bance odvozeny, jsou vybrány tak, aby dohromady pokrývaly nejbližší okolí telefonu a nejpravděpodobnější pozice řečníka vůči telefonu při běžném používání. Přitom se může brát v potah i poloha a tvar hlavy řečníka, poloha a tvar ruky, kterou řečník telefon přidržuje, a typický šum v prostředí, kde je telefon používán.The principle of the method of suppressing noise and enhancing the speech signal in a mobile phone with two or more microphones is that for a given mobile phone (hereinafter referred to as the "telephone") a bank of 10 to 256 speech-suppressing filters of 100 to 1000 ('the Bank'). The input of each filter from the bank is signals from microphones and the output is one signal. Each bank filter is designed for a specific or group of speaker-to-telephone positions so that its output signal contains only noise and suppresses the speaker's voice as much as possible. The speaker-to-telephone positions for which the filters in the bank are derived are selected to cover together the closest neighborhood of the telephone and the most likely speaker-to-telephone positions in normal use. The position and shape of the speaker head, the position and shape of the hand held by the speaker, and the typical noise in the environment where the telephone is used may also be taken into account.

Banka řeč-potlačujících filtrů se vytváří učícím procesem po zkonstruování prototypu telefonu. Banku si dále může upravit uživatel před běžným používáním za účelem kalibrace metody na uživatelův hlas, tvar hlavy a způsob držení telefonu a případně i na nejběžnější šum okolního prostředí. Pro danou polohu nebo skupinu poloh řečníka vůči telefonu, polohu a tvar jeho hlavy a polohu a tvar jeho ruky (dále jen „situace“) se filtr odvozuje tak, že se pomocí mikrofonů na telefonu pořídí záznam nebo více záznamů řečníka nebo řečníků v této situaci. Záznamy musí být takové, aby obsahovaly jen zanedbatelné množství šumu. Filtr je navržen tak, aby variance jeho výstupu, jsou-li jeho vstupem tyto záznamy, byla minimální a případně zároveň tak, aby měl propuštěný šum co nejméně pozměněné spektrum.A bank of speech-suppressing filters is created by the learning process after the phone prototype has been constructed. Furthermore, the bank can be customized by the user prior to normal use to calibrate the method to the user's voice, head shape and manner of holding the phone, and possibly to the most common environmental noise. For a given position or group of positions of the speaker relative to the telephone, the position and shape of his or her head and the position and shape of his hand (hereinafter referred to as the "situation"), the filter is derived by recording . The records shall be such that they contain only a negligible amount of noise. The filter is designed so that the variance of its output, if its input is these records, is minimal and, at the same time, so that the transmitted noise has the least altered spectrum.

K potlačení šumu a zvýraznění řečového signálu během hovoru jsou signály z mikrofonů paralelně filtrovány všemi filtry z banky a je měřena variance signálů na jejich výstupu. V daný okamžik je jako referenční signál šumu vybrán výstupní signál z toho filtru, jehož výstupní variance je minimální. Potlačení šumu a zvýraznění řečového signálu v signálu z daného mikrofonu nebo ze signálu, který je směsicí signálů z mikrofonů (dále jen „zaručený signál“), se provádí odečítáním šumu ze signálu, který je výstupem fokusovače, adaptivním Wienerovým filtrem. Jinou variantou odečítání je dvojité spektrální odečítání, navržené ve výše zmíněném patentu US 6 549 586. Třetí metodou odečítání je algoritmus PLD, navržený v práci Jeuba a kol., ICASSP 2012, rovněž citované výše. Konečně, u všech zmíněných metod spektrálního odečítání je možné poslechovou kvalitu vylepšit vyhlazováním (průměrováním) spektrogramu v oblasti vyšších frekvencí, jak je popsáno v práci P. Echt a P. Vary, Efficient musical noise suppression for speech enhancement systems, Proceedings of IEEE Int, Conference on Acoustics, Speech and Signál Processing, ICASSP, Taipei, Taiwan, 2009, pp. 4409 - 4412.To suppress the noise and enhance the speech signal during a call, the microphone signals are filtered in parallel by all bank filters and the signal variation at their output is measured. At the moment, the output signal from the filter whose output variance is minimal is selected as the noise reference signal. Noise suppression and enhancement of the speech signal in a given microphone signal or a signal that is a mixture of microphone signals (the "guaranteed signal") is accomplished by subtracting noise from the focusing output signal by an adaptive Wiener filter. Another variation of the subtraction is the double spectral subtraction proposed in the aforementioned U.S. Patent No. 6,549,586. A third subtraction method is the PLD algorithm proposed in Jeuba et al., ICASSP 2012, also cited above. Finally, for all of these spectral subtraction methods, listening quality can be improved by smoothing (averaging) the spectrogram at higher frequencies, as described in P. Echt and P. Vary, Effective musical noise suppression for speech enhancement systems, Proceedings of IEEE Int, Conference on Acoustics, Speech and Signal Processing, ICASSP, Taipei, Taiwan, 2009 4409 - 4412

Přehled obrázků na výkresechBRIEF DESCRIPTION OF THE DRAWINGS

Obr. 1 - blokové znázornění způsobu potlačení šumu a zvýraznění řečového signálu v mobilním telefonu, který může být použit v některém z provedení vynálezu.Giant. 1 is a block diagram of a method of suppressing noise and enhancing a speech signal in a mobile phone that may be used in any embodiment of the invention.

Obr. 2 - znázornění typických pozic, ze kterých jsou pořizovány čisté nahrávky řečníka pro výpočet a odvození banky řeč-potlačujících filtrů.Giant. 2 is a representation of typical positions from which pure speaker recordings are taken to calculate and derive a bank of speech-suppressing filters.

-2CZ 304330 B6-2GB 304330 B6

Příklady provedení vynálezuDETAILED DESCRIPTION OF THE INVENTION

Příklad 1: vytvoření banky řeč-potlačujících filtrůExample 1: Creating a Speech-Suppressing Filter Bank

Banka řeč-potlačujících filtrů se vytváří učícím procesem po zkonstruování prototypu telefonu. Nejprve se telefonem pořídí nahrávky mluvící osoby z různých pozic vůči telefonu. Počet nahrávek je 50, délka každé nahrávky je 5 vteřin a každá nahrávka obsahuje vyslovená slova nahrávané osoby „raz, dva, tři, čtyři, Pozice řečníka vůči telefonu jsou vybrány tak, aby dohromady pokrývaly nejbližší okolí telefonu a nejpravděpodobnější pozice řečníka vůči telefonu při běžném používání (viz Obr. 2). Nahrávky jsou pořízené v tichém prostředí, kde je úroveň šumu pod hladinou 40 dB.A bank of speech-suppressing filters is created by the learning process after the phone prototype has been constructed. First, the phone records the speaker from different positions in relation to the phone. The number of recordings is 50, the length of each record is 5 seconds, and each record contains the spoken words of the person to be recorded "one, two, three, four. Speaker positions relative to the phone are selected to cover together the closest neighborhood of the phone use (see Fig. 2). Recordings are made in a quiet environment where the noise level is below 40 dB.

Dále je pořízena nahrávka šumu délky 5 vteřin, který je typický pro prostředí pro použití telefonu.In addition, a 5-second noise recording is performed, typical of the phone's environment.

Pro každou pořízenou nahrávku řečníka se vypočte řeč-potlačující filtr, který je pro p-tou nahrávkou určen vektory g_pL a g_p,_R jež obsahují koeficienty filtru a mají každý délku 300, podle vzorce [gp,_Lg_P,_R 1 = argmin £ {|{g_L * x_pL}(n) + {g_R * x_{p >R} }(n)f +For each speaker recording, a speech-suppression filter is calculated, which is determined for the p-th recording by the vectors g _{pL and} g, _p , _R containing filter coefficients and each having a length of 300 according to the formula [gp, _L g _P , _R 1 = argmin £ {| {g _L x x _pL } (n) + {g _R x x _{p> R} } (n) f +

Bl'Er n e|{g_L * V_L}(n) + {g_R * y_R}(n) - y_L (n - D)|²}, kde x_p,[.(n) a x_P)R(n) označují vzorky p-té nahrávky čisté řeči, y_L(n) a VR(n) označují vzorky výše zmíněné nahrávky šumu, * značí operaci konvoluce, aje regularizační konstanta rovna 0,1 a D je celočíselná konstanta zpoždění šumu rovna 20. Účelem druhého členu ve vzorci je, aby řeč-potlačující filtr příliš neměnil spektrum šumu, který propouští. Úloha minimalizace se převede na soustavu lineárních rovnic s blokově toeplitzovskou maticí, která se rychle a úsporně vyřeší blokovým Levinson-Durbinovým algoritmem, který odvodil H. Akaike, „Block Toeplitz Matrix Inversion“, SIAM J. Appl. Math. 24 (2): 234-241, 1973.Bl'Er not | _{L * V g} _L (n) + g _{R} y _R (n) - y _L (n - D) | ² }, where x _p , [. (N) and x _{P) R} (n) denote samples of the pth recording of pure speech, y _L (n) and VR (n) denote samples of the above-mentioned noise recording, * denotes a convolution operation, and, the regularization constant is 0.1 and D is the integer noise delay constant equal to 20. The purpose of the second term in the formula is that the speech suppression filter does not change the spectrum of the noise it transmits too much. The task of minimization is transformed into a system of linear equations with a block toeplitz matrix, which is quickly and economically solved by a block Levinson-Durbin algorithm, derived by H. Akaike, "Block Toeplitz Matrix Inversion", SIAM J. Appl. Math. 24 (2): 234-241 (1973).

Vypočtené řeč-potlačující filtry, přesněji FFT transformace filtrů prodloužených o nuly na délku bloků, tvoří banku filtrů, která je uložena do paměti telefonu.Calculated speech-suppressing filters, more precisely the FFT transformation of zero-extended filters to block lengths, form a filter bank that is stored in the phone memory.

Příklad 2: jednodušší varianta vytvoření banky řeč-potlačujících filtrůExample 2: A simpler variant of creating a bank of speech-suppressing filters

Postup je stejný jako v příkladu 1 s tím rozdílem, že nahrávka šumu y_L(n) a yu(n) není potřeba a řeč-potlačující filtr se pro danou nahrávku řečníka počítá podle vzorce g_pL =argmin^|{g_L *x_pL}(n)-x_pR(n-D)|².The procedure is the same as in Example 1 except that the noise recording y _L (n) and yu (n) are not needed and the speech-suppression filter is calculated for the given speaker recording using the formula g _pL = argmin ^ | {g _L * x _pL } (n) - x _pR (nD) | ² .

bl _n V tomto případě jsou koeficienty g_p>R všechny nulové krom D-tého, který je roven -1 a g_p,_R je tedy stejný pro všechny řeč-potlačující filtry (pro všechna p). Úloha minimalizace se převede na soustavu lineárních rovnic a řeší se LevinsonDurbinovým algoritmem.bl _n In this case, the coefficients g _p> R are all zero except D-th, which is equal to -1 and g _p , so _R is the same for all speech-suppressing filters (for all p). The task of minimization is transformed into a system of linear equations and is solved by the Levinson Durbin algorithm.

Tato jednodušší varianta výpočtu řeč-potlačujících filtrů je výpočetně méně náročná a má menší paměťové nároky (do paměti stačí ukládat koeficienty g_p,L)· Neumožňuje však adaptaci na druh odstraňovaného šumu a potlačení signálu řečníka je slabší.This simpler version of calculation of speech-suppression filters is less computationally demanding and has less memory requirements (it is enough to store the coefficients g _p , L). However, it does not allow adaptation to the type of noise removed and the speaker signal suppression is weaker.

-3 CZ 304330 B6-3 CZ 304330 B6

Příklad 3: provedení způsobu potlačení šumu a zvýraznění řečového signáluExample 3: Performing a method of suppressing noise and enhancing a speech signal

Na obr. 1 je blokové znázornění způsobu potlačení šumu a zvýraznění řečového signálu v mobilním telefonu, který může být použit v některém z provedení vynález.Fig. 1 is a block diagram of a method of suppressing noise and enhancing a speech signal in a mobile phone that may be used in any embodiment of the invention.

Signály z mikrofonů vzorkované frekvenci 16 kHz x_L(ⁿ) ^a x_R(n), kde n je index vzorku, jsou nejprve transformovány rychlou okénkovou Diskrétní Fourierovou transformací (okénková FFT) 10, kde délka okna je 1024 vzorků a překryv oken je 50%. Blok (okénko) transformovaných signálů označujeme X_L(k) a X_R(k), kde k je index frekvenčního pásma.The signals from the microphones sampled at 16 kHz x _L ( ⁿ ) ^and x _R (n), where n is the sample index, are first transformed by a fast window Discrete Fourier Transform (window FFT) 10 where the window length is 1024 samples and the window overlay is 50 %. The block (window) of the transformed signals is denoted X _L (k) and X _R (k), where k is the frequency band index.

X_L(k) a X_R(k) jsou vstupem do banky 20 řeč-potlačuj ících filtrů, kde jsou paralelně filtrovány všemi filtry z banky. Koeficienty filtrů jsou načítány z paměti telefonu. Výstup p-tého filtru je počítán podle vzorce Zp(k)=Gpj(k)-X|(k)+G₍₎,_R(kpX_R(k), kde G_p,_L(k) a G_PiR(k) jsou koeficienty FFT transformace p-tého filtru.X _L (k) and X _R (k) are input to the bank 20 of speech-suppressing filters, where they are parallel filtered by all filters from the bank. The filter coefficients are read from the phone memory. The output of the p-th filter is calculated according to the formula Zp (k) = Gpj (k) -X | (k) + G ₍₎ , _R (kpX _R (k), where G _p , _L (k) and G _PiR (k) ) are the FFT transform coefficients of the p-th filter.

Volič 30 filtru vyhodnocuje variance výstupů řeč potlačujících filtrů. Variance p-tého filtruje ςμ^! · počítána podle vzorce k Výstup voliče 30 filtru značený Z(k) je výstup toho filtru, jehož variance je nejmenší. Je-li to p-tý filtr, pak Z(k)=Z_p(k).The filter selector 30 evaluates the output variations of the speech suppressing filters. The variance of the p-th filter ςμ ^! Calculated according to the formula k The output of the filter selector 30 denoted by Z (k) is the output of the filter whose variance is the smallest. If it is the p-th filter, then Z (k) = Z _p (k).

Paralelně k výpočtu signálu Z(k) probíhá ve fokusovači 40, jehož vstupem jsou signály X_L(k) a X_R(k), výpočet signálu X(k), který je vstupem do odečítače 50 šumu. Cílem je zvýšit odstup řečového signálu od šumu pomocí fokusovače 40, kterýje možné zvolit podle způsobu rozmístění mikrofonů na telefonu. Jsou-li mikrofony rozmístěny oba vpředu, je možné použít některý známý fokusovač 40, např. delay-and-sum beamformer nebo položit X(k) rovno signálu z mikrofonu, jehož variance je vyšší. V případě, že mikrofony jsou rozmístěny jeden vpředu (signál X_L(k)) a druhý vzadu (signál X_R(k)), pak je X(k) roven X_L(k).In parallel to the calculation of the signal Z (k), the focuser 40, which is inputted by the signals X _L (k) and X _R (k), calculates the signal X (k), which is the input to the noise subtractor 50. The aim is to increase the speech-to-noise ratio by means of a focuser 40 which can be selected according to the manner in which the microphones are placed on the telephone. If the microphones are positioned both in front, one of the known focusers 40 may be used, such as a delay-and-sum beamformer or set X (k) equal to the signal from the microphone whose variance is higher. If the microphones are placed one at the front (signal X _L (k)) and the other at the rear (signal X _R (k)), then X (k) is equal to X _L (k).

Odečítač 50 šumu odečítá signál Z(k) ze signálu X(k) a výstupem je signál S(k). Použitou meto|x(k)|² The noise reader 50 reads the Z (k) signal from the X (k) signal and outputs the S (k) signal. Metho | x (k) | used ²

S(k) = |x(k)|² +τ|ζ(Κ)|² S (k) = | x (k) | ² + τ (Κ) | ²

X(k), dou odečítání je adaptivní Wienerův filtr, realizovaný vzorcem kde τ je volitelný parametr rovný 10, kterým se řídí míra potlačení šumu. Pro zachování poslechové kvality signálu je vzorec použit jen pro hodnoty indexu k od 0 do K, kde K odpovídá frekvenci 3 kHz. Pro hodnoty k>K je potom S(k)= X(k).X (k), dou's subtraction is an adaptive Wiener filter, implemented by the formula where τ is an optional parameter equal to 10, which controls the amount of noise suppression. To maintain the listening quality of the signal, the formula is used only for index values k from 0 to K, where K corresponds to a frequency of 3 kHz. For values k> K then S (k) = X (k).

Inverzní FFT transformace 60 a metoda OLA převádí signál S(k) do časové oblasti s(k) pomocí inverzní FFT a metody overlap-add (OLA), která je popsaná např. v knize B. Porat, „A Course in Digital Signál Processing“, John Wiley & Sons, lne., 1997.The inverse FFT transformation 60 and the OLA method convert the S (k) signal into the time domain s (k) using the inverse FFT and the overlap-add (OLA) method, which is described, for example, in B. Porat, "A Course in Digital Signal Processing" , John Wiley & Sons, Inc., 1997.

Průmyslová využitelnostIndustrial applicability

Vynález je navržen pro implementaci v mobilních telefonech, které mají dva nebo více mikrofonů, jimiž lze snímat zvuk. Vynález má usnadnit telefonování v hlučném prostředí tím, že v přenášeném telefonním hovoru potlačuje hluk z okolí a zesiluje řečový signál volajícího.The invention is designed to be implemented in mobile phones having two or more microphones to capture sound. The invention is intended to facilitate telephony in noisy environments by suppressing environmental noise and amplifying the caller's speech signal in a transmitted telephone call.

Claims

A method for suppressing noise and enhancing a speech signal for a mobile phone with two or more microphones, characterized in that the noise component is estimated by means of a speech suppression filter bank (20) and a filter selector (30), wherein the selector (30) The filter of the filter always selects the output signal from the filter whose output variance is minimal, and the actual speech signal estimation is performed in the noise subtraction (50) by subtracting the estimated noise component from the focusing signal (40) by one of the known spectral subtraction methods. is a Wiener filter, or derived methods such as Power level difference.

The method of claim 1, wherein the speech suppression filter bank (20) is formed for an existing mobile phone prototype based on a set of speaker recordings holding the phone at various positions as expected in normal use of the phone, per g _p. , _L = argmin ^ | {g _L * x _pL } (n) - x _pR (nD) | ² by minimizing the expression ^gL n denotes samples of the fifth recording of pure speech from both microphones, n is a time index, * is a convolution operator, and D is a delay parameter.

The method of claim 1, wherein the speech suppression filter bank (20) is formed for an existing mobile phone prototype based on a set of speaker recordings holding the phone prototype at various positions as expected in normal use of the phone and audio. recording noise (noise) environment, which is assumed to be typical to use the phone, based on the minimization of the expression [g _P l g _L, R] = argmin £ {| {g _L * x _pL} (n) + { g _R * x _{p R} } (n) f +

Bl <Br ns | {Y _L * g} _L (n) + g _{R} * _R Y (n) - _yl (n - D) | ² }, where Xp, i (n) and x _p , R (n) denote samples of the pth record of pure speech by both microphones, y _L (n) and yR (n) denote samples of the aforementioned noise record, n is a time index, * is the convolution operator, D is the delay parameter and ε is the regularization constant.