FI110726B

FI110726B - Detection of voice activity

Info

Publication number: FI110726B
Application number: FI904410A
Authority: FI
Inventors: Daniel Kenneth Freeman; Ivan Boyd
Original assignee: British Telecomm
Priority date: 1988-03-11
Filing date: 1990-09-07
Publication date: 2003-03-14
Also published as: EP0335521B1; DK215690D0; NO982568L; DE68929442T2; WO1989008910A1; EP0548054B1; ES2047664T3; NZ228290A; HK135896A; AU3355489A; JPH03504283A; ES2188588T3; DK175478B1; KR0161258B1; NO903936D0; DE68910859T2; NO903936L; FI115328B; KR900700993A; CA1335003C

Abstract

The first aspect provides a voice activity detection appts. for receiving an input signal, estimating the noise signal component of the input signal and continually forming a measure M of the spectral similarity between a portion of the input signal and the noise signal. A circuit is provided to compare a parameter derived from the measure M with a threshold value T to produce an output to indicate the presence, or absence, of speech depending on whether, or not, that value is exceeded. A second aspect covers voice activity detection appts. which continually forms a spectral distortion measure and carries out a comparison.

Description

1 110726 Äänen aktiivisuuden ilmaisu - Detektion av röstaktivitet Äänen aktiivisuuden ilmaisin on laite, jolle syötetään signaali puhejaksojen tai vain kohinaa sisältävien jaksojen ilmaisemista varten. Vaikka esillä oleva kek-5 sintö ei rajoitu tähän, tällaisten ilmaisinten eräänä erikoisen mielenkiintoisena sovelluskohteena ovat matkaradiopuhelinjärjestelmät, joissa puhekooderi voi käyttää tietoa puheen esiintymisestä tai puuttumisesta parantamaan radio-spektrin hyväksikäyttöä ja joissa myös kohinataso (kulkuvälineeseen asennetusta yksiköstä) on todennäköisesti suuri.1 110726 Sound Activity Detection - Detection Av ustactivities The Sound Activity Detector is a device that is supplied with a signal for detecting speech or noise-only episodes. While not limited to the present invention, one particularly interesting application of such detectors is mobile radiotelephone systems where the speech encoder can use speech presence or absence information to improve radio spectrum utilization and where the noise level (from the unit mounted on the vehicle) is likely to be high.

10 Äänen aktiviisuuden ilmaisun olennaisena sisältönä on löytää mitta, joka eroaa selvästi puhejaksoilla ja puheettomilla jaksoilla. Puhekooderin sisältävässä laitteessa kooderin eri asteista voidaan saada helposti useita parametrejä ja tarvittavaa prosessointia on tämän vuoksi suotavaa vähentää käyttämällä 15 jotakin tällaista parametria. Monissa ympäristöissä pääkohinalähteet esiintyvät taajuusspektrin määrätyillä tunnetuilla alueilla. Esimerkiksi liikkuvassa autossa suuri osa kohinasta (esim. moottorin melu) keskittyy spektrin pien-taajuisille alueille. Kun tällaista tietoa kohinan spektriasemasta on käytettä- · vissä, päätös puheen esiintymisestä tai puuttumisesta on edullista perustaa 20 mittauksiin, jotka on suoritettu spektrin siinä osassa, joka sisältää suhteelli-··; sen vähän kohinaa. Käytännössä olisi luonnollisesti mahdollista suodattaa ·;;; signaali ennakolta ennen puheen aktiivisuuden ilmaisemiseksi suoritettua analyysia, mutta silloin kun äänen aktiivisuuden ilmaisin seuraa puhekooderin lähtöä, esisuodatus vääristäisi koodattavaa äänisignaalia.10 The essential content of the expression of voice activity is to find a measure that clearly differs between speech periods and speechless periods. In a device containing a speech encoder, various parameters can easily be obtained from different degrees of the encoder, and it is therefore desirable to reduce the processing required by using one of these parameters. In many environments, the main noise sources occur in certain known areas of the frequency spectrum. For example, in a moving car, much of the noise (e.g. engine noise) is concentrated in the low-frequency areas of the spectrum. When such information about the noise spectral position is available, it is advantageous to base the decision on the presence or absence of speech on the measurements made in that part of the spectrum that contains the relative ··; its a bit of noise. In practice, it would of course be possible to filter · ;;; the signal beforehand prior to analysis for detecting speech activity, but when the voice activity detector follows the output of the speech encoder, pre-filtering would distort the audio signal to be encoded.

2525

Keksintö kohdistuu siten patenttivaatimuksen 1 johdanto-osan mukaiseen äänen aktiivisuuden ilmaisinlaitteeseen, jolle on tunnusomaista se, että laite sisältää analyysivälineet tuottamaan sellaisen suotimen kertoimet, jonka ' spektrivaste on käänteinen mainituista kahdesta signaalista yhden taajuus- .... 30 spektrille; ja mitan muodostavat välineet kykenevät muodostamaan mitan M, ; .· joka on suhteessa sellaisen signaalin nollakertaluvun autokorrelaatioon (R’0), > · 2 110726 joka signaali on saatu suodattamalla mainituista kahdesta signaalista jäljelle jäävää signaalia suodattimena, jolla on mainitut kertoimet.The invention thus relates to a sound activity detector device according to the preamble of claim 1, characterized in that the device includes analysis means for producing coefficients of a filter which has an inverse spectral response to the frequency of one of said two signals; and the means for forming the dimension are capable of forming a dimension M 1; ., Which is proportional to the zero-order autocorrelation (R'0) of a signal,> 2,110,726, which signal is obtained by filtering the remaining signal of said two signals as a filter having said coefficients.

Mitta on edullisesti Itakura-Saito-vääristymämitta.Preferably, the measure is the Itakura-Saito distortion measure.

55

Keksintö kohdistuu myös menetelmään äänen aktiivisuuden ilmaisemiseksi sekä laitteeseen puhesignaalien koodaamiseksi ja matkapuhelinlaitteeseen. Näiden keksintöjen tunnusomaiset piirteet selviävät itsenäisistä vaatimuksista 15,16 ja 17.The invention also relates to a method for detecting voice activity, a device for coding speech signals and a mobile phone device. Characteristic features of these inventions are apparent from the independent claims 15, 16 and 17.

1010

Esillä olevan keksinnön muut muodot ovat patenttivaatimuksissa määritellyn mukaisia.Other embodiments of the present invention are as defined in the claims.

Keksinnön eräitä suoritusmuotoja selitetään seuraavassa esimerkkeinä ohei-15 siin piirustuksiin viitaten, joissa:Some embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:

Kuvio 1 on lohkokaavio keksinnön ensimmäisestä suoritusmuodosta.Figure 1 is a block diagram of a first embodiment of the invention.

' · * · ^ Kuvio 2 esittää keksinnön toista suoritusmuotoa.Fig. 2 shows another embodiment of the invention.

: 20 ' ·; Kuvio 3 esittää keksinnön kolmatta edullista suoritusmuotoa.: 20 '·; Figure 3 illustrates a third preferred embodiment of the invention.

!" Keksinnön ensimmäisen suoritusmuodon mukaisen ensimmäisen äänen aktii visuuden ilmaisimen perustana oleva yleinen periaate on seuraava.! "The general principle underlying the first tone activity detector according to the first embodiment of the invention is as follows.

2525

Kehyksestä, jossa on n signaalinäytettä (so, Si, S2, s3, S4... Sn-1), saadaan, kun se johdetaan neljännen kertaluvun äärellisen impulssivasteen (FIR) digitaalisen laskentasuotimen kautta, jonka impulssivaste on (1, ho, hi, h2, h3), ' tuloksena suodatettu signaali (kun näytteet aikaisemmista kehyksistä jäte- .... 30 tään huomiotta) s’ = (so), ; (si + hoSo), > 3 110726 (s2 + hosi + hiso), (S 3 + hoS2 + hlSl + h2So) (S 4 + hoS3 + hlS2 + h2Sl + hiso), (S 5 + hoS4 + hlS3 + h2S2 + h3Sl), (S 6 + hoS5 + hlS4 + h2S3 + h3S2)/ (S7 .··)A frame having n signal samples (i.e., S1, S2, s3, S4 ... Sn-1) is obtained when it is passed through a fourth order finite impulse response (FIR) digital computation filter having an impulse response of (1, ho, hi, h2, h3), 'the resulting filtered signal (when samples from previous frames are ignored) s' = (so),; (si + hoSo),> 3 110726 (s2 + hosi + Hiso), (S3 + hoS2 + h1Sl + h2So) (S4 + hoS3 + h1S2 + h2Sl + Hiso), (S5 + hoS4 + h1S3 + h2S2 + h3Sl), (S 6 + hoS5 + hlS4 + h2S3 + h3S2) / (S7. ··)

Kertaluvun nolla autokorrelaatiokerroin on termien neliösumma, joka voidaan normalisoida ts. jakaa termien kokonaislukumäärällä (kehysten ollessa vakiopituisia jakolasku on helpointa jättää pois). Suodatetun signaalin kerroin on siten n-1 R'o = (s'i)2 i = 0 ja tämä muodostaa siten mitan laskennallisen suodatetun signaalin s'- toisin sanoen laskentasuotimen päästökaistan sisälle osuvan signaalin s osan - teholle.The zero autocorrelation coefficient is a squared sum of terms that can be normalized, i.e. divided by an integer number of terms (for frames of constant length, the division is easiest to omit). The coefficient of the filtered signal is thus n-1 R'o = (s'i) 2 i = 0 and thus forms a measure of the power of a portion of the signal s that falls within the calculated bandwidth of the calculated filtered signal, that is, within the passband of the calculator.

Kun lauseke ratkaistaan, saadaan jätettäessä 4 ensimmäistä termiä huomiotta ”1. s’o ’ (S4 * »0*3 * »1*2 * V: * Vo)2 + (*5 * V»+ "Λ * V:* »3si)2 + ...When solving an expression, the first 4 terms are ignored '1. s'o '(S4 * »0 * 3 *» 1 * 2 * V: * Vo) 2 + (* 5 * V »+" Λ * V: * »3si) 2 + ...

* *4 * »0*4*3 + »1*4*2 * V4*l * »3*4*0 * "0*4*3 *4*0 * »0»1*3*2 * »0»2*3*1 * ty>3*3*0 * »1*4*2 * »0»iS3*2 + »1*2 * »1»2*2S1 + »AVa * ”2*4*1 * Vl*j*l * »lV2*l * »2*1 * !>2·»3*1*0 4 110726 + il3Vo + V3V0 + *1l‘13S2S0 + V3S Λ + h3S0 + · ' « 3 S0 (1 + h0+ "r h2' h3} + sx (2»0 + 2Vl + 2tllh2 + ^31 + R2 (2hj_ * »J&3 + 21^) + s3 (2¾ * a0n3) r (21l3) R'o voidaan siten saada autokorrelaatiokertoimien Ri yhdistelmästä painotettuina suluissa olevilla vakioilla, jotka määräävät taajuuskaistan, jossa kertoimen R'o arvo vaikuttaa. Suluissa olevat termit ovat itse asiassa laskentasuotimen impulssivasteen autokorrelaatiokertoimia, joten edellä esitetty lauseke voidaan yksinkertaistaa muotoon » »* * 4 * »0 * 4 * 3 +» 1 * 4 * 2 * V4 * l * »3 * 4 * 0 *» 0 * 4 * 3 * 4 * 0 * »0» 1 * 3 * 2 * » 0 »2 * 3 * 1 * ty> 3 * 3 * 0 *» 1 * 4 * 2 * »0» iS3 * 2 + »1 * 2 *» 1 »2 * 2S1 +» AVa * ”2 * 4 * 1 * Vl * j * l * »lV2 * l *» 2 * 1 *!> 2 · »3 * 1 * 0 4 110726 + il3Vo + V3V0 + * 1l'13S2S0 + V3S Λ + h3S0 + · '« 3 S0 (1 + h0 + "r h2 'h3} + sx (2» 0 + 2Vl + 2tllh2 + ^ 31 + R2 (2hj_ * »J & 3 + 21 ^) + s3 (2¾ * a0n3) r (21l3) R'o can thus be obtain the combination of autocorrelation coefficients Ri weighted by the constants in parentheses that determine the frequency band in which the value of the coefficient R'o is affected. The terms in parentheses are in fact the autocorrelation coefficients of the impulse response of the computational filter.

HB

;'i! *·ο * *oV *Σ!*α..............111 ··· i » 1 ♦ « 4 « missä N on suotimen kertaluku ja Hi ovat suotimen impulssi-,··. vasteen (normalisoimattomia) autokorrelaatikertoimia.; 'I! * · Ο * * oV * Σ! * Α .............. 111 ··· i »1 ♦« 4 «where N is the order of the filter and Hi is the impulse of the filter, ·· . response (non-normalized) autocorrelation coefficients.

Toisin sanoen signaalin suodatuksen vaikutusta signaalin autokorrelaatiokertoimiin voidaan simuloida muodostamalla (suodattamattoman) signaalin autokorrelaatiokertoimien painotettu summa käyttämällä impulssivastetta, joka vaaditulla suotimella olisi ollut.In other words, the effect of signal filtering on the signal autocorrelation coefficients can be simulated by generating a weighted sum of the (unfiltered) signal autocorrelation coefficients using the impulse response that the required filter would have had.

Suhteellisen yksinkertainen algoritmi, jossa käytetään vain vähän kertolaskutoimituksia, voi siten simuloida digitaalisen suotimen vaikutusta, jossa tarvitaan tyypillisesti sata kertaa 5 110726 tämä lukumäärä kertolaskutoimituksia.Thus, a relatively simple algorithm with little multiplication can simulate the effect of a digital filter, which typically requires 100 times this number of multiplications.

Suodatustoimitusta voidaan vaihtoehtoisesti tarkastella sen muotoisena spektrivertailuna, jossa signaalispektriä verrataan vertailuspektriin (laskentasuotimen vasteen käänteisarvoon). Koska laskentasuodin valitaan tässä sovelluksessa siten, että se approksimoi kohinaspektrin käänteisarvoa, tämä toimitus voidaan katsoa puhe- ja kohinaspektrien spektrivertailuksi ja siten kehitetty nollas autokorrelaatiokerroin (ts. käänteis-suodatetun signaalin energia) voidaan katsoa spektrien erilaisuuden mitaksi. Itakura-Saito-mittaa käytetään lineaaripredik-tiokoodauksessa LPC prediktorisuotimen ja tulospektrin välisen yhteensopivuuden arvioimiseksi ja se voidaan ilmaista eräässä muodossaAlternatively, the filtering operation can be viewed as a spectral comparison in its form comparing the signal spectrum with the reference spectrum (inverse of the response of the computation filter). Because the computational filter in this embodiment is chosen to approximate the inverse of the noise spectrum, this operation can be considered as a spectral comparison of speech and noise spectra and the resulting zero autocorrelation coefficient (i.e., inverse-filtered signal energy) can be considered as a measure of spectral difference. The Itakura-Saito measure is used in linear prediction coding to evaluate LPC predictor filter compatibility with the result spectrum and can be expressed in one form

NOF

M = Vo + ^ 8iAi' ui missä Ao jne. ovat LPC-parametrijoukon autokorrelaatioker-toimia. Havaitaan, että lauseke on hyvin samankaltainen kuin edellä johdettu riippuvuus ja kun muistetaan, että LPC-kertoimet ovat sellaisen FIR-suotimen tappeja, jolla on ··· tulosignaalin käänteinen spektrivaste, niin että LPC-kerroin- 1 > · · . " joukko on käänteisen LPC-suotimen impulssivaste, on ilmeistä, /:·. että Itakura-Saito-vääristymämitta on itse asiassa vain yhtälön 1 sellainen muoto, jossa suotimen vaste H on tulosignaalin pelkkiä napoja sisältävän mallin spektrimuodon käänteisarvo.M = Vo + ^ 8iAi 'ui where Ao, etc. are the autocorrelation coefficients of the LPC parameter set. It is found that the expression is very similar to the dependency derived above, and remembering that the LPC coefficients are the pins of an FIR filter having a ··· inverse spectral response of the input signal such that the LPC coefficient 1> · ·. "the set is the impulse response of the LPC inverse filter, it is obvious /: ·. that the Itakura-Saito distortion measure is in fact only a form of equation 1, where the filter response H is the inverse of the spectral form of the input-only model.

Itse asiassa on myös mahdollista muuntaa spektrit käyttämällä testispektrin LPC-kertoimia ja vertailuspektrin autokor-relaatiokertoimia erilaisen mitan saamiseksi spektrien saman-; laisuudelle.In fact, it is also possible to convert the spectra using LPC coefficients of the test spectrum and autocorrelation coefficients of the reference spectrum to obtain a different measure of the same spectra; particular aspect.

6 1107266 110726

Vector Quantisation", IEEE Trans on ASSP, Voi ASSP-28, No 5, lokakuu 1980.Vector Quantisation, "IEEE Trans on ASSP, Vol. ASSP-28, No. 5, October 1980.

Koska signaalikehyksillä on vain äärellinen pituus ja tietty lukumäärä termejä (N, missä N on suotimen kertaluku) jätetään ottamatta huomioon, edellä esitetty tulos on vain likiarvo. Se antaa kuitenkin hämmästyttävän hyvän ilmaisun puheen esiintymisestä tai puuttumisesta ja sitä voidaan siten käyttää mittana M puheen ilmaisussa. Ympäristössä, jossa kohinaspektri on hyvin tunnettu ja muuttumaton, on täysin mahdollista käyttää yksinkertaisesti kiinteitä kertoimia ho, hi jne. käänteisen kohina-suotimen mallintamiseksi.Since the signal frames have only a finite length and a certain number of terms (N, where N is the order of the filter) are ignored, the result above is only an approximation. However, it provides an astonishingly good indication of the presence or absence of speech and can thus be used as a measure of M speech expression. In an environment where the noise spectrum is well known and constant, it is perfectly possible to simply use fixed coefficients ho, hi, etc. to model the inverse noise filter.

Kuitenkin sellaista laitetta, joka voi adaptoitua erilaisiin kohinaympäristöihin, voidaan käyttää yleisemmin.However, a device that can adapt to different noise environments can be used more generally.

Kuten kuviosta 1 ilmenee, ensimmäisessä suoritusmuodossa mikrofonista (ei esitetty) tuleva signaali vastaanotetaan tulossa 1 ja muunnetaan digitaalisiksi näytteiksi s sopivalla : näytteenottotaajuudella analogia-digitaalimuuntimella 2. LPC- ·’· analyysiyksikkö 3 (sisältyy tunnetun tyyppiseen LPC-kooderiin) johtaa tällöin n (esim. 160) näytteen peräkkäisille kehyksille joukon N (esim. 8 tai 12) LPC-suodinkertoimia Li, jotka siirretään edustamaan tulevaa puhetta. Puhesignaali s syötetään « « ! myös korrelaattoriyksikölle 4 (sisältyy normaalisti osana LPC- < i kooderiin 3, koska myös puheen autokorrelaatiovektori Ri kehitetään LPC-analyysin yhtenä vaiheena, vaikka on selvää, että myös erillistä korrelaattoria voitaisiin käyttää). Korrelaattori 4 kehittää autokorrelaatiovektorin Ri, johon sisältyy nollakertaluvun korrelaatiokerroin Ro ja ainakin kaksi muuta autokorrelaatiokerrointa Ri, R2, R3. Ne syötetään tämän j>’ jälkeen kertojayksikölle 5.As shown in Figure 1, in the first embodiment, the signal from the microphone (not shown) is received at input 1 and converted to digital samples s at a suitable sampling rate by an analog-to-digital converter 2. The LPC · · · analysis unit 3 (included in LPC encoder of known type) 160) for sequential frames of the sample, a set of N (e.g., 8 or 12) LPC filter coefficients L1, which are transmitted to represent incoming speech. The speech signal s is input ««! also for the correlator unit 4 (normally included as part of the LPC-1 in the encoder 3, since the speech autocorrelation vector R1 is also developed as a step in the LPC analysis, although it is clear that a separate correlator could also be used). The correlator 4 generates an autocorrelation vector R1 which includes a zero order correlation coefficient R0 and at least two other autocorrelation coefficients R1, R2, R3. After this, they are fed to the multiplier unit 5.

>>

Toinen tulo 11 on kytketty toiseen mikrofoniin, joka on kaukana '* puhujasta siten, että tämä mikrofoni vastaanottaa vain tausta- 7 11072ο kohinaa. Tästä mikrofonista tuleva tulo muunnetaan AD-muuntimella 12 digitaaliseksi tulonäytejonoksi ja se LPC-analysoidaan toisella LPC-analysaattorilla 13. Analysaattorista 13 kehitetyt ”kohina,,-LPC-kertoimet johdetaan korrelaattori-yksikölle 14 ja siten kehitetty autokorrelaattorivektori kerrotaan termeittäin puhemikrofonista tulevan tulosignaalin autokorrelaatiokertoimien Ri kanssa kertojassa 5 ja siten kehitetyt painotetut kertoimet yhdistetään summaimessa 6 yhtälön 1 mukaan, jotta saataisiin suodinvaikutus, jolla on pelkkää kohinaa havaitsevan mikrofonin kohinaspektriin (joka on käytännössä sama kuin kohinaspektrin muoto signaalin ja kohinan vastaanottavassa mikrofonissa) nähden käänteinen muoto ja joka siten suodattaa pois suurimman osan kohinasta. Tuloksena olevaa mittaa M verrataan kynnysarvoon kynnysarvopiirissä 7 logiikka-lähdön 8 kehittämiseksi, joka ilmaisee puheen esiintymisen tai puuttumisen. Jos M on suuri, puheen katsotaan esiintyvän.The second input 11 is connected to a second microphone away from the speaker so that this microphone receives only background noise. The input from this microphone is converted by the AD converter 12 to a digital input sample queue and is LPC analyzed by another LPC analyzer 13. The "noise ,," generated from the analyzer 13 is fed to the correlator unit 14 and the resulting autocorrelator vector is multiplied by the term the weighted coefficients developed in the multiplier 5 and so combined in the adder 6 according to equation 1 to obtain a filter effect which is inverse to the noise spectrum of the noise-detecting microphone (which is virtually the same as the noise spectrum in the signal and noise receiving microphone) . The resulting dimension M is compared to a threshold value in a threshold circuit 7 to generate a logic output 8 which indicates the presence or absence of speech. If M is large, speech is considered to occur.

Tämä suoritusesimerkki vaatii kuitenkin kaksi mikrofonia ja kaksi LPC-analysaattoria, mikä lisää tarvittavan laitteiston kustannuksia ja monimutkaisuutta.However, this embodiment requires two microphones and two LPC analyzers, which increases the cost and complexity of the equipment required.

Toisessa suoritusmuodossa käytetään vaihtoehtoisesti vastaavaa mittaa, joka muodostettu käyttämällä kohinamikrofonista 11 saatuja autokorrelaatioita ja päämikrofonista 1 saatuja LPC-kertoimia, joten ylimääräisen LPC-analysaattorin sijasta tarvitaan ylimääräinen autokorrelaattori.Alternatively, in the second embodiment, a corresponding dimension formed using the autocorrelations from the noise microphone 11 and the LPC coefficients obtained from the main microphone 1 is used, so that an additional autocorrelator is required instead of an additional LPC analyzer.

Nämä suoritusmuodot voivat siten toimia erilaisissa ympäristöissä, joissa esiintyy kohinaa eri taajuuksilla, tai kohina-spektrin muuttuessa määrätyssä ympäristössä.These embodiments may thus operate in different environments where noise occurs at different frequencies or when the noise spectrum changes in a given environment.

Kuten kuviosta 2 ilmenee, keksinnön eräässä edullisessa suoritusmuodossa on puskuri 15, johon on tallennettu LPC-kerroin-joukko (tai joukon autokorrelaatiovektori), joka on johdettu mikrofonitulosta 1 sellaisen jakson aikana, joka on tunnistettu 3 1Ί 0726 "puheettomaksi" jaksoksi (ts. pelkäksi kohinajaksoksi). Näitä kertoimia käytetään tämän jälkeen mitan johtamiseksi käyttämällä yhtälöä 1, joka mitta myös tietenkin vastaa Itakura-Saito-vääristymämittaa, paitsi että tällöin käytetään yhtä tallennettua LPC-kerrointen kehystä, joka vastaa käänteisen kohinaspektrin approksimaatiota, eikä sen hetkistä LPC-kerrointen kehystä.As shown in Figure 2, in a preferred embodiment of the invention, there is a buffer 15 storing a set of LPC coefficients (or set of autocorrelation vectors) derived from microphone input 1 during a period recognized as a "speechless" period (i.e., merely). kohinajaksoksi). These coefficients are then used to derive the measure using equation 1, which of course also corresponds to the Itakura-Saito distortion measure, except that one stored LPC coefficient frame corresponding to the inverse noise spectrum approximation is used instead of the current LPC coefficient frame.

Analysaattorin 3 antama LPC-kerroinvektori johdetaan myös korrelaattorille 14, joka muodostaa LPC-kerroinvektorin autokorrelaatiovektorin. Kynnysarvopiirin 7 puhe/puheeton-lähtö ohjaa puskurimuistia 15 sillä tavalla, että puskuri säilyttää "puhekehysten" aikana "kohinan" autokorrelaatioker-toimet, mutta "kohinakehysten" aikana voidaan käyttää uutta LPC-kerrointen joukkoa puskurin päivittämiseksi, esimerkiksi monikkokytkimellä 16, jonka välityksellä korrelaattorin 14 lähdöt, joissa kussakin on autokorrelaatiokerroin, on kytketty puskuriin 15. On selvää, että korrelaattori 14 voitaisiin sijoittaa puskurin 15 jälkeen. Lisäksi puhe/puheeton-päätöstä kerrointen päivittämiseksi ei tarvitse tehdä lähdöstä 8, vaan se voitaisiin johtaa (ja edullisesti johdetaan) muulla tavalla.The LPC coefficient vector provided by the analyzer 3 is also provided to the correlator 14 which forms the autocorrelation vector of the LPC coefficient vector. The speech / speech output of the threshold circuit 7 controls the buffer memory 15 in such a way that the buffer retains "noise" autocorrelation actions during "speech frames", but a new set of LPC coefficients can be used during "noise frames" to update the buffer, e.g. the outputs, each of which has an autocorrelation coefficient, are coupled to the buffer 15. It is clear that the correlator 14 could be located after the buffer 15. Further, a speech / speechless decision to update the coefficients does not need to be made at output 8, but could be derived (and preferably derived) in some other way.

• ’·' Koska puheettomia jaksoja esiintyy usein, puskuriin tallennetut _ LPC-kertoimet tulevat päivitetyiksi ajoittain, niin että laite kykenee siten seuraamaan kohinaspektrin muutoksia. On selvää että tällainen puskurin päivitys saattaa olla tarpeen vain satunnaisesti tai se voi tapahtua vain kerran ilmaisimen toiminan alussa, jos (kuten usein on asianlaita) kohinaspektri on ajallisesti suhteellisen muuttumaton, mutta matkaradio-puheiinympäristossä usein tapahtuva päivitys on edullisempi.Because of the frequent occurrence of speechless periods, the _ LPC coefficients stored in the buffer are updated from time to time so that the device is able to monitor changes in the noise spectrum. It is clear that such a buffer update may be necessary only occasionally, or may occur only once at the start of the detector operation if (as is often the case) the noise spectrum is relatively unchanged over time, but in the mobile radio communication environment.

Tämän suoritusesimerkin eräässä muunnoksessa järjestelmä käyttää aluksi yhtälöä 1 kerrointermien vastatessa yksinkertaista kiinteää ylipäästösuodinta ja tämän jälkeen järjestelmä alkaa adaptoitua siirtymällä käyttämään "kohinajakson" LPC- 9 110726 kertoimia. Jos puheenilmaisu jostakin syystä epäonnistuu, järjestelmä voi palata käyttämään yksinkertaista ylipäästö-suodinta.In a variant of this embodiment, the system initially uses equation 1 as the coefficients correspond to a simple fixed high pass filter, and then the system begins to adapt by switching to the "noise period" LPC-9 110726 coefficients. If speech detection fails for any reason, the system may revert to using a simple high-pass filter.

Edellä esitetty mitta voidaan normalisoida jakamalla arvolla Ro, niin että lauseke, jota verrataan kynnysarvoon, on muodoltaan N RjA, M = A0 + 2Σ- 1=1 Rq Tämä mitta on riippumaton kehyksen kokonaissignaalienergiasta ja kokonaissignaalitason muutokset on siten kompensoitu siinä, mutta se antaa heikomman kontrastin "kohina-" ja "puhetasojen" välillä ja sitä ei tämän vuoksi edullisimmin käytetä erittäin häiriöllisissä ympäristöissä.The above dimension can be normalized by dividing by Ro so that the expression compared to the threshold is of the form N RjA, M = A0 + 2Σ-1 = 1 Rq This dimension is independent of the total signal energy of the frame and thus compensated for, but gives a weaker signal. contrast between "noise" and "speech levels" and is therefore most advantageously not used in highly disturbed environments.

Sen sijaan että käytettäisiin LPC-analyysiä kohinasignaalin känteisen suotimen kertoimien johtamiseen (joko kohinamikro-fonista tai pelkkää kohinaa sisältävistä jaksoista, kuten edellä selitetyissä eri esimerkeissä), käänteinen kohinaspektri • . on mahdollista mallintaa käyttämällä tunnetun tyyppistä adaptiivista suodinta. Koska kohinaspektri muuttuu vain hitaasti (kuten seuraavassa selitetään), tällaisissa suotimissa tavallinen suhteellisen hidas kertoimien adaptoitumisnopeus voidaan hyväksyä. Eräässä suoritusmuodossa, joka vastaa kuviota 1, LPC-analyysiyksikkö 13 korvataan yksinkertaisesti adaptii-: : visella suotimella (esimerkiksi FIR-poikittaissuotimella tai verkkosuotimella), joka on kytketty siten, että se tekee tulevan kohinan valkoisemmaksi mallintamalla käänteistä suodinta, ja sen kertoimet syötetään kuten edellä autokorre-. laattorille 14.Instead of using LPC analysis to derive the inverse filter coefficients of the noise signal (either from noise microphone or from noise-only sequences, as in the various examples described above), the inverse noise spectrum •. it is possible to model using a known type of adaptive filter. Because the noise spectrum changes only slowly (as will be explained below), the usual relatively slow coefficient adaptation rate for such filters can be accepted. In one embodiment corresponding to Figure 1, the LPC analysis unit 13 is simply replaced by an adaptive filter (e.g., an FIR transverse filter or a network filter) coupled to render incoming noise whiter by modeling the inverse filter, and its coefficients are input as above. autocorrelation function. to the stand 14.

’·;·* Eräässä toisessa suoritusmuodossa, joka vastaa kuvion 2 10 110726 suoritusmuotoa, LPC-analyysiväline 3 on korvattu tällaisella adaptiivisella suotimella, ja puskuriväline 15 jätetään pois, mutta kytkin 16 toimii siten, että se estää adaptiivista suodinta adaptoimasta kertoimiaan puhejaksojen aikana.In another embodiment corresponding to the embodiment of FIG. 2 10 110726, the LPC analysis means 3 is replaced by such an adaptive filter, and the buffer means 15 is omitted, but the switch 16 operates to prevent the adaptive filter from adapting its coefficients during speech periods.

Seuraavassa selitetään toista äänen aktiivisuuden ilmaisinta, joka on tarkoitettu käytettäväksi keksinnön erään toisen suoritusmuodon yhteydessä.Hereinafter, another voice activity detector for use with another embodiment of the invention will be described.

Edellä olevan perusteella on selvää, että LPC-kerroinvektori on yksinkertaisesti sellaisen FIR-suotimen impulssivaste, jonka vaste approksimoi tulosignaalin käänteistä spektrimuotoa. Kun muodostetaan viereisten kehysten välinen Itakura-Saito-vääristymämitta, tämä on itse asiassa yhtä suuri kuin signaalin teho edellisen kehyksen LPC-suotimen suodattamana. Siten jos viereisten kehysten spektrit erovat vähän, vastaava pieni päärä kehyksen spektritehosta jää suodattamatta ja mitta on pieni. Vastaavasti kehysten välinen suuri ero kehittää suuren Itakura-Saito-vääristymämitan, niin että mitta kuvastaa vierekkäisten kehysten spektraalista samankaltaisuutta. Puhekooderissa on toivottavaa minimoida datataajuus, joten kehyksen pituus tehdään niin suureksi kuin mahdollista. Toisin sanoen jos kehyksen pituus on tarpeeksi suuri, tällöin puhesignaalissa olisi esiinnyttävä huomattava spektrimuutos kehysten välillä :* (jos näin ei ole, kyseessä on ylimääräkoodaus). Kohinalla on toisaalta spektrimuoto, joka vaihtelee hitaasti kehyksestä toiseen, ja siten jaksolla, jossa signaalissa ei esiinny puhetta, Itakura-Saito-vääristymämitta on siten vastaavasti pieni - koska aikaisemman kehyksen Käänteisen LPC-suotimen käyttäminen "suodattaa pois" suurimman osan kohinatehosta.It is clear from the above that the LPC coefficient vector is simply the impulse response of an FIR filter whose response approximates the inverse spectral form of the input signal. When generating an Itakura-Saito distortion measure between adjacent frames, this is actually equal to the signal power filtered by the LPC filter of the previous frame. Thus, if the spectra of adjacent frames differ slightly, the corresponding small amount of spectral power of the frame is not filtered and the dimension is small. Similarly, the large difference between frames produces a large Itakura-Saito distortion measure so that the measure reflects the spectral similarity of the adjacent frames. In a speech encoder, it is desirable to minimize the data frequency so that the frame length is made as large as possible. In other words, if the length of the frame is large enough, there should be a significant spectral shift between frames in the speech signal: * (if not, this is an excess coding). On the other hand, noise has a spectral form that varies slowly from frame to frame, and thus, during a period of no speech, the Itakura-Saito distortion measure is correspondingly small - since using the reverse frame reverse LPC filter "filters out" most of the noise power.

Itakura-Saito-vääristymämitta ajoittaista puhetta sisältävän kohinaisen signaalin vierekkäisten kehysten välillä on tyypillisesti suurempi puhejaksojen aikana kuin kohinajaksojen aikana. Vaihtelun aste (standardipoikkeaman kuvaamana) on myös 11 110726 suurempi ja vähemmän ajoittain vaihteleva.The Itakura-Saito distortion measure between adjacent frames of an intermittent speech-containing noise signal is typically greater during speech periods than during noise periods. The degree of variation (as described by the standard deviation) is also greater than 11,110,726 and less variable at times.

On huomattava, että mitan M standardipoikkeaman standardi-poikkeama on myös luotettava mitta. Kunkin standardipoikkeaman muodostamisen vaikutus itse asiassa tasoittaa mittaa.Note that the standard deviation of the standard deviation of dimension M is also a reliable measure. The effect of generating each standard deviation is in fact smoothing the measure.

Tässä äänen aktiivisuuden ilmaisimen toisessa muodossa mitattu parametri, jota käytetään päätettäessä esiintyykö puhetta, on edullisesti Itakura-Saito-vääristymämitan standardipoikkeama, mutta myös muita vaihtelumittoja ja muita spektrivääristymän mittoja (jotka perustuvat esimerkiksi FFT-analyysiin) voitaisiin käyttää.The parameter measured here in another form of the voice activity detector used to determine whether speech is present is preferably the standard deviation of the Itakura-Saito distortion measure, but other measures of variation and other spectral distortion measurements (based, for example, on FFT analysis) could be used.

Adaptiivisen kynnyksen käyttö äänen aktiivisuuden ilmaisussa on havaittu edulliseksi. Tällaisia kynnyksiä ei saa asetella puhejaksojen aikana tai muuten puhesignaali tulee leikatuksi. Kynnyksenadaptointipiiriä on tämän vuoksi ohjattava käyttämällä puhe/puheeton-ohjaussignaalia ja tämän ohjaussignaalin tulisi edullisesti olla kynnyksenadaptointipiirin lähdöstä riippumaton .The use of an adaptive threshold for detecting voice activity has been found to be advantageous. Such thresholds must not be set during speech periods or else the speech signal will be cut. Therefore, the threshold adaptation circuit must be controlled using a speech / speech control signal, and this control signal should preferably be independent of the output of the threshold adaptation circuit.

Kynnys T asetellaan adaptiivisesti siten, että kynnysarvo , pidetään juuri mitan M tason yläpuolella pelkän kohinan ’· ; esiintyessä. Koska mitta vaihtelee yleensä satunnaisesti : kohinan esiintyessä, kynnystä muutetaan määräämällä kes- kimääräinen taso useiden lohkojen aikana ja kynnys asetetaan tähän keskiarvoon verrannolliselle tasolle. Tämä ei kuitenkaan yleensä riitä kohinaisessa ympäristössä ja siten myös parametrin vaihtelun asteen määritys useiden lohkojen ajalta otetaan myös huomioon.The threshold T is adaptively set such that the threshold, just above the level of dimension M, is kept by noise only; occurs. Because the dimension generally varies randomly: when noise occurs, the threshold is changed by determining the average level over several blocks, and the threshold is set at a level proportional to this average. However, this is usually not sufficient in a noisy environment and thus the determination of the degree of parameter variation over several blocks is also considered.

Kynnysarvo T lasketaan siten edullisesti seuraavan lausekkeen mukaan [· T = M' + K.d 12 110726 missä M' on mitan keskiarvo useiden peräkkäisten kehysten yli, d on mitan standardipoikkeama näiden kehysten aikana ja K on vakio (joka voi olla tyypillisesti 2).The threshold value T is thus preferably calculated according to the following expression [· T = M ′ + K.d 12 110726 where M ′ is the average of the measure over several successive frames, d is the standard deviation of the measure during these frames and K is a constant (typically 2).

Käytännössä on edullista, että adaptoimista ei aloiteta uudelleen välittömästi sen jälkeen, kun puheen on ilmaistu puuttuvan, vaan että odotetaan sen varmistamiseksi, että pudotus on stabiili (jotta vältettäisiin nopea toistuva kytkentä adaptoituvan ja ei-adaptoituvan tilan välillä).In practice, it is preferable that the adaptation is not restarted immediately after the speech is detected, but is expected to ensure that the drop is stable (to avoid a rapid repetitive switching between the adaptive and non-adaptive modes).

Kuten kuviosta 3 ilmenee, edellä mainitut piirteet sisältävässä keksinnön edullisessa suoritusmuodossa tulo 1 vastaanottaa signaalin, josta on otettu näytteitä ja joka on muunnettu digitaaliseksi analogia-digitaalimuuntimen (ADC) 2 avulla ja signaali syötetään käänteisen suotimen analysaattorin 3 tuloon, joka käytännössä kuuluu osana siihen puhekooderiin, jonka kanssa äänen aktiivisuuden ilmaisimen on tarkoitus toimia ja joka kehittää tulosignaalispektrin käänteisarvoa vastaavan suotimen kertoimet Li (tyypillisesti 8). Digitalisoitu signaali syötetään myös autokorrelaattorille 4 (joka sisältyy osana analysaattoriin 3), joka kehittää tulosignaalin autokorrelaa- • » tiovektorin Ri (tai ainakin yhtä monta kertaluvultaan alempaa termiä kuin LPC-kertoimia on). Laitteen näiden osien toiminta on kuvioissa 1 ja 2 selitetyn mukainen. Tällöin muodostetaan ;· edullisesti autokorrelaatiokertoimien Ri keskiarvot useiden peräkkäisten puhekehysten yli (pituus tyypillisesti 5-20 ms) * niiden luotettavuuden parantamiseksi. Tämä voidaan saada aikaan tallentamalla jokainen autokorrelaattorin 4 antama autokorrelaatiokertoimien joukko puskuriin 4a ja käyttämällä keskiar-vonmuodostajaa 4b sen hetkisten autokorrelaatiokertoimien Ri ja puskuriin 4a tallennettujen ja sieltä syötettyjen aikaisempien kehysten kertoimien painotetun summan muodostamiseksi. Siten johdetut keskimääräiset autokorrelaatiokertoimet Rai syötetään painotus- ja summausvälineil1 e 5, 6, jotka vastaanottavat myös tallennetut kohinajakson käänteisen suotimen suodinkertoimien ia 110726As shown in Fig. 3, in a preferred embodiment of the invention having the above features, the input 1 receives a sampled signal converted to digital by an analog-to-digital converter (ADC) 2 and is input to the input of the inverse filter analyzer 3, with which the sound activity detector is intended to operate, and which generates the coefficients Li (typically 8) of the filter corresponding to the inverse of the input signal spectrum. The digitized signal is also fed to an autocorrelator 4 (included as part of the analyzer 3), which generates an autocorrelation vector Ri of the input signal (or at least as many orders of magnitude lower than the LPC coefficients). The operation of these parts of the device is as described in Figures 1 and 2. · Average values of the autocorrelation coefficients R1 over a plurality of consecutive speech frames (typically 5 to 20 msec) * to improve their reliability are generated. This can be accomplished by storing each set of autocorrelation coefficients provided by the autocorrelator 4 in buffer 4a and using the averaging factor 4b to form a weighted sum of the current autocorrelation coefficients R 1 and the coefficients of previous frames stored in and input to the buffer 4a. The thus derived average autocorrelation coefficients Rai are fed by weighting and summing means 5, 6, which also receive the stored noise period inverse filter filter coefficients and 110726

Li autokorrelaatiovektorin Ai autokorrelaattorilta 14 puskurin 15 kautta ja jotka muodostavat arvoista Rai ja Ai mitan M, joka on edullisesti määritelty seuraavasti: S * &0 ΜΑ·Li from the autocorrelator vector Ai from the autocorrelator 14 via buffer 15 and forming a dimension M of Rai and Ai, preferably defined as: S * & 0 ΜΑ ·

Ro Tätä mittaa verrataan tämän jälkeen kynnystasoon kynnys-arvopiirissä 7 ja looginen tulos antaa ilmaisun puheen esiintymisestä tai puuttumisesta lähdöstä 8.Ro This measure is then compared to the threshold level in the threshold circuit 7 and the logical result gives an indication of the presence or absence of speech at the output 8.

Jotta käänteisen suotimen kertoimet Li vastaisivat kohtuullista estimaattia kohinaspektrin käänteisarvosta, nämä kertoimet on suotavaa päivittää kohinajaksojen aikana (ja tietenkin olla päivittämättä puhejaksojen aikana). On kuitenkin edullista, että puhe/puheeton-päätös, johon päivitys perustuu, ei riipu päivityksen tuloksesta tai muuten yksi ainoa väärin tunnistettu signaalikehys voi aiheuttaa äänen aktiivisuuden ilmaisimen "lukituksen katoamisen" tämän jälkeen ja seuraavien kehysten ·. virheellisen tunnistamisen. Tämän vuoksi on edullista käyttää ohjaussignaalinkehityspiiriä 20, joka on itse asiassa erillinen äänen aktiivisuuden ilmaisin, joka muodostaa riippumattoman ' ; ohjaussignaalin, joka osoittaa puheen esiintymisen tai ; puuttumisen, käänteisen suotimen analysaattorin 3 (eli puskurin 8) ohjaamiseksi, niin että mitan M muodostamiseen käytettyjä käänteisen suotimen autokorrelaatiokertoimia Ai päivitetään vain "kohinajaksojen" aikana. Ohjaussignaalinkehityspiiri 20 sisältää LPC-analysaattorin 21 (joka myös voi kuulua osana puhekooderiin ja jonka erikoisesti voi toteuttaa analysaattori 3), joka kehittää tulosignaalia vastaavan LPC-kerrointen Hi joukon, ja autokorrelaattorin 21a (jonka voi toteuttaa auto-korrelaattori 3a), joka johtaa kertoimien Mi autokorrelaatio-kertoimet Bi. Jos analysaattorin 21 toteuttaa analysaattori 3, tällöin Mi = Li ja Bi = Ai. Nämä autokorrelaatiokertoimet 110726 14 syötetään tällöin painotus- ja summausvälineille 22, 23 (vastaavat elimiä 5, 6), jotka vastaanottavat myös tulosig-naalin autokorrelaatiovektorin Ri autokorrelaattorilta 4. Siten j lasketaan mitta spektraaliselle samankaltaisuudelle tulevan I puhekehyksen ja edellisen puhekehyksen välillä. Tämä mitta voi olla Itakura-Saito-vääristymämitta sen hetkisen kehyksen kertoimien Ri ja edellisen kehyksen kertoimien Bi välillä, kuten edellä on esitetty, tai se voidaan sen sijaan johtaa laskemalla Itakura-Saito-vääristymämitta sen hetkisen kehyksen kertoimille Ri ja Bi ja vähentämällä (vähennyslaskuelimessä 25) puskuriin 24 tallennettu vastaava aikaisempi mitta spektri-erosignaalin kehittämiseksi (kummassakin tapauksessa mitan energia normalisoidaan jakamalla arvolla Ro). Tämän jälkeen puskuri 24 luonnollisesti päivitetään. Tämä spektrierosignaali muodostaa edellä selitetyn mukaisen kynnysarvopiirissä 26 suoritetun kynnysarvovertailun jälkeen ilmaisimen puheen esiintymiselle tai puuttumiselle. Olemme kuitenkin havainneet, että vaikka tämä mitta on erinomainen kohinan erottamiseksi ääntiöttömästä puheesta (tehtävä, johon tunnetut järjestelmät , , eivät yleensä pysty), se kykenee yleensä jonkin verran « · huonommin erottamaan kohinan ääntiöl 1 isestä puheesta. Tämän * ' 1 · I · mukaisesti piirissä 20 käytetään edullisesti lisäksi ääntiöl-lisen puheen ilmaisinpiiriä, jossa on äänenkorkeuden analy- t ·' saattori 27 (joka voi käytännössä toimia puhekooderin osana ja ' voi erikoisesti mitata monipulssi-LPC-kooderissa syntyvää pitkäaikaista prediktorin viivearvoa). Äänenkorkeuden analysaattori 27 kehittää loogisen signaalin, joka on "tosi", kun ääntiöllinen puhe havaitaan, ja tämä signaali johdetaan yhdessä kynnysarvopiiri1tä 26 johdetun kynnysarvovertai11 un mitan kanssa (joka on yleensä "tosi" ääntiöttömän puheen esiintyessä) TAI-EI-portin 28 tuloihin signaalin kehittämiseksi, joka on "epätosi" puheen esiintyessä ja "tosi" kohinan esiintyessä.In order for the inverse filter coefficients L i to correspond to a reasonable estimate of the inverse of the noise spectrum, it is desirable to update these coefficients during the noise periods (and of course not to update during the speech periods). However, it is preferred that the speech / speechless decision upon which the update is based does not depend on the result of the update or otherwise a single misidentified signal frame may cause the "activity lock" indicator of the audio activity thereafter and the following frames ·. false identification. Therefore, it is preferable to use a control signal generating circuit 20 which is in fact a separate voice activity detector which provides an independent '; a control signal indicating the presence of speech or; absence, to control the inverse filter analyzer 3 (i.e., buffer 8) so that the inverse filter autocorrelation coefficients Ai used to form the dimension M are updated only during "noise periods". The control signal generating circuit 20 includes an LPC analyzer 21 (which may also be part of a speech coder and may be implemented in particular by analyzer 3) generating a set of LPC coefficients Hi corresponding to the input signal, and an autocorrelator 21a (implemented by auto correlator 3a) autocorrelation coefficients Bi. If the analyzer 21 is implemented by the analyzer 3, then Mi = Li and Bi = Ai. These autocorrelation coefficients 110726 14 are then applied to the weighting and summing means 22, 23 (corresponding to the elements 5, 6), which also receive the result signal autocorrelator vector R1 from the autocorrelator 4. Thus, a measure of spectral similarity between the incoming speech frame I and the previous speech frame. This measure may be the Itakura-Saito distortion measure between the current frame coefficients Ri and the previous frame coefficients Bi, as described above, or it may instead be derived by calculating the Itakura-Saito distortion measure for the current frame coefficients Ri and Bi and subtracting (in the subtractor 25 a corresponding previous measure stored in buffer 24 to generate a spectrum difference signal (in each case the energy of the measure is normalized by dividing by Ro). The buffer 24 is then naturally updated. This spectral difference signal, after the threshold comparison performed in the threshold circuit 26 described above, provides a detector for the presence or absence of speech. However, we have found that, although this measure is excellent for distinguishing noise from silent speech (a task known systems do not usually), it is generally somewhat less capable of distinguishing noise from verbal speech. In accordance with this * '1 · I ·, the circuit 20 preferably further utilizes an audio speech detector circuit having a pitch analyzer 27 (which may in practice function as part of the speech encoder and' specifically measure the long-term predictor delay value generated in the multipulse LPC encoder ). The pitch analyzer 27 generates a logical signal that is "true" when the voiced speech is detected, and this signal is coupled with a threshold value derived from the threshold circuit 26 (which is usually "true" in the presence of silent speech) to the inputs of the OR port 28. , which is "false" when speech occurs and "true" noise occurs.

Tämä signaali syötetään puskuriin 8 (tai käänteisen suotimen analysaattorille 3), niin että käänteisen suotimen kertoimia Li > ' päivitetään vain kohinajaksojen aikana.This signal is supplied to buffer 8 (or inverse filter analyzer 3) so that the inverse filter coefficients Li> 'are updated only during noise periods.

» ‘ » 15 110726»'» 15 110726

Kynnyksenadaptointipiiri 29 on myös kytketty vastaanottamaan ohjaussignaaligeneraattoripiirin 20 puheettoman signaalin ohjauslähdön. Kynnyksenadaptointipiirin 29 lähtö syötetään kyn-nysarvopiiri11 e 7. Kynnyksenadaptointipiiri suurentaa tai pienentää kynnystä portaissa, jotka ovat verrannollisia sen hetkiseen kynnysarvoon, kunnes kynnys approksimoi kohinateho-tasoa (joka voidaan käytännöllisesti johtaa esimerkiksi painotus- ja summauspiireistä 22, 23).The threshold adapting circuit 29 is also coupled to receive a speech output control of a control signal generator circuit 20. The output of the threshold adaptation circuit 29 is supplied by a threshold circuit 11. The threshold adaptation circuit increases or decreases the threshold in steps proportional to the current threshold value until the threshold approximates the noise power level (which can be practically derived from the weighting and summing circuit 23).

Kun tulosignaali on erittäin alhainen, saattaa olla edullista, ! että kynnys asetetaan automaattisesti kiinteään alhaiseen tasoon, koska analogia-digitaalimuuntimen 2 synnyttämä signaalin kvantisointivaikutus saattaa aiheuttaa alhaisilla signaalitasoilla epäluotettavia tuloksia.When the input signal is very low, it may be advantageous! that the threshold is automatically set to a fixed low level, since the signal quantization effect generated by the analog-to-digital converter 2 may produce unreliable results at low signal levels.

Lisäksi voidaan käyttää "ylityksen" kehittäviä välineitä 30, jotka mittaavat puheilmaisujen kestot kynnysarvopiirin 7 jälkeen, ja kun puheen esiintymisen on ilmaistu jatkuvan ennaltamäärätyn aikavakion yli, lähtö pidetään ylemmässä tilassa lyhyen "ylitysjakson" ajan. Tällä tavalla vältetään ' : pientasoisten puhepurskeiden leikkautuminen keskeltä ja ‘ "i aikavakion oikea valinta estää ylitysgeneraattorin 30 liipaisun ·· lyhyiden, virheellisesti puheeksi ilmaistujen kohinapiikkien vaikutuksesta.Additionally, "overrun" generating means 30 can be used to measure the duration of speech detection after threshold circuit 7, and when the occurrence of speech is detected over a predetermined time constant, the output is maintained in a higher state for a short "overrun" period. This avoids': clipping of low-level speech bursts in the middle, and '' right selection of the time constant prevents triggering of the crossing generator 30 by the presence of short, incorrectly pronounced noise peaks.

♦ t • On luonnollisesti selvää, että kaikki edellä mainitut toiminnat voi suorittaa yksi sopivasti ohjelmoitu digitaalinen prosesso-riväline, kuten digitaalinen signaalinkäsittelypiiri (DSP), joka on siten toteutetun LPC-koodekin osana (tämä on parhaana pidetty toteutus), tai sopivasti ohjelmoitu mikrotietokone tai mikrokontrolleripiiri siihen liittyvine muistilaitteineen.♦ t • It is, of course, clear that all of the above functions can be performed by one suitably programmed digital processing device, such as a digital signal processing circuit (DSP), which is part of the implemented LPC codec (this is preferred), or microcontroller circuit with associated memory devices.

·. · Kuten edellä on selitetty, äänen ilmaisulaite voidaan käytän nöllisesti toteuttaa LPC-koodekin osana. Vaihtoehtoisesti kun signaalin autokorrelaatiokertoimet tai niihin liittyvät mitat 110726 16 (osittaiskorrelaatio eli "parcor"-kertoimet) lähetetään etäällä olevalle asemalle äänen ilmaisu voi tapahtua kaukana koode-kista.·. · As explained above, the audio detector can be practically implemented as part of an LPC codec. Alternatively, when the autocorrelation coefficients of a signal, or related dimensions 110726 16 (partial correlation, or "parcor" coefficients), are transmitted to a remote station, voice detection can occur far from the codec.

Claims

A voice activity detector apparatus comprising: (i) means (1) for receiving a first input signal; (Ii) means (14,15) for periodically adaptively generating a second signal representing an estimate of the noise signal component of the first signal; (iii) means (4, 5, 6) for periodically forming the first and second signals a measured M on the spectral similarity between a portion of the input signal and said estimated noise signal component; and (iv) means (7) for comparing the measured M with a threshold value T for generating an output signal to indicate the presence or absence of tai; characterized in that (v) the apparatus comprises analyzing means (13, 3) for generating coefficients for a filter with a spectral response, which is the inverse value of the frequency spectrum for one of the two mentioned signals; and (vi) the measuring forming means (4, 5, 6) amplify the measurement M, which is proportional to the zero order of an autocorrelation (R with a filter having said coefficients. '' 20

2. Apparatus according to claim 1, characterized in that the analyzing means; (13, 3) contains an adaptive filter.

3. Apparatus according to claim 1, characterized in that the generating means (14,15) are capable of calculating the autocorrelation coefficients A signal, and the means (5, 6) are connected to receive the values Rj and A, and calculate the measure M of these. 30

4. Apparatus according to claim 2, characterized in that the means (4) for calculating the autocorrelation coefficients Rj in said residual signal are arranged (4a, 4b) to make so dependent on the autocorrelation coefficients for several successive parts of the signal.

5. Apparatus according to claim 3 or 4, characterized in that 5 M = R0A0 + 2 £ RjA; where A, represents the ith autocorrelation coefficient of the impulse response of said filter. 10

Apparatus according to claim 3 or 4, characterized in that RiAi M = A0 + 2Σ -------

Apparatus according to any one of claims 1-6, characterized in that said one signal is the second, the noise representing the signal and said residual signal is the first input signal.

8. Apparatus according to claim 7, characterized in that it additionally provides an input (11) arranged to receive a second input, correspondingly exposed to noise, from which no means is provided, where the generating means are provided. takes LPC analyzing means (13) for deriving Aj values from the second input signal.

9. Apparatus according to any one of claims 1 to 7, characterized in that it comprises: signal of LPC analyzing means (3), wherein the apparatus is so coupled and controlled that the measurement M is calculated using said stored data, and said stored data is updated only from periods where the absence of numbers is indicated. 5

10. Apparatus according to claim 9, characterized in that it further comprises means (20) for indicating the absence of tai for checking the updating of stored data, the means (20) for indicating the absence of speech being a other detecting means (20) for the voice activity. 10

11. Apparatus according to any one of the preceding claims, characterized in that it further comprises means (29) for controlling the threshold value T during periods where the absence of numbers is indicated.

12. Apparatus according to claim 11, characterized in that it further comprises a second detecting means (20) for the voice activity arranged to prevent the regulation of the threshold where it exists.

Apparatus according to claim 10, characterized in that it further comprises means (20) for controlling said threshold value T for periods; when absence of tai is indicated, said second detecting means (20); for the voice activity is arranged to prevent the regulation of the threshold where tai occurs.

Apparatus according to claim 11, 12 or 13, characterized in that the threshold value T, when regulated, is adjusted to be the same as the mean of the measurement, plus a term, which is a fraction of the standard deviation of the measurement.

A method of detecting voice activity in a first input signal, comprising the steps of: (a) periodically adaptively generating a second signal representing a first signal's estimated noise signal component; (B) periodically formed by the first and second signals a measured M of the spectral equivalence between a portion of the input signal and said estimated noise signal component; and (c) comparing the measure M to a threshold value T to produce an output signal indicating presence or absence of tai; characterized by: (d) the step of generating the coefficients of a filter with a spectral response inverse of frequency spectra of one of said two signals; and that (e) measured M is proportional to the zero-order autocorrelation R'0 to one sigil obtained by filtering the residual of said two signals with a filter with said coefficients.

R0 where A represents the 1st autocorrelation coefficient of the impulse response of said filter.

16. An apparatus for encoding speech signals, characterized in that it contains an apparatus according to any of claims 1 to 14.

A mobile telephone device, characterized in that it contains an apparatus according to any of claims 1 to 14. 20>> »* t