JP3321156B2 - Voice operation characteristics detection - Google Patents

Voice operation characteristics detection

Info

Publication number
JP3321156B2
JP3321156B2 JP50377289A JP50377289A JP3321156B2 JP 3321156 B2 JP3321156 B2 JP 3321156B2 JP 50377289 A JP50377289 A JP 50377289A JP 50377289 A JP50377289 A JP 50377289A JP 3321156 B2 JP3321156 B2 JP 3321156B2
Authority
JP
Japan
Prior art keywords
means
value
signal
input signal
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP50377289A
Other languages
Japanese (ja)
Other versions
JPH03504283A (en
Inventor
フリーマン,ダニエル・ケネス
ボイド,イヴン
Original Assignee
ブリテツシユ・テレコミユニケイシヨンズ・パブリツク・リミテツド・カンパニー
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to GB888805795A priority Critical patent/GB8805795D0/en
Priority to GB8805795 priority
Priority to GB888813346A priority patent/GB8813346D0/en
Priority to GB8813346.7 priority
Priority to GB8820105.8 priority
Priority to GB888820105A priority patent/GB8820105D0/en
Application filed by ブリテツシユ・テレコミユニケイシヨンズ・パブリツク・リミテツド・カンパニー filed Critical ブリテツシユ・テレコミユニケイシヨンズ・パブリツク・リミテツド・カンパニー
Publication of JPH03504283A publication Critical patent/JPH03504283A/ja
Application granted granted Critical
Publication of JP3321156B2 publication Critical patent/JP3321156B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

The first aspect provides a voice activity detection appts. for receiving an input signal, estimating the noise signal component of the input signal and continually forming a measure M of the spectral similarity between a portion of the input signal and the noise signal. A circuit is provided to compare a parameter derived from the measure M with a threshold value T to produce an output to indicate the presence, or absence, of speech depending on whether, or not, that value is exceeded. A second aspect covers voice activity detection appts. which continually forms a spectral distortion measure and carries out a comparison.

Description

DETAILED DESCRIPTION OF THE INVENTION Voice activity detecto
r) is a device to which a signal is supplied which has the purpose of detecting a period of speech or a period containing only noise. The invention is not limited to these applications, and a particular embodiment of the invention for such a detector is a mobile radiotelephone system in which conversation is utilized by a conversation coder, It improves the efficient use of the spectrum, and in those systems the noise levels (from on-board units) are generally high.

The essence of speech behavioral characteristics detection is to look for different quantities between speech and non-speech periods. In an apparatus including a conversation coder, many parameters can be easily used from one coder or from another stage, thus utilizing such parameters to economically simplify the necessary processing. It is desired. In many situations, dominant noise occurs within a limited region of a frequency spectrum. For example, the noise of a moving car (eg, engine noise) is a low frequency band spectrum. If knowledge of such locations in the noise spectrum is available, it is desirable to base the determination of whether speech is present on the measurand obtained from the relatively noisy portions of the spectrum. Of course,
If it is indeed possible to filter the signal before detecting and analyzing the speech behavioral characteristics, but the speech behavioral detector depends on the output of the speech coder, this pre-filtering is coded. Interfere with the audio signal.

According to the present invention, a means for receiving an input signal, a means for adaptively and periodically estimating a noise signal component of the input signal, and a measurement M of spectral similarity between the input signal and the noise signal component are provided. A means for periodically forming a parameter obtained from the measured value M;
ue) A speech performance detector comprising means for comparing with T and means for generating an output indicating whether a conversation is present depending on whether said value has been exceeded is provided.

The measured value is desirably a distortion value by Itakura and Saito.

 Other aspects of the invention are within the scope of the claims.

Some embodiments of the present invention will now be described with reference to the accompanying drawings.

 FIG. 1 is a block diagram showing a first embodiment of the present invention; FIG. 2 shows a second embodiment of the present invention; FIG. 3 shows a preferred third embodiment of the present invention.

The general principle characterizing the first embodiment of the speech behavior characteristic detector according to the present invention is as follows.

The n signal samples (s 0 , s 1 , s 2 , s 3 , s 4 … s n-1 ) are the fourth-order finite pulses based on the concept of the pulse response (1, h 0 , h 2 , h 3 ). When passed through a response (FIR) digital filter, it becomes a filtered signal (ignoring samples from previous frames), The zero-order autocorrelation coefficient is the sum of the squares of each term, which is normalized, ie, divided by the total number of terms (for a fixed frame length, it is easy to omit the division), Thus the sum of the filtered signals is This is therefore the power of the logically filtered signal s', ie the power of the portion of the signal s within the passband of the conceptual filter.

Expanding ignoring the first four terms, Thus, R '0, the value R' obtained by the binding of the autocorrelation coefficients R i 0 is weighted by a constant in parentheses to determine the frequency band to be responsive. In fact, the term in parentheses is the autocorrelation coefficient of the pulse response of the logic filter, so the above expression can be expressed simply as:

Where N is the filter order and Hi is the (unnormalized) autocorrelation coefficient of the filter's pulse response.

That is, the effect of signal filtering on the signal autocorrelation coefficient is:
Using the pulse response of the required filter, it can be simulated by generating the sum of the autocorrelation coefficients of the (unfiltered) signal.

Thus, a relatively simple algorithm involving a small number of multiplication operations can simulate a digital filter that typically requires this number of 100 multiplication operations.

This filtering operation, on the other hand, can be viewed as a form of spectral comparison, with the signal spectrum matched to the reference spectrum (the inverse phase response of the logical filter). Since the logical filter in this application is chosen to approximate the inverse of the noise spectrum, this behavior is similar to values that indicate dissimilarity between spectra, such as:
It can be seen as a spectral comparison of the speech and noise spectra with the generated zero order autocorrelation coefficients (ie, the energy of the defiltered signal). The distortion values by Itakura and Saito are used in the LPC that evaluates the matching between the predistor filter and the input spectrum, and one form is shown as follows.

Here, like A 0 is the autocorrelation coefficients of the LPC parameter set. This turns out to be very similar to the relationship obtained above, where the LPC coefficients are the taps of the FIR with the inverse spectral response of the input signal, and thus the LPC
Considering that the coefficient set is the pulse response of an inverse LPC filter, in fact, the distortion value by Itakura and Saito is simply a form of Equation 1, where the filter response H is the all-pole model of the input signal. ).

In fact, using the LPC coefficient of the test spectrum and the autocorrelation coefficient of the reference spectrum, one can convert and obtain different values of spectral similarity.

The distortion value due to IS is described in “Speech Coding based upon Vecto”.
r Quantization "by A Buzo, AH Gray, RM Gr
ay and JD Markel, IEE Trans on ASSP, Vol
ASSP-28, No. 5, October 1980).

The frame of the signal has only a finite length and the number of terms (N,
The above results are only approximate, since N is the filter order). However, it shows very well whether the conversation is mobile and is therefore used as the value M of the conversation report. If the noise spectrum is known and it is static noise, it is quite possible to apply fixed h 0 , h 1 etc. coefficients to the inverse noise filter.

However, a device that can adapt to different noise situations is even more beneficial.

FIG. 1 shows a first embodiment of the present invention, in which a signal s from a microphone (not shown) is received at an input 1;
The analog / digital converter 2 converts the digital samples into digital samples at an appropriate sampling rate. The LPC analysis unit 3 (general LPC coder) sends N (eg 8 or 12) to indicate the input conversation for successive frames of n (eg 160) samples obtaining a set of LPC filter coefficients L i. The speech signal s is also the correlation unit (correlat
or unit) 4 (usually this is part of the LPC coder 3 because here the separate correlator [separate crrela
can evaluation be supplied tor], the autocorrelation vector R i of the conversation is input to normal since produced as a step in the LPC analysis). Correlator 4 produces an autocorrelation vector R i, including the vector R i is zero-order correlation coefficient R 0, and at least two further autocorrelation coefficients R1, R2, R3. These are multiplier units 5
Supplied to

The second input 11 is connected to a second microphone located away from the speaker, so that only background noise is received. The input from this microphone is converted to a digital input sample sequence by the AD converter 12 and the LPC analyzer 13
Analyzed by LPC. "Noise" LPC coefficients produced from analyzer 13 passes through the correlation unit 14, the self-correlation vector generated by it is multiplied term by term with the autocorrelation coefficients R i of the input signal from the conversation microphone of the multiplier 5 , The weighting factors generated thereby are added by adder 6 according to equation 1, thereby providing a filter having the inverse phase shape of the noise spectrum from the noise-only microphone (actually the signal
It has the same shape as the noise spectrum in a pulse noise microphone) and thus filters out most of the noise. The resulting measured value M is a thresholde
r) is compared to a threshold value by 7 to generate a logic output 8 indicating whether a conversation is present. Here, when M is large, it is considered that a conversation exists.

This embodiment uses two microphones and two LPC analyzers, but adds cost and complexity, but they can be increased if needed.

On the other hand, other embodiments use the autocorrelation from the noise microphone 11 and the corresponding value formed using the LPC coefficients from the main microphone 1. In that case,
Instead of an LPC analyzer, another autocorrelator is required.

Thus, these embodiments can be used in different situations with different frequencies of noise, or in one given situation.
It is possible to work where there is a changing noise spectrum.

In the preferred embodiment of FIG. 2, a buffer 15 stores a set of LPC coefficients (or a set of autocorrelation vectors).
Are provided, and these values are “non-speech (non-speech).
ech) (ie noise only)
Obtained from microphone input 1. These values are given by Equation 1.
Of course, this measurement corresponds to the distortion measurement method of Itakura and Saito, but the storage of the LPC coefficient, which is not the current frame of the LPC coefficient, but matches the estimated value of the inverse noise spectrum. The difference is that a single frame is used.

LPC coefficient vector L i output by analyzer 3
Is also directed to correlator 14, thereby generating an autocorrelation vector of the LPC coefficient vector. The buffer memory 15 is controlled by the speech / non-speech output of the thresholder 7; during the "speech" frame, the buffer retains the "noise" autocorrelation coefficient, but during the "noise" frame, the LPC coefficient is renewed. One set, for example, compound switch 1
The output of the correlator 14 transmitting each autocorrelation coefficient is connected to the buffer 15 via this switch 16, which can be used to update the buffer by 6. Correlator 14
May be arranged after the buffer 15. Further, the speech / non-speech determination for coefficient updating need not be from output 8 and can be (preferably) obtained in other ways.

As periods of silence often occur, the LPC coefficients stored in the buffer are updated from time to time, allowing the device to follow changes in the noise spectrum.
If the noise spectrum is relatively stable in time (as is often the case), then such a buffer update may be needed very rarely or only for the initial operation of the detector. Though conceivable, it is often desirable to update in situations such as a moving (car) radio.

As a variation on this embodiment, the system applies Equation 1 with the coefficient terms matching a simple fixed high-pass filter, and then the system starts fitting by switching using the "noise period" LPC coefficients. I do. If speech detection fails for several reasons, the system can again use a simple high-pass filter.

The above value can be normalized by dividing by R 0 , and the expression compared to the threshold is This value is independent of the total signal power of the frame, and is therefore compensated for in terms of total signal level changes, but does not provide a significant contrast between "noise" and "talk" levels, and therefore in noisy environments It is not used preferably.

As the noise spectrum changes gradually (as described below), instead of using LPC analysis to obtain the inverse filter coefficients of the noise signal (obtained from noise microphones or noise-only periods in the various embodiments described above) In addition, a prototype of the inverse noise spectrum can be generated using a general adaptive filter, and a relatively slow precision factor common to such filters can be obtained. In an embodiment consistent with FIG. 1, the LPC analysis unit 13 can be easily replaced with a compatible filter (eg, a transversal FIR or a lattice filter),
The filter is connected to the system to convert the noise input to white noise by creating a prototype of the inverse filter, the coefficients of which are autocorrelator as described above.
Supplied to 14.

In the second embodiment shown in FIG. 2, the LPC analysis means 3 is replaced by such a suitable filter and the buffer means 15 is omitted. However, switch 16 operates to prevent the adaptive filter from adapting its coefficients during the talk period.

A second audio performance detector used in another embodiment of the present invention will now be described.

In the following description, it is clear that the LPC coefficient vector is simply the pulse response of the FIR filter, and that the FIR filter is the inverse phase spectral shape of the input signal. When a distortion value by Itakura and Saito is formed between adjacent frames, the value is actually equal to the signal power because it is filtered by the LPC filter of the previous frame. Thus, if there is little difference in the spectrum of adjacent frames, the corresponding slight spectral power of the frame will escape filtering and its value will be small. At the same time, large spectral differences between frames generate large Itakura-Saito distortion values, whose values reflect the similarity of the spectra of adjacent frames. For a speech coder, it is desirable to maximize the frame length by minimizing the data rate. That is, if the frame length is long enough, the speech signal will show significant spectral changes from frame to frame (if not, the coding is redundant). On the other hand, noise has a spectral shape that changes gradually from frame to frame, and during periods when no speech is present in the signal, the phase LPC is reversed from the previous frame.
Since the filter is applied and most of the noise power is "filtered out", the distortion values by Itakura and Saito are correspondingly small.

Including intermittent conversations, the Itakura-Saito distortion value between adjacent frames of a noisy signal is generally greater during periods of conversation than during periods of noise, and varies in magnitude (as indicated by the standard deviation). Large, intermittent changes are small.

Here, the standard deviation of M is also a reliable value, and the effect of taking each standard deviation is essentially to smooth the value.

In this second form of the speech behavior characteristic detector, the measured parameter used to determine whether speech is present is preferably the standard deviation of the Itakura-Saito distortion value, but the change is measured. Other ways to do, and (for example,
Other methods of measuring spectral distortion (based on FFT analysis) can be applied.

Compatibility threshold for detecting voice motion characteristics (adaptive
There are advantages to using threshold). Such a threshold should not be adjusted during the duration of the conversation, at which point the speech signal will be thresholded out (th
reshold out). Therefore, it is necessary to control the threshold adapter using a speech / non-speech control signal, which is preferably independent of the output of the threshold adapter. The threshold T is
When only noise is present, adjustment is made so that the level is maintained at or above the level of the value M. Since the value generally varies randomly in the presence of noise, the threshold is varied by determining the average level for many blocks and setting the threshold to a level proportional to this average level. However, this is generally not sufficient in noisy situations and allows for an assessment of the degree of parameter change for some blocks.

 Therefore, the threshold value T is calculated according to the following equation.

T = M '+ K.d where M is the average of the measurements for many consecutive frames, d is the standard deviation of the measurements for those frames, and K is a constant (typically Is 2.)

In practice, the adaptation action should not be started again immediately after the indication that no conversation is present, but rather that the descent has stabilized (to avoid repeated rapid switching between conforming and non-conforming states). You should wait for confirmation.

FIG. 3 shows a preferred embodiment of the present invention having the above-mentioned features, wherein input 1 is an analog-to-digital converter (AD).
C) receiving the sampled and digitized signal by 2 and supplying a signal to the input of an anti-phase filter analyzer 3, which is one of the speech coders in which the operating characteristic detector of the speech actually operates; And the coefficient L of the filter that matches the antiphase of the input signal spectrum.
i (typically 8). The digital signal is also fed to an autocorrelator 4 (which is part of the analyzer 3), which outputs an autocorrelation vector R i of the input signal (or at least as many low-order terms as LPC coefficients). Occurs. The operation of these parts of the device is illustrated in FIGS.
Shown in the figure. Autocorrelation coefficients R i is preferably an average value for several speech successive frames (typically 5 to 20 mS) is taken, their reliability is improved. This averaging stores each set of autocorrelation coefficients output by the autocorrelator 4 in the buffer 4a, and uses an averager 4b to calculate the current autocorrelation coefficient R i and the buffer 4a. By generating a weighted sum of the coefficients from the previous frame stored in the buffer 4a and supplied from the buffer 4a. The resulting averaged autocorrelation coefficient Ra i is supplied to weighting and summing means 5, 6, which also receive, via buffer 15, the antiphase of the noise period stored from autocorrelator 14. receives the autocorrelation vector a i of the filter coefficients L i, forming the value M, which is defined by the following formulas Ra i and a i.

This value is compared by a thresholder 7 to a threshold value and a logical result is generated at output 8 indicating whether a conversation exists or not.

To reverse phase filter coefficients L i matches the good estimate of the reverse-phase noise spectrum, it is desirable to update these coefficients during the noise (of course, does not update the duration of the conversation). However, speech / non-speech decisions based on the update are unaffected by the results of the update, or a single frame of incorrectly identified signal may result in the speech behavior detector eventually "out of lock" (out). of loc
k) "and incorrectly recognizes the next frame. Thus, a control signal generating circuit 20 is provided, i.e. a detector of the operating characteristics of the separated speech, which forms an independent control signal indicating whether speech is present or not. Then, the anti-phase filter autocorrelation coefficient A i used to control the anti-phase filter analyzer 3 (or the buffer 8) and thereby form the value M is updated only during the “noise” period. Circuit 20 is LPC
It includes an analyzer 21 (which is again a part of the speech coder and is executed in particular by the analyzer 3), which analyzes the input signal and the autocorrelator 21a (autocorrelator 3a
A set of LPCs that can be implemented by
Generating a coefficient M i, the autocorrelation coefficients of the autocorrelator 21a is M i B i
Get. If the analyzer 21 was executed by the analyzer 3, then M i = L i and B i = A i . These autocorrelation coefficients are weighting and adding means 22, 23 (equivalent to 5, 6)
Which also receives the autocorrelation vector R i of the input signal from the autocorrelator 4. Therefore, the spectral similarity between the input speech frame and the previous speech frame is calculated. This is because, as described above, obtained by calculating the Itakura-Saito distortion value, or Itakura-Saito distortion values for Ri and B i of the current frame between B i of R i and the previous frame of the current frame Or by subtracting the corresponding value with respect to the previous frame stored in buffer 24 to produce a spectrally different signal (in each case, the value is divided by Ro for energy and normalization). Is desirable). Of course, the buffer 24 is updated here. This spectrally different signal, when compared to the threshold by thresholder 26, indicates whether speech is present, as described above. This method is excellent for discriminating noise from non-speech conversations (a task possible in conventional systems), but found that the ability to discriminate noise from speech conversations is generally low. Was done. Therefore, the circuit 20 includes a pitch analyzer
r) Speech speech detection with 27 (actually able to operate as part of a speech coder, in particular to measure long delay values of a predictor generated in a multi-pulse LPC coder) Preferably, a circuit is provided. The pitch analyzer 27 generates a logic signal that is "true" when a spoken conversation is detected, and outputs this signal to a thresholder 26 (when a non-speech conversation is present, generally a "truth"). ) And applied to the input of NOR gate 28 to generate a signal that is "false" when speech is present and "truth" when noise is present. . This signal is supplied to the buffer 8 (or the anti-phase filter analyzer 3), whereby the anti-phase filter coefficient Li
Are updated only during the noise period.

A threshold adapter 29 is also connected and receives the non-speech signal control output of the control signal generation circuit 20. The output of the threshold adapter 29 is supplied to the threshold 7. The output of the threshold adapter 29 is supplied to the threshold 7. The threshold adapter is activated until the threshold approaches the noise power level (for example,
(Easily obtained by adding and weighting 2, 23), which operates to increment or decrement the threshold in steps proportional to the instantaneous threshold level. Preferably, when the input signal is very small, the threshold is automatically set to low level. This is because at low signal levels, the amount of signal generated by ADC2 cannot produce reliable results.

In addition, a "hangover" generating means 30 is provided, which measures the period of time indicating a conversation after the threshold 7, and outputs its output when the presence of a conversation is indicated for a period exceeding a predetermined time constant. Is held high during a short "hangover". In this way, clipping in the middle of a low-level conversation burst is avoided, and selection of an appropriate time constant prevents activation of the hangover generator 30 due to short spikes that are erroneously shown during conversation. be able to. Of course, all of the functions described above can be performed by a single digital processing means that is appropriately programmed, such as LPC.
Such as a digital signal processing chip (DSP) configured as part of a codec (which is the desired configuration) or as a suitably programmed microcomputer or microcontroller chip with associated memory devices It can be performed by any means.

As described above, the voice detection device can be easily configured as a part of the LPC codec. On the other hand, if the autocorrelation coefficient of the signal, or a value associated therewith (partial correlation or "parcor" coefficient), is transmitted to a remote station, speech detection is performed away from the codec.

──────────────────────────────────────────────────続 き Continued on the front page (31) Priority claim number 8820105.8 (32) Priority date August 24, 1988 (August 24, 1988) (33) Priority claim country United Kingdom (GB) (72) Inventor Boyd, Even United Kingdom IP9, 2 Ekswye, Saffork, Aipwich, Kapel S.T.Marie, Homefield 5 (56) References JP-A-62-111698 (JP, A) JP-A-62-150299 (JP, A) JP-A-59-115625 (JP, A) (58) Fields investigated (Int. Cl. 7 , DB name) G10L 11/02, 15/04

Claims (14)

    (57) [Claims]
  1. (I) means for receiving a first input signal; and (ii) periodically adaptively generating a second signal representing an estimated noise signal component of the first signal. (Iv) means for periodically forming a value of the spectral similarity between a portion of the input signal and the estimated noise signal component from the first and second signals; Means for comparing the value with a threshold value to produce an output indicating whether speech is present or absent. Analyzing means operative to produce filter coefficients having a spectral response that is the inverse of the frequency spectrum for one of the signal and the estimated noise signal component; and (vi) the means for forming the value comprises an input Signal and estimated noise signal And wherein the operating to create a value proportional to the zero-order autocorrelation after filtering by a filter having the coefficients for the other of the components.
  2. Wherein the means for the generated work to calculate the autocorrelation coefficients A i of the pulse response of the coefficient, and the value forming means of the noise signal component the estimated and the input signal the other means for computing the autocorrelation coefficients R i, apparatus according to claim 1, characterized in that it comprises means for calculating the connected value M from them to receive R i and a i.
  3. 3. The means for calculating the autocorrelation coefficient of the input signal and the other of the estimated noise signal components is based on autocorrelation coefficients of several successive portions of the signal. 3. The apparatus of claim 2, wherein the apparatus is configured to calculate.
  4. 4. The apparatus according to claim 2, wherein M = R 0 A 0 + 2ΣR i A i .where Ai represents the i-th autocorrelation coefficient of the pulse response of said filter. Or the device according to claim 3.
  5. 5. The apparatus according to claim 1, wherein The apparatus according to claim 2 or 3, wherein Ai denotes an i-th autocorrelation coefficient of a pulse response of the filter.
  6. 6. The apparatus according to claim 1, wherein one of the input signal and the estimated noise signal component is an estimated noise signal component.
  7. 7. A buffer connected to store data from which an autocorrelation coefficient A i of the filter response is obtained, wherein the filter response is periodically calculated from the signal by LPC analysis means, The apparatus is connected and controlled such that the value M is calculated using the stored data, and wherein the stored data is updated only during a period indicating that no conversation is present. Apparatus according to any one of the preceding claims.
  8. 8. An apparatus for controlling the updating of the stored data, comprising means for indicating that no conversation is present, wherein the means for indicating that no conversation is present is means for detecting an operation characteristic of the second voice. The apparatus according to claim 7, characterized in that:
  9. 9. Apparatus according to claim 1, further comprising means for adjusting said threshold during a period when no speech is indicated.
  10. 10. The detecting device according to claim 9, further comprising a second voice operating characteristic detecting means configured to prohibit adjustment of the threshold value when a conversation is present.
  11. 11. The apparatus according to claim 11, wherein said second voice behavior characteristic detecting means includes means for generating a value of spectral similarity between a portion of the input signal and an earlier portion of the input signal. Item 11. The device according to Item 8 or 10.
  12. 12. An apparatus for encoding a speech signal, comprising an apparatus according to any one of claims 1 to 11.
  13. 13. A mobile telephone device comprising the device according to claim 1. Description:
  14. 14. A method for detecting an operating characteristic of a sound with respect to a first input signal, comprising: (a) periodically changing a second signal representing an estimated noise signal component of the first signal; (B) periodically forming a value of spectral similarity between a portion of an input signal and the estimated noise signal component from the first and second signals; and (C) comparing the value to a threshold value to produce an output indicating whether speech is present or absent; and (d) further for one of the input signal and the estimated noise signal component. Creating filter coefficients having a spectral response that is the inverse of the frequency spectrum; and (e) the value is the input signal after being filtered by the filter having the coefficients and the estimated noise signal component. Wherein the proportional of the other zero-order autocorrelation.
JP50377289A 1988-03-11 1989-03-10 Voice operation characteristics detection Expired - Lifetime JP3321156B2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
GB888805795A GB8805795D0 (en) 1988-03-11 1988-03-11 Voice activity detector
GB8805795 1988-03-11
GB888813346A GB8813346D0 (en) 1988-06-06 1988-06-06 Voice activity detection
GB8813346.7 1988-06-06
GB8820105.8 1988-08-24
GB888820105A GB8820105D0 (en) 1988-08-24 1988-08-24 Voice activity detection

Publications (2)

Publication Number Publication Date
JPH03504283A JPH03504283A (en) 1991-09-19
JP3321156B2 true JP3321156B2 (en) 2002-09-03

Family

ID=27263821

Family Applications (2)

Application Number Title Priority Date Filing Date
JP50377289A Expired - Lifetime JP3321156B2 (en) 1988-03-11 1989-03-10 Voice operation characteristics detection
JP32819899A Expired - Lifetime JP3423906B2 (en) 1988-03-11 1999-11-18 Voice operation characteristic detection device and detection method

Family Applications After (1)

Application Number Title Priority Date Filing Date
JP32819899A Expired - Lifetime JP3423906B2 (en) 1988-03-11 1999-11-18 Voice operation characteristic detection device and detection method

Country Status (16)

Country Link
EP (2) EP0335521B1 (en)
JP (2) JP3321156B2 (en)
KR (1) KR0161258B1 (en)
AU (1) AU608432B2 (en)
BR (1) BR8907308A (en)
CA (1) CA1335003C (en)
DE (2) DE68910859T2 (en)
DK (1) DK175478B1 (en)
ES (2) ES2188588T3 (en)
FI (2) FI110726B (en)
HK (1) HK135896A (en)
IE (1) IE61863B1 (en)
NO (2) NO304858B1 (en)
NZ (1) NZ228290A (en)
PT (1) PT89978B (en)
WO (1) WO1989008910A1 (en)

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2643593B2 (en) * 1989-11-28 1997-08-20 日本電気株式会社 Voice / modem signal identification circuit
CA2040025A1 (en) * 1990-04-09 1991-10-10 Hideki Satoh Speech detection apparatus with influence of input level and noise reduced
US5241692A (en) * 1991-02-19 1993-08-31 Motorola, Inc. Interference reduction system for a speech recognition device
FR2697101B1 (en) * 1992-10-21 1994-11-25 Sextant Avionique Speech detection method.
SE470577B (en) * 1993-01-29 1994-09-19 Ericsson Telefon Ab L M Method and apparatus for encoding and / or decoding background sounds
JPH06332492A (en) * 1993-05-19 1994-12-02 Matsushita Electric Ind Co Ltd Method and device for voice detection
SE501305C2 (en) * 1993-05-26 1995-01-09 Ericsson Telefon Ab L M Method and apparatus for discriminating between stationary and non-stationary signals
EP0633658A3 (en) * 1993-07-06 1996-01-17 Hughes Aircraft Co Voice activated transmission coupled AGC circuit.
IN184794B (en) * 1993-09-14 2000-09-30 British Telecomm
SE501981C2 (en) * 1993-11-02 1995-07-03 Ericsson Telefon Ab L M Method and apparatus for discriminating between stationary and non-stationary signals
US5742734A (en) * 1994-08-10 1998-04-21 Qualcomm Incorporated Encoding rate selection in a variable rate vocoder
FR2727236B1 (en) * 1994-11-22 1996-12-27 Alcatel Mobile Comm France Detection of voice activity
GB2317084B (en) * 1995-04-28 2000-01-19 Northern Telecom Ltd Methods and apparatus for distinguishing speech intervals from noise intervals in audio signals
GB2306010A (en) * 1995-10-04 1997-04-23 Univ Wales Medicine A method of classifying signals
FR2739995B1 (en) * 1995-10-13 1997-12-12 Massaloux Dominique Method and device for creating comfort noise in a digital speech transmission system
US5794199A (en) * 1996-01-29 1998-08-11 Texas Instruments Incorporated Method and system for improved discontinuous speech transmission
EP0909442B1 (en) * 1996-07-03 2002-10-09 BRITISH TELECOMMUNICATIONS public limited company Voice activity detector
US6618701B2 (en) 1999-04-19 2003-09-09 Motorola, Inc. Method and system for noise suppression using external voice activity detection
DE10052626A1 (en) * 2000-10-24 2002-05-02 Alcatel Sa Adaptive noise level estimator
CN1617606A (en) * 2003-11-12 2005-05-18 皇家飞利浦电子股份有限公司 Method and device for transmitting non voice data in voice channel
US7155388B2 (en) * 2004-06-30 2006-12-26 Motorola, Inc. Method and apparatus for characterizing inhalation noise and calculating parameters based on the characterization
US7139701B2 (en) * 2004-06-30 2006-11-21 Motorola, Inc. Method for detecting and attenuating inhalation noise in a communication system
FI20045315A (en) * 2004-08-30 2006-03-01 Nokia Corp Detection of voice activity in an audio signal
US8708702B2 (en) * 2004-09-16 2014-04-29 Lena Foundation Systems and methods for learning using contextual feedback
US8775168B2 (en) 2006-08-10 2014-07-08 Stmicroelectronics Asia Pacific Pte, Ltd. Yule walker based low-complexity voice activity detector in noise suppression systems
US8175871B2 (en) 2007-09-28 2012-05-08 Qualcomm Incorporated Apparatus and method of noise and echo reduction in multiple microphone audio systems
US8954324B2 (en) 2007-09-28 2015-02-10 Qualcomm Incorporated Multiple microphone voice activity detector
US8223988B2 (en) 2008-01-29 2012-07-17 Qualcomm Incorporated Enhanced blind source separation algorithm for highly correlated mixtures
WO2009130388A1 (en) 2008-04-25 2009-10-29 Nokia Corporation Calibrating multiple microphones
US8244528B2 (en) 2008-04-25 2012-08-14 Nokia Corporation Method and apparatus for voice activity determination
US8275136B2 (en) 2008-04-25 2012-09-25 Nokia Corporation Electronic device speech enhancement
ES2371619B1 (en) * 2009-10-08 2012-08-08 Telefónica, S.A. Voice segment detection procedure.
KR20120091068A (en) * 2009-10-19 2012-08-17 텔레폰악티에볼라겟엘엠에릭슨(펍) Detector and method for voice activity detection

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3509281A (en) * 1966-09-29 1970-04-28 Ibm Voicing detection system
US4052568A (en) * 1976-04-23 1977-10-04 Communications Satellite Corporation Digital voice switch
US4358738A (en) * 1976-06-07 1982-11-09 Kahn Leonard R Signal presence determination method for use in a contaminated medium
JPS6244732B2 (en) * 1979-08-31 1987-09-22 Nippon Denki Kk
JPS6245730B2 (en) * 1982-12-22 1987-09-29 Nippon Electric Co
EP0127718B1 (en) * 1983-06-07 1987-03-18 International Business Machines Corporation Process for activity detection in a voice transmission system
JPH036689B2 (en) * 1984-10-17 1991-01-30 Sharp Kk

Also Published As

Publication number Publication date
EP0548054B1 (en) 2002-12-11
DK215690D0 (en) 1990-09-07
JP2000148172A (en) 2000-05-26
DK175478B1 (en) 2004-11-08
CA1335003C (en) 1995-03-28
EP0335521A1 (en) 1989-10-04
NO982568D0 (en) 1998-06-04
KR0161258B1 (en) 1999-03-20
AU3355489A (en) 1989-10-05
NZ228290A (en) 1992-01-29
FI115328B (en) 2005-04-15
NO903936D0 (en) 1990-09-10
NO304858B1 (en) 1999-02-22
AU608432B2 (en) 1991-03-28
KR900700993A (en) 1990-08-17
FI110726B (en) 2003-03-14
PT89978A (en) 1989-11-10
FI904410A0 (en) 1990-09-07
DE68929442D1 (en) 2003-01-23
FI20010933A (en) 2001-05-04
EP0335521B1 (en) 1993-11-24
NO316610B1 (en) 2004-03-08
DK215690A (en) 1990-09-07
EP0548054A3 (en) 1994-01-12
EP0548054A2 (en) 1993-06-23
HK135896A (en) 1996-08-02
JP3423906B2 (en) 2003-07-07
JPH03504283A (en) 1991-09-19
DE68929442T2 (en) 2003-10-02
NO982568L (en) 1990-11-09
ES2188588T3 (en) 2003-07-01
WO1989008910A1 (en) 1989-09-21
PT89978B (en) 1995-03-01
DE68910859T2 (en) 1994-12-08
IE61863B1 (en) 1994-11-30
FI110726B1 (en)
BR8907308A (en) 1991-03-19
IE890774L (en) 1989-09-11
ES2047664T3 (en) 1994-03-01
FI904410D0 (en)
DE68910859D1 (en) 1994-01-05
FI115328B1 (en)
NO903936L (en) 1990-11-09

Similar Documents

Publication Publication Date Title
Shrawankar et al. Techniques for feature extraction in speech recognition system: A comparative study
KR101613673B1 (en) Audio codec using noise synthesis during inactive phases
Gonzalez et al. PEFAC-a pitch estimation algorithm robust to high levels of noise
Ghosh et al. Robust voice activity detection using long-term signal variability
Tanyer et al. Voice activity detection in nonstationary noise
Gerkmann et al. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay
JP5596039B2 (en) Method and apparatus for noise estimation in audio signals
Martin Noise power spectral density estimation based on optimal smoothing and minimum statistics
US10181327B2 (en) Speech gain quantization strategy
US5649055A (en) Voice activity detector for speech signals in variable background noise
CA2099655C (en) Speech encoding
McAulay et al. Sinusoidal Coding.
Hansen et al. Constrained iterative speech enhancement with application to speech recognition
KR100719650B1 (en) Endpointing of speech in a noisy signal
FI122273B (en) A method and apparatus for selecting an encoding rate in a variable rate vocoder
CA2034354C (en) Signal processing device
Tan et al. Low-complexity variable frame rate analysis for speech recognition and voice activity detection
US4628529A (en) Noise suppression system
JP4137634B2 (en) Voice communication system and method for handling lost frames
JP4764118B2 (en) Band expanding system, method and medium for band limited audio signal
US9953661B2 (en) Neural network voice activity detection employing running range normalization
ES2329046T3 (en) Procedure and device for improving voice in the presence of fund noise.
EP1326479B2 (en) Method and apparatus for noise reduction, particularly in hearing aids
JP3363336B2 (en) Frame speech determination method and apparatus
EP1208563B1 (en) Noisy acoustic signal enhancement

Legal Events

Date Code Title Description
S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313113

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080621

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090621

Year of fee payment: 7

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090621

Year of fee payment: 7