GB2360428A - Voice activity detection - Google Patents
Voice activity detection Download PDFInfo
- Publication number
- GB2360428A GB2360428A GB0006312A GB0006312A GB2360428A GB 2360428 A GB2360428 A GB 2360428A GB 0006312 A GB0006312 A GB 0006312A GB 0006312 A GB0006312 A GB 0006312A GB 2360428 A GB2360428 A GB 2360428A
- Authority
- GB
- United Kingdom
- Prior art keywords
- audio parameter
- unit
- averaging
- delay
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000694 effects Effects 0.000 title claims abstract description 43
- 238000001514 detection method Methods 0.000 title claims abstract description 27
- 238000012935 Averaging Methods 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims description 28
- 230000003595 spectral effect Effects 0.000 claims description 8
- 230000003111 delayed effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000004148 unit process Methods 0.000 claims 7
- 230000001934 delay Effects 0.000 claims 2
- 239000013598 vector Substances 0.000 description 19
- 230000007704 transition Effects 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 101100126625 Caenorhabditis elegans itr-1 gene Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 206010019133 Hangover Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mobile Radio Communication Systems (AREA)
- Geophysics And Detection Of Objects (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Time-Division Multiplex Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Voice activity detection system includes a plurality of audio parameter delay units 104A-104G connected in series therebetween, the first audio parameter delay unit 104A being further connected to an audio parameter generator 120, a plurality of distance measure units 106A-106D, each connected to at least two of the delay units, an averaging unit 108, connected to the distance measure units, a plurality of averaging delay units 112A-112D connected in series therebetween, the first averaging delay unit 112A being further connected to the output of the averaging unit, and a digital logic unit 116 connected to the averaging delay units. In an alternative embodiment (fig 5), there is a single distance measure unit (206) coupled between a multi-stage delay unit (202) and a plurality of delay units (218) connected in series.
Description
2360428 VOICE ACTIVITY DETECTION APPARATUS AND METHOD
FIELD OF THE INVENTION
The present invention relates to voice processing systems in general, and to methods and apparatus for detecting voice activity in a low resource environment, in particular.
BACKGROUND OF THE INVENTION
Methods and apparatus for detecting voice activity are known in the art. A voice activity detector (VAD) operates under the assumption that speech is present only in part of the audio signals while there are many intervals, which exhibit only silence or background noise.
A voice activity detector can be used for many purposes such as suppressing overall transmission activity in a transmission system, when there is no speech, thus potentially saving power and channel bandwidth.
When the VAD detects that speech activity has resumed, then it can reinitiate transmission activity.
A voice activity detector can also be used in conjunction with speech storage devices, by differentiating audio portions which include speech from those that are "speechless". The portions including speech are then stored in the storage device and the "speechless" portions are not stored.
Conventional methods for detecting voice are based at least in part on methods for detecting and assessing the power of a speech signal. The estimated power is compared to either constant or adaptive threshold, for determining a decision. The main advantage of these methods is their low complexity, which makes them suitable for low resources implementations. The main disadvantage of such methods is that background noise can result in "speech" being detected when none is present or "speech" which is present not being detected because it is obscured and difficult to detect.
Some methods for detecting speech activity are directed at noisy mobile environments and are based on adaptive filtering of the speech signal. This reduces the noise content from the signal, prior to the final decision. The frequency spectrum and noise level may vary because the method will be used for different speakers and in different environments.
Hence, the input filter and thresholds are often adaptive so as to track these variations. Examples of these methods are provided in GSM specification 06.42 Voice Activity Detector (VAD) for half rate, full rate, and enhanced full rate speech traffic channels respectively. Another such method is the "Multi-Boundary Voice Activity Detection AlgorithW as proposed in ITU G.729 annex B. These methods are more accurate in noisy environment but are significantly complex to implement.
All of these methods require the speech signal to be input. Some applications employing speech decompression schemes require carrying out speech detection during the speech decompression process.
European Patent application No. 0785419A2 Benyassine et al.
s is directed to a method for voice activity detection which includes the following steps: extracting a predetermined set of parameters from the incoming speech signal for each frame and making a frame voicing decision of the incoming speech signal for each frame according to a set of difference measures extracted from the predetermined set of lo parameters.
j e SUMMARY OF THE PRESENT INVENTION
It is an object of the present invention to provide a method and an apparatus for detecting the presence of speech activity, which alleviates the disadvantages of the prior art.
It is a further object of the present invention to provide a method and an apparatus, which can classify speech activity by utilizing compressed speech parameters.
In accordance with the present invention, there is thus provided a voice activity detection apparatus including: a plurality of audio lo parameter delay units, a plurality of distance measure units, an averaging unit, a plurality of averaging delay units, and a digital logic unit. The audio parameter delay units are connected in series therebetween. The distance measure units are each connected to at least two of the delay units. The averaging unit is connected to the distance measure units. The averaging delay units are connected in series therebetween. The digital logic unit is connected to the averaging delay units.
The first audio parameter delay unit is further connected to an audio parameter generator. The first averaging delay unit is further connected to the output of the averaging unit.
In accordance with another aspect of the invention, all of the distance measure units but the first one, are replaced by distance measure delay units which delay the value provided by the first distance measure delay unit, for predetermined time periods.
In accordance with a further aspect of the invention, there is thus provided method for detecting speech activity, including the steps of:
grouping audio parameters which are associated with a predetermined combination of audio frames, thereby producing a plurality of groups, determining a characteristic value for each of the groups, determining an average value for each of a plurality of selections of a plurality of the characteristic values, and determining the presence of speech activity lo from selected ones of the average values.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which:
Figure 1 is a schematic illustration of apparatus, constructed and operative in accordance with a preferred embodiment of the present invention; Figure 2 is an illustration of a method for operating the apparatus of Figure 1, operative in accordance with the present invention; Figure 3 is a schematic illustration of a two-state logic structure utilised in the preferred emobodiment; Figure 4 shows more detail of a step of the method shown in Figure 2; and Figure 5, which is a schematic illustration of an apparatus constructed and operative in accordance with a further embodiment of the invention.
-6 DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
The present invention alleviates the disadvantages of the prior art by providing a method which utilizes conventional vocoder output data of a voice related stream for detecting voice activity therein. According to one aspect of the present invention, the method for voice activity detection (VAD) is based on the analysis of audio parameters such as Line Spectral Frequencies (LSF) parameters. The detection is based on a stationarity estimate of spectral characteristics of the incoming speech frames, which is represented by LSF parameters.
Reference is now made to Figure 1, which is a schematic illustration of an apparatus, generally referenced 100, constructed and operative in accordance with a preferred embodiment of the present invention. Apparatus 100 includes two delay arrays 102 and 110, a plurality of distance measure units 106A, 10613, 106C and 1061), an averaging unit 108, a subtraction unit 114 and decision logic unit (DLU) 116. Delay array 102 includes a plurality of delay units 104A, 10413, 104C, 104D, 104E, 104F and 104G, all connected in series, so that each adds a further delay to the previous one.
Delay array 110 includes a plurality of delay units, 11 2A, 11 2B, 11 2C and 11 2D, all connected in series, so that each adds a further delay to the previous one. Apparatus 100 is further connected to a Line Spectral Frequencies (LSF) generation unit 120, which can be a part of the voice encoder (vocoder) apparatus of an audio system. The I-SF unit 120 produces LSF values for each received audio frame. It is noted that LSF unit 120 is only one example for an audio parameter generation unit.
The output of the LSF unit 120 is coupled to the input of delay unit 104A. The input of each of delay units 104A, 10413, 104C and 104D is connected to a respective one of distance measure units 106A, 10613, 106C and 1061). For example, the input of delay unit 104A is connected to distance measure unit 106A.
The outputs of each of the delay units is connected to a distance lo measure unit. Delay unit 104A has its output connected to distance measure unit 10613. Unit 104B has its output connected to unit 106C. Unit 104C has its output connected 106D.
The output of delay units 104D, 104E, 104F and 104G is connected to a respective one of distance measure units 106A, 10613, 106C and 106D. For example, the output of delay unit 104D is connected to distance measure unit 106A. Hence, the LSF value l(n) at the input of delay unit 104A is associated with the value 1 (n-4) at the output of delay unit 1041).
Similarly, each of the LSF values E (n-1), L (n-2) and L (n-3) is associated with a respective one of LSF values L (n-5), L (n-6) and L (n7), at a respective one of distance measure units 10613, 106C and 1061). According to another embodiment of the invention (not shown), the system includes a different number of delay units and can combine more than two LSF values, which are at different distances from each other, such as (n) + 1 (n-4) + L (n-6).
The distance measure units 106A, 10613, 106C and 106D are all connected to the averaging unit 108. Averaging unit 108 is further connected to delay unit 112A, subtraction unit 114 and to DLU 116. The output of each of delay units 112A, 11213, 112C and 112D is connected to DLU 116. The output of delay unit 112A is further connected to the subtraction unit 114.
Reference is further made to Figure 2, which is an illustration of a method for operating the apparatus 100 of Figure 1, operative in accordance with another preferred embodiment of the present invention.
In step 150 a plurality of audio parameters are received. Each of the audio parameters is related to a predetermined audio frame. In the present example, the audio parameters include LSF values, which represent the short-time frequency spectrum characteristics of the signal envelope for each audio frame. With respect to Figure 1, delay unit 104A and distance measure unit 106A receive LSF values 1(n)where L is a vector L(n) = [1, (n), i2(n),...,1,(n)l, n denotes the index of a selected frame and N denotes the number of spectrum frequencies within an LSF vector.
It is noted that LSF parameters are derived from the Linear Prediction Coefficients (LPC's), which are widely used by many modern -9- speech compression and analysis schemes and are discussed in detail in A. M. Kondoz, Digital Speech: Coding for Low Bit Rate Communications Systems, New York: John Wiley & Sons, 1994.
In step 152 the audio parameters are grouped according to a predetermined pattern of audio frames. In the present example, each audio frame is associated with a voice frame, which is four places ahead of it. Accordingly the audio parameters of audio frame n are grouped with the audio parameters of audio frame n-4. In general, the current frame LSF vector L(n) can be applied to an M,-stage (M, is odd) delay line lo (D0), where the delay line produces pairs of LSF vectors with delay of K=mi+1 2 It is noted that any other number can be used for the distance between the frames. In addition, further combinations can also be used such as combination (n,n-2,n-7) and the like.
Referring to Figure 1, distance measure unit 106A groups vector 1(n) of frame n with vector 1(n - 4) of frame n-4. Distance measure unit 106B groups vector 1(n-l)of frame n-1 with vector L(n-5)of frame n-5.
Distance measure unit 106C groups vector L(n - 2) of frame n-2 with vector L(n-6)of frame n-6. Distance measure unit 106D groups vector L(n-3)of frame n-3 with vector L(n - 7) of frame n-7.
In step 154, a characteristic value is determined for each group of audio values. In the present example, each distance measure unit 106A, 10613, 106C and 106D performs a two-stage operation. The first operation includes generating a vector =[v,,v,...v,] where each of the 5 components v(i) of the vector are determined as follows:
v(i)=1i(j)x-V'1 li'(j-K)-l-li2(j)xli(j-K) i=1... N j=n... (n-3) (1) The second operation includes a transformation of vector to a vector b according to the following expression:
Note the change in formula: d(i)= Ycj v 2i (i) i=1... N (2) j=1 where cj are coefficients and T is the number of the elements in the summation.
The distance measure units 106A, 10613, 106C and 106D provide the.5 vectors b(n) b(n-1) J(n-2) f) n - M1 - to 2 averaging unit 108.
In step 156, an average value is determined for all of the present characteristic values. In the present example, averaging unit 108 applies the averaging expression which is as follows:
M'-1 2 2 a(n) 1: Idi (n - j) (3) M1 - 1 i=1 j=0 The measure a(n) is applied to a secondM2-stage delay line. With reference to Figure 1, the delay line includes four delay units 112A, 11213, 112C and 112D. The delay units 112A, 11213, 112C and 112D provide a vector A(n) to DLU 116, where A(n) = [a(n) a(n - 1) a(n - 2) a(n-M2)]. Averaging unit 108 further provides the latest average value a(n) to DLU 116 and to subtraction unit 114. Delay unit 112A provides the previous average value a(n-1) to the subtraction unit 114. Subtraction unit 114 provides--a. -te- DL U 116, where g(n)=a(n)-a(n-1). It is noted that an additional signal energy values e(n) can also be provided as enabling switch to the DLU 116 (step 158) by connecting DLU 116 to a signal energy detector (not shown), which is usually present in most voice oriented communication systems.
In step 160, a decision is produced according to the values, which are present. Reference is now made to Figures 3 and 4.
With reference to Figure 3, ST1 (referenced 170) and ST2 (referenced 172) are states which indicate "speech-on" and "speech-off' modes, respectively. DEC1, DEC2, DEC3 and DEC4 are state transition decision functions, which in the present example, comply with the following rule: DEC3 = NOT (DEC1) and DEC4 = NOT (DEC2). The 20 implementation of each of the decision functions can be according to a Boolean expression, which compares the value e(n) and components of the averaging vector (n) with predetermined or variable threshold values.
It is noted that the decision logic can vary according to specific performance requirements to a trade-off between "false alarm", "miss detect" statistics, and the like. The logic can be either constant or adapted to other components such as background noise characteristic estimator, voicing mode if available, periodicity check, and the like. The instantaneous decision result can further be applied to an additional hangover function.
With reference to Figure 4, step 180 represents the initial stage of the decision phase, wherein the current state of the VAD (speech-on or speech off) is detected.- If the current state of the VAD is speech on, then the system 100 proceeds to step 182. Otherwise, the system 100 proceeds to step 186.
In step 182, compliance with a speech-on-to-off transition condition is detected. Such a condition includes a predetermined combination of a(n) and e(n) with respect to predetermined values (Note that the threshold can be adaptive in general case). When such compliance is detected then the system proceeds to step 184, which performs a transition in the VAD state to speech-off. Otherwise, this step is repeated until such compliance is detected. With reference to Figure 1, DLU 116 detects if the received values comply with the predetermined condition.
In step 186, compliance with a speech-off-to-on transition condition is detected. Such a condition includes another predetermined combination of a(n) and e(n) with respect to predetermined values. When such compliance is detected then the system proceeds to step 188, which performs a transition in the VAD state to speech-on. Otherwise, this step is repeated until such compliance is detected.
After performing a VAD mode transition (either of step 184 and 188), the system proceeds back to step 180.
Reference is now made to Figure 5, which is a schematic illustration of apparatus, generally referenced 200, constructed and operative in accordance with a further preferred embodiment of the present invention. Apparatus 200 includes a multi stage delay unit 202, two delay arrays 204 and 210, a distance measure unit 206, an averaging unit 208, a subtraction unit 214 and decision logic unit (DLU) 216. Delay array 204 includes a plurality of delay units 218A, 218B and 218M2, all connected in series, so that each adds a further delay to the previous one.
It is noted that delay array 210 includes a plurality of delay units, 212A, 21213, 212C and 212D, all connected in series, so that each adds a further delay stage to the previous one. System 200 is further connected to a Line Spectral Frequencies (LSF) generation unit 220, which can be a part of the voice encoder (vocoder) apparatus of an audio system. The LSF unit 220 produces LSF values for each received audio frame.
The input of multi stage delay unit 202 is connected to LSF unit 220. The output of multi stage delay unit 202 is connected distance 5 measure unit 206. Hence, the LSF value L(n) at the input of delay unit 218A is associated with an M, stage delayed value L(n-M1) at the output of delay unit 202.
The output of distance measure unit 206 is connected to averaging unit 208 and to delay array 204. The output of each of the delay lo units 218A, 218B and 218M2 is connected to the averaging unit 208 so that each provides a previously delayed distance measure output value to the averaging unit 208. For example, delay unit 218A provides a distance measure value, which is respective of the pair, L(n-1) and L(n-Mj-1). Accordingly, only the first distance measure value has to be calculated and the rest are stored, delayed and provided to the averaging unit 208 at the appropriate timing.
It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove. Rather the scope of the present invention is defined only by the claims, which follow.
Claims (22)
1 Voice activity detection apparatus comprising:
a plurality of audio parameter delay units connected in series therebetween, the first of said audio parameter delay units being further connected to an audio parameter generator; a plurality of distance measure units, each connected to at least two of said delay units; and an averaging unit, connected to said distance measure units.
2. The voice activity detection apparatus according to claim 1, wherein said first audio parameter delay unit receives a plurality of audio parameter values, respective of a predetermined speech period, from said audio parameter generator, each of the rest of said audio parameter delay units receives said audio parameter values from a preceding one of said audio parameter delay units, each said distance measure units processes together audio parameter values received from selected ones of said audio parameter delay unit, connected thereto, thereby producing differential values, said averaging unit produces an average value from said differential values.
3. The voice activity detection apparatus according to claim 1, further comprising:
a plurality of averaging delay units connected in series therebetween, the first of said averaging delay units being further connected to the output of said averaging unit; and a digital logic unit connected to said averaging delay units.
4. The voice activity detection apparatus according to claim 3, wherein said first averaging delay unit receives a plurality of processed audio parameter average values from said averaging unit, each said delay units delaying each said processed audio parameter average values, said digital logic unit receives a plurality of successive processed audio parameter average values, the latest of said successive processed audio parameter average values received from said averaging unit and the rest of said successive processed audio parameter average values received from said averaging delay unit, said digital logic unit processing said successive processed audio parameter average values thereby producing a speech presence indication.
5. The voice activity detection apparatus according to claim 3, wherein said first audio parameter delay unit receives a plurality of audio parameter values from said audio parameter generator, each of the rest of said audio parameter delay units receives said audio parameter values from a preceding one of said audio parameter delay units, and each said distance measure units processes together audio parameter values received from selected ones of said audio parameter delay unit, connected thereto, thereby producing differential values, said averaging unit produces a processed audio parameter average value from each set of said differential values, and wherein said first averaging delay unit receives said processed audio parameter average values from said averaging unit, each said delay units delaying each said processed audio parameter average values, said digital logic unit receives a plurality of successive processed audio parameter average values, the latest of said successive processed audio parameter average values received from said averaging unit and the rest of said successive processed audio parameter average values received from said averaging delay unit, said digital logic unit processes said successive processed audio parameter average values thereby producing a speech presence indication.
6. The voice activity detection apparatus according to claim 1, wherein said audio parameter include line spectral frequencies.
7. The voice activity detection apparatus according to claim 1, further comprising an audio parameter generator, connected to said first one of said audio parameter delay units.
8. The voice activity detection apparatus according to claim 7, wherein said audio parameter generator comprises a line spectral frequencies generator.
9. The voice activity detection apparatus according to claim 3, further comprising a subtraction unit connected between the input and output of said first averaging delay unit and further to said digital logic unit, is wherein said subtraction unit produces difference values from processed audio parameter average values received from said averaging unit and from processed audio parameter average values delayed by said first averaging delay unit, and wherein said digital logic unit processes said difference values together with said successive processed audio parameter average values thereby producing a speech presence indication.
10. Voice activity detection apparatus comprising:
a plurality of audio parameter delay units connected in series therebetween, the first of said audio parameter delay units being further connected to an audio parameter generator; a distance measure unit, connected to at least two of said delay units; a plurality of distance measure delay units connected in series therebetween, the first of said distance measure delay units being further connected to said distance measure unit; and an averaging unit, connected to said distance measure unit and to said distance measure delay units.
11. The voice activity detection apparatus according to claim 9, wherein said first audio parameter delay unit receives a plurality of audio parameter values from said audio parameter generator, each of the rest of said audio parameter delay units receives said audio parameter values from a preceding one of said audio parameter delay units, said distance measure unit processes together audio parameter values received audio parameter delay unit connected thereby, thereby producing differential values, said distance measure delay units delaying said differential values, and said averaging unit producing an average value from each set of said differential values.
12. The voice activity detection apparatus according to claim 9, further comprising:
a plurality of averaging delay units connected in series therebetween, the first of said averaging delay units being further connected to the output of said averaging unit; and a digital logic unit connected to said averaging delay units.
13. The voice activity detection apparatus according to claim 12, wherein said first averaging delay unit receives a plurality of processed audio parameter average values from said averaging unit, each said delay units delays each said processed audio parameter average values, said digital logic unit receives a plurality of successive processed audio parameter average values, the latest of said successive processed audio parameter average values received from said averaging unit and the rest of said successive processed audio parameter average values received from said averaging delay unit, said digital logic unit processes said successive processed audio parameter average values thereby producing a speech presence indication.
14. The voice activity detection apparatus according to claim 12, wherein said first audio parameter delay unit receives a plurality of audio parameter values from said audio parameter generator, each of the rest of said audio parameter delay units receives said audio parameter values from a preceding one of said audio parameter delay units, said distance measure unit processes together audio parameter values received audio parameter delay unit connected thereby, thereby producing differential values, said distance measure delay units delaying said differential values, and said averaging unit producing an average value from each set of said differential values, and wherein said first averaging delay unit receives said processed audio parameter average values from said averaging unit, each said delay units delays each said processed audio parameter average values, said digital logic unit receives a plurality of successive processed audio parameter average values, the latest of said successive processed audio parameter average values received from said averaging unit and the rest of said successive processed audio parameter average values received from said averaging delay unit, said digital logic unit processes said successive processed audio parameter average values thereby producing a speech presence indication.
15. The voice activity detection apparatus according to claim 9, wherein said audio parameter includes line spectral frequencies.
16. The voice activity detection apparatus according to claim 9, further comprising an audio parameter generator, connected to said first one of said audio parameter delay units.
17. The voice activity detection apparatus according to claim 16, wherein said audio parameter generator comprises a line spectral frequencies generator.
18. The voice activity detection apparatus according to claim 12, further comprising a subtraction unit connected between the input and output of said first averaging delay unit and further to said digital logic unit, wherein said subtraction unit produces difference values from processed audio parameter average values received from said averaging unit and from processed audio parameter average values delayed by said first averaging delay unit, and wherein said digital logic unit processes said difference values together with said successive processed audio parameter average values thereby producing a speech presence indication. -23-
19. A method for detecting speech activity, comprising the steps of:
grouping audio parameters, which are associated with a predetermined combination of audio frames, thereby producing a plurality of groups; determining a characteristic value for each of said groups; determining an average value for each of a plurality of selections of a plurality of said characteristic values; and determining the presence of speech activity from selected ones of said average values.
20. The method according to claim 19, further comprising the step of detecting the energy of audio samples associated with said audio parameters, prior to said step of determining the presence of speech activity.
21. The method according to claim 19, further comprising the preliminary step of receiving said audio parameters from an audio generator.
22. The method according to claim 19, further comprising the preliminary step of producing said audio parameters from a plurality of audio samples.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0006312A GB2360428B (en) | 2000-03-15 | 2000-03-15 | Voice activity detection apparatus and method |
DE60133998T DE60133998D1 (en) | 2000-03-15 | 2001-03-14 | METHOD AND DEVICE FOR LANGUAGE ACTIVITY DETECTION |
AT01958309T ATE395683T1 (en) | 2000-03-15 | 2001-03-14 | METHOD AND DEVICE FOR VOICE ACTIVITY DETECTION |
AU2001280027A AU2001280027A1 (en) | 2000-03-15 | 2001-03-14 | Voice activity detection apparatus and method |
EP01958309A EP1269462B1 (en) | 2000-03-15 | 2001-03-14 | Voice activity detection apparatus and method |
PCT/IB2001/001603 WO2001080220A2 (en) | 2000-03-15 | 2001-03-14 | Voice activity detection apparatus and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0006312A GB2360428B (en) | 2000-03-15 | 2000-03-15 | Voice activity detection apparatus and method |
Publications (3)
Publication Number | Publication Date |
---|---|
GB0006312D0 GB0006312D0 (en) | 2000-05-03 |
GB2360428A true GB2360428A (en) | 2001-09-19 |
GB2360428B GB2360428B (en) | 2002-09-18 |
Family
ID=9887716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0006312A Expired - Fee Related GB2360428B (en) | 2000-03-15 | 2000-03-15 | Voice activity detection apparatus and method |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1269462B1 (en) |
AT (1) | ATE395683T1 (en) |
AU (1) | AU2001280027A1 (en) |
DE (1) | DE60133998D1 (en) |
GB (1) | GB2360428B (en) |
WO (1) | WO2001080220A2 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0785419A2 (en) * | 1996-01-22 | 1997-07-23 | Rockwell International Corporation | Voice activity detection |
WO1999031655A1 (en) * | 1997-12-12 | 1999-06-24 | Motorola Inc. | Apparatus and method for detecting and characterizing signals in a communication system |
WO2000017856A1 (en) * | 1998-09-18 | 2000-03-30 | Conexant Systems, Inc. | Method and apparatus for detecting voice activity in a speech signal |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2317084B (en) * | 1995-04-28 | 2000-01-19 | Northern Telecom Ltd | Methods and apparatus for distinguishing speech intervals from noise intervals in audio signals |
-
2000
- 2000-03-15 GB GB0006312A patent/GB2360428B/en not_active Expired - Fee Related
-
2001
- 2001-03-14 AU AU2001280027A patent/AU2001280027A1/en not_active Abandoned
- 2001-03-14 AT AT01958309T patent/ATE395683T1/en not_active IP Right Cessation
- 2001-03-14 EP EP01958309A patent/EP1269462B1/en not_active Expired - Lifetime
- 2001-03-14 WO PCT/IB2001/001603 patent/WO2001080220A2/en active IP Right Grant
- 2001-03-14 DE DE60133998T patent/DE60133998D1/en not_active Expired - Lifetime
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0785419A2 (en) * | 1996-01-22 | 1997-07-23 | Rockwell International Corporation | Voice activity detection |
WO1999031655A1 (en) * | 1997-12-12 | 1999-06-24 | Motorola Inc. | Apparatus and method for detecting and characterizing signals in a communication system |
WO2000017856A1 (en) * | 1998-09-18 | 2000-03-30 | Conexant Systems, Inc. | Method and apparatus for detecting voice activity in a speech signal |
Also Published As
Publication number | Publication date |
---|---|
GB0006312D0 (en) | 2000-05-03 |
ATE395683T1 (en) | 2008-05-15 |
WO2001080220A3 (en) | 2002-05-23 |
DE60133998D1 (en) | 2008-06-26 |
EP1269462B1 (en) | 2008-05-14 |
WO2001080220A2 (en) | 2001-10-25 |
GB2360428B (en) | 2002-09-18 |
EP1269462A2 (en) | 2003-01-02 |
AU2001280027A1 (en) | 2001-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0909442B1 (en) | Voice activity detector | |
CN101826892B (en) | Echo canceller | |
EP0784311B1 (en) | Method and device for voice activity detection and a communication device | |
US5406635A (en) | Noise attenuation system | |
JP3423906B2 (en) | Voice operation characteristic detection device and detection method | |
JP3363336B2 (en) | Frame speech determination method and apparatus | |
US20050108004A1 (en) | Voice activity detector based on spectral flatness of input signal | |
EP3493205B1 (en) | Method and apparatus for adaptively detecting a voice activity in an input audio signal | |
JP3878482B2 (en) | Voice detection apparatus and voice detection method | |
US20060053009A1 (en) | Distributed speech recognition system and method | |
EP1229520A2 (en) | Silence insertion descriptor (sid) frame detection with human auditory perception compensation | |
CZ286743B6 (en) | Voice detector | |
US5533133A (en) | Noise suppression in digital voice communications systems | |
US4719649A (en) | Autoregressive peek-through comjammer and method | |
US6876965B2 (en) | Reduced complexity voice activity detector | |
JPH08221097A (en) | Detection method of audio component | |
GB2360428A (en) | Voice activity detection | |
JPH0844395A (en) | Voice pitch detecting device | |
Beritelli et al. | A low‐complexity speech‐pause detection algorithm for communication in noisy environments | |
JP3255077B2 (en) | Phone | |
JP3355473B2 (en) | Voice detection method | |
CN110910899B (en) | Real-time audio signal consistency comparison detection method | |
JPH09113350A (en) | Method and apparatus for predicting average level of background noise | |
JPH0875842A (en) | Radar signal processor | |
JPH0832527A (en) | Fading pitch estimating device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PCNP | Patent ceased through non-payment of renewal fee |
Effective date: 20100315 |