WO1995002239A1 - Commande de gain automatique activee par la voix - Google Patents
Commande de gain automatique activee par la voix Download PDFInfo
- Publication number
- WO1995002239A1 WO1995002239A1 PCT/US1994/006281 US9406281W WO9502239A1 WO 1995002239 A1 WO1995002239 A1 WO 1995002239A1 US 9406281 W US9406281 W US 9406281W WO 9502239 A1 WO9502239 A1 WO 9502239A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- input audio
- gain
- background noise
- component
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 claims abstract description 67
- 238000000034 method Methods 0.000 claims abstract description 19
- 230000007423 decrease Effects 0.000 claims description 7
- 230000000737 periodic effect Effects 0.000 claims description 7
- 230000003247 decreasing effect Effects 0.000 claims description 5
- 239000011295 pitch Substances 0.000 claims 5
- 230000011664 signaling Effects 0.000 claims 2
- 238000001914 filtration Methods 0.000 claims 1
- 238000001228 spectrum Methods 0.000 description 77
- 238000001514 detection method Methods 0.000 description 30
- 238000013459 approach Methods 0.000 description 9
- 230000003595 spectral effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000005086 pumping Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000004378 air conditioning Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Definitions
- the invention relates to automatically controlling the gain of an audio amplifier.
- the gain of an audio amplifier is the ratio of the volume of the output of the audio amplifier relative to the volume of the input of the audio amplifier.
- One approach used in maintaining the volume of speech at the output of the audio channel at a relatively constant level is to control the gain of the audio amplifier so that the energy output of the audio channel remains at a constant volume.
- This approach is problematic because it increases the gain when a person stops talking, and thereby increases the volume of the background noise.
- the constant change in background noise level is called "pumping" and tends to be quite distracting. Summary of the Invention The invention offers an improved approach to maintaining the volume of speech at the output of an audio channel at a relatively constant level.
- the improved approach offers an advantage over prior approaches that tried to equalize the volume of the audio output signal based upon all sounds (i.e., speech, chairs moving, background noise, walking sounds, etc.), when what is really desired in a telecommunications environment is constant volume on speech only.
- the invention also avoids the problem of pumping.
- the invention features a device for automatically controlling the volume of an output audio signal that is generated from an input audio signal by an amplifier.
- the device includes a detector that signals when the input audio signal includes a desired component such as speech and a gain controller that only increases the gain of the amplifier when the detector signals that the input audio signal includes speech.
- a gain controller that only increases the gain of the amplifier when the detector signals that the input audio signal includes speech.
- the device also includes an estimator that generates an estimate of a background noise component of the input audio signal.
- the detector compares the input audio signal to the estimate of the background noise component and, when the input audio signal is substantially equal to the estimate of the background noise component, signals that the input audio signal does not include speech.
- the detector subtracts the estimated background noise component from a representation of the input audio signal and examines the results of the subtraction to determine whether the input audio signal includes speech.
- This representation of the input audio signal is generated by performing a fast fourier transform ("FFT") on the input audio signal.
- FFT fast fourier transform
- Use of an FFT is computationally efficient because optimized assembly code for performing FFTs is commonly available.
- the invention has been efficiently implemented using less than 15 percent of the instruction cycles of a Texas Instruments TMS320C31 Floating Point Digital Signal Processor, clocked at 40 mHz, using 10,000 32-bit words of memory.
- the gain controller controls the gain in other ways. First, it decreases the gain of the amplifier when the product of the loudness of the input audio signal and the gain is above a predetermined level. The gain controller determines the loudness of the input audio signal by measuring the peak energy of the input audio signal. Second, the gain controller only increases the gain of the amplifier when the product of the loudness of the input audio signal and the gain is below a predetermined level and the detector signals that the input audio signal includes speech. Third, the gain controller decreases the gain of the amplifier when the product of the loudness of a background noise component of the input audio signal and the gain is above a predetermined level.
- the gain controller decreases the gain of the amplifier when, within a predetermined period, the detector does not signal that the input audio signal includes the desired speech component.
- this last method of gain control would seem to produce pumping, it has the opposite effect because it serves to decrease the volume of background noise when speech is not present, which is the period during which the background noise would be most noticeable.
- FIG. 1 is a block diagram of a system using a voice-activated automatic gain control according to the invention.
- Fig. 2 is a block diagram of the voice-activated automatic gain control used in the system of Fig. 1.
- Fig. 3 is a block diagram of a voiced segment detector of the voice activated automatic gain control of Fig. 2.
- Fig. 4 is a block diagram of a background noise estimator of the voiced segment detector of Fig. 3.
- Fig. 5 is a flowchart of the procedure implemented by a stationary estimator of the background noise estimator of Fig. 4.
- Fig. 6 is a flowchart of the procedure implemented by speech detection logic of the voiced segment detector of Fig. 3.
- Fig. 7 is a block diagram of the gain control logic of the voice activated automatic gain control of Fig. 2.
- a voice transmission system 10 includes a voice-activated automatic gain control 12, an analog-to-digital converter 18, an amplifier 14, and a digital-to-analog converter 24.
- Voice-activated automatic gain control 12 automatically adjusts the gain of amplifier 14 so that the volume of a speech component of an analog audio output signal 26 remains at a relatively constant level.
- analog-to-digital converter 18 converts an analog audio input signal 16 into a digital signal on a line 20 that is divided into frames, with each frame having a 20 ms duration. Because analog-to-digital converter 18 samples input signal 16 at a sampling rate of 16 kHz, each frame of the digital signal on line 20 includes 320 samples.
- the digital signal on line 20 is input to both voice- activated automatic gain control 12, which uses the digital signal on line 20 to produce a gain control signal on line 28, and amplifier 14, which amplifies the digital signal on line 20 in response to the gain control signal on line 28 to produce an amplified digital signal on line 22.
- digital-to-analog converter 24 converts the amplified digital signal on line 22 into analog audio output signal 26.
- voice-activated automatic gain control 12 includes a voiced segment detector 30 and a gain control logic 32.
- Voiced segment detector 30 looks for the vowel sounds of human voiced speech (as opposed to consonant sounds like "she" which have no periodicity) and discriminates against non-human sounds such as doors closing, footsteps, finger snaps, and paper shuffling.
- voiced segment detector 30 detects human speech, it produces a speech detection signal on line 44.
- Gain control logic 32 relies on the speech detection signal on line 44 in producing the gain control signal on line 28.
- a windowing function 34 reduces the effects of discontinuities introduced at the beginning and end of the frames of the digital signal on line 20 by converting each frame of the digital signal on line 20 into a windowed frame on line 38.
- windowing function 34 combines each frame of the digital signal on line 20 with a portion from the end of the immediately preceding frame of the digital signal on line 20 to produce the windowed frame on line 38.
- the duration of this portion is chosen such that the windowed frame on line 38 encompasses two speech pitch periods, which ensures that the entire contents of a particular pitch period will always appear in at least one windowed frame on line 38.
- each frame of the digital signal on line 20 is combined with the last 12 ms of the preceding frame to produce windowed frames on line 38 having durations of 32 ms.
- each windowed frame on line 38 includes the 320 samples from a frame of the digital signal on line 20 in combination with the last 192 samples of the immediately preceding frame of the digital signal on line 20.
- the 32 ms duration of each windowed frame ensures that detection of speech having a pitch period of 16 ms or less — corresponding to a pitch frequency of 62.5 Hz or more — will not be affected by frame discontinuities.
- Most males have a pitch frequency somewhat higher than 80 Hz, with the mean at about 100 Hz, and most females have a pitch frequency even higher than that.
- each 512 sample windowed frame on line 38 is transformed using a fast fourier transform ("FFT") 36 to produce a 257 component frequency spectrum 40.
- the frequency components of each frequency spectrum 40 are equally spaced in a range from 0 Hz to 8 kHz (half of the 16 kHz sampling frequency) .
- Frequency spectra 40 are then input to both voiced segment detector 30 and gain control logic 32.
- voiced segment detector 30 works by determining whether sequences of windowed frames on line 38 resulting in frequency spectra 40 contain periodic signals and whether the periodicity of the periodic signals remains relatively constant over the sequence of windowed frames on line 38. When windowed frames on line 38 meet these criteria, voiced segment detector 30 signals that speech is present through the speech detection signal on line 44. While some non- speech sounds such as, for example, musical instruments, meet these criteria, most non-speech sounds that occur in applications such as teleconferencing will not meet these criteria.
- Non-speech sounds occurring in teleconferencing applications can be broadly categorized as either constant background noise having relatively constant spectra (i.e., noise produced by fans, computer drives, or electronic circuits) or intermittent noise having spectra that change in nature over time (i.e., finger snaps, footsteps, paper shuffling, or doors opening) .
- voiced segment detector 30 includes a background noise estimator 46 that analyzes each frequency spectrum 40 and produces a background noise estimate 42 that is an estimate of the average magnitude of each component of frequency spectrum 40 attributable to constant background noise.
- Background noise estimator 46 continually monitors the frequency spectra 40 and automatically updates background noise estimate 42 in response to changed conditions such as, for example, air conditioning fans turning on and off.
- Background noise estimator 46 develops background noise estimate 42 using two approaches.
- a stationary estimator 92 generates a stationary estimate 98 by examining one second intervals of frequency spectra 40 that include only constant background noise, if such intervals exist.
- a running minimum estimator 94 develops a running estimate 100 by examining ten seconds intervals of frequency spectra 40 having unrestricted contents.
- An estimate selector 96 selects between stationary estimate 98 and running estimate 100 to produce background noise estimate 42.
- Stationary estimator 92 looks for long sequences of frequency spectra 40 in which the spectral shape of each frequency spectrum 40 is substantially similar to that of the other frequency spectra 40, which indicates that the frequency spectra 40 only contain background noise.
- stationary estimator 92 When stationary estimator 92 detects a sequence of frequency spectra 40 that meet this condition, stationary estimator 92 takes the average magnitude of each frequency component of the frequency spectra 40 in the central part of the sequence. Stationary estimator 92 excludes the frequency spectra 40 at the beginning and end of the sequence because those frequency spectra 40 potentially contain low level speech components.
- Stationary estimator 92 uses the procedure illustrated in Fig. 5 to generate stationary estimate 98. For each frequency spectrum 40, stationary estimator 92 first generates the average spectral shape of previous frequency spectra 40 (step 102) .
- the average spectral shape a simplified summary of the frequency spectrum 40, includes a numerical value for each of the eight 1000 Hz frequency bands of frequency spectrum 40 and is generated according to equation 1:
- N d (F c ) 0.25 ⁇ (R 2 (k, F) + i 2 (k, F) ) (1)
- F designates a frequency spectrum 40
- F c designates the current frequency spectrum 40
- i denotes a 1000 Hz frequency band
- k ⁇ i * 32
- 7c indexes the frequency components of a frequency spectrum 40
- R (k, F) and I (k, F) are the real and imaginary components of the Jcth frequency component of a frequency spectrum 40.
- Si -F c ⁇ (R 2 " k, F c ) + I 2 (k, F c ) ) (2 ) k - k
- F c designates the current frequency spectrum 40
- i denotes a 1000 Hz frequency band
- k ⁇ i * 32
- R (k, F c ) and I (k, F c ) are the real and imaginary components of the Jcth frequency component of frequency spectrum 40.
- stationary estimator 92 compares the spectral shape of frequency spectrum 40 to the average spectral shape of previous frequency spectra 40 using a lower threshold (step 106) . This comparison determines whether the frequency spectrum 40 differs from the average of the previous frequency spectra 40 by more than the lower threshold and is made according to equations 3 and 4:
- stationary estimator 92 classifies frequency spectrum 40 as being sufficiently different from previous frequency spectra 40 that it includes a signal other than background noise. From this, stationary estimator 92 determines that no stationary estimate can be developed (step 120) . Otherwise, stationary estimator 92 compares the spectral shape of frequency spectrum 40 to the average spectral shape of previous frequency spectra 40 using an upper threshold (step 110) according to equations 5 and 6:
- stationary estimator 92 classifies frequency spectrum 40 as having a signal and determines that no stationary estimate can be developed (step 120) . Otherwise, stationary estimator 92 classifies frequency spectrum 40 as being a noise spectrum (step 114) .
- stationary estimator 92 develops and outputs stationary estimate 98 (step 118) .
- Stationary estimator 92 does so by summing the tenth through the forty-first spectra of the fifty frequency spectra 40 according to equation 7:
- running minimum estimator 94 generates running estimate 100 by finding, for each frequency component of frequency spectra 40, the average value of the frequency components of the eight consecutive frequency spectra 40 that produce the minimum average value over the selected time duration. Put another way, for each frequency component 7c of the 500 frequency spectra included in a ten second interval, running minimum estimator 94 finds the F k that minimizes M k (F k ) of equation 8:
- estimate selector 96 sets background noise estimate 42 equal to stationary estimate 98, if a recent stationary estimate 98 is available.
- estimate selector 96 sets background noise estimate 42 equal to running estimate 100 if two conditions are met.
- the time elapsed since estimate selector 96 last set background noise estimate 42 equal to stationary estimate 98 must be more than ten seconds.
- the difference, D, between the background noise estimate 42 and the new running estimate 100 must exceed a predefined threshold.
- the threshold difference, D a sum of the squares of the relative difference between each frequency component of the background noise estimate 42 and its corresponding frequency component in the running estimate 100, is defined according to equation 9:
- N k are the frequency components of background noise estimate 42
- M k are the frequency components of running estimate 100.
- a signal versus noise detector 48 compares each frequency spectrum 40 with a corresponding background noise estimate 42. If frequency spectrum 40 is sufficiently greater than the corresponding background noise estimate 42, detector 48 determines that a signal other than constant background noise is present and transmits frequency spectrum 40 to a magnitude squaring and noise subtraction unit 50 for further processing. Otherwise, detector 48 determines that only constant background noise is present and transmits a signal on line 58 that causes voiced segment detector 30 to signal that speech is not present through the speech detection signal on line 44. By transmitting the signal on line 58, detector 48 eliminates the need to further evaluate frequency spectrum 40.
- voiced segment detector 30 must determine whether the signal is speech or intermittent noise. To do so, voiced segment detector 30 determines the periodicity of the signal and whether this periodicity is similar to periodicities of previous windowed frames on line 38. Because intermittent noise generally lacks similar periodicity over time, voiced segment detector 30 designates a windowed frame on line 38 as containing speech upon detection of such similar periodicity. Voiced segment detector 30 uses a technique known as autocorrelation to detect and estimate the periodicity of a windowed frame on line 38. A central theorem of signal processing is that convolution in the time domain is equivalent to multiplication in the frequency domain.
- the autocorrelation of a windowed frame on line 38 (which is equivalent to the convolution of the windowed frame on line 38 with a time-reversed version of itself) is equivalent to multiplying the frequency spectrum 40 corresponding to the windowed frame on line 38 by the complex conjugate of the same frequency spectrum 40, and then taking the inverse fast fourier transform ("IFFT") of the results of the multiplication.
- IFFT inverse fast fourier transform
- Magnitude squaring and noise subtraction unit 50 performs the first portion of the autocorrelation by squaring the magnitude of each component of frequency spectrum 40, which is equivalent to multiplying frequency spectrum 40 by the complex conjugate of itself.
- magnitude squaring and noise subtraction unit 50 generates the squared magnitudes S of frequency spectrum 40 using equation 10:
- magnitude squaring and noise subtraction unit 50 subtracts magnitude-squared frequency components N of background noise estimate 42 from the magnitude-squared frequency components S of frequency spectra 40 using equation 11:
- Magnitude squaring and noise subtraction unit 50 outputs the results of the subtraction as a magnitude-squared, noise-reduced frequency spectrum 60.
- High pass filter 52 operates according to equation 12:
- IFFT inverse fast fourier transform
- the output 62 of high pass filter 52 is transformed into a time domain signal 64 having 512 samples.
- IFFT 54 is taken on output 62 with each H k component from equation 12 representing the real part of the Tcth frequency component of output 62 and the imaginary part of the Tcth frequency component being zero.
- the output 64 of IFFT 54 approximates an autocorrelation of the periodic component of a windowed frame on line 38.
- Output 64 is an approximation because of the omission of appended zeroes to windowed frame on line 38 prior to taking FFT 36 to correct for circular convolution artifacts.
- windowing unit 34 substantially reduces the circular convolution artifacts by combining each frame of the digital signal on line 20 with a portion from the end of the immediately preceding frame of the digital signal on line 20, and thereby eliminates the need for appending zeroes. This, in turn, eliminates a significant computational burden that would have resulted from appending the zeroes.
- speech detection logic 56 generates the speech detection signal on line 44.
- speech detection logic 56 examines the signal on line 58 from detector 48 to determine whether frequency spectrum 40 contains just constant background noise or possibly contains speech (step 66) . If frequency spectrum 40 only contains background noise, speech detection logic 56 declares no voiced segment and sets the speech detection signal on line 44 accordingly (step 80) .
- speech detection logic 56 finds the maximum average peak of output 64 for lags of from 70 to 220 samples, which corresponds to the range of human pitch (step 68) . To find the maximum average peak for the appropriate lags, speech detection logic 56 generates the average magnitude for each lag of all pairs of samples spaced by between 70 and 220 other samples (the lag) . Speech detection logic 56 then selects the maximum of these average magnitudes.
- speech detection logic 56 having determined the lag from output 64 having the maximum magnitude, divides the selected average magnitude by this maximum magnitude, and examines the results (step 70) . If the ratio of these two magnitudes is less than or equal to a predetermined value, 0.7 in the illustrated embodiment, speech detection logic 56 declares no voiced segment and sets the speech detection signal on line 44 accordingly (step 80) . If the ratio of the two magnitudes is greater than 0.7, speech detection logic 56 determines that output 64 is a periodic frame having pitch period equal to the number of samples between the pair of samples having the maximum average amplitude.
- speech detection logic 56 determines whether more than two of the previous ten outputs of IFFT 54 have had pitch (step 72) . If so, speech detection logic 56 generates the standard deviation of the pitch periods of the previous ten outputs that have had pitch (step 74) and examines the generated standard deviation (step 76) . If the standard deviation is less than a predetermined value, fifteen samples in the illustrated embodiment, this means that an extended sound having consistent pitch in the range of human speech is present. In this case, speech detection logic 56 declares a voiced segment and sets the speech detection signal on line 44 accordingly (step 78) . If the standard deviation is greater than or equal to the predetermined value, or if two or less of the previous ten outputs of IFFT 54 have had pitch, speech detection logic 56 declares no voiced segment and sets the speech detection signal on line 44 accordingly (step 80) .
- gain control logic 32 determines the peak energy of the digital signal on line 20.
- the peak energy is the energy of the frame from the previous two seconds of the digital signal on line 20 that has the maximum energy of any of those frames (step 82) . If the peak energy is greater than that necessary for analog audio output signal 26 to be at a suitable volume, gain control logic 32 sets the gain control signal on line 28 to reduce the gain of amplifier 14 without regard to the speech detection signal on line 44 (step 84) .
- gain control logic 32 sets the gain control signal on line 28 to increase the gain of amplifier 14 so that the speech component of analog audio output signal 26 will be at a suitable volume (step 86) .
- Gain control logic 32 further limits the gain of amplifier 14 to prevent the constant background noise component of analog audio output signal 26 from exceeding a suitable volume (step 88) .
- Gain control logic 32 examines background noise estimate 42 and sets the gain control signal on line 28 to limit the gain of amplifier 14 accordingly. For example, if background noise estimate 42 indicates a high level of background noise, gain control logic 32 sets the gain control signal on line 28 so that amplifier 14 does not amplify the background noise above an acceptable volume. This limitation overrides any increase in gain that is indicated in step 86.
- step 86 indicates that the gain of amplifier 14 should be increased to bring the volume of the speech component of analog audio output signal 26 to a suitable volume
- this increase in gain would result in the volume of the background noise component of analog audio output signal 26 exceeding a suitable level
- gain control logic 32 sets the gain control signal on line 28 to reduce the gain of amplifier 14 so that the volume of any constant background noise component of analog audio output signal 26 is at a fairly low, unobtrusive level (step 90) . Therefore, the system of Figs. 1-7 offers an improved approach to maintaining the volume of speech at the output of an audio channel at a relatively constant, level and avoids the problem of pumping.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Control Of Amplification And Gain Control (AREA)
Abstract
Appareil (10) et procédé (30) de commande automatique du niveau sonore de la voix humaine dans un signal sonore de sortie (26) généré à partir d'un signal sonore d'entrée (16). Un contrôleur de gain (12) n'augmente le gain d'un amplificateur (14) utilisé pour produire le signal de sortie (26) à partir du signal d'entrée (16) que lorsqu'un détecteur (30) indique que le signal d'entrée (16) comprend la voix humaine.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US8805293A | 1993-07-07 | 1993-07-07 | |
US08/088,052 | 1993-07-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1995002239A1 true WO1995002239A1 (fr) | 1995-01-19 |
Family
ID=22209115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1994/006281 WO1995002239A1 (fr) | 1993-07-07 | 1994-06-03 | Commande de gain automatique activee par la voix |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1995002239A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0786920A1 (fr) * | 1996-01-23 | 1997-07-30 | Koninklijke Philips Electronics N.V. | Système de transmission de signaux correlés |
CN102654420A (zh) * | 2012-05-04 | 2012-09-05 | 惠州市德赛汽车电子有限公司 | 一种音量曲线自动化测试方法及其系统 |
WO2014043024A1 (fr) * | 2012-09-17 | 2014-03-20 | Dolby Laboratories Licensing Corporation | Surveillance à long terme de profils d'émission et d'activité vocale pour la régulation d'une commande de gain |
CN109817237A (zh) * | 2019-03-06 | 2019-05-28 | 小雅智能平台(深圳)有限公司 | 一种音频自动处理方法、终端及计算机可读存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4637402A (en) * | 1980-04-28 | 1987-01-20 | Adelman Roger A | Method for quantitatively measuring a hearing defect |
US4696040A (en) * | 1983-10-13 | 1987-09-22 | Texas Instruments Incorporated | Speech analysis/synthesis system with energy normalization and silence suppression |
US5014318A (en) * | 1988-02-25 | 1991-05-07 | Fraunhofer Gesellschaft Zur Forderung Der Angewandten Forschung E. V. | Apparatus for checking audio signal processing systems |
US5146504A (en) * | 1990-12-07 | 1992-09-08 | Motorola, Inc. | Speech selective automatic gain control |
US5157760A (en) * | 1990-04-20 | 1992-10-20 | Sony Corporation | Digital signal encoding with quantizing based on masking from multiple frequency bands |
US5293450A (en) * | 1990-05-28 | 1994-03-08 | Matsushita Electric Industrial Co., Ltd. | Voice signal coding system |
US5293588A (en) * | 1990-04-09 | 1994-03-08 | Kabushiki Kaisha Toshiba | Speech detection apparatus not affected by input energy or background noise levels |
-
1994
- 1994-06-03 WO PCT/US1994/006281 patent/WO1995002239A1/fr active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4637402A (en) * | 1980-04-28 | 1987-01-20 | Adelman Roger A | Method for quantitatively measuring a hearing defect |
US4696040A (en) * | 1983-10-13 | 1987-09-22 | Texas Instruments Incorporated | Speech analysis/synthesis system with energy normalization and silence suppression |
US5014318A (en) * | 1988-02-25 | 1991-05-07 | Fraunhofer Gesellschaft Zur Forderung Der Angewandten Forschung E. V. | Apparatus for checking audio signal processing systems |
US5293588A (en) * | 1990-04-09 | 1994-03-08 | Kabushiki Kaisha Toshiba | Speech detection apparatus not affected by input energy or background noise levels |
US5157760A (en) * | 1990-04-20 | 1992-10-20 | Sony Corporation | Digital signal encoding with quantizing based on masking from multiple frequency bands |
US5293450A (en) * | 1990-05-28 | 1994-03-08 | Matsushita Electric Industrial Co., Ltd. | Voice signal coding system |
US5146504A (en) * | 1990-12-07 | 1992-09-08 | Motorola, Inc. | Speech selective automatic gain control |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0786920A1 (fr) * | 1996-01-23 | 1997-07-30 | Koninklijke Philips Electronics N.V. | Système de transmission de signaux correlés |
CN102654420A (zh) * | 2012-05-04 | 2012-09-05 | 惠州市德赛汽车电子有限公司 | 一种音量曲线自动化测试方法及其系统 |
WO2014043024A1 (fr) * | 2012-09-17 | 2014-03-20 | Dolby Laboratories Licensing Corporation | Surveillance à long terme de profils d'émission et d'activité vocale pour la régulation d'une commande de gain |
US9521263B2 (en) | 2012-09-17 | 2016-12-13 | Dolby Laboratories Licensing Corporation | Long term monitoring of transmission and voice activity patterns for regulating gain control |
CN109817237A (zh) * | 2019-03-06 | 2019-05-28 | 小雅智能平台(深圳)有限公司 | 一种音频自动处理方法、终端及计算机可读存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11962279B2 (en) | Audio control using auditory event detection | |
JP4279357B2 (ja) | 特に補聴器における雑音を低減する装置および方法 | |
JP3626492B2 (ja) | 会話の品質向上のための背景雑音の低減 | |
KR100860805B1 (ko) | 음성 강화 시스템 | |
US8165875B2 (en) | System for suppressing wind noise | |
US8015002B2 (en) | Dynamic noise reduction using linear model fitting | |
US20090254340A1 (en) | Noise Reduction | |
US20120321095A1 (en) | Signature Noise Removal | |
US20050108004A1 (en) | Voice activity detector based on spectral flatness of input signal | |
US20080208572A1 (en) | High-frequency bandwidth extension in the time domain | |
US11183172B2 (en) | Detection of fricatives in speech signals | |
JPH06208395A (ja) | ホルマント検出装置及び音声加工装置 | |
WO1995002239A1 (fr) | Commande de gain automatique activee par la voix | |
JP2003510665A (ja) | 適応フィルタリングアルゴリズムを用いるデエッサーのための装置および方法 | |
JPH08221097A (ja) | 音声成分の検出法 | |
Chu | Voice-activated AGC for teleconferencing | |
CN113963699A (zh) | 一种金融设备智能语音交互方法 | |
Dai et al. | An improved model of masking effects for robust speech recognition system | |
JP3355473B2 (ja) | 音声検出方法 | |
JPH0424692A (ja) | 音声区間検出方式 | |
WO2000072305A2 (fr) | Procede et dispositif de reduction du bruit dans des signaux vocaux | |
Loizou et al. | A MODIFIED SPECTRAL SUBTRACTION METHOD COMBINED WITH PERCEPTUAL WEIGHTING FOR SPEECH ENHANCEMENT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WD | Withdrawal of designations after international publication |
Free format text: US |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |