WO2001029826A1 - Procede servant a mettre en application un suppresseur de bruit dans un systeme de reconnaissance vocale - Google Patents

Procede servant a mettre en application un suppresseur de bruit dans un systeme de reconnaissance vocale Download PDF

Info

Publication number
WO2001029826A1
WO2001029826A1 PCT/US2000/041316 US0041316W WO0129826A1 WO 2001029826 A1 WO2001029826 A1 WO 2001029826A1 US 0041316 W US0041316 W US 0041316W WO 0129826 A1 WO0129826 A1 WO 0129826A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
noise
speech
background noise
energy
Prior art date
Application number
PCT/US2000/041316
Other languages
English (en)
Inventor
Duanpei Wu
Miyuki Tanaka
Xavier Menendez-Pidal
Original Assignee
Sony Electronics Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Electronics Inc. filed Critical Sony Electronics Inc.
Priority to AU22973/01A priority Critical patent/AU2297301A/en
Publication of WO2001029826A1 publication Critical patent/WO2001029826A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • This invention relates generally to electronic speech detection systems, and relates more particularly to a method for implementing a noise suppressor in a speech recognition system.
  • Speech generally consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence.
  • speech detection systems typically determine the endpoints (the beginning and ending points) of a spoken utterance to accurately identify the specific sound data intended for analysis.
  • Conditions with significant ambient background-noise levels present additional difficulties when implementing a speech detection system.
  • Examples of such noisy conditions may include speech recognition in automobiles or in certain manufacturing facilities.
  • a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
  • noisy speech 1 12 of FIG. 1(a) is therefore typically comprised of several components, including speech 114 of FIG. (1 (b) and noise 116 of FIG. 1(c).
  • waveforms 112, 114, and 116 are presented for purposes of illustration only. The present invention may readily function and incorporate various other embodiments of noisy speech 112, speech 114, and noise 116.
  • SNR signal- to-noise ratio
  • Wiener filtering One known method for speech enhancement is Wiener filtering.
  • a feature extractor in a speech detector initially receives noisy speech data that is preferably generated by a sound sensor, an amplifier and an analog-to-digital converter.
  • the speech detector processes the noisy speech data in a series of individual data units called “windows" that each includes sub-units called "frames".
  • the feature extractor responsively filters the received noisy speech into a predetermined number of frequency sub-bands or channels using a filter bank to thereby generate filtered channel energy to a noise suppressor.
  • the filtered channel energy is therefore preferably comprised of a series of discrete channels which the noise suppressor operates on concurrently.
  • a noise calculator in the noise suppressor preferably calculates channel background noise values for each channel of the filter bank, and responsively stores the channel background noise values into a memory device.
  • a speech energy calculator in the noise suppressor preferably calculates speech energy values for each channel of the filter bank, and responsively stores the speech energy values into the memory device.
  • a weighting module in the noise suppressor advantageously calculates individual weighting values for each calculated channel energy value. In a first embodiment, the weighting module calculates weighting values whose various channel values are related to the reciprocal of a channel average background noise variance value for the corresponding channel. .
  • the weighting module may calculate the individual weighting values as being equal to the reciprocal of a minimum variance of channel background noise for the corresponding channel.
  • the weighting module therefore generates a total noise-suppressed channel energy that is the summation of each channel's channel energy value multiplied by that channel's calculated weighting value.
  • An endpoint detector then receives the noise-suppressed channel energy, and responsively detects corresponding speech endpoints.
  • a recognize (418) receives the speech endpoints from the endpoint detector, and also receives feature vectors from the feature extractor, and responsively generates a recognition result using the endpoints and the feature vectors between the endpoints.
  • FIG. 1(a) is an exemplary waveform diagram for one embodiment of noisy speech energy
  • FIG. 1(b) is an exemplary waveform diagram for one embodiment of speech energy without noise energy
  • FIG. 1(c) is an exemplary waveform diagram for one embodiment of noise energy without speech energy
  • FIG. 2 is a block diagram of one embodiment for a computer system, in accordance with the present invention.
  • FIG. 3 is a block diagram of one embodiment for the memory of FIG. 2, in accordance with the present invention.
  • FIG. 4 is a block diagram of one embodiment for the speech detector of FIG. 3;
  • FIG. 5 is a schematic diagram of one embodiment for the filter bank of the FIG. 4 feature extractor
  • FIG. 6 is a block diagram of one embodiment for the noise suppressor of FIG. 4, in accordance with the present invention.
  • FIG. 7 is a waveform diagram of one exemplary embodiment for detecting speech energy, in accordance with the present invention.
  • FIG. 8 is a flowchart for one embodiment of method steps for suppressing background noise in a speech detection system, in accordance with the present invention.
  • the present invention relates to an improvement in speech recognition systems.
  • the following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements.
  • Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments.
  • the present invention is not intended to be limited to the embodiment shown, but is to be accorded the widest scope consistent with the principles and features described herein.
  • the present invention includes a method for implementing a noise suppressor in a speech recognition system that comprises a filter bank for separating source speech data into discrete frequency sub-bands to generate filtered channel energy, and a noise suppressor for weighting the frequency sub-bands to improve the signal-to-noise ratio of the resultant noise- suppressed channel energy.
  • the noise suppressor preferably includes a noise calculator for calculating channel background noise values, and a weighting module for calculating and applying calculated weighting values to the filtered channel energy to generate the noise-suppressed channel energy.
  • FIG. 2 a block diagram of one embodiment for a computer system 210 is shown, in accordance with the present invention.
  • the FIG. 2 embodiment includes a sound sensor 212, an amplifier 216, an analog-to-digital converter 220, a central processing unit (CPU) 228, a memory 230, and an input/ output device 232.
  • CPU central processing unit
  • sound sensor 212 detects ambient sound energy and converts the detected sound energy into an analog speech signal which is provided to amplifier 216 via line 214.
  • Amplifier 216 amplifies the received analog speech signal and provides an amplified analog speech signal to analog-to-digital converter 220 via line 218.
  • Analog-to-digital converter 220 then converts the amplified analog speech signal into corresponding digital speech data and provides the digital speech data via line 222 to system bus 224.
  • CPU 228 may then access the digital speech data on system bus 224 and responsively analyze and process the digital speech data to perform speech detection according to software instructions contained in memory 230. The operation of CPU 228 and the software instructions in memory 230 are further discussed below in conjunction with FIGS. 3-8. After the speech data is processed, CPU 228 may then advantageously provide the results of the speech detection analysis to other devices (not shown) via input/ output interface 232.
  • Memory 230 may alternatively comprise various storage-device configurations, including Random-Access Memory (RAM) and non-volatile storage devices such as floppy-disks or hard disk- drives.
  • memory 230 includes a speech detector 310, energy registers 312, weighting value registers 314, and noise registers 316.
  • speech detector 310 includes a series of software modules which are executed by CPU 228 to analyze and detect speech data, and which are further described below in conjunction with FIG. 4. In alternate embodiments, speech detector 310 may readily be implemented using various other software and/ or hardware configurations.
  • Energy registers 312, weighting value registers 314, and noise registers 316 contain respective variable values which are calculated and utilized by speech detector 310 to suppress background noise according to the present invention. The utilization and functionality of energy registers 312, weighting value registers 314, and noise registers 316 are further described below in conjunction with FIGS. 6 through 8.
  • FIG. 4 a block diagram of one embodiment for the FIG. 3 speech detector 310 is shown.
  • speech detector 310 includes a feature extractor 410, a noise suppressor 412, an endpoint detector 414, and a recognizer 418.
  • analog-to-digital converter 220 (FIG. 2) provides digital speech data to feature extractor 410 within speech detector 310 via system bus 224.
  • a filter bank in feature extractor 410 then receives the speech data and responsively generates channel energy which is provided to noise suppressor 412 via path 428.
  • the filter bank in feature extractor 410 is a mel-frequency scaled filter bank which is further described below in conjunction with FIG. 5.
  • the channel energy from the filter bank in feature extractor 410 is also provided to a feature vector calculator in feature extractor 410 to generate feature vectors which are then provided to recognizer 418 via path 416.
  • the feature vector calculator is a mel-scaled frequency capture (mfcc) feature vector calculator.
  • noise suppressor 412 responsively processes the received channel energy to suppress background noise. Noise suppressor 412 then generates noise-suppressed channel energy to endpoint detector via path 430. The functionality and operation of noise suppressor 412 is further discussed below in conjunction with FIGS. 6 through 8.
  • Endpoint detector 414 analyzes the noise-suppressed channel energy received from noise suppressor 412, and responsively determines endpoints (beginning and ending points) for the particular spoken utterance represented by the noise-suppressed channel energy received via path 430. Endpoint detector 414 then provides the calculated endpoints to recognizer 418 via path 432. The operation of endpoint detector 414 is further discussed in co- pending U.S. Patent Application Serial No. 08/957,875, entitled “Method For Implementing A Speech Recognition System For Use During Conditions With Background Noise,” filed on October 20, 1997, which is hereby incorporated by reference.
  • recognizer 418 receives feature vectors via path 416 and endpoints via path 432, and responsively performs a speech detection procedure to advantageously generate a speech detection result to CPU 228 via path 424.
  • Verifier 440 preferably checks the segment of an utterance between the identified endpoints to determine whether the segment is a speech signal. This decision may be made based on the signal characteristics and a confidence index preferably generated using a confidence measure technique and a garbage modeling technique. Verifier 440 responsively generates an abort/ confirm signal to recognizer 418.
  • the foregoing confidence measure technique is further discussed in co-pending U.S. Patent Application Serial No. 09/553,985, entitled “System And Method For Speech Verification Using A Confidence Measure,” filed on April 20, 2000, which is hereby incorporated by reference.
  • filter bank 610 is a mel-frequency scaled filter bank with "p" channels (channel 0 (614) through channel p- 1 (622)).
  • filter bank 610 is equally possible.
  • filter bank 610 receives pre-emphasized speech data via path 612, and provides the speech data in parallel to channel 0 (614) through channel p-1 (622).
  • channel 0 (614) through channel p-1 (622) generate respective channel energies Eo through E p which collectively form the channel energy provided to noise suppressor 412 via path 428 (FIG. 4).
  • Filter bank 610 thus processes the speech data received via path 612 to generate and provide filtered channel energy to noise suppressor 412 via path 428.
  • Noise suppressor 412 may then advantageously suppress the background noise contained in the received channel energy, in accordance with the present invention.
  • FIG. 6 a block diagram of one embodiment for the FIG. 4 noise suppressor 412 is shown, in accordance with the present invention.
  • noise suppressor 412 preferably includes a noise calculator 634, a speech energy calculator 636, and a weighting module 638.
  • noise suppressor 412 preferably utilizes noise calculator 634 to identify and calculate channel background noise values for each channel of filter bank 610.
  • noise suppressor 412 preferably utilizes speech energy calculator 636 to calculate speech energy values for each channel of filter bank 610.
  • Noise suppressor 412 then preferably uses weighting module 638 to weight the channel speech energy from filter bank 610 with weighting values adapted to the channel background noise data to thereby advantageously increase the signal-to-noise ratio (SNR) of the channel energy.
  • SNR signal-to-noise ratio
  • the weighting values calculated and applied by weighting module 638 are preferably proportional to the SNRs of the respective channel energies.
  • noise suppressor 412 initially determines the channel energy for each of the channels transmitted from filter bank 610, and preferably stores corresponding channel energy values into energy registers 312 (FIG. 3).
  • Noise suppressor 412 also determines channel background noise values for each of the channels of filter bank 610, and preferably stores the channel background noise values into noise registers 316.
  • Weighting module 638 may then advantageously access the channel energy values and the channel background noise values to calculate weighting values that are preferably stored into weighting value registers 314. Finally, weighting module 638 applies the calculated weighting values to the corresponding channel energy values to generate noise-suppressed channel energy to endpoint detector 414 for use as endpoint detection parameters, in accordance with the present invention.
  • One embodiment for the performance of noise suppressor 412 may be illustrated by the following discussion. Let n denote an uncorrelated additive random noise vector from the background noise of the channel energy, let s be a random speech feature vector from the channel energy, and let y stand for a random noisy speech feature vector from the channel energy, all with dimension "p" to indicate the number of channels. Therefore, relationship of the foregoing variables may be expressed by the following equation:
  • weighting module 638 of the FIG. 6 embodiment primarily utilizes several principal weighting techniques.
  • be the estimated average energy vector of background noise n from the channel energy from filter bank 610, and let ⁇ be defined by the following formula.
  • SNR signal-to-noise ratio
  • weighting module 638 provides a method for calculating weighting values "w" whose various channel values are directly proportional to the SNR for the corresponding channel. Weighting module 638 may thus calculate weighting values using the following formula.
  • weighting module 638 sets the variance vector of the speech q to the unit vector, and sets the value ⁇ to 1.
  • the weighting value for a given channel thus becomes equal to the reciprocal of the background noise for that channel.
  • the weighting values "w x " may be defined by the following formula.
  • Weighting module 638 therefore generates noise-suppressed channel energy that is the summation of each channel energy value multiplied by that channel's calculated weighting value "wi*.
  • the total noise- suppressed channel energy "ET” may therefore be defined by the following formula.
  • Speech energy 910 represents an exemplary spoken utterance which has a beginning point t s shown at time 914 and an ending point t e shown at time 926.
  • the waveform of the FIG. 7 speech energy 910 is presented for purposes of illustration only and may alternatively comprise various other waveforms.
  • Speech energy 910 also includes a reliable island region which has a starting point t sr shown at time 918, and a stopping point t er shown at time 922.
  • speech detector 310 repeatedly recalculates the foregoing thresholds (T s 912, T e 920, T sr 916, and T er 920) in real time.
  • One method for calculating the foregoing thresholds (T s 912, T e 920, T sr 916, and T er 920) is further discussed in co-pending U.S. Patent Application Serial No. 08/957,875, entitled "Method For Implementing A Speech Recognition System For Use During Conditions With Background Noise," filed on October 20, 1997, which has previously been incorporated herein by reference.
  • the silent segment used to calculate channel background noise values preferably is located in a silent segment that has signal energy below an ending noise-calculation threshold, and that also has signal energy below a beginning noise- calculation threshold.
  • the ending noise-calculation threshold may be expressed by the following formula.
  • the beginning noise-calculation threshold may be expressed by the following formula.
  • the respective weighting values may be reciprocally proportional to the variance of channel energy or channel average background noise.
  • channel average background noise "N ⁇ (m)" for channel m at frame i may be calculated by using the following iterative equation.
  • may be equal to 0.985, which is equivalent to a window size of 145 frames.
  • channel average background noise may utilize non-linear spectrum subtraction (NSS) to advantageously remove a mean value to produce a channel average background noise variance value "V ⁇ (m)" for channel m at frame i.
  • NSS non-linear spectrum subtraction
  • the channel average background noise variance value "V ⁇ (m)" for channel m at frame i may be calculated using the following iterative equation.
  • V,(m) ⁇ V,. ⁇ (m) + (l- ⁇ )
  • m 0, 1, . . . , M - 1
  • may be equal to 0.985, which is equivalent to a window size of 145 frames.
  • the weighting value W ⁇ (m) for a given channel of filter bank 610 may then preferably be set to the reciprocal of the channel average background noise variance value according to the following formula.
  • a saturation limit may be utilized to advantageously reduce the dynamic range of the weighting procedure by utilizing a different formula to calculate weighting values in certain instances where V ⁇ (m) is less than a pre-determined minimum value (MINV) .
  • MINV is preferably equal to 0.00013. If the channel average background noise variance value V ⁇ (m) is less than MINV, then the weighting value W ⁇ (m) may be calculated according to the following formula.
  • weighting module 638 of noise suppressor 412 may then apply the calculated weighting values to respective corresponding channel energies to produce noise-suppressed channel energy for use by endpoint detector 414.
  • weighting module 638 may supply the weighting values to endpoint detector 414 which may responsively utilize the weighting values to calculate endpoint detection parameters according to the following formula.
  • W ⁇ (m) is a respective weighting value
  • y ⁇ (m) is channel signal energy of channel m at frame i
  • M is the total number of channels of filter bank 610.
  • step 810 of the FIG. 8 embodiment feature extractor 410 of speech detector 310 initially receives noisy speech data that is preferably generated by sound sensor 212, and that is then processed by amplifier 216 and analog-to-digital converter 220.
  • speech detector 310 processes the noisy speech data in a series of individual data units called “windows" that each include sub-units called "frames”.
  • step 812 feature extractor 410 filters the received noisy speech into a predetermined number of frequency sub-bands or channels using a filter bank 610 to thereby generate filtered channel energy to a noise suppressor 412.
  • the filtered channel energy is therefore preferably comprised of a series of discrete channels, and noise suppressor 412 operates on each channel.
  • a noise calculator 634 preferably identifies and calculates channel background noise values for each channel of filter bank 610, and responsively stores the channel background noise values into memory 230.
  • a weighting module 638 in noise suppressor 412 calculates weighting values for each channel of the channel energy.
  • weighting module 638 calculates weighting values whose various channel values are directly proportional to the SNR for the corresponding channel. For example, the weighting values may be equal to the corresponding channel's SNR raised to a selectable exponential power.
  • weighting module 638 calculates the individual weighting values as being equal to the reciprocal of the channel background noise for that corresponding channel. In step 820, weighting module 638 then generates noise-suppressed channel energy that is the sum of each channel's channel energy value multiplied by that channel's calculated weighting value.
  • an endpoint detector 414 receives the noise-suppressed channel energy, and responsively detects corresponding speech endpoints.
  • a recognizer 418 receives the speech endpoints from endpoint detector 414 and feature vectors from feature extractor 410, and responsively generates a result signal from speech detector 310.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)

Abstract

Procédé servant à mettre en application un suppresseur de bruit (412) dans un système de reconnaissance vocale et comprenant une banque de filtres (610) servant à séparer les données vocales de source en sous-bandes de fréquences discrètes afin de générer une énergie de canal filtrée, et suppresseur de bruit (412) servant à pondérer les sous-bandes de fréquence afin d'améliorer le rapport signal-bruit de l'énergie de canal obtenu dont le bruit a été supprimé. Ce suppresseur de bruit (412) comporte, de préférence, un calculateur de bruit (634) servant à calculer des valeurs de bruit d'arrière-plan, un calculateur d'énergie vocale (636) servant à calculer des valeurs d'énergie vocale pour chaque canal de la bande de filtre (610) et un module de pondération (638) servant à appliquer des valeurs de pondération calculées à l'énergie de canal projetée afin de générer l'énergie de canal dont le bruit a été supprimé.
PCT/US2000/041316 1999-10-21 2000-10-18 Procede servant a mettre en application un suppresseur de bruit dans un systeme de reconnaissance vocale WO2001029826A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU22973/01A AU2297301A (en) 1999-10-21 2000-10-18 Method for implementing a noise suppressor in a speech recognition system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16084299P 1999-10-21 1999-10-21
US60/160,842 1999-10-21

Publications (1)

Publication Number Publication Date
WO2001029826A1 true WO2001029826A1 (fr) 2001-04-26

Family

ID=22578691

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/041316 WO2001029826A1 (fr) 1999-10-21 2000-10-18 Procede servant a mettre en application un suppresseur de bruit dans un systeme de reconnaissance vocale

Country Status (2)

Country Link
AU (1) AU2297301A (fr)
WO (1) WO2001029826A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114006874A (zh) * 2020-07-14 2022-02-01 中国移动通信集团吉林有限公司 一种资源块调度方法、装置、存储介质和基站

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4831551A (en) * 1983-01-28 1989-05-16 Texas Instruments Incorporated Speaker-dependent connected speech word recognizer
US5574824A (en) * 1994-04-11 1996-11-12 The United States Of America As Represented By The Secretary Of The Air Force Analysis/synthesis-based microphone array speech enhancer with variable signal distortion
US5617508A (en) * 1992-10-05 1997-04-01 Panasonic Technologies Inc. Speech detection device for the detection of speech end points based on variance of frequency band limited energy
US5706394A (en) * 1993-11-30 1998-01-06 At&T Telecommunications speech signal improvement by reduction of residual noise
US5727072A (en) * 1995-02-24 1998-03-10 Nynex Science & Technology Use of noise segmentation for noise cancellation
US5732390A (en) * 1993-06-29 1998-03-24 Sony Corp Speech signal transmitting and receiving apparatus with noise sensitive volume control
US5749068A (en) * 1996-03-25 1998-05-05 Mitsubishi Denki Kabushiki Kaisha Speech recognition apparatus and method in noisy circumstances
US5768473A (en) * 1995-01-30 1998-06-16 Noise Cancellation Technologies, Inc. Adaptive speech filter
US5806025A (en) * 1996-08-07 1998-09-08 U S West, Inc. Method and system for adaptive filtering of speech signals using signal-to-noise ratio to choose subband filter bank
US5806022A (en) * 1995-12-20 1998-09-08 At&T Corp. Method and system for performing speech recognition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4831551A (en) * 1983-01-28 1989-05-16 Texas Instruments Incorporated Speaker-dependent connected speech word recognizer
US5617508A (en) * 1992-10-05 1997-04-01 Panasonic Technologies Inc. Speech detection device for the detection of speech end points based on variance of frequency band limited energy
US5732390A (en) * 1993-06-29 1998-03-24 Sony Corp Speech signal transmitting and receiving apparatus with noise sensitive volume control
US5706394A (en) * 1993-11-30 1998-01-06 At&T Telecommunications speech signal improvement by reduction of residual noise
US5574824A (en) * 1994-04-11 1996-11-12 The United States Of America As Represented By The Secretary Of The Air Force Analysis/synthesis-based microphone array speech enhancer with variable signal distortion
US5768473A (en) * 1995-01-30 1998-06-16 Noise Cancellation Technologies, Inc. Adaptive speech filter
US5727072A (en) * 1995-02-24 1998-03-10 Nynex Science & Technology Use of noise segmentation for noise cancellation
US5806022A (en) * 1995-12-20 1998-09-08 At&T Corp. Method and system for performing speech recognition
US5749068A (en) * 1996-03-25 1998-05-05 Mitsubishi Denki Kabushiki Kaisha Speech recognition apparatus and method in noisy circumstances
US5806025A (en) * 1996-08-07 1998-09-08 U S West, Inc. Method and system for adaptive filtering of speech signals using signal-to-noise ratio to choose subband filter bank

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114006874A (zh) * 2020-07-14 2022-02-01 中国移动通信集团吉林有限公司 一种资源块调度方法、装置、存储介质和基站
CN114006874B (zh) * 2020-07-14 2023-11-10 中国移动通信集团吉林有限公司 一种资源块调度方法、装置、存储介质和基站

Also Published As

Publication number Publication date
AU2297301A (en) 2001-04-30

Similar Documents

Publication Publication Date Title
US6768979B1 (en) Apparatus and method for noise attenuation in a speech recognition system
US20210067867A1 (en) Signal processing apparatus and signal processing method
US6826528B1 (en) Weighted frequency-channel background noise suppressor
US9064498B2 (en) Apparatus and method for processing an audio signal for speech enhancement using a feature extraction
US6173258B1 (en) Method for reducing noise distortions in a speech recognition system
CN110021307B (zh) 音频校验方法、装置、存储介质及电子设备
US6289309B1 (en) Noise spectrum tracking for speech enhancement
US9542937B2 (en) Sound processing device and sound processing method
Gu et al. Perceptual harmonic cepstral coefficients for speech recognition in noisy environment
US6230122B1 (en) Speech detection with noise suppression based on principal components analysis
Kim et al. Nonlinear enhancement of onset for robust speech recognition.
JPH08506427A (ja) 雑音減少
KR101892733B1 (ko) 켑스트럼 특징벡터에 기반한 음성인식 장치 및 방법
JP5752324B2 (ja) 雑音の入った音声信号中のインパルス性干渉の単一チャネル抑制
US9520138B2 (en) Adaptive modulation filtering for spectral feature enhancement
WO2001029821A1 (fr) Technique d'utilisation de contraintes de validite dans un detecteur de fin de signaux vocaux
US7890319B2 (en) Signal processing apparatus and method thereof
CN108053834B (zh) 音频数据处理方法、装置、终端及系统
KR20070061216A (ko) Gmm을 이용한 음질향상 시스템
KR20050051435A (ko) 잡음 환경에서의 음성 인식을 위한 특징 벡터 추출 장치및 역상관 필터링 방법
WO2000016312A1 (fr) Procede de mise en oeuvre d'un systeme de verification de la parole a utiliser dans un milieu bruyant
US20230095174A1 (en) Noise supression for speech enhancement
WO2001029826A1 (fr) Procede servant a mettre en application un suppresseur de bruit dans un systeme de reconnaissance vocale
JP7383122B2 (ja) 信号認識または修正のために音声データから抽出した特徴を正規化するための方法および装置
JPH06332491A (ja) 音声区間検出装置と雑音抑圧装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP