WO1991011696A1 - Procede et appareil de reconnaissance d'ordres prononces dans des environnements bruyants - Google Patents
Procede et appareil de reconnaissance d'ordres prononces dans des environnements bruyants Download PDFInfo
- Publication number
- WO1991011696A1 WO1991011696A1 PCT/US1991/000053 US9100053W WO9111696A1 WO 1991011696 A1 WO1991011696 A1 WO 1991011696A1 US 9100053 W US9100053 W US 9100053W WO 9111696 A1 WO9111696 A1 WO 9111696A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- utterance
- distance
- determining
- reference samples
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 39
- 239000013074 reference sample Substances 0.000 claims abstract description 40
- 239000002131 composite material Substances 0.000 claims description 15
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 6
- 239000000872 buffer Substances 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
Definitions
- This invention relates generally to the field of word recognizers and in particular to those word recognizers which are capable of recognizing command words in noisy environments.
- a policeman in a police car may activate numerous functions, such as turning on the siren, by simply uttering an appropriate utterance which contains a word command.
- a word recognizer after receiving and processing the utterance, recognizes the word command and effectuates the desired function.
- a word recognizer recognizes the word command by extracting features which adequately represent the utterance, and making a decision as to whether these features meet a particular criteria. These criteria may comprise correspondence to a set of pre-stored features representing the command words to be recognized.
- the word recognizer may be speaker dependent or speaker independent.
- a speaker independent word recognizer is designed to recognize the commands of potentially any number of users regardless of the differences in speech patterns, accents, and other variations in spoken words.
- the speaker independent word recognizer requires significantly sophisticated processing capability and hence has been constrained to recognizing a limited number of command words.
- a speaker dependent word recognizer is designed to recognize the command words of limited number of users by comparing the utterance to prestored voice templates which contain the voice features of those users. Therefore, it is necessary to train the word recognizer to recognize the voice features of each individual user. Training is commonly understood to be a process by which the individual users repeats a predetermined set of word commands for a sufficient number of times so that an acceptable number of their voice features are extracted and stored as reference features.
- word recognizer One of the important characteristics of a word recognizer is its capability to accurately recognize a word command under various noise conditions. Typically, the word recognizers provides error rates of less than 1% in quiet environments.
- the error rate may be degraded by as much as 40% in environments where there is a 20 db peak signal-to-noise ratio (SNR).
- SNR signal-to-noise ratio
- One of the factors contributing to poor noise performance is the difference between the training condition under which the reference features are derived and the operating condition under which the utterance features are derived. Accordingly, due to this difference, comparison of the reference features and input utterance features may produce substantially erroneous results.
- Many word recognizers incorporate noise compensation techniques in means utilized to derive the reference features.
- a background noise estimator provides the ambient noise characteristics, and the prestored reference features are temporarily modified according to the characteristics of the ambient noise. The modified reference features and the input utterance features are then compared to each other, and the reference sample having features with the closest similarity to the features of the input utterance is declared as the recognized word.
- the features of the input utterance are represented by the amount of energy contained within predetermined number of frequency bands.
- This technique is known as the filter banks method.
- the noise compensation is achieved by determining the back ground noise energy at every frequency band and subtracting it from the energy at the corresponding frequency band of the input utterance.
- the resulting features are then compared to the corresponding reference features, and again the reference sample having most similar features to the features of the input utterance is declared the recognized word.
- this type of system suffers an inherent draw back in that the number of predetermined frequency band is critical to the proper operation of the word recognizer. That is, dividing the voice spectrum into a high number of frequency band causes degradation in recognition accuracy of high pitched voices, and dividing the voice spectrum into a low number of frequency bands causes smearing effect on the voice signal.
- noise compensation in speech recognition utilize noise reduction techniques, wherein the signal to noise ratio is increased using various filtering techniques.
- practical improvements in SNR typically fall short of achieving a substantial accuracy in recognizing word commands.
- Another method of noise compensation for a system utilized in a severe noise environment is to train the system in a comparable noise environment.
- certain type of noise such as acoustical background noise, are time variant in nature. Accordingly it is not possible to predict or otherwise reproduce, during training, the actual time variant noise which will exist during a subsequent speech recognition mode.
- the word recognizer of the invention comprises a voice processing means for receiving an input utterance and determining features which adequately represent the utterance.
- a template means provides the pre-stored features of a set of reference samples which represent the recognizable command words.
- a noise analysis means determines ambient noise characteristics.
- a comparison means determines the distance between the features of the utterance and the reference samples. The comparison means is responsive to the ambient noise characteristics for modifying the determined distance.
- the word recognizer apparatus include means for determining the minimum distance and selecting the reference sample based thereon.
- Figure 1 shows a block diagram of the word recognizer of the invention.
- FIG. 2 shows a block diagram of the voice processor shown in Figure. 1.
- Figure 3 is the flow chart for extracting CSM features of an input utterance.
- Figure 4 shows the block diagram of the noise analyzer of FIG.1.
- Figure 5 shows a portion of the word recognizer of the invention which includes the block diagram of the template means of Figure 1.
- Figure 6 shows the graph of the power distribution of the reference sample and the input utterance for a command word.
- Figure 7 shows a portion of the word recognizer of the invention which includes the block diagram of the comparison means of Figure 1.
- Figure 8 is the flow chart of the steps taken according to the invention to recognize the word command in noisy environments.
- the word recognizer 100 comprise an isolated word recognizer which is capable of recognizing more than one spoken word commands having a pause therebetween.
- the word recognizer 100 includes a voice processor 110 for processing an input utterance containing one or more word commands.
- the input utterance is received through a microphone 103 which produces a voice signal representing the input utterance.
- a well known audio filter 105 is used to limit the frequency spectrum of the input utterance to a predetermined range. In the preferred embodiment of the invention, the range of the audio filter 105 is confined to a range of 200 Hz to 3200 Hz.
- the voice processor 110 divides the input utterance in to frames of predetermined duration.
- the voice processor 110 provides, in each frame, those features of the input utterance which adequately characterize the input utterance. The detailed process by which these features are produced is described later. These features comprise frequency components and corresponding amplitudes as well as the power of the input utterance in each frame.
- a background noise analyzer 120 provides the characteristics of the ambient noise. These characteristics comprise signal to noise ratio in the frequency spectrum and the level of the ambient noise floor. Because the word recognizer 100 is an isolated word recognizer, the beginning and the end of the input utterance must be determined. In the preferred embodiment of the invention, this determination is made by comparing the power of the input utterance to the power of the ambient noise floor.
- a comparator 130 closes a switch 140, thereby allowing the features of the input utterance to be stored in a temporary feature storage means 150.
- the switch 140 is opened preventing features from being stored in the storage means 150. Accordingly, the end points of the input utterance are -determined by comparing the ambient noise floor to the power of the input utterance.
- a template means 160 provides the features of a set of prestored reference samples. The features of the prestored reference samples are generated, during training, utilizing the same process as that which provides the features of the input utterance.
- the template means 160 aligns the end points of the reference sample with the end points of the input utterance.
- a comparison means 170 primarily comprising a microcomputer/controller provides the distance between the features of the input utterance and the reference samples. The detail of the process by which the distance between the features of the input utterance and the reference sample are produced is described later. The comparison means 170 then selects the reference sample having the minimum distance with the features of the input utterance and based thereon declares the word command. Noise compensation in the word recognizer of the invention is achieved by eliminating or modifying the distance between the features of the input utterance and the features of the reference sample having noise characteristics above a predetermined threshold.
- the block diagram of the voice processor 110 comprises an A/D converter 102 which samples the voice signals provided by microphone 103 of FIG. 1 at a suitable sampling rate.such as 8000 samples per second.
- a frame buffer 104 buffers the sampled signal and provides frames which consist of a predetermined number of consecutive voice samples.
- the framing technique utilized by the frame buffer 104 is well known in the art, and the frames provided by the preferred embodiment of the invention comprise 160 samples which correspond to a frame duration of 20 msec. It may be appreciated that depending on the duration of each input utterance a variable number of frames (designated as N) may be generated by the frame buffer 104.
- the features characterising each frame utterance may be parametric or discrete.
- the discrete features of the utterance frames may be provided by such known techniques as the filter banks method.
- the embodiment of the present invention utilizes a technique which provides the parametric features of the utterance frame.
- the parametric features of the utterance may be provided by such known techniques as linear predictive analysis (LPC) or composite sinusoidal modeling (CSM).
- LPC linear predictive analysis
- CSM composite sinusoidal modeling
- the features of the utterance frames are provided utilizing conventional CSM analysis techniques as described in S. Sagayama and F. Ikatura, "Duality Theory of Composite Sinusoidal Modelling and Linear Prediction", ICASSP '86 Proceedings, vol 3, pp. 1261-1264, the disclosure of which is hereby incorporated by reference.
- the purpose of CSM analysis is to determine a set of CSM features which adequately characterize the frame utterance.
- the CSM features comprise CSM frequencies ⁇ fj ⁇ and amplitudes ⁇ rrtj ⁇ which correspond thereto.
- the number of CSM features (designated as M) of each frame of the input utterance is related to the frequency range of the utterance. In utterances confined to a range of 200 Hz to 3200 Hz in frequency spectrum there usually exists four formant resonant frequencies below 3200 Hz. Thus, it is usually sufficient to utilize 4 CSM frequencies and amplitudes to characterize the input utterance frames. Therefore, in the preferred embodiment of the invention, the number of features (designated as M) is equal to 4.
- a feature extractor 106 executes a feature extraction process utilizing conventional CSM techniques which as shown in the flow chart of FIG.3.
- the CSM extractor 106 applies the input utterance features and computes the autocorrelation of the frame utterances at block 320.
- the term of the interpolative correlation is then computed, block 330.
- the feature extractor 106 also provides the power content of the frame input utterance frame derived from the following equation: N
- T (n) ⁇ m ⁇ n , m 2 n ,....,m M n , f 1 n , f 2 n f M n , P (n) ) ( 4 )
- voice processor 110 described in FIG. 2 and in FIG. 3 may be implemented by means of any suitable digital signal processor (DSP), such as 56000 series family of DSPs manufactured by Motorola, Inc.
- DSP digital signal processor
- the noise analyzer 120 continually monitors the background noise and provides characteristics thereof.
- the noise analyzer 120 includes a noise processing means 122 for producing the noise powers of the desired frequency spectrum.
- the noise processing means 122 utilizes well known analysis techniques, such as Fast Fourier Transformation analysis, to provide noise power at desired CSM frequencies.
- the noise processor 122 also receives the corresponding CSM amplitudes of the input utterance frames and produces the signal to noise ratios SNR (f) at the CSM frequencies.
- the noise analyzer 120 includes a well known noise averaging means 124 which provides the power at noise floor Rn.
- the techniques for providing ambient noise floor is well known in the art.
- the template storage means 162 stores the features of a set of reference samples representing word commands recognizable by the word recognizer 100. These reference features have been obtained during a training process. During the training process, a user repeats each of the desired word commands to be recognized a number of times. Preferably, the training of the word recognizer is performed in a quiet environment. The features of the user voice are extracted and stored in the template storage means 162 as the reference samples. During training, the utterances are processed identically to the processing of the input utterance. In fact, the voice processor 110 is used to generate the reference sample features during training of the word recognizer 100.
- the number of reference sample frames (designated as J) may be different from the number of the corresponding input utterance frames N. It should be noted that the powers of each frame as derived from equation (3) are also included in the features of the reference sample. Accordingly, the features of the each reference sample may be stored in the template storage means 162 as vectors:
- R (j) ⁇ ⁇ " , m 2 J m , f ⁇ , fgi y, P l- ⁇ 6 >
- each of these reference samples are selected and compared to the input utterance.
- the end points of the reference sample under comparison and the input utterance must be aligned.
- an end point aligner 164 is included in the template means 160 to alleviate end point misalignments.
- FIG. 6 shows in time domain the power contour 610 of a reference sample for a word command.
- the power contour 610 of the reference sample can actually be represented by a number of discrete powers corresponding to each frame. However, for the sake of simplicity and ease of understanding the contour of the power distribution of the reference sample is shown as a solid line 610. Similarly, the power contour of an input utterance substantially corresponding to that of the reference sample is shown by a dotted line 620. As shown, the end points of the reference sample in quiet background and the input utterance in noisy environments are separated from each other by the ambient noise floor power R (n).
- the end points of the reference sample are readjusted by a number of frames such that the subsequent frames have powers above the noise floor power, the end points of the reference sample and the input utterance may be realigned. Therefore, the noise floor power Rn provided by noise analyzer 120 constitutes a threshold by which the end points of the reference sample are readjusted. Referring back to FIG.5, the end point aligner 164 skips those candidate endpoints whose power are below the noise power.
- the end point aligner 164 may be implemented by means of any suitable microcomputer or DSP executing a suitable program for achieving the intended purpose thereof.
- the comparison means 170 comprise a well known a microcomputer/controller, such as the 68000 family of microcomputers manufactured by Motorola, Inc.
- the comparison means 170 includes a controller 172, a computer 174, a RAM 176 and a ROM 178.
- the controller 172 performs several functions which include controlling the operation of the comparison means 170 and the template means 160 as well as interacting with the temporary storage means 150 and noise analyzer 120.
- the computer 174 performs the computational functions of the comparison means 170.
- the RAM 176 provides a temporary information storage for the computer 174 and the controller 172.
- the program containing the operational steps of the computer 174 and the controller 172 is stored in the ROM 178.
- the controller 172 receives the features of the input utterance from the temporary storage means 150.
- the features of the first reference sample after endpoint alignment are received from the template means 160.
- the computer 174 determines the distance between the features of the reference sample and the input utterance. In the preferred embodiment of the invention, only the frequency features of the frames of the utterance and the reference sample are utilized for computing the distance. The determined distance is called a local distance metric and is computed from the following equation:
- T (i,n) represents the i th composite sinusoidal frequency in the n th frame of said utterance
- R (i, j) represents the i th composite sinusoidal frequency in the j th frame of said reference sample.
- the local distant metric d is modified by a function W(i, n) of the signal to noise ratio SNR(f) provided by the noise analyzer 120.
- the function W(i,n) may be defined as:
- K is the normalization constant defined by:
- W(i,n) comprises a discrete function defined by:
- N.T signal to noise ratio threshold. Accordingly, the ith frequency features of the nth frame is eliminated, if the SNR(f) at that frequency is below the SNR threshold.
- the W(i,n) may comprise a continuously differentiable limiting function, such as well known sigmoidal or hyprobolic tangent functions . It may be appreciated that for each frame of the input utterance there is total of at most J local distances. The legal local distance minimum of each input utterance frame are added to subsequent local distances. An accumulated distance is thus determined for each reference sample frame, block 840.
- a minimum distance may be obtained utilizing well known dynamic time warping techniques.
- the minimum distance utilizing such a technique is computed and stored.
- a decision is made to determine whether more reference samples are to be processed. After comparing all of the stored reference samples, block 870, the reference sample having the minimum distance is selected.
- the command word contained in the input utterance is recognized based on a decision on the selected reference sample. The decision also takes into consideration a predetermined criteria before the recognized command word is declared. Such criteria may comprise a threshold minimum distance below which the recognized word is valid. This predetermined criteria prevents declaring an invalid input utterance, which produces a minimum distance, the recognized word command.
- the local distances between the features of the input utterance and the reference sample are relied upon in recognizing the command word.
- the local distances are modified as a function of the signal to noise ratio. Accordingly, the accuracy of the word recognizer under severe noise conditions is improved by eliminating or lessening the contribution of those local distances which have an undesirable noise characteristic.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Une émission vocale contenant un ordre à reconnaître est traitée en entrée (110), et des caractéristiques représentant adéquatement cette émission sont déterminées. Des caractéristiques pré-stockées d'un ensemble d'échantillons de référence d'ordres (160) sont comparés (170) aux caractéristiques de ladite émission vocale. On améliore la reconnaissance d'ordres dans des environnements bruyants par détermination de la distance entre les caractéristiques de ladite émission vocale et les caractéristiques des échantillons de référence, et par modification de la distance (120) en réponse au bruit de fond. L'échantillon de référence présentant la distance minimum est sélectionné comme ordre reconnu.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US47443590A | 1990-02-02 | 1990-02-02 | |
US474,435 | 1990-02-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1991011696A1 true WO1991011696A1 (fr) | 1991-08-08 |
Family
ID=23883519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1991/000053 WO1991011696A1 (fr) | 1990-02-02 | 1991-01-02 | Procede et appareil de reconnaissance d'ordres prononces dans des environnements bruyants |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1991011696A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2330677A (en) * | 1997-10-21 | 1999-04-28 | Lothar Rosenbaum | Phonetic control apparatus |
KR100450787B1 (ko) * | 1997-06-18 | 2005-05-03 | 삼성전자주식회사 | 스펙트럼의동적영역정규화에의한음성특징추출장치및방법 |
US6983245B1 (en) | 1999-06-07 | 2006-01-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Weighted spectral distance calculator |
KR100714721B1 (ko) | 2005-02-04 | 2007-05-04 | 삼성전자주식회사 | 음성 구간 검출 방법 및 장치 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2137791A (en) * | 1982-11-19 | 1984-10-10 | Secr Defence | Noise Compensating Spectral Distance Processor |
US4829578A (en) * | 1986-10-02 | 1989-05-09 | Dragon Systems, Inc. | Speech detection and recognition apparatus for use with background noise of varying levels |
US4852181A (en) * | 1985-09-26 | 1989-07-25 | Oki Electric Industry Co., Ltd. | Speech recognition for recognizing the catagory of an input speech pattern |
US4897878A (en) * | 1985-08-26 | 1990-01-30 | Itt Corporation | Noise compensation in speech recognition apparatus |
US4918732A (en) * | 1986-01-06 | 1990-04-17 | Motorola, Inc. | Frame comparison method for word recognition in high noise environments |
US4933973A (en) * | 1988-02-29 | 1990-06-12 | Itt Corporation | Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems |
-
1991
- 1991-01-02 WO PCT/US1991/000053 patent/WO1991011696A1/fr unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2137791A (en) * | 1982-11-19 | 1984-10-10 | Secr Defence | Noise Compensating Spectral Distance Processor |
US4897878A (en) * | 1985-08-26 | 1990-01-30 | Itt Corporation | Noise compensation in speech recognition apparatus |
US4852181A (en) * | 1985-09-26 | 1989-07-25 | Oki Electric Industry Co., Ltd. | Speech recognition for recognizing the catagory of an input speech pattern |
US4918732A (en) * | 1986-01-06 | 1990-04-17 | Motorola, Inc. | Frame comparison method for word recognition in high noise environments |
US4829578A (en) * | 1986-10-02 | 1989-05-09 | Dragon Systems, Inc. | Speech detection and recognition apparatus for use with background noise of varying levels |
US4933973A (en) * | 1988-02-29 | 1990-06-12 | Itt Corporation | Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100450787B1 (ko) * | 1997-06-18 | 2005-05-03 | 삼성전자주식회사 | 스펙트럼의동적영역정규화에의한음성특징추출장치및방법 |
GB2330677A (en) * | 1997-10-21 | 1999-04-28 | Lothar Rosenbaum | Phonetic control apparatus |
US6983245B1 (en) | 1999-06-07 | 2006-01-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Weighted spectral distance calculator |
KR100714721B1 (ko) | 2005-02-04 | 2007-05-04 | 삼성전자주식회사 | 음성 구간 검출 방법 및 장치 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4933973A (en) | Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems | |
US7133826B2 (en) | Method and apparatus using spectral addition for speaker recognition | |
EP0691024B1 (fr) | Procede et dispositif d'identification de locuteur | |
US7756700B2 (en) | Perceptual harmonic cepstral coefficients as the front-end for speech recognition | |
US7877254B2 (en) | Method and apparatus for enrollment and verification of speaker authentication | |
US5459815A (en) | Speech recognition method using time-frequency masking mechanism | |
KR19990043998A (ko) | 패턴인식시스템 | |
US20060165202A1 (en) | Signal processor for robust pattern recognition | |
JP3451146B2 (ja) | スペクトルサブトラクションを用いた雑音除去システムおよび方法 | |
JP2745535B2 (ja) | 音声認識装置 | |
EP1141942A1 (fr) | Compensation et normalisation du bruit dans un procede de transformation par alignement dynamique | |
Hautamäki et al. | Improving speaker verification by periodicity based voice activity detection | |
WO2005020212A1 (fr) | Dispositif d'analyse de signaux, dispositif de traitement de signaux, dispositif de reconnaissance de la parole, programme d'analyse de signaux, programme de traitement de signaux, programme de reconnaissance de la parole, support d'enregistrement et dispositif electronique | |
CN117672201A (zh) | 一种农机无人驾驶语音识别的控制系统 | |
FI111572B (fi) | Menetelmä puheen käsittelemiseksi akustisten häiriöiden läsnäollessa | |
WO1994022132A1 (fr) | Procede et dispositif d'identification de locuteur | |
WO1991011696A1 (fr) | Procede et appareil de reconnaissance d'ordres prononces dans des environnements bruyants | |
Tazi | A robust speaker identification system based on the combination of GFCC and MFCC methods | |
US20080228477A1 (en) | Method and Device For Processing a Voice Signal For Robust Speech Recognition | |
JPH0449952B2 (fr) | ||
Kumar et al. | Effective preprocessing of speech and acoustic features extraction for spoken language identification | |
JP3046029B2 (ja) | 音声認識システムに使用されるテンプレートに雑音を選択的に付加するための装置及び方法 | |
US20070124143A1 (en) | Adaptation of environment mismatch for speech recognition systems | |
Hilger et al. | Noise level normalization and reference adaptation for robust speech recognition | |
JPS6039695A (ja) | 自動音声アクチビテイ検出方法および装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE |
|
NENP | Non-entry into the national phase |
Ref country code: CA |