JPH01274198A

JPH01274198A - Speech recognition device

Info

Publication number: JPH01274198A
Application number: JP63104769A
Authority: JP
Inventors: Tadashi Suzuki; 忠鈴木; Kunio Nakajima; 中島　邦男
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-04-27
Filing date: 1988-04-27
Publication date: 1989-11-01
Anticipated expiration: 2010-07-19
Also published as: JPH0766271B2

Abstract

PURPOSE:To improve the recognition rate in the presence of a large noise by selecting the best noiseless feature vector time series among noiseless feature vector candidates based upon the minimization of distortion with a reference pattern at the time of pattern matching. CONSTITUTION:A vector quantizing means (ambiguity vector quantizer 20) inputs a speech signal on which a noise is superposed and performs the vector quantization of a feature vector time series according to a noise added code table 6, and a reverse vector quantizing means (reverse ambiguity vector quantizer 22) performs the reverse vector quantization of label candidate time series according to a noiseless code table 9. Then a multiple vector recognition processing means (multiple vector recognition processing circuit 26) select a noiseless feature vector time series among input noiseless feature vector candidate time series and performs speech recognition processing. Consequently, the feature vector of a speech is not distorted by the noise removal processing and the recognition rate of the speech does not decrease even in the presence of a loud noise.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は入力音声信号に重畳した雑音を抑圧する雑音
除去機能を持ち入力音声を認識する音声認識装置に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device that recognizes input speech and has a noise removal function that suppresses noise superimposed on an input speech signal.

[Conventional technology]

音声のスペクトル情報を用いた音声認識では、雑音重畳
による音声のスペクトルの変形は認識性能を著しく低下
させる。それゆえ、音声認識装置を実用に供するために
は雑音に対する耐性の向上は重要な問題である。環境騒
音による雑音混入を抑えるためにノイズキャンセルマイ
クがよく使用されているが、それでも十分な信号対雑音
比（Ｓ／Ｎ）が得られない場合や音声信号の伝送過程に
おいて雑音が重畳する場合があり、このような既に雑音
の混入した音声信号から雑音のみを除去しＳ／Ｎを改善
しようとする信号処理技術は雑音抑圧・雑音除去・音声
強調などと呼ばれ、多数の方式提案がなされている。近
時、新しい概念に基づく雑音抑圧法として、ベクトル量
子化を用いて生成された雑音重畳信号空間と雑音無し信
号空間の既知の写像関係により、信号検出の立場から雑
音を除去する５ｐｅｃｔｒａｌ　Ｍａｐｐｉｎｇ法と称
する方式が、論文ｒＢｉｉｎｇ−Ｈｗａｎｇ　Ｊｕａｎ
ｇ＋Ｌ、ＲｏＲａｂｉｎｅｒ、　　ＳｉｇｎａｌＲｅｓ
ｔｏｒａｔｉｏｎ　ｂｙ　５ｐｅｃｔｒａｌ　Ｍａｐｐ
ｉｎｇ法、１９８７　ＩＥＥＥＩＮＴＥＲＮＡＴＩＯＮ
ＡＬ　Ｃ０ＮＦＥＲＥＮＣＥ　ＯＮ　ＡＣＯＵＳＴＩＣ
３，５ＰＥＥＣＨ。In speech recognition using speech spectral information, deformation of the speech spectrum due to noise superposition significantly reduces recognition performance. Therefore, improving the resistance to noise is an important issue in putting speech recognition devices into practical use. Noise-cancelling microphones are often used to suppress noise contamination caused by environmental noise, but there are still cases where a sufficient signal-to-noise ratio (S/N) cannot be obtained or noise is superimposed during the audio signal transmission process. Signal processing techniques that attempt to improve the S/N ratio by removing only noise from a voice signal that already contains noise are called noise suppression, noise removal, voice enhancement, etc., and many methods have been proposed. There is. Recently, as a noise suppression method based on a new concept, the 5pectral mapping method removes noise from the standpoint of signal detection using a known mapping relationship between a noise-superimposed signal space and a noise-free signal space generated using vector quantization. The method called Biing-Hwang Juan
g+L, RoRabiner, SignalRes
toration by 5pectral Mapp
ing method, 1987 IEEE INTERNATION
AL C0NFERENCE ON ACUSTIC
3,5PEECH.

＆　５ＩＧＮＡＬ　ＰＲＯＣＥＳＳＩＮＧ　Ｖｏｌｕｍ
ｅ　４　ＰＰ、６．６．１−６．６．４Ａｐｒｉｌ　１
９８７．　ＤａｌｌａｓＪ　　（以下ではこの論文を文
献〔１〕と引用する）において提案されており、音声伝
送用雑音抑圧方式として有効とされている。& 5IGNAL PROCESSING Volume
e 4 PP, 6.6.1-6.6.4 April 1
987. Dallas J (hereinafter this paper will be referred to as Document [1]), and is considered to be effective as a noise suppression method for voice transmission.

この５ｐｅｃｔｒａｌ　Ｍａｐｐｉｎｇ法による雑音除
去回路を組み込んだ音声認識装置の構成例として、第２
図が考えられる。認識装置における認識方式は種々ある
が、単語単位のテンプレートを持ち、ＤＰ（Ｄｙｎａｍ
ｉｃ　Ｐｒｏｇｒａｍｍｉｎｇ）マツチングによる離散
単語認識装置を例として説明する。As a configuration example of a speech recognition device incorporating a noise removal circuit using this 5pectral mapping method, the second
A diagram can be considered. There are various recognition methods in recognition devices, but they have word-based templates and DP (Dynam
ic Programming) A discrete word recognition device using matching will be explained as an example.

第２図において、１は音声信号の入力端、２は入力音声
信号、３は入力音声信号２を音響分析する分析回路、４
は特徴ベクトル時系列、５は分析回路から出力される特
徴ベクトル時系列を雑音付加符号帳６でベクトル量子化
するベクトル量子化器、７は符号語のラベル候補時系列
、８はベクトル量子化器５の出力であるところのラベル
候補時系列７を雑音無し符号帳９で逆ベクトル量子化す
る逆ベクトル量子化器、１０は雑音無し特徴ベクトル時
系列、１１は前記文献〔１〕に示されたベクトル量子化
器５と雑音付加符号帳６と逆ベクトル量子化器８と雑音
無し符号帳９とで構成される雑音除去回路である。１２
は雑音無し特徴ベクトル時系列１０と、テンプレートメ
モリ１４から出力される雑音無し特徴ベクトル時系列で
表現される参照バタン１５とのＤＰマツチングを行い、
マツチング歪１６を出力するバタンマツチング回路、１
３はバタンマツチングにおける参照パタンの指定と認識
結果の出力を行う認識制御回路、１７は認識制御回路１
３が参照パタンを指定するためにチンプレニトメモリ１
４に送るアドレスデータ、１８は認識結果、１９はバタ
ンマツチング回路１２とテンプレートメモリ１４と認識
制御回路１３とで構成される認識処理回路である。In FIG. 2, 1 is an input terminal for audio signals, 2 is an input audio signal, 3 is an analysis circuit for acoustically analyzing the input audio signal 2, and 4
is a feature vector time series, 5 is a vector quantizer that vector-quantizes the feature vector time series output from the analysis circuit using a noise-added codebook 6, 7 is a label candidate time series of codewords, and 8 is a vector quantizer An inverse vector quantizer that performs inverse vector quantization on the label candidate time series 7, which is the output of 5, using a noise-free codebook 9; 10 is a noise-free feature vector time series; 11 is shown in the above-mentioned document [1]. This is a noise removal circuit composed of a vector quantizer 5, a noise-added codebook 6, an inverse vector quantizer 8, and a noise-free codebook 9. 12
performs DP matching between the noise-free feature vector time series 10 and the reference button 15 expressed by the noise-free feature vector time series output from the template memory 14;
Bump matching circuit that outputs matching distortion 16, 1
3 is a recognition control circuit that specifies a reference pattern in baton matching and outputs a recognition result; 17 is a recognition control circuit 1;
3 to specify the reference pattern
4 is the address data to be sent, 18 is the recognition result, and 19 is a recognition processing circuit composed of a slam matching circuit 12, a template memory 14, and a recognition control circuit 13.

次に動作について説明する。Next, the operation will be explained.

雑音除去回路１１において、雑音無し符号帳９は雑音が
重畳していない音声の特徴ベクトルを符号語として構成
され、雑音付加符号帳６は雑音無し符号帳９の各符号語
に対し、雑音重畳入力音声信号の雑音様態と同じになる
ような処理、例えば、雑音重畳人力音声信号２と同一の
信号対雑音比になるように、時間波形領域で雑音を付加
し、これを再分析して雑音付加特徴ベクトルに変換する
ことで生成した符号語で構成される。入力端１に入力さ
れた雑音が重畳した入力音声信号２は分析回路３で音響
分析され、特徴ベクトル時系列４である（Ｘ（ｎｌ　ｊ
　ｎ　＝　１　、　２．−、　Ｎ）として出力される。In the noise removal circuit 11, the noise-free codebook 9 is configured with speech feature vectors on which no noise is superimposed as codewords, and the noise-added codebook 6 is configured with noise-superimposed input for each codeword of the noise-free codebook 9. Processing to make the noise mode the same as that of the voice signal, for example, adding noise in the time waveform domain so that the signal-to-noise ratio is the same as the noise superimposed human voice signal 2, and reanalyzing this to add noise. It consists of code words generated by converting them into feature vectors. The input audio signal 2 inputted to the input terminal 1 and superimposed with noise is acoustically analyzed by the analysis circuit 3, and a feature vector time series 4 is obtained (X(nl j
n = 1, 2. -, N).

ここでＮは特徴ベクトルの数を示す。雑音除去回路１１
において、ベクトル量子化器５は特徴ベクトル時系列４
を入力とし、任意のベクトル番号ｎに対応するＸ　（ｎ
）について雑音付加符号帳６の全符号語との尤度を求め
、尤度が大きい方から第り位までの符号語のラベルをラ
ベル候補（ｍ４（ｎ）ｌ　ｉ　＝　１．２．−、Ｌ）と
し、これをｎ＝１．２゜−、Ｎについて求め、ラベル候
補時系列７である（Ｍ（ｎｌ　：　ｎ　＝　１　、　２
．−−、　　Ｎ）　　（ただしＭ　（ｎ）　＝（ｍ＝　
（ｎ）　ｌ　ｉ　＝　１．　２．−−、　　ｌ）　）と
して出力する。Ｌは、１または２以上の整数である。逆
ベクトル量子化器８はラベル候補時系列７である（Ｍ（
ｎｌ　ｉ　ｎ　＝　１　、　２．−−、　　Ｎ）　　（
ただしＭ（ｎ）＝　（ｍｉ（ｎｌ　ｌ　＋　＝　１　、
　２．−−−、　　Ｌ）　）の任意のｎについて、ラベ
ル候補Ｍ（ｎｌ＝　（ｍ、（ｎｌ　ｌ　ｉ　＝　１．２
．　・−、Ｌ）を雑音無し符号帳９で逆ベクトル量子化
し雑音無し特徴ベクトル候補（Ｚ；　（ｎｌ　ｌ　ｉ　
＝　１　、　２、−。Here, N indicates the number of feature vectors. Noise removal circuit 11
In the vector quantizer 5, the feature vector time series 4
As input, X (n
) with all the codewords in the noise-added codebook 6, and the labels of the codewords from the one with the highest likelihood to the highest rank are used as label candidates (m4(n)l i = 1.2.-, L), this is calculated for n = 1.2°-, N, and the label candidate time series 7 is (M(nl: n = 1, 2
．． --, N) (where M (n) = (m=
(n) l i = 1. 2. --, l)). L is an integer of 1 or 2 or more. The inverse vector quantizer 8 is the label candidate time series 7 (M(
nl i n = 1, 2. --, N) (
However, M(n) = (mi(nl l + = 1,
2. ---, L) ), label candidate M(nl= (m, (nl l i = 1.2
．．・-, L) is inverse vector quantized using the noiseless codebook 9 and a noiseless feature vector candidate (Z; (nl l i
= 1, 2, -.

Ｌ）を求め、このＬ個の雑音無し特徴ベクトル候補（Ｚ
＋　（ｎｌ　ｌ　ｉ　＝１．２．−・−、Ｌｌの平均ベ
クトルとしてＹ　（ｎ）を求める。L), and calculate these L noise-free feature vector candidates (Z
+ (nl l i =1.2.--, find Y (n) as the average vector of Ll.

すなわち、Ｌｙｔ（ｎｌ＝　　□　Σ２い、（ｎｌ　　　（式１）％
式％ここで、ｙｔ（ｎ）、ｚｔｌ、（ｎ）はそれぞれ、Ｙ　
（ｎ）、Ｚ、（ｎｌの第を次元の成分である。That is, L yt(nl= □ Σ2, (nl (Equation 1)%
Formula % Here, yt(n), ztl, (n) are respectively Y
(n), Z, (nl-th dimension component.

認識処理回路１９において、テンプレートメモリ１４は
認識制御回路１３が出力するアドレスデータ１７で指定
される参照パタン１５である（Ｔ（ｋｌｌｋ＝１．２．
　−・・、Ｋｌ（Ｋは特徴ベクトルの数）をバタンマツ
チング回路１２に送出する。バタンマツチング回路１２
は上記参照バタン１５である（Ｔ（ｋ）ｌ　ｋ＝１．２
．・−１Ｋ）と雑音除去回路１１の出力であるところの
雑音無し特徴ベクトル時系列１０である（Ｙ（ｎｌ　ｌ
　ｎ　＝　１　、　２、−、　Ｎ）とのＤＰマツチング
を行う。ＤＰマツチングの漸化式は例えば次のようにな
る。In the recognition processing circuit 19, the template memory 14 is a reference pattern 15 specified by the address data 17 output from the recognition control circuit 13 (T(kllk=1.2.
-..., Kl (K is the number of feature vectors) are sent to the matching circuit 12. Battan matching circuit 12
is the above reference button 15 (T(k)l k=1.2
．． -1K) and the noise-free feature vector time series 10 which is the output of the noise removal circuit 11 (Y(nl l
DP matching is performed with n = 1, 2, -, N). For example, the recurrence formula for DP matching is as follows.

ｇ　（ｋ、ｎ−１）＋Ｄ（ｋ、ｎ）ｇ（ｋ、ｎ）　　−ｍｉｎ　　ｇ（ｋ−１，ｎ−１）＋
２ＸＤ（ｋ、ｎ）ｇ　（ｋ−１，ｎ）＋Ｄ（ｋ、ｎ）（式２）ここで、Ｄ（ｋ、ｎ）は特徴ベクトルＴ　（ｋｌと特徴
ベクトルＹ　（ｎ）との歪である。（弐２）は−例とし
て傾斜制限なしのＤＰマツチングの場合を挙げたちので
ある。この漸化式（式２）において得られたｇ（Ｋ、Ｎ
）を市街化距離で割ることで正規化し、マツチング歪１
６として出力する。認識制御回路１３は、アドレスデー
タ１７で指定する参照パタンを順次変え、各参照パタン
についてバタンマツチング回路１２が出力するマツチン
グ歪１６から、マツチング歪を最小とする参照パタンの
ラベルを認識結果１８として出力する。g (k, n-1) + D (k, n) g (k, n) -min g (k-1, n-1) +
2XD(k,n)g(k-1,n)+D(k,n) (Formula 2) Here, D(k,n) is the distortion between the feature vector T(kl and the feature vector Y(n)) (2) takes the case of DP matching without slope restriction as an example. g(K, N
) is normalized by dividing by the urbanization distance, and the matching distortion 1
Output as 6. The recognition control circuit 13 sequentially changes the reference patterns specified by the address data 17 and selects the label of the reference pattern that minimizes the matching distortion from the matching distortion 16 output by the button matching circuit 12 for each reference pattern as the recognition result 18. Output.

[Problem to be solved by the invention]

５ｐｅｃｔｒａｌ　Ｍａｐｐｉｎｇ法による雑音除去に
おいて根幹をなす、雑音重畳信号空間と雑音無し信号空
間との写像関係は、雑音無し符号帳９に雑音重畳と等価
な処理を加えて雑音付加符号帳６を作ることで形成され
る。しかし、実際の雑音は統計的分散をもつため、前記
写像関係の生成において等価的に加えた雑音と人力音声
信号２に重畳している雑音との差により写像関係に誤り
が生じる。また、入力音声信号２に重畳している雑音の
レベルが上がるに従い、入力音声信号２のスペクトル包
絡は平滑化する。そのため分析回路３より出力される特
徴ベクトル時系列４に含まれる音韻特徴性が消滅し、前
記の写像誤りは甚だしく増加する。例えば雑音除去回路
１１におけるラベル候補の数りを１にすると、雑音の分
散による写像誤りが生じた場合、その写像誤りによる歪
がそのままバタンマツチングにおけるマツチング歪１６
に反映するので、重畳雑音のレベルが上がるに従い認識
性能は急激に低下する。また、Ｌを２以上の数にして候
補数を増やせば、複数の候補の中に正しい写像による候
補が含まれる確率が高くなるが、その複数の候補の中の
どの候補が正しい写像による候補なのかはわからない。The mapping relationship between the noise-superimposed signal space and the noise-free signal space, which is the basis of noise removal using the 5-pectral mapping method, can be achieved by creating the noise-added codebook 6 by adding processing equivalent to noise superimposition to the noise-free codebook 9. It is formed. However, since actual noise has statistical variance, errors occur in the mapping relationship due to the difference between the noise equivalently added in generating the mapping relationship and the noise superimposed on the human voice signal 2. Furthermore, as the level of noise superimposed on the input audio signal 2 increases, the spectral envelope of the input audio signal 2 becomes smoother. As a result, the phonological characteristics included in the feature vector time series 4 output from the analysis circuit 3 disappear, and the above-mentioned mapping error increases significantly. For example, when the number of label candidates in the noise removal circuit 11 is set to 1, if a mapping error occurs due to noise dispersion, the distortion due to the mapping error will be the matching distortion 16 in the slam matching.
Therefore, as the level of superimposed noise increases, recognition performance decreases rapidly. Furthermore, if you increase the number of candidates by setting L to a number greater than or equal to 2, the probability that a candidate based on the correct mapping will be included among the multiple candidates will increase, but which of the multiple candidates is the candidate based on the correct mapping? I don't know.

そのため複数の候補の特徴ベクトルの平均を出力とする
改良策を、前記文献〔１〕では使用している。しかしこ
の平均化処理により、本来選ばれるべき候補の特徴ベク
トルに誤った候補の特徴ベクトルの成分が混入するため
歪が生じ、認識性能は低下する。従って、前記文献〔１
〕の方法は、音声伝送において聴覚上の信号対雑音比を
改善する効果はあるが、音声認識に適用する場合は効果
がない。このように従来の音声認識装置では、雑音除去
処理によって音声の特徴ベクトルに歪が生じ、高雑音下
においては音声の認識率が低下するという問題点があっ
た。Therefore, the above-mentioned document [1] uses an improvement measure in which the average of feature vectors of a plurality of candidates is output. However, due to this averaging process, components of the incorrect candidate feature vectors are mixed into the candidate feature vectors that should have been selected, causing distortion and deteriorating recognition performance. Therefore, the above document [1
] method is effective in improving the auditory signal-to-noise ratio in speech transmission, but is ineffective when applied to speech recognition. As described above, the conventional speech recognition apparatus has a problem in that the noise removal process causes distortion in the speech feature vector, and the speech recognition rate decreases under high noise.

本発明は上記のような問題点を解消するためになされた
もので、雑音除去処理によって音声の特徴ベクトルに歪
を与えず、高雑音下においても音声の認識率が低下しな
い音声認識装置を得ることを目的とする。The present invention has been made in order to solve the above-mentioned problems, and provides a speech recognition device that does not distort the speech feature vector through noise removal processing and does not reduce the speech recognition rate even under high noise. The purpose is to

[Means to solve the problem]

この発明に係る音声認識装置は、雑音が重畳していない
音声の特徴ベクトルを符号語とする雑音無し符号帳９と
、この雑音無し符号帳９の各符号語に雑音重畳と等価な
処理を施し生成された雑音付加符号帳６と、雑音が重畳
した入力音声信号の特徴ベクトル時系列を上記雑音付加
符号帳６に従ってベクトル量子化し符号語のラベルを示
す複数のラベル候補時系列を出力するベクトル量子化手
段（曖昧ベクトル量子化器２０）と、このベクトル量子
化手段（曖昧ベクトル量子化器２０）の出力信号である
複数のラベル候補時系列を上記雑音無し符号帳９に従っ
て逆ベクトル量子化し複数の雑音無し特徴ベクトル候補
時系列を出力する逆ベクトル量子化手段（逆曖昧ベクト
ル量子化器２２）と、この逆ベクトル量子化手段（逆曖
昧ベクトル量子化器２２）の出力信号である複数の雑音
無し特徴ベクトル候補時系列を入力し雑音無し特徴ベク
トル時系列を選択して音声認識処理を行う多重ベクトル
認識処理手段（多重ベクトル認識処理回路２６）とを備
えたことを特徴とするものである。The speech recognition device according to the present invention includes a noise-free codebook 9 whose codewords are feature vectors of speech on which no noise is superimposed, and a process equivalent to noise superimposition on each codeword of the noise-free codebook 9. A vector quantizer that vector quantizes the generated noise-added codebook 6 and the feature vector time series of the input speech signal on which noise is superimposed according to the noise-added codebook 6 and outputs a plurality of label candidate time series indicating labels of code words. quantization means (ambiguous vector quantizer 20) and a plurality of label candidate time series, which are output signals of this vector quantization means (ambiguous vector quantizer 20), are subjected to inverse vector quantization according to the noiseless codebook 9, and a plurality of label candidate time series are An inverse vector quantization means (inverse ambiguous vector quantizer 22) that outputs a noise-free feature vector candidate time series, and a plurality of noise-free signals that are output signals of this inverse vector quantization means (inverse ambiguous vector quantizer 22) The present invention is characterized by comprising a multi-vector recognition processing means (multi-vector recognition processing circuit 26) that inputs a feature vector candidate time series, selects a noise-free feature vector time series, and performs speech recognition processing.

[Effect]

ベクトル量子化手段（曖昧ベクトル量子化器２０）は、
雑音が重畳した音声信号を人力し、この入力音声信号の
特徴ベクトル時系列を雑音付加符号帳６に基づいてベク
トル量子化し、複数のラベル候補時系列を逆ベクトル量
子化手段（逆曖昧ベクトル量子化器２２）に与える。逆
ベクトル量子化手段（逆曖昧ベクトル量子化器２２）は
、入力された複数のラベル候補時系列を雑音無し符号帳
９に基づいて逆ベクトル量子化し、複数の雑音無し特徴
ベクトル候補時系列を多重ベクトル認識処理手段（多重
ベクトル認識処理回路２６）に与える。多重ベクトル認
識処理手段（多重ベクトル認識処理回路２６）は入力さ
れた複数の雑音無し特徴ベクトル候補時系列から雑音無
し特徴ベクトル時系列を選び出し音声認識処理を行う。The vector quantization means (ambiguous vector quantizer 20) is
A speech signal on which noise is superimposed is manually processed, the feature vector time series of this input speech signal is vector quantized based on the noise-added codebook 6, and a plurality of label candidate time series are subjected to inverse vector quantization means (inverse ambiguous vector quantization). 22). The inverse vector quantization means (inverse ambiguous vector quantizer 22) performs inverse vector quantization on the input plurality of label candidate time series based on the noiseless codebook 9, and multiplexes the plurality of noiseless feature vector candidate time series. The vector recognition processing means (multiple vector recognition processing circuit 26) is provided with the vector recognition processing means (multiple vector recognition processing circuit 26). The multiple vector recognition processing means (multiple vector recognition processing circuit 26) selects a noise-free feature vector time series from a plurality of input noise-free feature vector candidate time series and performs speech recognition processing.

[Embodiments of the invention]

第１図はこの発明の一実施例に係る音声認識装置の構成
を示すブロック図である。第１図において、第２図に示
す構成要素に対応するものには同一の参照符を付し、そ
の説明を省略する。この実施例は、従来例と同様、単語
単位のテンプレートとのＤＰマツチングにより認識を行
うＭ徹単語音声認識装置を例として説明する。FIG. 1 is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention. In FIG. 1, components corresponding to those shown in FIG. 2 are given the same reference numerals, and their explanations will be omitted. This embodiment will be explained by taking as an example an M-word speech recognition device that performs recognition by DP matching with a word-by-word template, as in the conventional example.

第１図において、２０は分析回路３の出力信号であると
ころの特徴ベクトル時系列４を雑音付加符号帳９を用い
てベクトル量子化し、複数のラベル候補時系列２１を出
力するベクトル量子化手段としての曖昧ベクトル量子化
器、２２は複数のラベル候補時系列２１を雑音無し符号
帳９で逆ベクトル量子化し、複数の雑音無し特徴ヘクト
ル候補時系列２３を出力する逆ベクトル量子化手段とし
ての逆曖昧ベクトル量子化器、２４は雑音付加符号帳６
と雑音無し符号帳９と曖昧ベクトル量子化器２０と逆曖
昧ベクトル量子化器２２で構成される雑音無し多重ベク
トル生成回路、２５は複数の雑音無し特徴ベクトル候補
時系列２３を入力として参照バタン１５とのＤＰマツチ
ングを行い、マツチング歪を出力する多重ベクトルバタ
ンマツチング回路、２６は多重ベクトルバタンマツチン
グ回路２５とテンプレートメモリ１４と認識制御回路１
３とで構成される多重ベクトル認識処理手段としての多
重ベクトル認識処理回路である。In FIG. 1, 20 is a vector quantization means that vector quantizes the feature vector time series 4, which is the output signal of the analysis circuit 3, using a noise-added codebook 9, and outputs a plurality of label candidate time series 21. The ambiguous vector quantizer 22 is an inverse vector quantizer that performs inverse vector quantization on a plurality of label candidate time series 21 using a noiseless codebook 9 and outputs a plurality of noiseless feature hector candidate time series 23. Vector quantizer, 24 is a noise-added codebook 6
A noise-free multiple vector generation circuit consisting of a noise-free codebook 9, an ambiguous vector quantizer 20, and an inverse ambiguous vector quantizer 22; 25 is a reference button 15 using a plurality of noise-free feature vector candidate time series 23 as input; 26 is a multi-vector bump matching circuit 25, a template memory 14, and a recognition control circuit 1.
3. This is a multi-vector recognition processing circuit as a multi-vector recognition processing means consisting of 3 and 3.

この音声認識装置は、曖昧ベクトル量子化（Ｆｕｚｚｙ
　Ｖｅｃｔｏｒ　Ｑｕａｎｔｉｚａｔｉｏｎ）により、
入力信号の雑音除去とバタン認識を同時的に実行する新
規方式を採用し、雑音が重畳した入力音声信号の特徴ベ
クトルを入力して複数の雑音無し特徴ベクトル候補時系
列２３を出力する雑音無し多重ベクトル生成回路２４と
、こ、の雑音無し多重ベクトル生成回路２４の出力信号
である複数の雑音無し特徴ベクトル候補時系列２３を入
力として認識を行う多重ベクトル認識処理回路２６とを
備える。上述したように、雑音無し多重ベクトル性成回
路２４は、雑音が重畳していない音声の特徴ベクトルを
符号語とする雑音無し符号帳９と、この雑音無し符号帳
９の各符号語に雑音重畳と等価な処理を施し生成した雑
音付加符号帳６と、雑音が重畳した入力音声信号の特徴
ベクトル時系列４を上記雑音付加符号帳６に従ってベク
トル量子化し複数のラベル候補時系列２１を出力する曖
昧ベクトル量子化器２０と、曖昧ベクトル量子化器２ｏ
の出力信号であるところの複数のラベル候補時系列２１
を上記雑音無し符号帳９に従って逆ベクトル量子化し複
数の雑音無し特徴ベクトル候補時系列２３を出力する逆
曖昧ベクトル量子化器２２とで構成される。This speech recognition device uses fuzzy vector quantization (Fuzzy
Vector Quantization)
Noise-free multiplexing employs a new method that simultaneously performs input signal noise removal and button recognition, and outputs a plurality of noise-free feature vector candidate time series 23 by inputting the feature vectors of the input audio signal with superimposed noise. It includes a vector generation circuit 24 and a multiple vector recognition processing circuit 26 that performs recognition by inputting a plurality of noise-free feature vector candidate time series 23, which are output signals of the noise-free multiple vector generation circuit 24. As described above, the noise-free multi-vector generation circuit 24 generates a noise-free codebook 9 whose codewords are voice feature vectors on which no noise is superimposed, and a noise-free multivector generator 24 that generates a noise-free codebook 9 in which codewords are feature vectors of speech on which no noise is superimposed, and a noise-free multivector generating circuit 24 that generates a noise-free codebook 9 in which codewords are feature vectors of speech on which no noise is superimposed. An ambiguous method that vector quantizes the noise-added codebook 6 generated by performing processing equivalent to the noise-added codebook 6 and the feature vector time series 4 of the input speech signal on which noise is superimposed according to the noise-added codebook 6 to output a plurality of label candidate time series 21. Vector quantizer 20 and ambiguous vector quantizer 2o
A plurality of label candidate time series 21 which are the output signals of
and an inverse ambiguous vector quantizer 22 that performs inverse vector quantization on the noise-free feature vector candidate time series 23 according to the noise-free codebook 9 and outputs a plurality of noise-free feature vector candidate time series 23.

多重ベクトル認識処理回路２６は、逆゛曖昧ベクトル量
子化器２２の出力信号である複数の雑音無し特徴ベクト
ル候補時系列（Ｚ（ｎＮｎ＝１，２゜・−、Ｎ）（ただ
しＺ（ｎ）＝　（Ｚ；　ｆｎ）　Ｉ　ｉ　＝　１　、　
２゜−・−２Ｌ））を入力し、任意のｎにおける雑音無
し特徴ベクトル候補Ｚ（ｎ）＝　（Ｚｔ　ｆｎ）　ｌ　
ｉ　＝　１．　２゜−、Ｌ）のＬ個の候補の中で、各参
照バタン毎に参照パタンとの尤度を最大化する候補を最
適な雑音無し特徴ベクトルとして選択することにより、
最終的にｎ−Ｎに至る全時系列について最大の尤度を与
える参照バタンを認識カテゴリと判定すると同時に、最
適な雑音無し特徴ベクトル時系列を決定する処理を行う
。The multiple vector recognition processing circuit 26 processes a plurality of noise-free feature vector candidate time series (Z(nNn=1,2°·-,N) (where Z(n) = (Z; fn) I i = 1,
2゜-・-2L)), and the noise-free feature vector candidate Z(n)=(Zt fn) l at arbitrary n.
i = 1. 2°-, L), by selecting the candidate that maximizes the likelihood with the reference pattern for each reference pattern as the optimal noise-free feature vector.
The reference button that finally gives the maximum likelihood for all time series n-N is determined to be the recognition category, and at the same time, processing is performed to determine the optimal noise-free feature vector time series.

この実施例の動作を説明する。The operation of this embodiment will be explained.

曖昧ヘクトル量子化器２０は、入力音声の特徴ベクトル
時系列４である（Ｘ（ｎｌ　ｌ　ｎ＝　１．２．−・。The ambiguous vector quantizer 20 is a feature vector time series 4 of the input speech (X(nl l n= 1.2.-.

Ｎ）を入力とし、任意のｎに対する特徴ベクトルＸ　（
ｎｌと雑音付加符号帳の全符号語との先度を求め、尤度
が大きい方から第り位までの符号語のラベルをラベル候
補（ｍｔ　（ｎ）　ｌ　ｉ　＝　１．　２．−−−、　
　Ｌ）とし、これをｎ＝１．２．・−、Ｎについて求め
複数のラベル候補時系列２１　　（Ｍ（ｎｌ　ｌ　ｎ＝
　１．　２．−。N) as input, and the feature vector X (
Find the precedence between nl and all codewords in the noise-added codebook, and select the labels of the codewords from the one with the highest likelihood to the highest likelihood as label candidates (mt (n) l i = 1. 2. --- ,
L), and n=1.2.・−, N, multiple label candidate time series 21 (M(nl l n=
1. 2. −.

Ｎｌ　　（ただしＭ（ｎ）＝（ｍｌ（ｎ）１１＝１．２
．−・−９Ｌ））として出力する。逆曖昧ベクトル量子
化器２２は、複数のラベル候補時系列２１である（Ｍ（
ｎｌ　ｉ　ｎ　＝　１　、　２．−−−、　Ｎｌ　　（
ただしＭ　（ｎ）　＝　（ｒｎ　ｔ（ｎｌ　ｌ　ｉ　＝
　１　、　２．−−、　　Ｌ）　）の任意のｎに対応す
るラベル候補Ｍ（ｎ）＝　（ｍｔ　（ｎ）　ｌ　ｉ　＝
　１　、　２、−。Nl (where M(n)=(ml(n)11=1.2
．． -・-9L)). The inverse ambiguity vector quantizer 22 generates a plurality of label candidate time series 21 (M(
nl i n = 1, 2. ---, Nl (
However, M (n) = (rn t(nl l i =
1, 2. --, L) ) label candidate M(n) = (mt (n) l i =
1, 2, -.

Ｌ）を雑音無し符号帳９で逆ベクトル量子化し、複数の
雑音無し特徴ベクトル候補（Ｚｔ　（ｎ）　ｌ　ｉ　＝
１．２．−・・、Ｌ）を得て、これをｎ＝　１．２．−
−−。L) is inverse vector quantized using the noiseless codebook 9, and multiple noiseless feature vector candidates (Zt (n) l i =
1.2. -..., L) and convert this to n= 1.2. −
--.

Ｎについて求め、複数の雑音無し特徴ベクトル候補時系
列２３である（Ｚ（ｎ）　ｌ　ｎ　＝　１　、　２　、
−−、　Ｎｌ（ただしｚ（ｎ）＝　（Ｚ！　（ｎｌ　Ｉ
　ｉ　＝　１．　２．−、　　Ｌ）　）として出力する
。N, and a plurality of noise-free feature vector candidate time series 23 (Z(n) l n = 1, 2,
--, Nl (where z(n)= (Z! (nl I
i = 1. 2. -, L)).

多重ベクトルバタンマツチング回路２５は逆曖昧ベクト
ル量子化器２２の出力信号であるところの複数の雑音無
し特徴ベクトル候補時系列２３である（Ｚ（ｎ）　ｌ　
ｎ　＝　１　、　２、−、　Ｎ）　　（ただしＺ　（ｎ
）＝　（Ｚｚ　（ｎｌ　ｌ　ｉ　＝　１．　２．−−、
　　Ｌ）　）を入力し、参照パタン１５　（Ｔ（ｋｌ　
ｌ　ｋ＝　１．　２．・−・、Ｋ）（Ｋは特徴ベクトル
の数）とのＤＰマツチング（式２）を行う。ただし、複
数の雑音無し特徴ベクトル候補時系列２３は任意のｎに
ついてＬ個の特徴ベクトル候補を持つので、任意のｋと
ｎにおける歪Ｄ（ｋ、ｎ）を次式のように定義する。The multi-vector bump matching circuit 25 generates a plurality of noise-free feature vector candidate time series 23, which are the output signals of the inverse ambiguous vector quantizer 22 (Z(n) l
n = 1, 2, -, N) (however, Z (n
) = (Zz (nl l i = 1. 2.--,
), and input reference pattern 15 (T(kl
lk=1. 2. .--, K) (K is the number of feature vectors) DP matching (Equation 2) is performed. However, since the multiple noise-free feature vector candidate time series 23 has L feature vector candidates for any n, the distortion D(k, n) at any k and n is defined as follows.

Ｄ（ｋ、ｎ）　＝　ｍｉｎ　（ｄ（Ｔ　（ｋｌ、　　Ｚ
ｔ　（ｎｌ））１≦ｉ５Ｌ　　　　　　　　（式３）ただし、ｄ（＊、＊）は特徴ベクトル間の歪を表す。こ
れにより、参照パタン１５に対して最適な特徴ベクトル
が複数の候補の中から選択される。従来例と同様、漸化
式（式２）において得られたｇ（Ｋ、Ｎ）を市街化距離
で割ることで正規化し、マツチング歪１６として出力す
る。ＤＰマツチングの原理により、（式３）の部分歪を
最小化すれば、バタン全体のマツチング歪が最小化され
る。認識制御回路１３は、アドレスデータ１７で指定す
るテンプレートメモリ１４内の参照パタンを順次変え、
各参照パタンについてバタンマツチング回路２５が出力
するマツチング歪１６を判定し、マツチング歪１６を最
小とする参照パタンのラベルを認識結果１８として出力
する。マツチング歪１６を最小とする参照パタンに対し
て、複数の雑音無し特徴ベクトル候補時系列２３の中か
ら最適選択されたベクトル時系列が、雑音の重畳した入
力音声から雑音を正しく除去した音声信号の雑音無し特
徴ベクトル時系列に近似しており、最小マツチング歪は
雑音無しの入力音声と雑音無しの自己カテゴリテンプレ
ートとの間のマツチング歪に相当する。D(k, n) = min (d(T (kl, Z
t (nl))1≦i5L (Formula 3) where d(*, *) represents the distortion between the feature vectors. Thereby, the optimal feature vector for the reference pattern 15 is selected from among the plurality of candidates. As in the conventional example, g (K, N) obtained in the recurrence formula (Formula 2) is normalized by dividing it by the urbanization distance and output as matching distortion 16. According to the principle of DP matching, if the partial distortion of (Equation 3) is minimized, the matching distortion of the entire batten is minimized. The recognition control circuit 13 sequentially changes the reference pattern in the template memory 14 specified by the address data 17,
The matching distortion 16 output by the baton matching circuit 25 is determined for each reference pattern, and the label of the reference pattern that minimizes the matching distortion 16 is output as the recognition result 18. With respect to the reference pattern that minimizes the matching distortion 16, the vector time series optimally selected from among the multiple noise-free feature vector candidate time series 23 is a speech signal from which noise has been correctly removed from the input speech with superimposed noise. It approximates the noise-free feature vector time series, and the minimum matching distortion corresponds to the matching distortion between the noise-free input speech and the noise-free self-category template.

なお、（式２）によるＤＰマツチングは一実施例として
挙げたもので、本発明の適用範囲を縛るものではない。Note that the DP matching according to Equation 2 is given as an example, and does not limit the scope of application of the present invention.

このように、雑音無し多重ベクトル生成回路２４では、
雑音が重畳した入力音声の特徴ベクトル時系列４を正し
い雑音除去信号を含む複数の雑音無し特徴ベクトル候補
時系列２３に冗長性を持たせて曖昧写像し、多重ベクト
ル認識処理回路２６において、雑音無しの参照パタンを
使用して尤度を最大化する処理により自動的に写像量を
最小化する雑音無し特徴ベクトル時系列を選択して認識
処理が行われる。In this way, in the noise-free multiple vector generation circuit 24,
The feature vector time series 4 of the input speech on which noise is superimposed is ambiguously mapped onto a plurality of noise-free feature vector candidate time series 23 containing correct noise-removed signals with redundancy, and the multiple vector recognition processing circuit 26 processes the noise-free feature vector time series 4 . Recognition processing is performed by automatically selecting a noise-free feature vector time series that minimizes the amount of mapping by maximizing the likelihood using the reference pattern.

従って、この実施例では前述の従来装置の問題点である
ところの、写像誤りによる歪及び複数候補の特徴ベクト
ルを平均することにより生じる歪がなくなり、それらを
原因とする認識率の低下が生じなくなる。Therefore, this embodiment eliminates distortion caused by mapping errors and distortion caused by averaging feature vectors of multiple candidates, which are the problems of the conventional device described above, and the recognition rate does not decrease due to these. .

以上、単語単位のテンプレートを登１．工する離散単語
音声認識装置を例にとり本発明の説明を行ったが、認識
単位は単語に限定されるものではなく、ＣＶ（子音−母
音）、■Ｃ■（母音−子音−母音）、ＣＶＣ（子音−母
音−子音）、単音節、形態素など任意の単位であっても
よく、特徴ベクトルの時系列でテンプレート登録するも
のであれば、上記実施例と同様の効果を奏する。またテ
ンプレートマツチングによる認識方式以外のＨＭ　Ｍ　
（ＩｌｉｄｄｅｎＭａｒｋｏｖ　Ｍｏｄｅｌ）のような
統計的認識手法におい°ζも、参照パタンの代わりに遷
移確率モデルを用い、各遷移確率モデルに対して生成確
率を最大にする雑音無し特徴ベクトルを雑音無し特徴ベ
クトル候補から選択することで同様の効果が得られる。Above are the word-by-word templates. The present invention has been explained using a discrete word speech recognition device as an example, but the recognition units are not limited to words, but include CV (consonant-vowel), ■C■ (vowel-consonant-vowel), CVC (consonant-vowel-consonant), single syllable, morpheme, or other arbitrary units may be used, and as long as the template is registered in time series of feature vectors, the same effect as in the above embodiment can be achieved. In addition, HM M other than the recognition method using template matching
In statistical recognition methods such as (Ilidden Markov Model), °ζ also uses a transition probability model instead of a reference pattern, and for each transition probability model, a noise-free feature vector that maximizes the generation probability is used as a noise-free feature vector candidate. A similar effect can be obtained by selecting from.

さらに、離散単語音声以外の連続発声された音声につい
ても、その連続発声を登録した音声の単位に切り出すセ
グメンテーション回路を本Ｗｉｔｈの前段に設置し、切
り出された音声を本装置の人力とすることで通用可能で
ある。また、本発明の実現手段は専用ハードウェアに限
定するものではなく、汎用の計算機や信号処理プロセッ
サにおけるソフトウェア処理によっても実現できること
はいうまでもない。さらに、音声以外のほかの音響信号
や画像信号、文字・図形信号などの認識装置にも、本発
明を拡張し適用することが可能である。Furthermore, for continuously uttered sounds other than discrete word sounds, a segmentation circuit that cuts out the continuous utterances into registered sound units is installed in the front stage of this With, and the cut out sounds can be used as the human power of this device. It is applicable. Furthermore, it goes without saying that the means for realizing the present invention is not limited to dedicated hardware, but can also be realized by software processing in a general-purpose computer or signal processing processor. Furthermore, the present invention can be expanded and applied to recognition devices for acoustic signals other than voice, image signals, text/graphic signals, and the like.

〔Effect of the invention〕

以上のように本発明によれば、雑音が重畳した入力音声
信号の特徴ベクトル時系列を雑音付加符号帳に従ってベ
クトル量子化し複数のラベル候補時系列を出力するベク
トル量子化手段と、このベクトル量子化手段の出力信号
である複数のラベル候補時系列を上記雑音無し符号帳に
従って逆ベクトル量子化し複数の雑音無し特徴ベクトル
候補時系列を出力する逆ベクトル量子化手段と、この逆
ベクトル量子化手段の出力信号である複数の雑音無し特
徴ベクトル候補時系列を入力し雑音無し特徴ベクトル時
系列を選択して音声認識処理を行う多重ベクトル認識処
理手段とを備えて構成したので、バタンマンチングの際
に参照バタンとの歪を最小化することを基準にして複数
個の雑音無し特徴ベクトルの候補の中から最適な雑音無
し特徴ベクトル時系列を選択でき、これにより雑音除去
処理による音声の特徴ベクトルには歪が生ずることがな
くなり、従って高雑音下においても認識率がきわめて高
いという効果が得られる。As described above, according to the present invention, there is provided a vector quantization unit that vector quantizes a feature vector time series of an input audio signal on which noise is superimposed according to a noise-added codebook and outputs a plurality of label candidate time series, and a vector quantization unit that outputs a plurality of label candidate time series. an inverse vector quantization means for inverse vector quantizing a plurality of label candidate time series, which are output signals of the means, according to the noiseless codebook and outputting a plurality of noiseless feature vector candidate time series; and an output of the inverse vector quantization means. The multi-vector recognition processing means inputs a plurality of noise-free feature vector candidate time series as signals, selects the noise-free feature vector time series, and performs speech recognition processing, so it can be referred to during slam munching. The optimal noise-free feature vector time series can be selected from among multiple candidates for noise-free feature vectors based on minimizing the distortion with the noise removal process. Therefore, even under high noise conditions, the recognition rate is extremely high.

[Brief explanation of the drawing]

第１図はこの発明の一実施例に係る音声認識装置の構成
を示すブロック図、第２図は従来の音声認識装置の構成
を示すブロック図である。６・・・雑音付加符号帳、９・・・雑音無し符号帳、２
０・・・曖昧ベクトル量子化器（ベクトル量子化手段）
、２２・・・逆曖昧ベクトル世子化器（逆ベクトル量子
化手段）、２６・・・多重ベクトル認識処理回路（多重
ベクトル認識処理手段）。代理人　　大　　岩　　増　　ａ（ほか２名）手続補正
書（自発）私′１年５月。９日３、補正をする者代表者　志　岐　守　哉５　補正の対象特許請求の範囲１発明の詳細な説明の欄。６　補正の内容（１）特許請求の範囲を別紙のとおり補正する。（２）明細書第４頁第３行目に「時系列」とあるのを「
時系列４」と補正する。（３）同書第４頁第５行目に「符合語の」とあるのを「
符合語を表すラベルで構成される」と補正する。（４）同書第７頁第１４行目乃至第１６行目に弐ｒ　　
　　　　　　　ｇ（ｋ、ｎ−１）＋Ｄ（ｋ、ｎ）ｇ（ｋ
、ｎ）＝　ｍｉｎ　　ｇ（ｋ−１，ｎ−１）＋２ＸＤ（
ｋ、ｎ）ｇ（ｋ−１，ｎ）＋Ｄ（ｋ、ｎ）　　　　　Ｊ
とあるのをと補正する。（４）同書第８頁第２行目及び第１７頁第９行目に「市
街化距離」とあるのを「市街化距離（に＋Ｎ）　Ｊと補
正する。（５）同書第９頁第２０行目に「従来の」とあるのを「
従来の雑音除去機能を持つ」と補正する。（６）同書第１０頁第１６行目に「符合語のラベルを示
す」とあるのを［符合語を表すラベルで構成される」と
補正する。（７）同書第１１頁第１２行目乃至第１３行目に「雑音
が重畳した音声信号を入力し、この入力音声信号の」と
あるのを「雑音が重畳した入力音声信号のＪと補正する
。（８）同書第１２頁第１６行目に「符号帳９」とあるの
を「符号帳６」と補正する。（９）同書第１３頁第７行目乃至第８行目に「マツチン
グ歪」とあるのを「マツチング歪１６」と補正する。（１０）同書第１４頁第８行目に「入力音声信号」とあ
るのをｒ入力音声信号２」と補正する。（１１）同書第１４頁第１９行目に「時系列」とあるの
を「時系列２３である」と補正する。（１２）同書第１５頁第１４行目に「雑音付加符号帳」
とあるのを「雑音付加符号帳６」と補正する。Ｊ裟よ２、特許請求の範囲入力音声信号に重畳している雑音を抑圧する雑音除去機
能を持ち人力音声を認識する音声認識装置において、雑
音が重畳していない音声の特徴ベクトルを符号語とする
雑音無し符号帳と、この雑音無し符号帳の各符号語に雑
音重畳と等価な処理を施し生成された雑音付加符号帳と
、雑音が重畳した入力音声信号の特徴ベクトル時系列を
上記雑音付加符号帳に従ってベクトル量子化し符合語に
表−すシレふル］■成より工Ａ複数のラベル候補時系列
を出力するベクトル量子化手段と、このベクトル量子化
手段の出力信号である複数のラベル候補時系列を上記雑
音無し符号帳に従って逆ベクトル量子化し複数の雑音無
し特徴ベクトル候補時系列を出力する逆ベクトル量子化
手段と、この逆ベクトル量子化手段の出力信号である複
数の雑音無し特徴ベクトル候補時系列を入力し雑音無し
特徴ベクトル時系列を選択して音声認識処理を行う多重
ベクトル認識処理手段とを備えたことを特徴とする音声
認識装置。FIG. 1 is a block diagram showing the structure of a speech recognition device according to an embodiment of the present invention, and FIG. 2 is a block diagram showing the structure of a conventional speech recognition device. 6...Noise-added codebook, 9...Noise-free codebook, 2
0... Ambiguous vector quantizer (vector quantization means)
, 22... Inverse ambiguous vector generator (inverse vector quantization means), 26... Multiple vector recognition processing circuit (multiple vector recognition processing means). Agent Masu Oiwa A (and 2 others) Procedural amendment (voluntary) May 2011. 9th, 3rd, Representative of the person making the amendment: Moriya Shiki 5 Detailed description of claim 1 invention subject to amendment. 6. Contents of amendment (1) The scope of claims will be amended as shown in the attached sheet. (2) In the 3rd line of page 4 of the specification, replace ``chronological order'' with ``
"Time series 4". (3) In the 5th line of page 4 of the same book, the phrase ``code word'' is replaced with ``
It is corrected as "consisting of labels representing code words." (4) 2r on page 7, lines 14 to 16 of the same book.
g(k,n-1)+D(k,n)g(k
, n)=min g(k-1, n-1)+2XD(
k, n) g (k-1, n) + D (k, n) J
I'll correct that. (4) "Urbanization distance" on page 8, line 2 of the same book and page 17, line 9 is corrected to "urbanization distance (+N) J." (5) Same book, page 9, "urbanization distance" In the 20th line, replace "conventional" with "
It has a conventional noise removal function.'' (6) In the same book, page 10, line 16, the phrase ``indicates the label of the code word'' is corrected to ``consists of the label representing the code word.'' (7) On page 11, lines 12 and 13 of the same book, the phrase "input an audio signal superimposed with noise, and correct this input audio signal" is replaced with "J of the input audio signal superimposed with noise". (8) In the same book, page 12, line 16, "Codebook 9" is corrected to "Codebook 6." (9) In the same book, page 13, line 7 to line 8, the phrase "matching distortion" is corrected to "matching distortion 16." (10) In the same book, page 14, line 8, "input audio signal" is corrected to read "r input audio signal 2". (11) In the same book, page 14, line 19, the phrase "time series" is corrected to "time series 23." (12) “Noise-added codebook” on page 15, line 14 of the same book.
The text has been corrected to read "noise-added codebook 6." J裟YO 2, Claims In a speech recognition device that recognizes human speech and has a noise removal function that suppresses noise superimposed on an input speech signal, a feature vector of speech on which no noise is superimposed is used as a code word. A noise-free codebook, a noise-added codebook generated by performing processing equivalent to noise superimposition on each code word of this noise-free codebook, and a noise-added codebook that generates the feature vector time series of the noise-superimposed input speech signal. A vector quantization means that outputs a plurality of label candidate time series, and a plurality of label candidates that are output signals of this vector quantization means. Inverse vector quantization means for inverse vector quantization of a time series according to the noiseless codebook and outputting a plurality of noiseless feature vector candidate time series, and a plurality of noiseless feature vector candidates which are output signals of the inverse vector quantization means. A speech recognition device comprising: multi-vector recognition processing means for inputting a time series, selecting a noise-free feature vector time series, and performing speech recognition processing.

Claims

[Claims]

In a speech recognition device that recognizes input speech and has a noise removal function that suppresses noise superimposed on an input speech signal, a noise-free codebook that uses feature vectors of speech on which no noise is superimposed as codewords, and this noise A noise-added codebook is generated by applying processing equivalent to noise superimposition to each codeword of the non-codebook, and the feature vector time series of the input speech signal with noise superimposed is vector quantized according to the above-mentioned noise-added codebook, and the codewords are A vector quantization means for outputting a plurality of label candidate time series indicating labels, and a plurality of label candidate time series, which are output signals of the vector quantization means, are inverse vector quantized according to the noiseless codebook to generate a plurality of noiseless features. An inverse vector quantization means that outputs a vector candidate time series, and a plurality of noise-free feature vector candidate time series that are output signals of this inverse vector quantization means are input, and a noise-free feature vector time series is selected to perform speech recognition processing. 1. A speech recognition device comprising: multi-vector recognition processing means for performing.