JPH01274198A - Speech recognition device - Google Patents

Speech recognition device

Info

Publication number
JPH01274198A
JPH01274198A JP63104769A JP10476988A JPH01274198A JP H01274198 A JPH01274198 A JP H01274198A JP 63104769 A JP63104769 A JP 63104769A JP 10476988 A JP10476988 A JP 10476988A JP H01274198 A JPH01274198 A JP H01274198A
Authority
JP
Japan
Prior art keywords
noise
vector
time series
feature vector
codebook
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP63104769A
Other languages
Japanese (ja)
Other versions
JPH0766271B2 (en
Inventor
Tadashi Suzuki
忠 鈴木
Kunio Nakajima
中島 邦男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP63104769A priority Critical patent/JPH0766271B2/en
Publication of JPH01274198A publication Critical patent/JPH01274198A/en
Publication of JPH0766271B2 publication Critical patent/JPH0766271B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE:To improve the recognition rate in the presence of a large noise by selecting the best noiseless feature vector time series among noiseless feature vector candidates based upon the minimization of distortion with a reference pattern at the time of pattern matching. CONSTITUTION:A vector quantizing means (ambiguity vector quantizer 20) inputs a speech signal on which a noise is superposed and performs the vector quantization of a feature vector time series according to a noise added code table 6, and a reverse vector quantizing means (reverse ambiguity vector quantizer 22) performs the reverse vector quantization of label candidate time series according to a noiseless code table 9. Then a multiple vector recognition processing means (multiple vector recognition processing circuit 26) select a noiseless feature vector time series among input noiseless feature vector candidate time series and performs speech recognition processing. Consequently, the feature vector of a speech is not distorted by the noise removal processing and the recognition rate of the speech does not decrease even in the presence of a loud noise.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は入力音声信号に重畳した雑音を抑圧する雑音
除去機能を持ち入力音声を認識する音声認識装置に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device that recognizes input speech and has a noise removal function that suppresses noise superimposed on an input speech signal.

〔従来の技術〕[Conventional technology]

音声のスペクトル情報を用いた音声認識では、雑音重畳
による音声のスペクトルの変形は認識性能を著しく低下
させる。それゆえ、音声認識装置を実用に供するために
は雑音に対する耐性の向上は重要な問題である。環境騒
音による雑音混入を抑えるためにノイズキャンセルマイ
クがよく使用されているが、それでも十分な信号対雑音
比(S/N)が得られない場合や音声信号の伝送過程に
おいて雑音が重畳する場合があり、このような既に雑音
の混入した音声信号から雑音のみを除去しS/Nを改善
しようとする信号処理技術は雑音抑圧・雑音除去・音声
強調などと呼ばれ、多数の方式提案がなされている。近
時、新しい概念に基づく雑音抑圧法として、ベクトル量
子化を用いて生成された雑音重畳信号空間と雑音無し信
号空間の既知の写像関係により、信号検出の立場から雑
音を除去する5pectral Mapping法と称
する方式が、論文rBiing−Hwang Juan
g+L、RoRabiner、  SignalRes
toration by 5pectral Mapp
ing法、1987 IEEEINTERNATION
AL C0NFERENCE ON ACOUSTIC
3,5PEECH。
In speech recognition using speech spectral information, deformation of the speech spectrum due to noise superposition significantly reduces recognition performance. Therefore, improving the resistance to noise is an important issue in putting speech recognition devices into practical use. Noise-cancelling microphones are often used to suppress noise contamination caused by environmental noise, but there are still cases where a sufficient signal-to-noise ratio (S/N) cannot be obtained or noise is superimposed during the audio signal transmission process. Signal processing techniques that attempt to improve the S/N ratio by removing only noise from a voice signal that already contains noise are called noise suppression, noise removal, voice enhancement, etc., and many methods have been proposed. There is. Recently, as a noise suppression method based on a new concept, the 5pectral mapping method removes noise from the standpoint of signal detection using a known mapping relationship between a noise-superimposed signal space and a noise-free signal space generated using vector quantization. The method called Biing-Hwang Juan
g+L, RoRabiner, SignalRes
toration by 5pectral Mapp
ing method, 1987 IEEE INTERNATION
AL C0NFERENCE ON ACUSTIC
3,5PEECH.

& 5IGNAL PROCESSING Volum
e 4 PP、6.6.1−6.6.4April 1
987. DallasJ  (以下ではこの論文を文
献〔1〕と引用する)において提案されており、音声伝
送用雑音抑圧方式として有効とされている。
& 5IGNAL PROCESSING Volume
e 4 PP, 6.6.1-6.6.4 April 1
987. Dallas J (hereinafter this paper will be referred to as Document [1]), and is considered to be effective as a noise suppression method for voice transmission.

この5pectral Mapping法による雑音除
去回路を組み込んだ音声認識装置の構成例として、第2
図が考えられる。認識装置における認識方式は種々ある
が、単語単位のテンプレートを持ち、DP(Dynam
ic Programming)マツチングによる離散
単語認識装置を例として説明する。
As a configuration example of a speech recognition device incorporating a noise removal circuit using this 5pectral mapping method, the second
A diagram can be considered. There are various recognition methods in recognition devices, but they have word-based templates and DP (Dynam
ic Programming) A discrete word recognition device using matching will be explained as an example.

第2図において、1は音声信号の入力端、2は入力音声
信号、3は入力音声信号2を音響分析する分析回路、4
は特徴ベクトル時系列、5は分析回路から出力される特
徴ベクトル時系列を雑音付加符号帳6でベクトル量子化
するベクトル量子化器、7は符号語のラベル候補時系列
、8はベクトル量子化器5の出力であるところのラベル
候補時系列7を雑音無し符号帳9で逆ベクトル量子化す
る逆ベクトル量子化器、10は雑音無し特徴ベクトル時
系列、11は前記文献〔1〕に示されたベクトル量子化
器5と雑音付加符号帳6と逆ベクトル量子化器8と雑音
無し符号帳9とで構成される雑音除去回路である。12
は雑音無し特徴ベクトル時系列10と、テンプレートメ
モリ14から出力される雑音無し特徴ベクトル時系列で
表現される参照バタン15とのDPマツチングを行い、
マツチング歪16を出力するバタンマツチング回路、1
3はバタンマツチングにおける参照パタンの指定と認識
結果の出力を行う認識制御回路、17は認識制御回路1
3が参照パタンを指定するためにチンプレニトメモリ1
4に送るアドレスデータ、18は認識結果、19はバタ
ンマツチング回路12とテンプレートメモリ14と認識
制御回路13とで構成される認識処理回路である。
In FIG. 2, 1 is an input terminal for audio signals, 2 is an input audio signal, 3 is an analysis circuit for acoustically analyzing the input audio signal 2, and 4
is a feature vector time series, 5 is a vector quantizer that vector-quantizes the feature vector time series output from the analysis circuit using a noise-added codebook 6, 7 is a label candidate time series of codewords, and 8 is a vector quantizer An inverse vector quantizer that performs inverse vector quantization on the label candidate time series 7, which is the output of 5, using a noise-free codebook 9; 10 is a noise-free feature vector time series; 11 is shown in the above-mentioned document [1]. This is a noise removal circuit composed of a vector quantizer 5, a noise-added codebook 6, an inverse vector quantizer 8, and a noise-free codebook 9. 12
performs DP matching between the noise-free feature vector time series 10 and the reference button 15 expressed by the noise-free feature vector time series output from the template memory 14;
Bump matching circuit that outputs matching distortion 16, 1
3 is a recognition control circuit that specifies a reference pattern in baton matching and outputs a recognition result; 17 is a recognition control circuit 1;
3 to specify the reference pattern
4 is the address data to be sent, 18 is the recognition result, and 19 is a recognition processing circuit composed of a slam matching circuit 12, a template memory 14, and a recognition control circuit 13.

次に動作について説明する。Next, the operation will be explained.

雑音除去回路11において、雑音無し符号帳9は雑音が
重畳していない音声の特徴ベクトルを符号語として構成
され、雑音付加符号帳6は雑音無し符号帳9の各符号語
に対し、雑音重畳入力音声信号の雑音様態と同じになる
ような処理、例えば、雑音重畳人力音声信号2と同一の
信号対雑音比になるように、時間波形領域で雑音を付加
し、これを再分析して雑音付加特徴ベクトルに変換する
ことで生成した符号語で構成される。入力端1に入力さ
れた雑音が重畳した入力音声信号2は分析回路3で音響
分析され、特徴ベクトル時系列4である(X(nl j
 n = 1 、 2.−、 N)として出力される。
In the noise removal circuit 11, the noise-free codebook 9 is configured with speech feature vectors on which no noise is superimposed as codewords, and the noise-added codebook 6 is configured with noise-superimposed input for each codeword of the noise-free codebook 9. Processing to make the noise mode the same as that of the voice signal, for example, adding noise in the time waveform domain so that the signal-to-noise ratio is the same as the noise superimposed human voice signal 2, and reanalyzing this to add noise. It consists of code words generated by converting them into feature vectors. The input audio signal 2 inputted to the input terminal 1 and superimposed with noise is acoustically analyzed by the analysis circuit 3, and a feature vector time series 4 is obtained (X(nl j
n = 1, 2. -, N).

ここでNは特徴ベクトルの数を示す。雑音除去回路11
において、ベクトル量子化器5は特徴ベクトル時系列4
を入力とし、任意のベクトル番号nに対応するX (n
)について雑音付加符号帳6の全符号語との尤度を求め
、尤度が大きい方から第り位までの符号語のラベルをラ
ベル候補(m4(n)l i = 1.2.−、L)と
し、これをn=1.2゜−、Nについて求め、ラベル候
補時系列7である(M(nl : n = 1 、 2
.−−、  N)  (ただしM (n) =(m= 
(n) l i = 1. 2.−−、  l) )と
して出力する。Lは、1または2以上の整数である。逆
ベクトル量子化器8はラベル候補時系列7である(M(
nl i n = 1 、 2.−−、  N)  (
ただしM(n)= (mi(nl l + = 1 、
 2.−−−、  L) )の任意のnについて、ラベ
ル候補M(nl= (m、(nl l i = 1.2
. ・−、L)を雑音無し符号帳9で逆ベクトル量子化
し雑音無し特徴ベクトル候補(Z; (nl l i 
= 1 、 2、−。
Here, N indicates the number of feature vectors. Noise removal circuit 11
In the vector quantizer 5, the feature vector time series 4
As input, X (n
) with all the codewords in the noise-added codebook 6, and the labels of the codewords from the one with the highest likelihood to the highest rank are used as label candidates (m4(n)l i = 1.2.-, L), this is calculated for n = 1.2°-, N, and the label candidate time series 7 is (M(nl: n = 1, 2
.. --, N) (where M (n) = (m=
(n) l i = 1. 2. --, l)). L is an integer of 1 or 2 or more. The inverse vector quantizer 8 is the label candidate time series 7 (M(
nl i n = 1, 2. --, N) (
However, M(n) = (mi(nl l + = 1,
2. ---, L) ), label candidate M(nl= (m, (nl l i = 1.2
..・-, L) is inverse vector quantized using the noiseless codebook 9 and a noiseless feature vector candidate (Z; (nl l i
= 1, 2, -.

L)を求め、このL個の雑音無し特徴ベクトル候補(Z
+ (nl l i =1.2.−・−、Llの平均ベ
クトルとしてY (n)を求める。
L), and calculate these L noise-free feature vector candidates (Z
+ (nl l i =1.2.--, find Y (n) as the average vector of Ll.

すなわち、 L yt(nl=  □ Σ2い、(nl   (式1)%
式% ここで、yt(n)、ztl、(n)はそれぞれ、Y 
(n)、Z、(nlの第を次元の成分である。
That is, L yt(nl= □ Σ2, (nl (Equation 1)%
Formula % Here, yt(n), ztl, (n) are respectively Y
(n), Z, (nl-th dimension component.

認識処理回路19において、テンプレートメモリ14は
認識制御回路13が出力するアドレスデータ17で指定
される参照パタン15である(T(kllk=1.2.
 −・・、Kl(Kは特徴ベクトルの数)をバタンマツ
チング回路12に送出する。バタンマツチング回路12
は上記参照バタン15である(T(k)l k=1.2
.・−1K)と雑音除去回路11の出力であるところの
雑音無し特徴ベクトル時系列10である(Y(nl l
 n = 1 、 2、−、 N)とのDPマツチング
を行う。DPマツチングの漸化式は例えば次のようにな
る。
In the recognition processing circuit 19, the template memory 14 is a reference pattern 15 specified by the address data 17 output from the recognition control circuit 13 (T(kllk=1.2.
-..., Kl (K is the number of feature vectors) are sent to the matching circuit 12. Battan matching circuit 12
is the above reference button 15 (T(k)l k=1.2
.. -1K) and the noise-free feature vector time series 10 which is the output of the noise removal circuit 11 (Y(nl l
DP matching is performed with n = 1, 2, -, N). For example, the recurrence formula for DP matching is as follows.

g (k、n−1)+D(k、n) g(k、n)  −min  g(k−1,n−1)+
2XD(k、n)g (k−1,n)+D(k、n) (式2) ここで、D(k、n)は特徴ベクトルT (klと特徴
ベクトルY (n)との歪である。(弐2)は−例とし
て傾斜制限なしのDPマツチングの場合を挙げたちので
ある。この漸化式(式2)において得られたg(K、N
)を市街化距離で割ることで正規化し、マツチング歪1
6として出力する。認識制御回路13は、アドレスデー
タ17で指定する参照パタンを順次変え、各参照パタン
についてバタンマツチング回路12が出力するマツチン
グ歪16から、マツチング歪を最小とする参照パタンの
ラベルを認識結果18として出力する。
g (k, n-1) + D (k, n) g (k, n) -min g (k-1, n-1) +
2XD(k,n)g(k-1,n)+D(k,n) (Formula 2) Here, D(k,n) is the distortion between the feature vector T(kl and the feature vector Y(n)) (2) takes the case of DP matching without slope restriction as an example. g(K, N
) is normalized by dividing by the urbanization distance, and the matching distortion 1
Output as 6. The recognition control circuit 13 sequentially changes the reference patterns specified by the address data 17 and selects the label of the reference pattern that minimizes the matching distortion from the matching distortion 16 output by the button matching circuit 12 for each reference pattern as the recognition result 18. Output.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

5pectral Mapping法による雑音除去に
おいて根幹をなす、雑音重畳信号空間と雑音無し信号空
間との写像関係は、雑音無し符号帳9に雑音重畳と等価
な処理を加えて雑音付加符号帳6を作ることで形成され
る。しかし、実際の雑音は統計的分散をもつため、前記
写像関係の生成において等価的に加えた雑音と人力音声
信号2に重畳している雑音との差により写像関係に誤り
が生じる。また、入力音声信号2に重畳している雑音の
レベルが上がるに従い、入力音声信号2のスペクトル包
絡は平滑化する。そのため分析回路3より出力される特
徴ベクトル時系列4に含まれる音韻特徴性が消滅し、前
記の写像誤りは甚だしく増加する。例えば雑音除去回路
11におけるラベル候補の数りを1にすると、雑音の分
散による写像誤りが生じた場合、その写像誤りによる歪
がそのままバタンマツチングにおけるマツチング歪16
に反映するので、重畳雑音のレベルが上がるに従い認識
性能は急激に低下する。また、Lを2以上の数にして候
補数を増やせば、複数の候補の中に正しい写像による候
補が含まれる確率が高くなるが、その複数の候補の中の
どの候補が正しい写像による候補なのかはわからない。
The mapping relationship between the noise-superimposed signal space and the noise-free signal space, which is the basis of noise removal using the 5-pectral mapping method, can be achieved by creating the noise-added codebook 6 by adding processing equivalent to noise superimposition to the noise-free codebook 9. It is formed. However, since actual noise has statistical variance, errors occur in the mapping relationship due to the difference between the noise equivalently added in generating the mapping relationship and the noise superimposed on the human voice signal 2. Furthermore, as the level of noise superimposed on the input audio signal 2 increases, the spectral envelope of the input audio signal 2 becomes smoother. As a result, the phonological characteristics included in the feature vector time series 4 output from the analysis circuit 3 disappear, and the above-mentioned mapping error increases significantly. For example, when the number of label candidates in the noise removal circuit 11 is set to 1, if a mapping error occurs due to noise dispersion, the distortion due to the mapping error will be the matching distortion 16 in the slam matching.
Therefore, as the level of superimposed noise increases, recognition performance decreases rapidly. Furthermore, if you increase the number of candidates by setting L to a number greater than or equal to 2, the probability that a candidate based on the correct mapping will be included among the multiple candidates will increase, but which of the multiple candidates is the candidate based on the correct mapping? I don't know.

そのため複数の候補の特徴ベクトルの平均を出力とする
改良策を、前記文献〔1〕では使用している。しかしこ
の平均化処理により、本来選ばれるべき候補の特徴ベク
トルに誤った候補の特徴ベクトルの成分が混入するため
歪が生じ、認識性能は低下する。従って、前記文献〔1
〕の方法は、音声伝送において聴覚上の信号対雑音比を
改善する効果はあるが、音声認識に適用する場合は効果
がない。このように従来の音声認識装置では、雑音除去
処理によって音声の特徴ベクトルに歪が生じ、高雑音下
においては音声の認識率が低下するという問題点があっ
た。
Therefore, the above-mentioned document [1] uses an improvement measure in which the average of feature vectors of a plurality of candidates is output. However, due to this averaging process, components of the incorrect candidate feature vectors are mixed into the candidate feature vectors that should have been selected, causing distortion and deteriorating recognition performance. Therefore, the above document [1
] method is effective in improving the auditory signal-to-noise ratio in speech transmission, but is ineffective when applied to speech recognition. As described above, the conventional speech recognition apparatus has a problem in that the noise removal process causes distortion in the speech feature vector, and the speech recognition rate decreases under high noise.

本発明は上記のような問題点を解消するためになされた
もので、雑音除去処理によって音声の特徴ベクトルに歪
を与えず、高雑音下においても音声の認識率が低下しな
い音声認識装置を得ることを目的とする。
The present invention has been made in order to solve the above-mentioned problems, and provides a speech recognition device that does not distort the speech feature vector through noise removal processing and does not reduce the speech recognition rate even under high noise. The purpose is to

〔課題を解決するための手段〕[Means to solve the problem]

この発明に係る音声認識装置は、雑音が重畳していない
音声の特徴ベクトルを符号語とする雑音無し符号帳9と
、この雑音無し符号帳9の各符号語に雑音重畳と等価な
処理を施し生成された雑音付加符号帳6と、雑音が重畳
した入力音声信号の特徴ベクトル時系列を上記雑音付加
符号帳6に従ってベクトル量子化し符号語のラベルを示
す複数のラベル候補時系列を出力するベクトル量子化手
段(曖昧ベクトル量子化器20)と、このベクトル量子
化手段(曖昧ベクトル量子化器20)の出力信号である
複数のラベル候補時系列を上記雑音無し符号帳9に従っ
て逆ベクトル量子化し複数の雑音無し特徴ベクトル候補
時系列を出力する逆ベクトル量子化手段(逆曖昧ベクト
ル量子化器22)と、この逆ベクトル量子化手段(逆曖
昧ベクトル量子化器22)の出力信号である複数の雑音
無し特徴ベクトル候補時系列を入力し雑音無し特徴ベク
トル時系列を選択して音声認識処理を行う多重ベクトル
認識処理手段(多重ベクトル認識処理回路26)とを備
えたことを特徴とするものである。
The speech recognition device according to the present invention includes a noise-free codebook 9 whose codewords are feature vectors of speech on which no noise is superimposed, and a process equivalent to noise superimposition on each codeword of the noise-free codebook 9. A vector quantizer that vector quantizes the generated noise-added codebook 6 and the feature vector time series of the input speech signal on which noise is superimposed according to the noise-added codebook 6 and outputs a plurality of label candidate time series indicating labels of code words. quantization means (ambiguous vector quantizer 20) and a plurality of label candidate time series, which are output signals of this vector quantization means (ambiguous vector quantizer 20), are subjected to inverse vector quantization according to the noiseless codebook 9, and a plurality of label candidate time series are An inverse vector quantization means (inverse ambiguous vector quantizer 22) that outputs a noise-free feature vector candidate time series, and a plurality of noise-free signals that are output signals of this inverse vector quantization means (inverse ambiguous vector quantizer 22) The present invention is characterized by comprising a multi-vector recognition processing means (multi-vector recognition processing circuit 26) that inputs a feature vector candidate time series, selects a noise-free feature vector time series, and performs speech recognition processing.

〔作用〕[Effect]

ベクトル量子化手段(曖昧ベクトル量子化器20)は、
雑音が重畳した音声信号を人力し、この入力音声信号の
特徴ベクトル時系列を雑音付加符号帳6に基づいてベク
トル量子化し、複数のラベル候補時系列を逆ベクトル量
子化手段(逆曖昧ベクトル量子化器22)に与える。逆
ベクトル量子化手段(逆曖昧ベクトル量子化器22)は
、入力された複数のラベル候補時系列を雑音無し符号帳
9に基づいて逆ベクトル量子化し、複数の雑音無し特徴
ベクトル候補時系列を多重ベクトル認識処理手段(多重
ベクトル認識処理回路26)に与える。多重ベクトル認
識処理手段(多重ベクトル認識処理回路26)は入力さ
れた複数の雑音無し特徴ベクトル候補時系列から雑音無
し特徴ベクトル時系列を選び出し音声認識処理を行う。
The vector quantization means (ambiguous vector quantizer 20) is
A speech signal on which noise is superimposed is manually processed, the feature vector time series of this input speech signal is vector quantized based on the noise-added codebook 6, and a plurality of label candidate time series are subjected to inverse vector quantization means (inverse ambiguous vector quantization). 22). The inverse vector quantization means (inverse ambiguous vector quantizer 22) performs inverse vector quantization on the input plurality of label candidate time series based on the noiseless codebook 9, and multiplexes the plurality of noiseless feature vector candidate time series. The vector recognition processing means (multiple vector recognition processing circuit 26) is provided with the vector recognition processing means (multiple vector recognition processing circuit 26). The multiple vector recognition processing means (multiple vector recognition processing circuit 26) selects a noise-free feature vector time series from a plurality of input noise-free feature vector candidate time series and performs speech recognition processing.

〔発明の実施例〕[Embodiments of the invention]

第1図はこの発明の一実施例に係る音声認識装置の構成
を示すブロック図である。第1図において、第2図に示
す構成要素に対応するものには同一の参照符を付し、そ
の説明を省略する。この実施例は、従来例と同様、単語
単位のテンプレートとのDPマツチングにより認識を行
うM徹単語音声認識装置を例として説明する。
FIG. 1 is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention. In FIG. 1, components corresponding to those shown in FIG. 2 are given the same reference numerals, and their explanations will be omitted. This embodiment will be explained by taking as an example an M-word speech recognition device that performs recognition by DP matching with a word-by-word template, as in the conventional example.

第1図において、20は分析回路3の出力信号であると
ころの特徴ベクトル時系列4を雑音付加符号帳9を用い
てベクトル量子化し、複数のラベル候補時系列21を出
力するベクトル量子化手段としての曖昧ベクトル量子化
器、22は複数のラベル候補時系列21を雑音無し符号
帳9で逆ベクトル量子化し、複数の雑音無し特徴ヘクト
ル候補時系列23を出力する逆ベクトル量子化手段とし
ての逆曖昧ベクトル量子化器、24は雑音付加符号帳6
と雑音無し符号帳9と曖昧ベクトル量子化器20と逆曖
昧ベクトル量子化器22で構成される雑音無し多重ベク
トル生成回路、25は複数の雑音無し特徴ベクトル候補
時系列23を入力として参照バタン15とのDPマツチ
ングを行い、マツチング歪を出力する多重ベクトルバタ
ンマツチング回路、26は多重ベクトルバタンマツチン
グ回路25とテンプレートメモリ14と認識制御回路1
3とで構成される多重ベクトル認識処理手段としての多
重ベクトル認識処理回路である。
In FIG. 1, 20 is a vector quantization means that vector quantizes the feature vector time series 4, which is the output signal of the analysis circuit 3, using a noise-added codebook 9, and outputs a plurality of label candidate time series 21. The ambiguous vector quantizer 22 is an inverse vector quantizer that performs inverse vector quantization on a plurality of label candidate time series 21 using a noiseless codebook 9 and outputs a plurality of noiseless feature hector candidate time series 23. Vector quantizer, 24 is a noise-added codebook 6
A noise-free multiple vector generation circuit consisting of a noise-free codebook 9, an ambiguous vector quantizer 20, and an inverse ambiguous vector quantizer 22; 25 is a reference button 15 using a plurality of noise-free feature vector candidate time series 23 as input; 26 is a multi-vector bump matching circuit 25, a template memory 14, and a recognition control circuit 1.
3. This is a multi-vector recognition processing circuit as a multi-vector recognition processing means consisting of 3 and 3.

この音声認識装置は、曖昧ベクトル量子化(Fuzzy
 Vector Quantization)により、
入力信号の雑音除去とバタン認識を同時的に実行する新
規方式を採用し、雑音が重畳した入力音声信号の特徴ベ
クトルを入力して複数の雑音無し特徴ベクトル候補時系
列23を出力する雑音無し多重ベクトル生成回路24と
、こ、の雑音無し多重ベクトル生成回路24の出力信号
である複数の雑音無し特徴ベクトル候補時系列23を入
力として認識を行う多重ベクトル認識処理回路26とを
備える。上述したように、雑音無し多重ベクトル性成回
路24は、雑音が重畳していない音声の特徴ベクトルを
符号語とする雑音無し符号帳9と、この雑音無し符号帳
9の各符号語に雑音重畳と等価な処理を施し生成した雑
音付加符号帳6と、雑音が重畳した入力音声信号の特徴
ベクトル時系列4を上記雑音付加符号帳6に従ってベク
トル量子化し複数のラベル候補時系列21を出力する曖
昧ベクトル量子化器20と、曖昧ベクトル量子化器2o
の出力信号であるところの複数のラベル候補時系列21
を上記雑音無し符号帳9に従って逆ベクトル量子化し複
数の雑音無し特徴ベクトル候補時系列23を出力する逆
曖昧ベクトル量子化器22とで構成される。
This speech recognition device uses fuzzy vector quantization (Fuzzy
Vector Quantization)
Noise-free multiplexing employs a new method that simultaneously performs input signal noise removal and button recognition, and outputs a plurality of noise-free feature vector candidate time series 23 by inputting the feature vectors of the input audio signal with superimposed noise. It includes a vector generation circuit 24 and a multiple vector recognition processing circuit 26 that performs recognition by inputting a plurality of noise-free feature vector candidate time series 23, which are output signals of the noise-free multiple vector generation circuit 24. As described above, the noise-free multi-vector generation circuit 24 generates a noise-free codebook 9 whose codewords are voice feature vectors on which no noise is superimposed, and a noise-free multivector generator 24 that generates a noise-free codebook 9 in which codewords are feature vectors of speech on which no noise is superimposed, and a noise-free multivector generating circuit 24 that generates a noise-free codebook 9 in which codewords are feature vectors of speech on which no noise is superimposed. An ambiguous method that vector quantizes the noise-added codebook 6 generated by performing processing equivalent to the noise-added codebook 6 and the feature vector time series 4 of the input speech signal on which noise is superimposed according to the noise-added codebook 6 to output a plurality of label candidate time series 21. Vector quantizer 20 and ambiguous vector quantizer 2o
A plurality of label candidate time series 21 which are the output signals of
and an inverse ambiguous vector quantizer 22 that performs inverse vector quantization on the noise-free feature vector candidate time series 23 according to the noise-free codebook 9 and outputs a plurality of noise-free feature vector candidate time series 23.

多重ベクトル認識処理回路26は、逆゛曖昧ベクトル量
子化器22の出力信号である複数の雑音無し特徴ベクト
ル候補時系列(Z(nNn=1,2゜・−、N)(ただ
しZ(n)= (Z; fn) I i = 1 、 
2゜−・−2L))を入力し、任意のnにおける雑音無
し特徴ベクトル候補Z(n)= (Zt fn) l 
i = 1. 2゜−、L)のL個の候補の中で、各参
照バタン毎に参照パタンとの尤度を最大化する候補を最
適な雑音無し特徴ベクトルとして選択することにより、
最終的にn−Nに至る全時系列について最大の尤度を与
える参照バタンを認識カテゴリと判定すると同時に、最
適な雑音無し特徴ベクトル時系列を決定する処理を行う
The multiple vector recognition processing circuit 26 processes a plurality of noise-free feature vector candidate time series (Z(nNn=1,2°·-,N) (where Z(n) = (Z; fn) I i = 1,
2゜-・-2L)), and the noise-free feature vector candidate Z(n)=(Zt fn) l at arbitrary n.
i = 1. 2°-, L), by selecting the candidate that maximizes the likelihood with the reference pattern for each reference pattern as the optimal noise-free feature vector.
The reference button that finally gives the maximum likelihood for all time series n-N is determined to be the recognition category, and at the same time, processing is performed to determine the optimal noise-free feature vector time series.

この実施例の動作を説明する。The operation of this embodiment will be explained.

曖昧ヘクトル量子化器20は、入力音声の特徴ベクトル
時系列4である(X(nl l n= 1.2.−・。
The ambiguous vector quantizer 20 is a feature vector time series 4 of the input speech (X(nl l n= 1.2.-.

N)を入力とし、任意のnに対する特徴ベクトルX (
nlと雑音付加符号帳の全符号語との先度を求め、尤度
が大きい方から第り位までの符号語のラベルをラベル候
補(mt (n) l i = 1. 2.−−−、 
 L)とし、これをn=1.2.・−、Nについて求め
複数のラベル候補時系列21  (M(nl l n=
 1. 2.−。
N) as input, and the feature vector X (
Find the precedence between nl and all codewords in the noise-added codebook, and select the labels of the codewords from the one with the highest likelihood to the highest likelihood as label candidates (mt (n) l i = 1. 2. --- ,
L), and n=1.2.・−, N, multiple label candidate time series 21 (M(nl l n=
1. 2. −.

Nl  (ただしM(n)=(ml(n)11=1.2
.−・−9L))として出力する。逆曖昧ベクトル量子
化器22は、複数のラベル候補時系列21である(M(
nl i n = 1 、 2.−−−、 Nl  (
ただしM (n) = (rn t(nl l i =
 1 、 2.−−、  L) )の任意のnに対応す
るラベル候補M(n)= (mt (n) l i =
 1 、 2、−。
Nl (where M(n)=(ml(n)11=1.2
.. -・-9L)). The inverse ambiguity vector quantizer 22 generates a plurality of label candidate time series 21 (M(
nl i n = 1, 2. ---, Nl (
However, M (n) = (rn t(nl l i =
1, 2. --, L) ) label candidate M(n) = (mt (n) l i =
1, 2, -.

L)を雑音無し符号帳9で逆ベクトル量子化し、複数の
雑音無し特徴ベクトル候補(Zt (n) l i =
1.2.−・・、L)を得て、これをn= 1.2.−
−−。
L) is inverse vector quantized using the noiseless codebook 9, and multiple noiseless feature vector candidates (Zt (n) l i =
1.2. -..., L) and convert this to n= 1.2. −
--.

Nについて求め、複数の雑音無し特徴ベクトル候補時系
列23である(Z(n) l n = 1 、 2 、
−−、 Nl(ただしz(n)= (Z! (nl I
 i = 1. 2.−、  L) )として出力する
N, and a plurality of noise-free feature vector candidate time series 23 (Z(n) l n = 1, 2,
--, Nl (where z(n)= (Z! (nl I
i = 1. 2. -, L)).

多重ベクトルバタンマツチング回路25は逆曖昧ベクト
ル量子化器22の出力信号であるところの複数の雑音無
し特徴ベクトル候補時系列23である(Z(n) l 
n = 1 、 2、−、 N)  (ただしZ (n
)= (Zz (nl l i = 1. 2.−−、
  L) )を入力し、参照パタン15 (T(kl 
l k= 1. 2.・−・、K)(Kは特徴ベクトル
の数)とのDPマツチング(式2)を行う。ただし、複
数の雑音無し特徴ベクトル候補時系列23は任意のnに
ついてL個の特徴ベクトル候補を持つので、任意のkと
nにおける歪D(k、n)を次式のように定義する。
The multi-vector bump matching circuit 25 generates a plurality of noise-free feature vector candidate time series 23, which are the output signals of the inverse ambiguous vector quantizer 22 (Z(n) l
n = 1, 2, -, N) (however, Z (n
) = (Zz (nl l i = 1. 2.--,
), and input reference pattern 15 (T(kl
lk=1. 2. .--, K) (K is the number of feature vectors) DP matching (Equation 2) is performed. However, since the multiple noise-free feature vector candidate time series 23 has L feature vector candidates for any n, the distortion D(k, n) at any k and n is defined as follows.

D(k、n) = min (d(T (kl、  Z
t (nl))1≦i5L        (式3) ただし、d(*、*)は特徴ベクトル間の歪を表す。こ
れにより、参照パタン15に対して最適な特徴ベクトル
が複数の候補の中から選択される。従来例と同様、漸化
式(式2)において得られたg(K、N)を市街化距離
で割ることで正規化し、マツチング歪16として出力す
る。DPマツチングの原理により、(式3)の部分歪を
最小化すれば、バタン全体のマツチング歪が最小化され
る。認識制御回路13は、アドレスデータ17で指定す
るテンプレートメモリ14内の参照パタンを順次変え、
各参照パタンについてバタンマツチング回路25が出力
するマツチング歪16を判定し、マツチング歪16を最
小とする参照パタンのラベルを認識結果18として出力
する。マツチング歪16を最小とする参照パタンに対し
て、複数の雑音無し特徴ベクトル候補時系列23の中か
ら最適選択されたベクトル時系列が、雑音の重畳した入
力音声から雑音を正しく除去した音声信号の雑音無し特
徴ベクトル時系列に近似しており、最小マツチング歪は
雑音無しの入力音声と雑音無しの自己カテゴリテンプレ
ートとの間のマツチング歪に相当する。
D(k, n) = min (d(T (kl, Z
t (nl))1≦i5L (Formula 3) where d(*, *) represents the distortion between the feature vectors. Thereby, the optimal feature vector for the reference pattern 15 is selected from among the plurality of candidates. As in the conventional example, g (K, N) obtained in the recurrence formula (Formula 2) is normalized by dividing it by the urbanization distance and output as matching distortion 16. According to the principle of DP matching, if the partial distortion of (Equation 3) is minimized, the matching distortion of the entire batten is minimized. The recognition control circuit 13 sequentially changes the reference pattern in the template memory 14 specified by the address data 17,
The matching distortion 16 output by the baton matching circuit 25 is determined for each reference pattern, and the label of the reference pattern that minimizes the matching distortion 16 is output as the recognition result 18. With respect to the reference pattern that minimizes the matching distortion 16, the vector time series optimally selected from among the multiple noise-free feature vector candidate time series 23 is a speech signal from which noise has been correctly removed from the input speech with superimposed noise. It approximates the noise-free feature vector time series, and the minimum matching distortion corresponds to the matching distortion between the noise-free input speech and the noise-free self-category template.

なお、(式2)によるDPマツチングは一実施例として
挙げたもので、本発明の適用範囲を縛るものではない。
Note that the DP matching according to Equation 2 is given as an example, and does not limit the scope of application of the present invention.

このように、雑音無し多重ベクトル生成回路24では、
雑音が重畳した入力音声の特徴ベクトル時系列4を正し
い雑音除去信号を含む複数の雑音無し特徴ベクトル候補
時系列23に冗長性を持たせて曖昧写像し、多重ベクト
ル認識処理回路26において、雑音無しの参照パタンを
使用して尤度を最大化する処理により自動的に写像量を
最小化する雑音無し特徴ベクトル時系列を選択して認識
処理が行われる。
In this way, in the noise-free multiple vector generation circuit 24,
The feature vector time series 4 of the input speech on which noise is superimposed is ambiguously mapped onto a plurality of noise-free feature vector candidate time series 23 containing correct noise-removed signals with redundancy, and the multiple vector recognition processing circuit 26 processes the noise-free feature vector time series 4 . Recognition processing is performed by automatically selecting a noise-free feature vector time series that minimizes the amount of mapping by maximizing the likelihood using the reference pattern.

従って、この実施例では前述の従来装置の問題点である
ところの、写像誤りによる歪及び複数候補の特徴ベクト
ルを平均することにより生じる歪がなくなり、それらを
原因とする認識率の低下が生じなくなる。
Therefore, this embodiment eliminates distortion caused by mapping errors and distortion caused by averaging feature vectors of multiple candidates, which are the problems of the conventional device described above, and the recognition rate does not decrease due to these. .

以上、単語単位のテンプレートを登1.工する離散単語
音声認識装置を例にとり本発明の説明を行ったが、認識
単位は単語に限定されるものではなく、CV(子音−母
音)、■C■(母音−子音−母音)、CVC(子音−母
音−子音)、単音節、形態素など任意の単位であっても
よく、特徴ベクトルの時系列でテンプレート登録するも
のであれば、上記実施例と同様の効果を奏する。またテ
ンプレートマツチングによる認識方式以外のHM M 
(IliddenMarkov Model)のような
統計的認識手法におい°ζも、参照パタンの代わりに遷
移確率モデルを用い、各遷移確率モデルに対して生成確
率を最大にする雑音無し特徴ベクトルを雑音無し特徴ベ
クトル候補から選択することで同様の効果が得られる。
Above are the word-by-word templates. The present invention has been explained using a discrete word speech recognition device as an example, but the recognition units are not limited to words, but include CV (consonant-vowel), ■C■ (vowel-consonant-vowel), CVC (consonant-vowel-consonant), single syllable, morpheme, or other arbitrary units may be used, and as long as the template is registered in time series of feature vectors, the same effect as in the above embodiment can be achieved. In addition, HM M other than the recognition method using template matching
In statistical recognition methods such as (Ilidden Markov Model), °ζ also uses a transition probability model instead of a reference pattern, and for each transition probability model, a noise-free feature vector that maximizes the generation probability is used as a noise-free feature vector candidate. A similar effect can be obtained by selecting from.

さらに、離散単語音声以外の連続発声された音声につい
ても、その連続発声を登録した音声の単位に切り出すセ
グメンテーション回路を本Withの前段に設置し、切
り出された音声を本装置の人力とすることで通用可能で
ある。また、本発明の実現手段は専用ハードウェアに限
定するものではなく、汎用の計算機や信号処理プロセッ
サにおけるソフトウェア処理によっても実現できること
はいうまでもない。さらに、音声以外のほかの音響信号
や画像信号、文字・図形信号などの認識装置にも、本発
明を拡張し適用することが可能である。
Furthermore, for continuously uttered sounds other than discrete word sounds, a segmentation circuit that cuts out the continuous utterances into registered sound units is installed in the front stage of this With, and the cut out sounds can be used as the human power of this device. It is applicable. Furthermore, it goes without saying that the means for realizing the present invention is not limited to dedicated hardware, but can also be realized by software processing in a general-purpose computer or signal processing processor. Furthermore, the present invention can be expanded and applied to recognition devices for acoustic signals other than voice, image signals, text/graphic signals, and the like.

〔発明の効果〕〔Effect of the invention〕

以上のように本発明によれば、雑音が重畳した入力音声
信号の特徴ベクトル時系列を雑音付加符号帳に従ってベ
クトル量子化し複数のラベル候補時系列を出力するベク
トル量子化手段と、このベクトル量子化手段の出力信号
である複数のラベル候補時系列を上記雑音無し符号帳に
従って逆ベクトル量子化し複数の雑音無し特徴ベクトル
候補時系列を出力する逆ベクトル量子化手段と、この逆
ベクトル量子化手段の出力信号である複数の雑音無し特
徴ベクトル候補時系列を入力し雑音無し特徴ベクトル時
系列を選択して音声認識処理を行う多重ベクトル認識処
理手段とを備えて構成したので、バタンマンチングの際
に参照バタンとの歪を最小化することを基準にして複数
個の雑音無し特徴ベクトルの候補の中から最適な雑音無
し特徴ベクトル時系列を選択でき、これにより雑音除去
処理による音声の特徴ベクトルには歪が生ずることがな
くなり、従って高雑音下においても認識率がきわめて高
いという効果が得られる。
As described above, according to the present invention, there is provided a vector quantization unit that vector quantizes a feature vector time series of an input audio signal on which noise is superimposed according to a noise-added codebook and outputs a plurality of label candidate time series, and a vector quantization unit that outputs a plurality of label candidate time series. an inverse vector quantization means for inverse vector quantizing a plurality of label candidate time series, which are output signals of the means, according to the noiseless codebook and outputting a plurality of noiseless feature vector candidate time series; and an output of the inverse vector quantization means. The multi-vector recognition processing means inputs a plurality of noise-free feature vector candidate time series as signals, selects the noise-free feature vector time series, and performs speech recognition processing, so it can be referred to during slam munching. The optimal noise-free feature vector time series can be selected from among multiple candidates for noise-free feature vectors based on minimizing the distortion with the noise removal process. Therefore, even under high noise conditions, the recognition rate is extremely high.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はこの発明の一実施例に係る音声認識装置の構成
を示すブロック図、第2図は従来の音声認識装置の構成
を示すブロック図である。 6・・・雑音付加符号帳、9・・・雑音無し符号帳、2
0・・・曖昧ベクトル量子化器(ベクトル量子化手段)
、22・・・逆曖昧ベクトル世子化器(逆ベクトル量子
化手段)、26・・・多重ベクトル認識処理回路(多重
ベクトル認識処理手段)。 代理人  大  岩  増  a(ほか2名)手続補正
書(自発) 私′1年5月。9日 3、補正をする者 代表者 志 岐 守 哉 5 補正の対象 特許請求の範囲1発明の詳細な説明の欄。 6 補正の内容 (1)特許請求の範囲を別紙のとおり補正する。 (2)明細書第4頁第3行目に「時系列」とあるのを「
時系列4」と補正する。 (3)同書第4頁第5行目に「符合語の」とあるのを「
符合語を表すラベルで構成される」と補正する。 (4)同書第7頁第14行目乃至第16行目に弐r  
       g(k、n−1)+D(k、n)g(k
、n)= min  g(k−1,n−1)+2XD(
k、n)g(k−1,n)+D(k、n)     J
とあるのを と補正する。 (4)同書第8頁第2行目及び第17頁第9行目に「市
街化距離」とあるのを「市街化距離(に+N) Jと補
正する。 (5)同書第9頁第20行目に「従来の」とあるのを「
従来の雑音除去機能を持つ」と補正する。 (6)同書第10頁第16行目に「符合語のラベルを示
す」とあるのを[符合語を表すラベルで構成される」と
補正する。 (7)同書第11頁第12行目乃至第13行目に「雑音
が重畳した音声信号を入力し、この入力音声信号の」と
あるのを「雑音が重畳した入力音声信号のJと補正する
。 (8)同書第12頁第16行目に「符号帳9」とあるの
を「符号帳6」と補正する。 (9)同書第13頁第7行目乃至第8行目に「マツチン
グ歪」とあるのを「マツチング歪16」と補正する。 (10)同書第14頁第8行目に「入力音声信号」とあ
るのをr入力音声信号2」と補正する。 (11)同書第14頁第19行目に「時系列」とあるの
を「時系列23である」と補正する。 (12)同書第15頁第14行目に「雑音付加符号帳」
とあるのを「雑音付加符号帳6」と補正する。 J裟よ 2、特許請求の範囲 入力音声信号に重畳している雑音を抑圧する雑音除去機
能を持ち人力音声を認識する音声認識装置において、雑
音が重畳していない音声の特徴ベクトルを符号語とする
雑音無し符号帳と、この雑音無し符号帳の各符号語に雑
音重畳と等価な処理を施し生成された雑音付加符号帳と
、雑音が重畳した入力音声信号の特徴ベクトル時系列を
上記雑音付加符号帳に従ってベクトル量子化し符合語に
表−すシレふル]■成より工A複数のラベル候補時系列
を出力するベクトル量子化手段と、このベクトル量子化
手段の出力信号である複数のラベル候補時系列を上記雑
音無し符号帳に従って逆ベクトル量子化し複数の雑音無
し特徴ベクトル候補時系列を出力する逆ベクトル量子化
手段と、この逆ベクトル量子化手段の出力信号である複
数の雑音無し特徴ベクトル候補時系列を入力し雑音無し
特徴ベクトル時系列を選択して音声認識処理を行う多重
ベクトル認識処理手段とを備えたことを特徴とする音声
認識装置。
FIG. 1 is a block diagram showing the structure of a speech recognition device according to an embodiment of the present invention, and FIG. 2 is a block diagram showing the structure of a conventional speech recognition device. 6...Noise-added codebook, 9...Noise-free codebook, 2
0... Ambiguous vector quantizer (vector quantization means)
, 22... Inverse ambiguous vector generator (inverse vector quantization means), 26... Multiple vector recognition processing circuit (multiple vector recognition processing means). Agent Masu Oiwa A (and 2 others) Procedural amendment (voluntary) May 2011. 9th, 3rd, Representative of the person making the amendment: Moriya Shiki 5 Detailed description of claim 1 invention subject to amendment. 6. Contents of amendment (1) The scope of claims will be amended as shown in the attached sheet. (2) In the 3rd line of page 4 of the specification, replace ``chronological order'' with ``
"Time series 4". (3) In the 5th line of page 4 of the same book, the phrase ``code word'' is replaced with ``
It is corrected as "consisting of labels representing code words." (4) 2r on page 7, lines 14 to 16 of the same book.
g(k,n-1)+D(k,n)g(k
, n)=min g(k-1, n-1)+2XD(
k, n) g (k-1, n) + D (k, n) J
I'll correct that. (4) "Urbanization distance" on page 8, line 2 of the same book and page 17, line 9 is corrected to "urbanization distance (+N) J." (5) Same book, page 9, "urbanization distance" In the 20th line, replace "conventional" with "
It has a conventional noise removal function.'' (6) In the same book, page 10, line 16, the phrase ``indicates the label of the code word'' is corrected to ``consists of the label representing the code word.'' (7) On page 11, lines 12 and 13 of the same book, the phrase "input an audio signal superimposed with noise, and correct this input audio signal" is replaced with "J of the input audio signal superimposed with noise". (8) In the same book, page 12, line 16, "Codebook 9" is corrected to "Codebook 6." (9) In the same book, page 13, line 7 to line 8, the phrase "matching distortion" is corrected to "matching distortion 16." (10) In the same book, page 14, line 8, "input audio signal" is corrected to read "r input audio signal 2". (11) In the same book, page 14, line 19, the phrase "time series" is corrected to "time series 23." (12) “Noise-added codebook” on page 15, line 14 of the same book.
The text has been corrected to read "noise-added codebook 6." J裟YO 2, Claims In a speech recognition device that recognizes human speech and has a noise removal function that suppresses noise superimposed on an input speech signal, a feature vector of speech on which no noise is superimposed is used as a code word. A noise-free codebook, a noise-added codebook generated by performing processing equivalent to noise superimposition on each code word of this noise-free codebook, and a noise-added codebook that generates the feature vector time series of the noise-superimposed input speech signal. A vector quantization means that outputs a plurality of label candidate time series, and a plurality of label candidates that are output signals of this vector quantization means. Inverse vector quantization means for inverse vector quantization of a time series according to the noiseless codebook and outputting a plurality of noiseless feature vector candidate time series, and a plurality of noiseless feature vector candidates which are output signals of the inverse vector quantization means. A speech recognition device comprising: multi-vector recognition processing means for inputting a time series, selecting a noise-free feature vector time series, and performing speech recognition processing.

Claims (1)

【特許請求の範囲】[Claims] 入力音声信号に重畳している雑音を抑圧する雑音除去機
能を持ち入力音声を認識する音声認識装置において、雑
音が重畳していない音声の特徴ベクトルを符号語とする
雑音無し符号帳と、この雑音無し符号帳の各符号語に雑
音重畳と等価な処理を施し生成された雑音付加符号帳と
、雑音が重畳した入力音声信号の特徴ベクトル時系列を
上記雑音付加符号帳に従ってベクトル量子化し符号語の
ラベルを示す複数のラベル候補時系列を出力するベクト
ル量子化手段と、このベクトル量子化手段の出力信号で
ある複数のラベル候補時系列を上記雑音無し符号帳に従
って逆ベクトル量子化し複数の雑音無し特徴ベクトル候
補時系列を出力する逆ベクトル量子化手段と、この逆ベ
クトル量子化手段の出力信号である複数の雑音無し特徴
ベクトル候補時系列を入力し雑音無し特徴ベクトル時系
列を選択して音声認識処理を行う多重ベクトル認識処理
手段とを備えたことを特徴とする音声認識装置。
In a speech recognition device that recognizes input speech and has a noise removal function that suppresses noise superimposed on an input speech signal, a noise-free codebook that uses feature vectors of speech on which no noise is superimposed as codewords, and this noise A noise-added codebook is generated by applying processing equivalent to noise superimposition to each codeword of the non-codebook, and the feature vector time series of the input speech signal with noise superimposed is vector quantized according to the above-mentioned noise-added codebook, and the codewords are A vector quantization means for outputting a plurality of label candidate time series indicating labels, and a plurality of label candidate time series, which are output signals of the vector quantization means, are inverse vector quantized according to the noiseless codebook to generate a plurality of noiseless features. An inverse vector quantization means that outputs a vector candidate time series, and a plurality of noise-free feature vector candidate time series that are output signals of this inverse vector quantization means are input, and a noise-free feature vector time series is selected to perform speech recognition processing. 1. A speech recognition device comprising: multi-vector recognition processing means for performing.
JP63104769A 1988-04-27 1988-04-27 Voice recognizer Expired - Fee Related JPH0766271B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63104769A JPH0766271B2 (en) 1988-04-27 1988-04-27 Voice recognizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63104769A JPH0766271B2 (en) 1988-04-27 1988-04-27 Voice recognizer

Publications (2)

Publication Number Publication Date
JPH01274198A true JPH01274198A (en) 1989-11-01
JPH0766271B2 JPH0766271B2 (en) 1995-07-19

Family

ID=14389682

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63104769A Expired - Fee Related JPH0766271B2 (en) 1988-04-27 1988-04-27 Voice recognizer

Country Status (1)

Country Link
JP (1) JPH0766271B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04298797A (en) * 1991-03-08 1992-10-22 Mitsubishi Electric Corp Voice recognition device
US5201004A (en) * 1990-05-22 1993-04-06 Nec Corporation Speech recognition method with noise reduction and a system therefor
CN112002307A (en) * 2020-08-31 2020-11-27 广州市百果园信息技术有限公司 Voice recognition method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5201004A (en) * 1990-05-22 1993-04-06 Nec Corporation Speech recognition method with noise reduction and a system therefor
JPH04298797A (en) * 1991-03-08 1992-10-22 Mitsubishi Electric Corp Voice recognition device
CN112002307A (en) * 2020-08-31 2020-11-27 广州市百果园信息技术有限公司 Voice recognition method and device
CN112002307B (en) * 2020-08-31 2023-11-21 广州市百果园信息技术有限公司 Voice recognition method and device

Also Published As

Publication number Publication date
JPH0766271B2 (en) 1995-07-19

Similar Documents

Publication Publication Date Title
KR101201146B1 (en) Method of noise reduction using instantaneous signal-to-noise ratio as the principal quantity for optimal estimation
KR101224755B1 (en) Multi-sensory speech enhancement using a speech-state model
EP0689194A1 (en) Method of and apparatus for signal recognition that compensates for mismatching
JP2004264816A (en) Method of iterative noise estimation in recursive framework
JPH05216490A (en) Apparatus and method for speech coding and apparatus and method for speech recognition
JP2002140089A (en) Method and apparatus for pattern recognition training wherein noise reduction is performed after inserted noise is used
KR20050000541A (en) Method of determining uncertainty associated with noise reduction
CN110335608B (en) Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
KR20040088368A (en) Method of speech recognition using variational inference with switching state space models
US6502073B1 (en) Low data transmission rate and intelligible speech communication
JP2019008120A (en) Voice quality conversion system, voice quality conversion method and voice quality conversion program
JP7335460B2 (en) clear text echo
JP2024529889A (en) Robust direct speech-to-speech translation
JP3703394B2 (en) Voice quality conversion device, voice quality conversion method, and program storage medium
US20230081543A1 (en) Method for synthetizing speech and electronic device
CN115240696B (en) Speech recognition method and readable storage medium
JPH01274198A (en) Speech recognition device
Shanthamallappa et al. Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review
US20220310061A1 (en) Regularizing Word Segmentation
Kurian et al. Connected digit speech recognition system for Malayalam language
WO2020166359A1 (en) Estimation device, estimation method, and program
JP2004191968A (en) Method and device for separating signal source
KR100353858B1 (en) Method for generating context-dependent phonelike units for speech recognition
US11335321B2 (en) Building a text-to-speech system from a small amount of speech data
JP2961916B2 (en) Voice recognition device

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees