JPH04298797A

JPH04298797A - Voice recognition device

Info

Publication number: JPH04298797A
Application number: JP3043517A
Authority: JP
Inventors: Toshiyuki Hanazawa; 利行花沢
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 1992-10-22
Anticipated expiration: 2014-10-12
Also published as: JP2961916B2

Abstract

PURPOSE:To obtain the voice recognition device which performs a noise removing process without averaging the feature vector itself of a voice and does not decrease in the recognition ratio of the voice even in noisy environment as to a voice recognition device which has a noise removing function for suppressing a noise superposed on an input voice signal and recognizes the input voice. CONSTITUTION:A proximate code word selector 20 outputs plural proximate code word candidates 21 which have short distance to the input voice signal and distance candidates 22 between the respective proximate code word candidates and input voice signal from a noise addition code book 6. A moving vector calculator 23 calculates moving vector candidates 24 of the proximate code word candidates before and after noise superposition. A noise removal unit 25 inputs the moving vector candidates 24, distance candidates, and the feature vector 4 of the input voice signal and corrects the movement of the feature vector of the input voice signal caused by the noise superposition by using the load mean of the moving vector candidates to remove the noise.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】この発明は入力音声信号に重畳し
た雑音を抑圧する雑音除去機能を有する耐騒音性を備え
た音声認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise-resistant speech recognition device having a noise removal function for suppressing noise superimposed on an input speech signal.

【０００２】0002

【従来の技術】音声のスペクトル情報を用いた音声認識
では、雑音重畳による音声のスペクトルの変形は認識性
能を著しく低下させる。それゆえ、音声認識装置を実用
に供するためには雑音に対する耐性の向上は重要な問題
である。環境騒音による雑音混入を抑えるためにノイズ
キャンセルマイクがよく使用されているが、それでも十
分な信号対雑音比（Ｓ／Ｎ）が得られない場合や音声信
号の伝送過程において雑音が重畳する場合があり、この
ような既に雑音の混入した音声信号から雑音のみを除去
しＳ／Ｎを改善しようとする信号処理技術は雑音抑圧・
雑音除去・音声強調などと呼ばれ、多数の方式提案がな
されている。2. Description of the Related Art In speech recognition using speech spectrum information, deformation of the speech spectrum due to noise superposition significantly reduces recognition performance. Therefore, improving the resistance to noise is an important issue in putting speech recognition devices into practical use. Noise-cancelling microphones are often used to suppress noise contamination caused by environmental noise, but there are still cases where a sufficient signal-to-noise ratio (S/N) cannot be obtained or noise is superimposed during the audio signal transmission process. However, the signal processing technology that attempts to improve the S/N by removing only the noise from the audio signal that already contains noise is known as noise suppression/
Many methods have been proposed, called noise removal and speech enhancement.

【０００３】近時、新しい概念に基づく雑音抑圧法とし
て、ベクトル量子化を用いて生成された雑音重畳信号空
間と雑音無し信号空間の既知の写像関係により、雑音を
除去するＳｐｅｃｔｒａｌ　ｍａｐｐｉｎｇ法と称する
方式が論文「Ｂｉｉｎｇ−Ｈｗａｎｇ　Ｊａｎｇ，　Ｌ
．Ｒ．Ｒａｂｉｎｅｒ，■Ｓｉｇｎａｌ　Ｒｅｓｔｒｕ
ｃｔｉｏｎ　ｂｙ　Ｓｐｅｃｔｒａｌ　Ｍａｐｐｉｎｇ
■，　１９８７，　ＩＥＥＥ　ＩＮＴＥＲＮＡＴＩＯＮ
ＡＬ　ＣＯＮＦＥＲＥＮＣＥ　ＯＮ　ＡＣＯＵＳＴＩＣ
Ｓ，　ＳＰＥＥＣＨ　ＡＮＤ　ＳＩＧＮＡＬＰＲＯＣＥ
ＳＳＩＮＧＳ．，　Ｖｏｌｕｍｅ　４，　ＰＰ．６．６
．１−６．６．４，　Ａｐｒｉｌ　１９８７，　Ｄａｌ
ｌａｓ」（以下ではこの論文を文献［１］と引用する）
において提案されており、音声伝送用雑音抑圧方式とし
て有効とされている。[0003]Recently, as a noise suppression method based on a new concept, a method called the spectral mapping method is used to remove noise using a known mapping relationship between a noise-superimposed signal space and a noise-free signal space generated using vector quantization. published the paper “Biing-Hwang Jang, L.
．． R. Rabiner, ■Signal Restru
cation by Spectral Mapping
■, 1987, IEEE INTERNATION
AL CONFERENCE ON ACUSTIC
S, SPEECH AND SIGNAL PROCE
SSINGS. , Volume 4, PP. 6.6
．． 1-6.6.4, April 1987, Dal
(Hereinafter, this paper will be cited as Reference [1])
This method was proposed in 2013, and is said to be effective as a noise suppression method for voice transmission.

【０００４】このＳｐｅｃｔｒａｌ　Ｍａｐｐｉｎｇ法
による雑音除去回路を組み込んだ音声認識装置の構成例
を図２に示す。認識装置における認識方式は種々あるが
、単語単位のテンプレートを持ち、ＤＰ（Ｄｙｎａｍｉ
ｃ　ｐｒｏｇｒａｍｍｉｇ）マッチングによる孤立単語
認識装置を例として説明する。FIG. 2 shows an example of the configuration of a speech recognition device incorporating a noise removal circuit using the spectral mapping method. There are various recognition methods in recognition devices, but they have word-based templates and DP (Dynami
An example of an isolated word recognition device using matching will be described.

【０００５】図２において、１は音声信号の入力端、２
は入力音声信号、３は入力音声信号２を音響分析する分
析回路、４は特徴ベクトル、５は分析回路から出力され
る特徴ベクトル４を雑音付加符号帳６でベクトル量子化
するベクトル量子化器、７は符号語を表すラベルで構成
されるラベル候補、８はベクトル量子化器５の出力であ
るところのラベル候補７を雑音なし符号帳９で逆ベクト
ル量子化する逆ベクトル量子化器、１０は雑音無し特徴
ベクトル、１１はベクトル量子化器５と雑音付加符号帳
６と逆ベクトル量子化器８と雑音無し符号帳９とで構成
される雑音除去回路である。１２は雑音無し特徴ベクト
ル１０の時系列と、テンプレートメモリ１４から出力さ
れる雑音無し特徴ベクトルの時系列で表現される参照パ
タン１５とのＤＰマッチングを行い、マッチング歪１６
を出力するパタンマッチング回路、１３はパタンマッチ
ングにおける参照パタンの指定と認識結果の出力を行う
認識制御回路、１７は認識制御回路１３が参照パタンを
指定するためにテンプレートメモリ１４に送るアドレス
データ、１８は認識結果、１９はパタンマッチング回路
１２とテンプレートメモリ１４と認識制御回路１３とで
構成される認識処理回路である。In FIG. 2, 1 is an audio signal input terminal;
3 is an input audio signal, 3 is an analysis circuit that acoustically analyzes the input audio signal 2, 4 is a feature vector, and 5 is a vector quantizer that vector-quantizes the feature vector 4 output from the analysis circuit using a noise-added codebook 6. 7 is a label candidate consisting of a label representing a code word; 8 is an inverse vector quantizer that performs inverse vector quantization of the label candidate 7, which is the output of the vector quantizer 5, using a noiseless codebook 9; and 10 is an inverse vector quantizer The noise-free feature vector 11 is a noise removal circuit composed of a vector quantizer 5, a noise-added codebook 6, an inverse vector quantizer 8, and a noise-free codebook 9. 12 performs DP matching between the time series of the noise-free feature vectors 10 and the reference pattern 15 expressed as the time series of the noise-free feature vectors output from the template memory 14, and creates a matching distortion 16.
13 is a recognition control circuit that specifies a reference pattern in pattern matching and outputs a recognition result; 17 is address data that the recognition control circuit 13 sends to the template memory 14 to specify a reference pattern; 18 is a recognition result, and 19 is a recognition processing circuit composed of a pattern matching circuit 12, a template memory 14, and a recognition control circuit 13.

【０００６】次に動作について説明する。雑音除去回路
１１において、雑音無し符号帳９は雑音が重畳していな
い音声の特徴ベクトルを符号語として構成され、雑音付
加符号帳６は雑音無し符号帳９の各符号語に対し、雑音
重畳入力音声信号の雑音様態となるような処理、例えば
、雑音重畳入力音声信号２と同一の信号対雑音比になる
ように、時間波形領域で雑音を付加し、これを再分析し
て雑音付加特徴ベクトルに変換することで生成した符号
帳で構成される。入力端１に入力された雑音が重畳した
入力音声信号２は分析回路３で音響分析され、特徴ベク
トル４であるＸ（ｋ）（ｋは時系列の番号を表す。ｋ＝
１，２，．．．，Ｋ）として出力される。Next, the operation will be explained. In the noise removal circuit 11, the noise-free codebook 9 is configured with speech feature vectors on which no noise is superimposed as codewords, and the noise-added codebook 6 is configured with noise-superimposed input for each codeword of the noise-free codebook 9. Processing that makes the speech signal noise-like, for example, adding noise in the time waveform domain so that the signal-to-noise ratio is the same as the noise superimposed input speech signal 2, and reanalyzing this to create a noise-added feature vector. It consists of a codebook generated by converting to . The input audio signal 2 with superimposed noise input to the input terminal 1 is acoustically analyzed by the analysis circuit 3, and the feature vector 4 is X(k) (k represents a time series number. k=
1, 2, . ．．．． , K).

【０００７】雑音除去回路１１において、ベクトル量子
化器５は特徴ベクトル４であるＸ（ｋ）を入力とし、雑
音付加符号帳６の全符号語との距離を求め、距離の小さ
い方から第Ｍ位までの符号語を表すラベル候補Ｌｎ（ｋ
）＝｛ｌｎｉ（ｋ）　｜　ｉ　＝　１，２，．．．，Ｍ
｝として出力する。Ｍは１または２以上の整数である。In the noise removal circuit 11, the vector quantizer 5 inputs the feature vector 4, X(k), calculates the distances to all the code words in the noise-added codebook 6, and selects the M-th one from the one with the smallest distance. Label candidate Ln(k
) = {lni(k) | i = 1, 2, . ．．．． ,M
}. M is an integer of 1 or 2 or more.

【０００８】逆ベクトル量子化器８はラベル候補７であ
るＬｎ（ｋ）　＝｛ｌｎｉ（ｋ）　｜　ｉ　＝　１，２
，．．．，Ｍ｝を雑音無し符号帳９で逆ベクトル量子化
し特徴ベクトル候補｛ｖｃｉ（ｋ）　｜　ｉ　＝１，２
，．．．，Ｍ｝を求め、このＭ個の特徴ベクトル候補｛
ｖｃｉ（ｋ）　｜ｉ　＝　１，２，．．．，Ｍ｝の平均
ベクトルとして雑音無し特徴ベクトルＹ（ｋ）を求める
。すなわち、The inverse vector quantizer 8 calculates the label candidate 7 Ln(k) = {lni(k) | i = 1,2
、．．．．． , M} is inverse vector quantized using the noise-free codebook 9 to create a feature vector candidate {vci(k) | i =1,2
、．．．．． , M}, and calculate these M feature vector candidates {
vci(k) |i = 1, 2, . ．．．． , M}, a noise-free feature vector Y(k) is obtained. That is,

【０００９】[0009]

【数１】[Math 1]

【００１０】認識処理回路１９において、テンプレート
メモリ１４は認識制御１３が出力するアドレスデータ１
７で指定される参照パタン１５である｛Ｔ（ｌ）　｜　
ｌ　＝　１，２，．．．，Ｌ｝（Ｌは特徴ベクトルの数
）をパタンマッチング回路１２に送出する。パタンマッ
チング回路１２は上記参照パタン１５である｛Ｔ（ｌ）
　｜　ｌ　＝　１，２，．．．，Ｌ｝と雑音除去回路１
１の出力であるところの雑音無し特徴ベクトル１０の時
系列である｛Ｙ（ｋ）　｜　ｋ　＝　１，２，．．．，
Ｋ｝とのＤＰマッチングを行う。ＤＰマッチングの漸化
式は例えば次のようになる。In the recognition processing circuit 19, the template memory 14 stores address data 1 output from the recognition control 13.
The reference pattern 15 specified by 7 is {T(l) |
l = 1, 2, . ．．．． , L} (L is the number of feature vectors) to the pattern matching circuit 12. The pattern matching circuit 12 uses the above reference pattern 15 {T(l)
| l = 1, 2, . ．．．． , L} and noise removal circuit 1
{Y(k) | k = 1, 2, . ．．．．，
Perform DP matching with K}. For example, the recurrence formula for DP matching is as follows.

【００１１】[0011]

【数２】[Math 2]

【００１２】ここで、ｇ（ｋ，ｌ）は特徴ベクトルＹ（
ｋ）と特徴ベクトルＴ（ｌ）との歪である。上式は一例
として傾斜制限なしのＤＰマッチングの場合を挙げたも
のである。この漸化式（式２）においてＧ（Ｋ，Ｌ）を
、上記参照パタン１５である｛Ｔ（ｌ）　｜　ｌ＝　１
，２，．．．，Ｌ｝と雑音除去回路１１の出力であると
ころの雑音無し特徴ベクトル１０の時系列である｛Ｙ（
ｋ）　｜　ｋ　＝　１，２，．．．，Ｋ｝との時系列長
の和である（Ｋ＋Ｌ）で割ることで正規化し、マッチン
グ歪１６として出力する。Here, g(k,l) is the feature vector Y(
k) and the feature vector T(l). The above equation takes as an example the case of DP matching without slope restriction. In this recurrence formula (Equation 2), G(K,L) is the reference pattern 15 described above, {T(l) | l= 1
,2,. ．．．． , L} and {Y(
k) | k = 1, 2, . ．．．． , K}, which is the sum of the time series lengths (K+L), is normalized and output as matching distortion 16.

【００１３】認識制御回路１３は、アドレスデータ１７
で指定する参照パタンを順次変え、各参照パタンについ
てパタンマッチング回路１２が出力するマッチング歪１
６からマッチング歪を最小とする参照パタンのラベルを
認識結果１８として出力する。The recognition control circuit 13 receives address data 17
The matching distortion 1 outputted by the pattern matching circuit 12 for each reference pattern by sequentially changing the reference pattern specified by
6, the label of the reference pattern that minimizes the matching distortion is output as the recognition result 18.

【００１４】[0014]

【発明が解決しようとする課題】Ｓｐｅｃｔｒａｌ　Ｍ
ａｐｐｉｎｇ法による雑音除去において根幹をなす、雑
音重畳信号空間雑音無し信号空間との写像関係は、雑音
無し符号帳９に雑音重畳と等価な処理を加えて雑音付加
符号帳６を作ることで形成される。しかし入力音声信号
２に重畳する雑音のレベルが上がるに従い、入力音声信
号２のスペクトル包絡は平坦化する。そのため分析回路
３より出力される特徴ベクトル時系列４に含まれる音韻
特徴性が低下し、上記雑音付加符号帳６でベクトル量子
化した場合、正しい写像による符号語にベクトル量子化
される可能性が低くなり、前記の写像誤りは甚だしく増
加する。例えば雑音除去回路１１におけるラベル候補の
数Ｍを１にすると、写像誤りが生じた場合、その写像誤
りによる歪がそのままパタンマッチングにおけるマッチ
ング歪１６に反映するので、重畳雑音のレベルが上がる
に従い認識性能は急激に低下する。そこでＭを２以上の
数にして候補数を増やす改良策を前記従来技術では使用
している。これにより複数の候補の中に正しい写像によ
る候補が含まれる確率が高くなるが、その複数の候補の
中のどの候補が正しい写像による候補なのかわからない
。そのため複数の候補の特徴ベクトルの平均を出力とす
る方法を使用している。しかしこの平均化処理により、
本来選ばれるべき候補の特徴ベクトルに誤った候補の特
徴ベクトルの成分が混入するため、音韻特徴性がぼけ、
認識性能が低下する。従って、従来技術は音声伝送にお
いて聴覚上の信号対雑音比を改善する効果はあるが、音
声認識に適用する場合は効果が少ない。このように従来
の雑音除去機能を持つ音声認識装置では、雑音除去処理
によって音声の特徴ベクトルが平均化されるため音韻特
徴性がぼけ、高雑音下においては音声の認識率が低下す
るという問題点があった。[Problem to be solved by the invention] Spectral M
The mapping relationship between the noise-superimposed signal space and the noise-free signal space, which is the basis of noise removal using the appping method, is formed by creating the noise-added codebook 6 by adding processing equivalent to noise-superimposition to the noise-free codebook 9. Ru. However, as the level of noise superimposed on the input audio signal 2 increases, the spectral envelope of the input audio signal 2 flattens. Therefore, the phonological characteristics included in the feature vector time series 4 output from the analysis circuit 3 are reduced, and when vector quantization is performed using the noise-added codebook 6, there is a possibility that the vector quantization will be performed into a code word by correct mapping. The mapping error increases significantly. For example, when the number M of label candidates in the noise removal circuit 11 is set to 1, when a mapping error occurs, the distortion due to the mapping error is directly reflected in the matching distortion 16 in pattern matching, so as the level of superimposed noise increases, the recognition performance increases. decreases rapidly. Therefore, the prior art uses an improvement measure to increase the number of candidates by setting M to a number greater than or equal to 2. This increases the probability that a correct mapping candidate will be included among the plurality of candidates, but it is not known which one of the plurality of candidates is the correct mapping candidate. Therefore, a method is used in which the average of feature vectors of multiple candidates is output. However, due to this averaging process,
Because components of the feature vector of the incorrect candidate are mixed into the feature vector of the candidate that should have been selected, the phonological distinctiveness is blurred.
Recognition performance deteriorates. Therefore, although the conventional technology is effective in improving the auditory signal-to-noise ratio in speech transmission, it is less effective when applied to speech recognition. As described above, conventional speech recognition devices with a noise removal function have the problem that the speech feature vectors are averaged during the noise removal process, which blurs the phonological features and reduces the speech recognition rate under high noise conditions. was there.

【００１５】本発明は上記のような問題点を解決するた
めになされたもので、雑音除去処理によって音声の特徴
ベクトルを平均化させず、高雑音下においても音声の認
識率が低下しない音声認識装置を得ることを目的とする
。The present invention has been made to solve the above-mentioned problems, and provides speech recognition in which the speech feature vectors are not averaged through noise removal processing, and the speech recognition rate does not decrease even under high noise. The purpose is to obtain equipment.

【００１６】[0016]

【課題を解決するための手段】この発明に係る音声認識
装置は、雑音が重畳していない音声の特徴ベクトルを符
号語とする雑音無し符号帳と、この雑音無し符号帳の各
符号語に雑音重畳と等価な処理を施し生成された雑音付
加符号帳と、この雑音付加符号帳の中から入力音声信号
との距離が小さい複数の近傍符号語候補およびこの各近
傍符号語候補と入力音声信号との距離候補を出力する近
傍符号語選択手段と、上記近傍符号語の雑音重畳前と雑
音重畳後との移動ベクトルを計算する移動ベクトル計算
手段と、この移動ベクトル計算手段の出力信号である複
数の移動ベクトル候補と上記近傍符号語選択手段の出力
信号である上記距離候補と上記雑音重畳入力音声信号の
特徴ベクトルを入力とし、上記移動ベクトル候補の荷重
平均を用いて、雑音重畳によってもたらされる入力音声
信号の特徴ベクトルの移動を補正し雑音を除去する手段
を備えたものである。[Means for Solving the Problems] A speech recognition device according to the present invention includes a noise-free codebook whose codewords are feature vectors of speech on which no noise is superimposed, and a noise-free codebook in which each codeword of the noise-free codebook has a noise-free codebook. A noise-added codebook generated by processing equivalent to superposition, a plurality of neighboring codeword candidates from this noise-added codebook whose distance to the input speech signal is small, and the relationship between each neighboring codeword candidate and the input speech signal. a nearby code word selection means for outputting a distance candidate of the above-mentioned neighborhood code word; a movement vector calculation means for calculating a movement vector of the neighboring code word before and after noise superimposition; The motion vector candidate, the distance candidate which is the output signal of the neighboring code word selection means, and the feature vector of the noise-superimposed input speech signal are input, and the weighted average of the motion vector candidate is used to calculate the input speech resulting from the noise superposition. It is equipped with means for correcting the movement of the signal feature vector and removing noise.

【００１７】[0017]

【作用】この発明おける雑音除去手段は、移動ベクトル
計算手段の出力信号である複数の移動ベクトル候補の荷
重平均を用いて、雑音重畳によってもたらされる入力音
声信号の特徴ベクトルの移動を補正することによって雑
音を除去するので、特徴ベクトル自体を平均化すること
なく雑音が除去できる。[Operation] The noise removal means of the present invention uses a weighted average of a plurality of movement vector candidates, which are output signals of the movement vector calculation means, to correct movement of the feature vector of the input speech signal caused by noise superposition. Since noise is removed, the noise can be removed without averaging the feature vectors themselves.

【００１８】[0018]

【実施例】図１はこの発明の一実施例に係る音声認識装
置の構成を示すブロック図である。図１において、図２
に示す構成要素に対応するものには同一の参照符を付し
、その説明を省略する。この実施例は、従来例と同様、
単語単位のテンプレートとのＤＰマッチングにより認識
を行う孤立単語認識装置を例として説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the configuration of a speech recognition apparatus according to an embodiment of the present invention. In Figure 1, Figure 2
Components corresponding to those shown in are given the same reference numerals, and their explanations will be omitted. This embodiment, like the conventional example,
An example of an isolated word recognition device that performs recognition by DP matching with a word-by-word template will be described.

【００１９】図１において、２０は雑音付加符号帳６の
中から入力音声信号の特徴ベクトル４との距離が小さい
複数の近傍符号語候補２１および各近傍符号語候補と入
力音声信号との距離候補２２を出力する近傍符号語選択
器である。ここで距離は例えば入力音声信号の特徴ベク
トル４とのユークリッド距離である。２３は近傍符号語
選択器２０の出力信号である近傍符号語候補２１の雑音
重畳前と雑音重畳後の移動ベクトル候補２４を計算する
移動ベクトル算出器である。２５は移動ベクトル候補２
４と近傍符号語選択器２０の出力信号である距離候補２
２と分析回路３の出力信号である特徴ベクトル４を入力
とし、雑音除去された特徴ベクトル２７を出力する雑音
除去器である。２６は雑音付加符号帳６と雑音無し符号
帳９と近傍符号語選択器２０と移動ベクトル算出器２３
と雑音除去器２５とで構成される雑音除去回路である。In FIG. 1, reference numeral 20 denotes a plurality of neighboring codeword candidates 21 having a small distance from the feature vector 4 of the input speech signal from the noise-added codebook 6, and distance candidates between each neighboring codeword candidate and the input speech signal. This is a neighborhood code word selector that outputs 22. Here, the distance is, for example, the Euclidean distance to the feature vector 4 of the input audio signal. Reference numeral 23 denotes a movement vector calculator that calculates a movement vector candidate 24 of the neighboring codeword candidate 21, which is the output signal of the neighboring codeword selector 20, before and after noise superimposition. 25 is movement vector candidate 2
4 and the distance candidate 2 which is the output signal of the neighborhood code word selector 20
2 and a feature vector 4 which is the output signal of the analysis circuit 3 as inputs, and outputs a feature vector 27 from which noise has been removed. 26 is a noise-added codebook 6, a noise-free codebook 9, a neighboring codeword selector 20, and a movement vector calculator 23.
This is a noise removal circuit composed of a noise remover 25 and a noise remover 25.

【００２０】この発明に係る音声認識装置の雑音除去方
式の原理は以下のようである。雑音付加符号帳６は、雑
音無し符号帳９の各符号語に対し雑音重畳入力音声信号
の雑音様態となるような処理、例えば、雑音重畳入力音
声信号２と同一の信号対雑音比になるように、時間波形
領域で雑音を付加し、これを再分析して雑音付加特徴ベ
クトルに変換することで得られるが、この雑音重畳の過
程を各符号語の特徴空間上での移動とみなすことができ
る。各符号語の特徴ベクトルは一般に時間波形領域から
非線形な処理を施して得られるので、雑音重畳による移
動は各符号語ごとに異なったものになっている。この各
符号語ごとの特徴空間上での移動を移動ベクトルと呼ぶ
ことにする。雑音が重畳した符号語から雑音成分を除去
するには、雑音重畳によってもたらされる移動を補正す
る、すなわち移動ベクトルを減算すればよい。入力音声
の特徴ベクトルＸ（ｋ）は雑音付加符号帳６の符号語と
は完全には一致しないが、雑音重畳によってもたらされ
る入力音声の特徴ベクトルＸ（ｋ）の移動は、Ｘ（ｋ）
の近傍の符号語の移動ベクトルに近いものと考えてよい
。そこで本発明に係る音声認識装置では雑音付加符号帳
６の中から選択された入力音声の特徴ベクトルＸ（ｋ）
の近傍のＭ個の符号語｛ｖｎｉ（ｋ）　｜　ｉ　＝　１
，２，．．．，Ｍ｝の移動ベクトルの荷重平均を減ずる
ことによって雑音除去を行う。The principle of the noise removal method of the speech recognition device according to the present invention is as follows. The noise-added codebook 6 processes each codeword of the noise-free codebook 9 so that it becomes the noise form of the noise-superimposed input speech signal, for example, so that it has the same signal-to-noise ratio as the noise-superimposed input speech signal 2. This can be obtained by adding noise in the temporal waveform domain, reanalyzing this, and converting it to a noise-added feature vector, but this process of noise superposition can be regarded as movement of each codeword in the feature space. can. Since the feature vector of each codeword is generally obtained by performing nonlinear processing from the time waveform domain, the movement due to noise superposition is different for each codeword. This movement in the feature space for each code word will be called a movement vector. In order to remove a noise component from a code word on which noise is superimposed, it is sufficient to correct the movement caused by the noise superposition, that is, to subtract the movement vector. Although the feature vector X(k) of the input speech does not completely match the code word of the noise-added codebook 6, the movement of the feature vector X(k) of the input speech caused by noise superposition is
can be considered to be close to the movement vector of the code word in the vicinity of . Therefore, in the speech recognition device according to the present invention, the feature vector X(k) of the input speech selected from the noise-added codebook 6 is
M codewords in the neighborhood of {vni(k) | i = 1
,2,. ．．．． , M}, the noise is removed by subtracting the weighted average of the movement vectors.

【００２１】次にこの実施例の動作について説明する。近傍符号語選択器２０は、雑音付加符号帳６の中から入
力音声信号４であるｘ（ｋ）との距離が近い近傍Ｍ個の
符号語候補２１であるＶｎ（ｋ）　＝｛ｖｎｉ（ｋ）｜
ｉ＝　１，２，．．．，Ｍ｝および前記近傍符号語候補
と入力音声信号との距離候補２２であるＤ（ｋ）＝｛ｄ
ｉ（ｋ）｜ｉ　＝　１，２，．．．，Ｍ｝を出力する。Next, the operation of this embodiment will be explained. The neighborhood codeword selector 20 selects M neighborhood codeword candidates 21 from the noise-added codebook 6 that are close to x(k), which is the input speech signal 4, Vn(k) = {vni(k). )｜
i=1,2,. ．．．． , M} and D(k)={d
i(k) | i = 1, 2, . ．．．． , M} is output.

【００２２】移動ベクトル算出器２３は近傍符号語選択
器２０の出力信号である近傍符号語候補２１、即ち、Ｖ
ｎ（ｋ）＝｛ｖｎｉ（ｋ）｜ｉ　＝　１，２，．．．，
Ｍ｝と、雑音無し符号帳９の符号語を入力とし、上記近
傍符号語候補２１の雑音重畳前と雑音重畳後との移動ベ
クトル候補２４であるＢ（ｋ）＝｛ｂｉ（ｋ）｜ｉ　＝
　１，２，．．．，Ｍ｝を計算する。上記移動ベクトル
候補２４であるＢ（ｋ）＝｛ｂｉ（ｋ）｜ｉ　＝　１，
２，．．．，Ｍ｝は以下のように計算される。The movement vector calculator 23 uses the nearby code word candidate 21, which is the output signal of the nearby code word selector 20, that is, V
n(k)={vni(k)|i=1,2,. ．．．．，
M} and the code word of the noise-free codebook 9 are input, and B(k) = {bi(k)|i which is the movement vector candidate 24 before and after noise superimposition of the neighboring code word candidate 21 =
1, 2,. ．．．． , M}. The movement vector candidate 24 is B(k)={bi(k)|i=1,
2,. ．．．． , M} is calculated as follows.

【００２３】[0023]

【数３】[Math 3]

【００２４】ここでｖｃｉ（ｋ）は雑音無し符号帳９の
符号語であり、ｖｎｉ（ｋ）は雑音付加符号帳６の符号
語である。Here, vci(k) is a codeword of the noiseless codebook 9, and vni(k) is a codeword of the noise-added codebook 6.

【００２５】雑音除去器２５は複数の移動ベクトル候補
２４と近傍符号語選択器２０の出力信号である距離候補
２２と分析回路３の出力信号である特徴ベクトル４を入
力とし、雑音除去された特徴ベクトル２７であるＺ（ｋ
）を出力する。前記雑音除去された特徴ベクトル２７で
あるＺ（ｋ）は以下のように計算される。The noise remover 25 inputs a plurality of movement vector candidates 24, distance candidates 22 which are the output signals of the neighborhood code word selector 20, and feature vectors 4 which are the output signals of the analysis circuit 3, and calculates the noise-removed features. Z(k
) is output. Z(k), which is the noise-removed feature vector 27, is calculated as follows.

【００２６】[0026]

【数４】[Math 4]

【００２７】[0027]

【数５】[Math 5]

【００２８】ここでｗｉは移動ベクトル候補Ｂ（ｋ）＝
｛ｂｉ（ｋ）｜　ｉ　＝　１，２，．．．，Ｍ｝に対す
る荷重係数であり、ｐは荷重平均の重みを決める定数で
あり、例えばｐ＝１である。Here, wi is the movement vector candidate B(k)=
{bi(k)| i = 1, 2, . ．．．． , M}, and p is a constant that determines the weight of the weighted average, for example, p=1.

【００２９】上記雑音除去を施された特徴ベクトル２７
であるＺ（ｋ）はパタンマッチング回路１２に送られ、
認識処理にはいる。以降の処理は従来の技術の項で述べ
た方法と全く同等なので説明は省略する。Feature vector 27 subjected to the above noise removal
Z(k) is sent to the pattern matching circuit 12,
Enter recognition processing. Since the subsequent processing is completely equivalent to the method described in the section of the prior art, the explanation will be omitted.

【００３０】以上のように、この実施例では特徴ベクト
ルの移動という視点から雑音除去を行うので、前述の従
来装置の問題点である、複数候補の特徴ベクトル自体を
平均することにより生じる音韻性のぼけが低減され、認
識率の著しい低下を抑えることができる。As described above, in this embodiment, noise is removed from the viewpoint of movement of feature vectors, so that the phonological noise caused by averaging the feature vectors of multiple candidates, which is a problem with the conventional device described above, is eliminated. Blur is reduced, and a significant drop in recognition rate can be suppressed.

【００３１】本実施例では移動ベクトルの荷重平均を用
いているが、本発明は荷重係数ｗｉをすべて１にする、
すなわち移動ベクトルの単純平均を用いるものもふくむ
ものである。In this embodiment, a weighted average of the movement vectors is used, but in the present invention, all weight coefficients wi are set to 1.
That is, it includes a method using a simple average of moving vectors.

【００３２】本実施例では単語単位のテンプレートを登
録する孤立単語音声認識装置を例にとり本発明の説明を
行ったが、認識単位は単語に限定されるものではなく、
音素、音節等の任意の単位であっても良い。また認識方
式としてＤＰマッチングを例にとって説明したが、特徴
ベクトルの時系列を用いて認識する手法であれば任意の
手法に適用できる。In this embodiment, the present invention has been explained by taking as an example an isolated word speech recognition device that registers word-based templates; however, the recognition unit is not limited to words;
It may be any unit such as a phoneme or a syllable. Further, although DP matching has been described as an example of a recognition method, any recognition method using a time series of feature vectors can be applied.

【００３３】また音声認識用としてだけではなく、音声
伝送用雑音抑圧方式としても用いることが可能である。[0033] Furthermore, it can be used not only for speech recognition but also as a noise suppression method for speech transmission.

【００３４】[0034]

【発明の効果】以上のように本発明によれば、雑音が重
畳していない音声の特徴ベクトルを符号語とする雑音無
し符号帳９と、この雑音無し符号帳９の各符号語に雑音
重畳と等価な処理を施し生成された雑音付加符号帳６と
、雑音付加符号帳６の中から入力音声信号との距離が小
さい複数の近傍符号語候補および各近傍符号語候補と入
力音声信号との距離候補を出力する近傍符号語選択手段
と、上記近傍符号語の雑音重畳前との移動ベクトルを計
算する移動ベクトル計算手段と、この移動ベクトル計算
手段の出力信号である複数の移動ベクトル候補と近傍符
号語選択手段の出力信号である上記距離候補と上記雑音
重畳入力音声信号の特徴ベクトル時系列を入力とし、上
記移動ベクトル候補の荷重平均を用いて、雑音重畳によ
ってもたらされる入力音声信号の特徴ベクトルの移動を
補正することによって雑音を除去する手段を備えて構成
したので複数候補の特徴ベクトル自体を平均することに
より生じる音韻性のぼけが低減され、高雑音下において
も認識率の著しい低下を抑えることができる。As described above, according to the present invention, there is a noise-free codebook 9 whose codeword is a feature vector of speech on which no noise is superimposed, and a noise-free codebook 9 in which each codeword of this noise-free codebook 9 is superimposed with noise. A noise-added codebook 6 generated by performing processing equivalent to , a plurality of neighboring codeword candidates from the noise-added codebook 6 whose distance to the input speech signal is small, and a connection between each neighboring codeword candidate and the input speech signal. Neighborhood codeword selection means for outputting distance candidates; movement vector calculation means for calculating a movement vector of the neighborhood codeword before noise superimposition; and output signals of the movement vector calculation means for a plurality of movement vector candidates and neighbors. The distance candidate, which is the output signal of the code word selection means, and the feature vector time series of the noise-superimposed input speech signal are input, and the weighted average of the movement vector candidates is used to calculate the feature vector of the input speech signal resulting from the noise superposition. Since the system is equipped with a means to remove noise by correcting the movement of the image, the blurring of phonetic properties caused by averaging the feature vectors of multiple candidates is reduced, and a significant drop in recognition rate is suppressed even under high noise conditions. be able to.

[Brief explanation of drawings]

【図１】この発明の一実施例に係る音声認識装置の構成
を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention.

【図２】従来の音声認識装置の構成を示すブロック図で
ある。FIG. 2 is a block diagram showing the configuration of a conventional speech recognition device.

[Explanation of symbols]

１　　音声信号の入力端２　　入力音声信号３　　分析回路４　　特徴ベクトル５　　ベクトル量子化器６　　雑音付加符号帳７　　ラベル候補８　　逆ベクトル量子化器９　　雑音無し符号帳１０　　雑音無し特徴ベクトル１１　　雑音除去回路１２　　パタンマッチング回路１３　　認識制御回路１４　　テンプレートメモリ１５　　参照パタン１６　　マッチング歪１７　　アドレスデータ１８　　認識結果１９　　認識処理回路２０　　近傍符号語選択器２１　　近傍符号語候補２２　　距離候補２３　　移動ベクトル算出器２４　　移動ベクトル候補２５　　雑音除去器２６　　雑音除去回路２７　　雑音除去された特徴ベクトル 1. Audio signal input terminal 2 Input audio signal 3 Analysis circuit 4 Feature vector 5 Vector quantizer 6 Noise-added codebook 7 Label candidates 8 Inverse vector quantizer 9 Noise-free codebook 10 Noise-free feature vector 11 Noise removal circuit 12 Pattern matching circuit 13 Recognition control circuit 14 Template memory 15 Reference pattern 16 Matching distortion 17 Address data 18 Recognition results 19 Recognition processing circuit 20 Neighborhood code word selector 21 Neighboring codeword candidates 22 Distance candidate 23 Movement vector calculator 24 Movement vector candidates 25 Noise remover 26 Noise removal circuit 27 Noise removed feature vector

Claims

[Claims]

Claim 1: A speech recognition device that recognizes input speech and has a noise removal function that suppresses noise superimposed on an input speech signal, wherein a noise-free code uses a feature vector of speech on which no noise is superimposed as a code word. a noise-added codebook generated by applying processing equivalent to noise superposition to each codeword of this noise-free codebook, and a plurality of neighboring codes from this noise-added codebook that have a small distance to the input audio signal. Neighborhood codeword selection means for outputting word candidates and distance candidates between each of the neighborhood codeword candidates and the input audio signal; movement vector calculation means for calculating a movement vector of the neighborhood codeword before noise superimposition; A plurality of movement vector candidates, which are the output signals of the vector calculation means, the distance candidates, which are the output signals of the neighborhood code word selection means, and the feature vector of the noise-superimposed input audio signal are input, and the weighted average of the movement vector candidates is used. 1. A speech recognition device comprising means for correcting movement of a feature vector of an input speech signal caused by noise superimposition and removing noise.