JPH04298797A - Voice recognition device - Google Patents

Voice recognition device

Info

Publication number
JPH04298797A
JPH04298797A JP3043517A JP4351791A JPH04298797A JP H04298797 A JPH04298797 A JP H04298797A JP 3043517 A JP3043517 A JP 3043517A JP 4351791 A JP4351791 A JP 4351791A JP H04298797 A JPH04298797 A JP H04298797A
Authority
JP
Japan
Prior art keywords
noise
candidates
vector
input
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP3043517A
Other languages
Japanese (ja)
Other versions
JP2961916B2 (en
Inventor
Toshiyuki Hanazawa
利行 花沢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP3043517A priority Critical patent/JP2961916B2/en
Publication of JPH04298797A publication Critical patent/JPH04298797A/en
Application granted granted Critical
Publication of JP2961916B2 publication Critical patent/JP2961916B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE:To obtain the voice recognition device which performs a noise removing process without averaging the feature vector itself of a voice and does not decrease in the recognition ratio of the voice even in noisy environment as to a voice recognition device which has a noise removing function for suppressing a noise superposed on an input voice signal and recognizes the input voice. CONSTITUTION:A proximate code word selector 20 outputs plural proximate code word candidates 21 which have short distance to the input voice signal and distance candidates 22 between the respective proximate code word candidates and input voice signal from a noise addition code book 6. A moving vector calculator 23 calculates moving vector candidates 24 of the proximate code word candidates before and after noise superposition. A noise removal unit 25 inputs the moving vector candidates 24, distance candidates, and the feature vector 4 of the input voice signal and corrects the movement of the feature vector of the input voice signal caused by the noise superposition by using the load mean of the moving vector candidates to remove the noise.

Description

【発明の詳細な説明】[Detailed description of the invention]

【0001】0001

【産業上の利用分野】この発明は入力音声信号に重畳し
た雑音を抑圧する雑音除去機能を有する耐騒音性を備え
た音声認識装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise-resistant speech recognition device having a noise removal function for suppressing noise superimposed on an input speech signal.

【0002】0002

【従来の技術】音声のスペクトル情報を用いた音声認識
では、雑音重畳による音声のスペクトルの変形は認識性
能を著しく低下させる。それゆえ、音声認識装置を実用
に供するためには雑音に対する耐性の向上は重要な問題
である。環境騒音による雑音混入を抑えるためにノイズ
キャンセルマイクがよく使用されているが、それでも十
分な信号対雑音比(S/N)が得られない場合や音声信
号の伝送過程において雑音が重畳する場合があり、この
ような既に雑音の混入した音声信号から雑音のみを除去
しS/Nを改善しようとする信号処理技術は雑音抑圧・
雑音除去・音声強調などと呼ばれ、多数の方式提案がな
されている。
2. Description of the Related Art In speech recognition using speech spectrum information, deformation of the speech spectrum due to noise superposition significantly reduces recognition performance. Therefore, improving the resistance to noise is an important issue in putting speech recognition devices into practical use. Noise-cancelling microphones are often used to suppress noise contamination caused by environmental noise, but there are still cases where a sufficient signal-to-noise ratio (S/N) cannot be obtained or noise is superimposed during the audio signal transmission process. However, the signal processing technology that attempts to improve the S/N by removing only the noise from the audio signal that already contains noise is known as noise suppression/
Many methods have been proposed, called noise removal and speech enhancement.

【0003】近時、新しい概念に基づく雑音抑圧法とし
て、ベクトル量子化を用いて生成された雑音重畳信号空
間と雑音無し信号空間の既知の写像関係により、雑音を
除去するSpectral mapping法と称する
方式が論文「Biing−Hwang Jang, L
.R.Rabiner,■Signal Restru
ction by Spectral Mapping
■, 1987, IEEE INTERNATION
AL CONFERENCE ON ACOUSTIC
S, SPEECH AND SIGNALPROCE
SSINGS., Volume 4, PP.6.6
.1−6.6.4, April 1987, Dal
las」(以下ではこの論文を文献[1]と引用する)
において提案されており、音声伝送用雑音抑圧方式とし
て有効とされている。
[0003]Recently, as a noise suppression method based on a new concept, a method called the spectral mapping method is used to remove noise using a known mapping relationship between a noise-superimposed signal space and a noise-free signal space generated using vector quantization. published the paper “Biing-Hwang Jang, L.
.. R. Rabiner, ■Signal Restru
cation by Spectral Mapping
■, 1987, IEEE INTERNATION
AL CONFERENCE ON ACUSTIC
S, SPEECH AND SIGNAL PROCE
SSINGS. , Volume 4, PP. 6.6
.. 1-6.6.4, April 1987, Dal
(Hereinafter, this paper will be cited as Reference [1])
This method was proposed in 2013, and is said to be effective as a noise suppression method for voice transmission.

【0004】このSpectral Mapping法
による雑音除去回路を組み込んだ音声認識装置の構成例
を図2に示す。認識装置における認識方式は種々あるが
、単語単位のテンプレートを持ち、DP(Dynami
c programmig)マッチングによる孤立単語
認識装置を例として説明する。
FIG. 2 shows an example of the configuration of a speech recognition device incorporating a noise removal circuit using the spectral mapping method. There are various recognition methods in recognition devices, but they have word-based templates and DP (Dynami
An example of an isolated word recognition device using matching will be described.

【0005】図2において、1は音声信号の入力端、2
は入力音声信号、3は入力音声信号2を音響分析する分
析回路、4は特徴ベクトル、5は分析回路から出力され
る特徴ベクトル4を雑音付加符号帳6でベクトル量子化
するベクトル量子化器、7は符号語を表すラベルで構成
されるラベル候補、8はベクトル量子化器5の出力であ
るところのラベル候補7を雑音なし符号帳9で逆ベクト
ル量子化する逆ベクトル量子化器、10は雑音無し特徴
ベクトル、11はベクトル量子化器5と雑音付加符号帳
6と逆ベクトル量子化器8と雑音無し符号帳9とで構成
される雑音除去回路である。12は雑音無し特徴ベクト
ル10の時系列と、テンプレートメモリ14から出力さ
れる雑音無し特徴ベクトルの時系列で表現される参照パ
タン15とのDPマッチングを行い、マッチング歪16
を出力するパタンマッチング回路、13はパタンマッチ
ングにおける参照パタンの指定と認識結果の出力を行う
認識制御回路、17は認識制御回路13が参照パタンを
指定するためにテンプレートメモリ14に送るアドレス
データ、18は認識結果、19はパタンマッチング回路
12とテンプレートメモリ14と認識制御回路13とで
構成される認識処理回路である。
In FIG. 2, 1 is an audio signal input terminal;
3 is an input audio signal, 3 is an analysis circuit that acoustically analyzes the input audio signal 2, 4 is a feature vector, and 5 is a vector quantizer that vector-quantizes the feature vector 4 output from the analysis circuit using a noise-added codebook 6. 7 is a label candidate consisting of a label representing a code word; 8 is an inverse vector quantizer that performs inverse vector quantization of the label candidate 7, which is the output of the vector quantizer 5, using a noiseless codebook 9; and 10 is an inverse vector quantizer The noise-free feature vector 11 is a noise removal circuit composed of a vector quantizer 5, a noise-added codebook 6, an inverse vector quantizer 8, and a noise-free codebook 9. 12 performs DP matching between the time series of the noise-free feature vectors 10 and the reference pattern 15 expressed as the time series of the noise-free feature vectors output from the template memory 14, and creates a matching distortion 16.
13 is a recognition control circuit that specifies a reference pattern in pattern matching and outputs a recognition result; 17 is address data that the recognition control circuit 13 sends to the template memory 14 to specify a reference pattern; 18 is a recognition result, and 19 is a recognition processing circuit composed of a pattern matching circuit 12, a template memory 14, and a recognition control circuit 13.

【0006】次に動作について説明する。雑音除去回路
11において、雑音無し符号帳9は雑音が重畳していな
い音声の特徴ベクトルを符号語として構成され、雑音付
加符号帳6は雑音無し符号帳9の各符号語に対し、雑音
重畳入力音声信号の雑音様態となるような処理、例えば
、雑音重畳入力音声信号2と同一の信号対雑音比になる
ように、時間波形領域で雑音を付加し、これを再分析し
て雑音付加特徴ベクトルに変換することで生成した符号
帳で構成される。入力端1に入力された雑音が重畳した
入力音声信号2は分析回路3で音響分析され、特徴ベク
トル4であるX(k)(kは時系列の番号を表す。k=
1,2,...,K)として出力される。
Next, the operation will be explained. In the noise removal circuit 11, the noise-free codebook 9 is configured with speech feature vectors on which no noise is superimposed as codewords, and the noise-added codebook 6 is configured with noise-superimposed input for each codeword of the noise-free codebook 9. Processing that makes the speech signal noise-like, for example, adding noise in the time waveform domain so that the signal-to-noise ratio is the same as the noise superimposed input speech signal 2, and reanalyzing this to create a noise-added feature vector. It consists of a codebook generated by converting to . The input audio signal 2 with superimposed noise input to the input terminal 1 is acoustically analyzed by the analysis circuit 3, and the feature vector 4 is X(k) (k represents a time series number. k=
1, 2, . .. .. , K).

【0007】雑音除去回路11において、ベクトル量子
化器5は特徴ベクトル4であるX(k)を入力とし、雑
音付加符号帳6の全符号語との距離を求め、距離の小さ
い方から第M位までの符号語を表すラベル候補Ln(k
)={lni(k) | i = 1,2,...,M
}として出力する。Mは1または2以上の整数である。
In the noise removal circuit 11, the vector quantizer 5 inputs the feature vector 4, X(k), calculates the distances to all the code words in the noise-added codebook 6, and selects the M-th one from the one with the smallest distance. Label candidate Ln(k
) = {lni(k) | i = 1, 2, . .. .. ,M
}. M is an integer of 1 or 2 or more.

【0008】逆ベクトル量子化器8はラベル候補7であ
るLn(k) ={lni(k) | i = 1,2
,...,M}を雑音無し符号帳9で逆ベクトル量子化
し特徴ベクトル候補{vci(k) | i =1,2
,...,M}を求め、このM個の特徴ベクトル候補{
vci(k) |i = 1,2,...,M}の平均
ベクトルとして雑音無し特徴ベクトルY(k)を求める
。すなわち、
The inverse vector quantizer 8 calculates the label candidate 7 Ln(k) = {lni(k) | i = 1,2
、. .. .. , M} is inverse vector quantized using the noise-free codebook 9 to create a feature vector candidate {vci(k) | i =1,2
、. .. .. , M}, and calculate these M feature vector candidates {
vci(k) |i = 1, 2, . .. .. , M}, a noise-free feature vector Y(k) is obtained. That is,

【0009】[0009]

【数1】[Math 1]

【0010】認識処理回路19において、テンプレート
メモリ14は認識制御13が出力するアドレスデータ1
7で指定される参照パタン15である{T(l) | 
l = 1,2,...,L}(Lは特徴ベクトルの数
)をパタンマッチング回路12に送出する。パタンマッ
チング回路12は上記参照パタン15である{T(l)
 | l = 1,2,...,L}と雑音除去回路1
1の出力であるところの雑音無し特徴ベクトル10の時
系列である{Y(k) | k = 1,2,...,
K}とのDPマッチングを行う。DPマッチングの漸化
式は例えば次のようになる。
In the recognition processing circuit 19, the template memory 14 stores address data 1 output from the recognition control 13.
The reference pattern 15 specified by 7 is {T(l) |
l = 1, 2, . .. .. , L} (L is the number of feature vectors) to the pattern matching circuit 12. The pattern matching circuit 12 uses the above reference pattern 15 {T(l)
| l = 1, 2, . .. .. , L} and noise removal circuit 1
{Y(k) | k = 1, 2, . .. .. ,
Perform DP matching with K}. For example, the recurrence formula for DP matching is as follows.

【0011】[0011]

【数2】[Math 2]

【0012】ここで、g(k,l)は特徴ベクトルY(
k)と特徴ベクトルT(l)との歪である。上式は一例
として傾斜制限なしのDPマッチングの場合を挙げたも
のである。この漸化式(式2)においてG(K,L)を
、上記参照パタン15である{T(l) | l= 1
,2,...,L}と雑音除去回路11の出力であると
ころの雑音無し特徴ベクトル10の時系列である{Y(
k) | k = 1,2,...,K}との時系列長
の和である(K+L)で割ることで正規化し、マッチン
グ歪16として出力する。
Here, g(k,l) is the feature vector Y(
k) and the feature vector T(l). The above equation takes as an example the case of DP matching without slope restriction. In this recurrence formula (Equation 2), G(K,L) is the reference pattern 15 described above, {T(l) | l= 1
,2,. .. .. , L} and {Y(
k) | k = 1, 2, . .. .. , K}, which is the sum of the time series lengths (K+L), is normalized and output as matching distortion 16.

【0013】認識制御回路13は、アドレスデータ17
で指定する参照パタンを順次変え、各参照パタンについ
てパタンマッチング回路12が出力するマッチング歪1
6からマッチング歪を最小とする参照パタンのラベルを
認識結果18として出力する。
The recognition control circuit 13 receives address data 17
The matching distortion 1 outputted by the pattern matching circuit 12 for each reference pattern by sequentially changing the reference pattern specified by
6, the label of the reference pattern that minimizes the matching distortion is output as the recognition result 18.

【0014】[0014]

【発明が解決しようとする課題】Spectral M
apping法による雑音除去において根幹をなす、雑
音重畳信号空間雑音無し信号空間との写像関係は、雑音
無し符号帳9に雑音重畳と等価な処理を加えて雑音付加
符号帳6を作ることで形成される。しかし入力音声信号
2に重畳する雑音のレベルが上がるに従い、入力音声信
号2のスペクトル包絡は平坦化する。そのため分析回路
3より出力される特徴ベクトル時系列4に含まれる音韻
特徴性が低下し、上記雑音付加符号帳6でベクトル量子
化した場合、正しい写像による符号語にベクトル量子化
される可能性が低くなり、前記の写像誤りは甚だしく増
加する。例えば雑音除去回路11におけるラベル候補の
数Mを1にすると、写像誤りが生じた場合、その写像誤
りによる歪がそのままパタンマッチングにおけるマッチ
ング歪16に反映するので、重畳雑音のレベルが上がる
に従い認識性能は急激に低下する。そこでMを2以上の
数にして候補数を増やす改良策を前記従来技術では使用
している。これにより複数の候補の中に正しい写像によ
る候補が含まれる確率が高くなるが、その複数の候補の
中のどの候補が正しい写像による候補なのかわからない
。そのため複数の候補の特徴ベクトルの平均を出力とす
る方法を使用している。しかしこの平均化処理により、
本来選ばれるべき候補の特徴ベクトルに誤った候補の特
徴ベクトルの成分が混入するため、音韻特徴性がぼけ、
認識性能が低下する。従って、従来技術は音声伝送にお
いて聴覚上の信号対雑音比を改善する効果はあるが、音
声認識に適用する場合は効果が少ない。このように従来
の雑音除去機能を持つ音声認識装置では、雑音除去処理
によって音声の特徴ベクトルが平均化されるため音韻特
徴性がぼけ、高雑音下においては音声の認識率が低下す
るという問題点があった。
[Problem to be solved by the invention] Spectral M
The mapping relationship between the noise-superimposed signal space and the noise-free signal space, which is the basis of noise removal using the appping method, is formed by creating the noise-added codebook 6 by adding processing equivalent to noise-superimposition to the noise-free codebook 9. Ru. However, as the level of noise superimposed on the input audio signal 2 increases, the spectral envelope of the input audio signal 2 flattens. Therefore, the phonological characteristics included in the feature vector time series 4 output from the analysis circuit 3 are reduced, and when vector quantization is performed using the noise-added codebook 6, there is a possibility that the vector quantization will be performed into a code word by correct mapping. The mapping error increases significantly. For example, when the number M of label candidates in the noise removal circuit 11 is set to 1, when a mapping error occurs, the distortion due to the mapping error is directly reflected in the matching distortion 16 in pattern matching, so as the level of superimposed noise increases, the recognition performance increases. decreases rapidly. Therefore, the prior art uses an improvement measure to increase the number of candidates by setting M to a number greater than or equal to 2. This increases the probability that a correct mapping candidate will be included among the plurality of candidates, but it is not known which one of the plurality of candidates is the correct mapping candidate. Therefore, a method is used in which the average of feature vectors of multiple candidates is output. However, due to this averaging process,
Because components of the feature vector of the incorrect candidate are mixed into the feature vector of the candidate that should have been selected, the phonological distinctiveness is blurred.
Recognition performance deteriorates. Therefore, although the conventional technology is effective in improving the auditory signal-to-noise ratio in speech transmission, it is less effective when applied to speech recognition. As described above, conventional speech recognition devices with a noise removal function have the problem that the speech feature vectors are averaged during the noise removal process, which blurs the phonological features and reduces the speech recognition rate under high noise conditions. was there.

【0015】本発明は上記のような問題点を解決するた
めになされたもので、雑音除去処理によって音声の特徴
ベクトルを平均化させず、高雑音下においても音声の認
識率が低下しない音声認識装置を得ることを目的とする
The present invention has been made to solve the above-mentioned problems, and provides speech recognition in which the speech feature vectors are not averaged through noise removal processing, and the speech recognition rate does not decrease even under high noise. The purpose is to obtain equipment.

【0016】[0016]

【課題を解決するための手段】この発明に係る音声認識
装置は、雑音が重畳していない音声の特徴ベクトルを符
号語とする雑音無し符号帳と、この雑音無し符号帳の各
符号語に雑音重畳と等価な処理を施し生成された雑音付
加符号帳と、この雑音付加符号帳の中から入力音声信号
との距離が小さい複数の近傍符号語候補およびこの各近
傍符号語候補と入力音声信号との距離候補を出力する近
傍符号語選択手段と、上記近傍符号語の雑音重畳前と雑
音重畳後との移動ベクトルを計算する移動ベクトル計算
手段と、この移動ベクトル計算手段の出力信号である複
数の移動ベクトル候補と上記近傍符号語選択手段の出力
信号である上記距離候補と上記雑音重畳入力音声信号の
特徴ベクトルを入力とし、上記移動ベクトル候補の荷重
平均を用いて、雑音重畳によってもたらされる入力音声
信号の特徴ベクトルの移動を補正し雑音を除去する手段
を備えたものである。
[Means for Solving the Problems] A speech recognition device according to the present invention includes a noise-free codebook whose codewords are feature vectors of speech on which no noise is superimposed, and a noise-free codebook in which each codeword of the noise-free codebook has a noise-free codebook. A noise-added codebook generated by processing equivalent to superposition, a plurality of neighboring codeword candidates from this noise-added codebook whose distance to the input speech signal is small, and the relationship between each neighboring codeword candidate and the input speech signal. a nearby code word selection means for outputting a distance candidate of the above-mentioned neighborhood code word; a movement vector calculation means for calculating a movement vector of the neighboring code word before and after noise superimposition; The motion vector candidate, the distance candidate which is the output signal of the neighboring code word selection means, and the feature vector of the noise-superimposed input speech signal are input, and the weighted average of the motion vector candidate is used to calculate the input speech resulting from the noise superposition. It is equipped with means for correcting the movement of the signal feature vector and removing noise.

【0017】[0017]

【作用】この発明おける雑音除去手段は、移動ベクトル
計算手段の出力信号である複数の移動ベクトル候補の荷
重平均を用いて、雑音重畳によってもたらされる入力音
声信号の特徴ベクトルの移動を補正することによって雑
音を除去するので、特徴ベクトル自体を平均化すること
なく雑音が除去できる。
[Operation] The noise removal means of the present invention uses a weighted average of a plurality of movement vector candidates, which are output signals of the movement vector calculation means, to correct movement of the feature vector of the input speech signal caused by noise superposition. Since noise is removed, the noise can be removed without averaging the feature vectors themselves.

【0018】[0018]

【実施例】図1はこの発明の一実施例に係る音声認識装
置の構成を示すブロック図である。図1において、図2
に示す構成要素に対応するものには同一の参照符を付し
、その説明を省略する。この実施例は、従来例と同様、
単語単位のテンプレートとのDPマッチングにより認識
を行う孤立単語認識装置を例として説明する。
DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the configuration of a speech recognition apparatus according to an embodiment of the present invention. In Figure 1, Figure 2
Components corresponding to those shown in are given the same reference numerals, and their explanations will be omitted. This embodiment, like the conventional example,
An example of an isolated word recognition device that performs recognition by DP matching with a word-by-word template will be described.

【0019】図1において、20は雑音付加符号帳6の
中から入力音声信号の特徴ベクトル4との距離が小さい
複数の近傍符号語候補21および各近傍符号語候補と入
力音声信号との距離候補22を出力する近傍符号語選択
器である。ここで距離は例えば入力音声信号の特徴ベク
トル4とのユークリッド距離である。23は近傍符号語
選択器20の出力信号である近傍符号語候補21の雑音
重畳前と雑音重畳後の移動ベクトル候補24を計算する
移動ベクトル算出器である。25は移動ベクトル候補2
4と近傍符号語選択器20の出力信号である距離候補2
2と分析回路3の出力信号である特徴ベクトル4を入力
とし、雑音除去された特徴ベクトル27を出力する雑音
除去器である。26は雑音付加符号帳6と雑音無し符号
帳9と近傍符号語選択器20と移動ベクトル算出器23
と雑音除去器25とで構成される雑音除去回路である。
In FIG. 1, reference numeral 20 denotes a plurality of neighboring codeword candidates 21 having a small distance from the feature vector 4 of the input speech signal from the noise-added codebook 6, and distance candidates between each neighboring codeword candidate and the input speech signal. This is a neighborhood code word selector that outputs 22. Here, the distance is, for example, the Euclidean distance to the feature vector 4 of the input audio signal. Reference numeral 23 denotes a movement vector calculator that calculates a movement vector candidate 24 of the neighboring codeword candidate 21, which is the output signal of the neighboring codeword selector 20, before and after noise superimposition. 25 is movement vector candidate 2
4 and the distance candidate 2 which is the output signal of the neighborhood code word selector 20
2 and a feature vector 4 which is the output signal of the analysis circuit 3 as inputs, and outputs a feature vector 27 from which noise has been removed. 26 is a noise-added codebook 6, a noise-free codebook 9, a neighboring codeword selector 20, and a movement vector calculator 23.
This is a noise removal circuit composed of a noise remover 25 and a noise remover 25.

【0020】この発明に係る音声認識装置の雑音除去方
式の原理は以下のようである。雑音付加符号帳6は、雑
音無し符号帳9の各符号語に対し雑音重畳入力音声信号
の雑音様態となるような処理、例えば、雑音重畳入力音
声信号2と同一の信号対雑音比になるように、時間波形
領域で雑音を付加し、これを再分析して雑音付加特徴ベ
クトルに変換することで得られるが、この雑音重畳の過
程を各符号語の特徴空間上での移動とみなすことができ
る。各符号語の特徴ベクトルは一般に時間波形領域から
非線形な処理を施して得られるので、雑音重畳による移
動は各符号語ごとに異なったものになっている。この各
符号語ごとの特徴空間上での移動を移動ベクトルと呼ぶ
ことにする。雑音が重畳した符号語から雑音成分を除去
するには、雑音重畳によってもたらされる移動を補正す
る、すなわち移動ベクトルを減算すればよい。入力音声
の特徴ベクトルX(k)は雑音付加符号帳6の符号語と
は完全には一致しないが、雑音重畳によってもたらされ
る入力音声の特徴ベクトルX(k)の移動は、X(k)
の近傍の符号語の移動ベクトルに近いものと考えてよい
。そこで本発明に係る音声認識装置では雑音付加符号帳
6の中から選択された入力音声の特徴ベクトルX(k)
の近傍のM個の符号語{vni(k) | i = 1
,2,...,M}の移動ベクトルの荷重平均を減ずる
ことによって雑音除去を行う。
The principle of the noise removal method of the speech recognition device according to the present invention is as follows. The noise-added codebook 6 processes each codeword of the noise-free codebook 9 so that it becomes the noise form of the noise-superimposed input speech signal, for example, so that it has the same signal-to-noise ratio as the noise-superimposed input speech signal 2. This can be obtained by adding noise in the temporal waveform domain, reanalyzing this, and converting it to a noise-added feature vector, but this process of noise superposition can be regarded as movement of each codeword in the feature space. can. Since the feature vector of each codeword is generally obtained by performing nonlinear processing from the time waveform domain, the movement due to noise superposition is different for each codeword. This movement in the feature space for each code word will be called a movement vector. In order to remove a noise component from a code word on which noise is superimposed, it is sufficient to correct the movement caused by the noise superposition, that is, to subtract the movement vector. Although the feature vector X(k) of the input speech does not completely match the code word of the noise-added codebook 6, the movement of the feature vector X(k) of the input speech caused by noise superposition is
can be considered to be close to the movement vector of the code word in the vicinity of . Therefore, in the speech recognition device according to the present invention, the feature vector X(k) of the input speech selected from the noise-added codebook 6 is
M codewords in the neighborhood of {vni(k) | i = 1
,2,. .. .. , M}, the noise is removed by subtracting the weighted average of the movement vectors.

【0021】次にこの実施例の動作について説明する。 近傍符号語選択器20は、雑音付加符号帳6の中から入
力音声信号4であるx(k)との距離が近い近傍M個の
符号語候補21であるVn(k) ={vni(k)|
i= 1,2,...,M}および前記近傍符号語候補
と入力音声信号との距離候補22であるD(k)={d
i(k)|i = 1,2,...,M}を出力する。
Next, the operation of this embodiment will be explained. The neighborhood codeword selector 20 selects M neighborhood codeword candidates 21 from the noise-added codebook 6 that are close to x(k), which is the input speech signal 4, Vn(k) = {vni(k). )|
i=1,2,. .. .. , M} and D(k)={d
i(k) | i = 1, 2, . .. .. , M} is output.

【0022】移動ベクトル算出器23は近傍符号語選択
器20の出力信号である近傍符号語候補21、即ち、V
n(k)={vni(k)|i = 1,2,...,
M}と、雑音無し符号帳9の符号語を入力とし、上記近
傍符号語候補21の雑音重畳前と雑音重畳後との移動ベ
クトル候補24であるB(k)={bi(k)|i =
 1,2,...,M}を計算する。上記移動ベクトル
候補24であるB(k)={bi(k)|i = 1,
2,...,M}は以下のように計算される。
The movement vector calculator 23 uses the nearby code word candidate 21, which is the output signal of the nearby code word selector 20, that is, V
n(k)={vni(k)|i=1,2,. .. .. ,
M} and the code word of the noise-free codebook 9 are input, and B(k) = {bi(k)|i which is the movement vector candidate 24 before and after noise superimposition of the neighboring code word candidate 21 =
1, 2,. .. .. , M}. The movement vector candidate 24 is B(k)={bi(k)|i=1,
2,. .. .. , M} is calculated as follows.

【0023】[0023]

【数3】[Math 3]

【0024】ここでvci(k)は雑音無し符号帳9の
符号語であり、vni(k)は雑音付加符号帳6の符号
語である。
Here, vci(k) is a codeword of the noiseless codebook 9, and vni(k) is a codeword of the noise-added codebook 6.

【0025】雑音除去器25は複数の移動ベクトル候補
24と近傍符号語選択器20の出力信号である距離候補
22と分析回路3の出力信号である特徴ベクトル4を入
力とし、雑音除去された特徴ベクトル27であるZ(k
)を出力する。前記雑音除去された特徴ベクトル27で
あるZ(k)は以下のように計算される。
The noise remover 25 inputs a plurality of movement vector candidates 24, distance candidates 22 which are the output signals of the neighborhood code word selector 20, and feature vectors 4 which are the output signals of the analysis circuit 3, and calculates the noise-removed features. Z(k
) is output. Z(k), which is the noise-removed feature vector 27, is calculated as follows.

【0026】[0026]

【数4】[Math 4]

【0027】[0027]

【数5】[Math 5]

【0028】ここでwiは移動ベクトル候補B(k)=
{bi(k)| i = 1,2,...,M}に対す
る荷重係数であり、pは荷重平均の重みを決める定数で
あり、例えばp=1である。
Here, wi is the movement vector candidate B(k)=
{bi(k)| i = 1, 2, . .. .. , M}, and p is a constant that determines the weight of the weighted average, for example, p=1.

【0029】上記雑音除去を施された特徴ベクトル27
であるZ(k)はパタンマッチング回路12に送られ、
認識処理にはいる。以降の処理は従来の技術の項で述べ
た方法と全く同等なので説明は省略する。
Feature vector 27 subjected to the above noise removal
Z(k) is sent to the pattern matching circuit 12,
Enter recognition processing. Since the subsequent processing is completely equivalent to the method described in the section of the prior art, the explanation will be omitted.

【0030】以上のように、この実施例では特徴ベクト
ルの移動という視点から雑音除去を行うので、前述の従
来装置の問題点である、複数候補の特徴ベクトル自体を
平均することにより生じる音韻性のぼけが低減され、認
識率の著しい低下を抑えることができる。
As described above, in this embodiment, noise is removed from the viewpoint of movement of feature vectors, so that the phonological noise caused by averaging the feature vectors of multiple candidates, which is a problem with the conventional device described above, is eliminated. Blur is reduced, and a significant drop in recognition rate can be suppressed.

【0031】本実施例では移動ベクトルの荷重平均を用
いているが、本発明は荷重係数wiをすべて1にする、
すなわち移動ベクトルの単純平均を用いるものもふくむ
ものである。
In this embodiment, a weighted average of the movement vectors is used, but in the present invention, all weight coefficients wi are set to 1.
That is, it includes a method using a simple average of moving vectors.

【0032】本実施例では単語単位のテンプレートを登
録する孤立単語音声認識装置を例にとり本発明の説明を
行ったが、認識単位は単語に限定されるものではなく、
音素、音節等の任意の単位であっても良い。また認識方
式としてDPマッチングを例にとって説明したが、特徴
ベクトルの時系列を用いて認識する手法であれば任意の
手法に適用できる。
In this embodiment, the present invention has been explained by taking as an example an isolated word speech recognition device that registers word-based templates; however, the recognition unit is not limited to words;
It may be any unit such as a phoneme or a syllable. Further, although DP matching has been described as an example of a recognition method, any recognition method using a time series of feature vectors can be applied.

【0033】また音声認識用としてだけではなく、音声
伝送用雑音抑圧方式としても用いることが可能である。
[0033] Furthermore, it can be used not only for speech recognition but also as a noise suppression method for speech transmission.

【0034】[0034]

【発明の効果】以上のように本発明によれば、雑音が重
畳していない音声の特徴ベクトルを符号語とする雑音無
し符号帳9と、この雑音無し符号帳9の各符号語に雑音
重畳と等価な処理を施し生成された雑音付加符号帳6と
、雑音付加符号帳6の中から入力音声信号との距離が小
さい複数の近傍符号語候補および各近傍符号語候補と入
力音声信号との距離候補を出力する近傍符号語選択手段
と、上記近傍符号語の雑音重畳前との移動ベクトルを計
算する移動ベクトル計算手段と、この移動ベクトル計算
手段の出力信号である複数の移動ベクトル候補と近傍符
号語選択手段の出力信号である上記距離候補と上記雑音
重畳入力音声信号の特徴ベクトル時系列を入力とし、上
記移動ベクトル候補の荷重平均を用いて、雑音重畳によ
ってもたらされる入力音声信号の特徴ベクトルの移動を
補正することによって雑音を除去する手段を備えて構成
したので複数候補の特徴ベクトル自体を平均することに
より生じる音韻性のぼけが低減され、高雑音下において
も認識率の著しい低下を抑えることができる。
As described above, according to the present invention, there is a noise-free codebook 9 whose codeword is a feature vector of speech on which no noise is superimposed, and a noise-free codebook 9 in which each codeword of this noise-free codebook 9 is superimposed with noise. A noise-added codebook 6 generated by performing processing equivalent to , a plurality of neighboring codeword candidates from the noise-added codebook 6 whose distance to the input speech signal is small, and a connection between each neighboring codeword candidate and the input speech signal. Neighborhood codeword selection means for outputting distance candidates; movement vector calculation means for calculating a movement vector of the neighborhood codeword before noise superimposition; and output signals of the movement vector calculation means for a plurality of movement vector candidates and neighbors. The distance candidate, which is the output signal of the code word selection means, and the feature vector time series of the noise-superimposed input speech signal are input, and the weighted average of the movement vector candidates is used to calculate the feature vector of the input speech signal resulting from the noise superposition. Since the system is equipped with a means to remove noise by correcting the movement of the image, the blurring of phonetic properties caused by averaging the feature vectors of multiple candidates is reduced, and a significant drop in recognition rate is suppressed even under high noise conditions. be able to.

【図面の簡単な説明】[Brief explanation of drawings]

【図1】この発明の一実施例に係る音声認識装置の構成
を示すブロック図である。
FIG. 1 is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention.

【図2】従来の音声認識装置の構成を示すブロック図で
ある。
FIG. 2 is a block diagram showing the configuration of a conventional speech recognition device.

【符号の説明】[Explanation of symbols]

1  音声信号の入力端 2  入力音声信号 3  分析回路 4  特徴ベクトル 5  ベクトル量子化器 6  雑音付加符号帳 7  ラベル候補 8  逆ベクトル量子化器 9  雑音無し符号帳 10  雑音無し特徴ベクトル 11  雑音除去回路 12  パタンマッチング回路 13  認識制御回路 14  テンプレートメモリ 15  参照パタン 16  マッチング歪 17  アドレスデータ 18  認識結果 19  認識処理回路 20  近傍符号語選択器 21  近傍符号語候補 22  距離候補 23  移動ベクトル算出器 24  移動ベクトル候補 25  雑音除去器 26  雑音除去回路 27  雑音除去された特徴ベクトル 1. Audio signal input terminal 2 Input audio signal 3 Analysis circuit 4 Feature vector 5 Vector quantizer 6 Noise-added codebook 7 Label candidates 8 Inverse vector quantizer 9 Noise-free codebook 10 Noise-free feature vector 11 Noise removal circuit 12 Pattern matching circuit 13 Recognition control circuit 14 Template memory 15 Reference pattern 16 Matching distortion 17 Address data 18 Recognition results 19 Recognition processing circuit 20 Neighborhood code word selector 21 Neighboring codeword candidates 22 Distance candidate 23 Movement vector calculator 24 Movement vector candidates 25 Noise remover 26 Noise removal circuit 27 Noise removed feature vector

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】  入力音声信号に重畳している雑音を抑
圧する雑音除去機能を持ち入力音声を認識する音声認識
装置において、雑音が重畳していない音声の特徴ベクト
ルを符号語とする雑音無し符号帳と、この雑音無し符号
帳の各符号語に雑音重畳と等価な処理を施し生成された
雑音付加符号帳と、この雑音付加符号帳の中から入力音
声信号との距離が小さい複数の近傍符号語候補およびこ
の各近傍符号語候補と入力音声信号との距離候補を出力
する近傍符号語選択手段と、上記近傍符号語の雑音重畳
前との移動ベクトルを計算する移動ベクトル計算手段と
、この移動ベクトル計算手段の出力信号である複数の移
動ベクトル候補と近傍符号語選択手段の出力信号である
上記距離候補と上記雑音重畳入力音声信号の特徴ベクト
ルを入力とし、上記移動ベクトル候補の荷重平均を用い
て、雑音重畳によってもたらされる入力音声信号の特徴
ベクトルの移動を補正し、雑音を除去する手段を備えた
ことを特徴とする音声認識装置。
Claim 1: A speech recognition device that recognizes input speech and has a noise removal function that suppresses noise superimposed on an input speech signal, wherein a noise-free code uses a feature vector of speech on which no noise is superimposed as a code word. a noise-added codebook generated by applying processing equivalent to noise superposition to each codeword of this noise-free codebook, and a plurality of neighboring codes from this noise-added codebook that have a small distance to the input audio signal. Neighborhood codeword selection means for outputting word candidates and distance candidates between each of the neighborhood codeword candidates and the input audio signal; movement vector calculation means for calculating a movement vector of the neighborhood codeword before noise superimposition; A plurality of movement vector candidates, which are the output signals of the vector calculation means, the distance candidates, which are the output signals of the neighborhood code word selection means, and the feature vector of the noise-superimposed input audio signal are input, and the weighted average of the movement vector candidates is used. 1. A speech recognition device comprising means for correcting movement of a feature vector of an input speech signal caused by noise superimposition and removing noise.
JP3043517A 1991-03-08 1991-03-08 Voice recognition device Expired - Fee Related JP2961916B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3043517A JP2961916B2 (en) 1991-03-08 1991-03-08 Voice recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3043517A JP2961916B2 (en) 1991-03-08 1991-03-08 Voice recognition device

Publications (2)

Publication Number Publication Date
JPH04298797A true JPH04298797A (en) 1992-10-22
JP2961916B2 JP2961916B2 (en) 1999-10-12

Family

ID=12665940

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3043517A Expired - Fee Related JP2961916B2 (en) 1991-03-08 1991-03-08 Voice recognition device

Country Status (1)

Country Link
JP (1) JP2961916B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013175869A (en) * 2012-02-24 2013-09-05 Nippon Telegr & Teleph Corp <Ntt> Acoustic signal enhancement device, distance determination device, methods for the same, and program
CN109346067A (en) * 2018-11-05 2019-02-15 珠海格力电器股份有限公司 The processing method and processing device of voice messaging, storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01274198A (en) * 1988-04-27 1989-11-01 Mitsubishi Electric Corp Speech recognition device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01274198A (en) * 1988-04-27 1989-11-01 Mitsubishi Electric Corp Speech recognition device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013175869A (en) * 2012-02-24 2013-09-05 Nippon Telegr & Teleph Corp <Ntt> Acoustic signal enhancement device, distance determination device, methods for the same, and program
CN109346067A (en) * 2018-11-05 2019-02-15 珠海格力电器股份有限公司 The processing method and processing device of voice messaging, storage medium

Also Published As

Publication number Publication date
JP2961916B2 (en) 1999-10-12

Similar Documents

Publication Publication Date Title
Liu et al. Efficient cepstral normalization for robust speech recognition
EP0970462B1 (en) Recognition system
CN112447191A (en) Signal processing device and signal processing method
Acero et al. Robust speech recognition by normalization of the acoustic space.
JP4295118B2 (en) Pattern recognition
US8391505B2 (en) Reverberation suppressing apparatus and reverberation suppressing method
JP3154487B2 (en) A method of spectral estimation to improve noise robustness in speech recognition
US9536538B2 (en) Method and device for reconstructing a target signal from a noisy input signal
US6202047B1 (en) Method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients
JPH07271394A (en) Removal of signal bias for sure recognition of telephone voice
US20070276662A1 (en) Feature-vector compensating apparatus, feature-vector compensating method, and computer product
EP1345209A2 (en) Method and apparatus for feature domain joint channel and additive noise compensation
US20220068288A1 (en) Signal processing apparatus, signal processing method, and program
CN101790752A (en) Multiple microphone voice activity detector
JPH01291298A (en) Adaptive voice recognition device
US7987090B2 (en) Sound-source separation system
US20240105199A1 (en) Learning method based on multi-channel cross-tower network for jointly suppressing acoustic echo and background noise
US20030182114A1 (en) Robust parameters for noisy speech recognition
JP2797949B2 (en) Voice recognition device
Park et al. Unsupervised speech domain adaptation based on disentangled representation learning for robust speech recognition
JPH04298797A (en) Voice recognition device
Park et al. Modeling acoustic transitions in speech by modified hidden Markov models with state duration and state duration-dependent observation probabilities
CN112652321A (en) Voice noise reduction system and method based on deep learning phase friendlier
Techini et al. Robust front-end based on MVA and HEQ post-processing for Arabic speech recognition using hidden Markov model toolkit (HTK)
Matsumoto et al. Unsupervised speaker adaptation from short utterances based on a minimized fuzzy objective function

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees