JPH05289695A

JPH05289695A - Voice recognition system under noise

Info

Publication number: JPH05289695A
Application number: JP4085839A
Authority: JP
Inventors: Toshihiro Kasuya; 敏宏糟谷; Noriya Murakami; 憲也村上
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Group Corp
Priority date: 1992-04-08
Filing date: 1992-04-08
Publication date: 1993-11-05

Abstract

PURPOSE:To efficiently remove a background noise in consideration of fine variation in background noise power and to improve the recognition rate of a voice which is vocalized in noisy environment. CONSTITUTION:The voice recognition system under noise which discriminates the input voice by deforming input voice vectors by using the varying direction of the input voice vectors due to the fine variation in noise power calculated by using spectrum information on the background noise of the input voice and a variation quantity set in advance is provided with a vector movement quantity calculation part 10 which analyzes the noise section preceding the voice section and calculates the variation quantity used for the deformation of the input voice vectors according to the dispersion of the noise power in the noise section, and, the variation quantity is determined according to the analysis of the noise section to perform the voice recognition wherein the time variation quantity of the noise power is accurately reflected.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、雑音下で利用される音
声認識システムに係わり、特に、雑音下で発声された音
声に対する認識率を向上させるのに好適な雑音下音声認
識システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system used in noise, and more particularly to a noisy speech recognition system suitable for improving the recognition rate for speech uttered in noise. is there.

【０００２】[0002]

【従来の技術】背景雑音の混入した音声、すなわち、雑
音下音声は、雑音のない環境で発声された音声と比較し
て、そこから抽出されるスペクトルなどの特徴パラメタ
が異なる。従って、雑音下音声の認識を行なう際には、
高い識別率を維持するために、なんらかの雑音除去処理
を行なうか、または、パラメタの変形を考慮した識別を
行なう必要がある。2. Description of the Related Art A background noise-containing voice, that is, a noisy voice has different characteristic parameters such as a spectrum extracted from a voice uttered in a noise-free environment. Therefore, when recognizing noisy speech,
In order to maintain a high discrimination rate, it is necessary to perform some noise elimination processing or to discriminate in consideration of parameter deformation.

【０００３】このような雑音下音声の認識率を向上させ
るための従来技術の一つに、例えば、スペクトルサブト
ラクション法（以下、ＳＳ法と記載）がある。このＳＳ
法では、まず、以下に述べるスペクトル分析などの前処
理を行なう。この場合、音声に混入した雑音成分を除去
するため、音声区間に先行する音声のない雑音のみの区
間から、雑音のスペクトルを推定し、雑音の混入した音
声から得られるスペクトルから差し引くことにより、音
声スペクトルを得る。この後、入力された音声の特徴量
を求め、予め設定登録してある標準パタンとの間で識別
処理を行ない、それらの類似度が、所定のしきい値以内
か否かによって入力音声を識別する。One of the conventional techniques for improving the recognition rate of such noisy speech is, for example, a spectral subtraction method (hereinafter referred to as SS method). This SS
In the method, first, pretreatment such as spectrum analysis described below is performed. In this case, in order to remove the noise component mixed in the voice, by estimating the noise spectrum from the noise-free period preceding the voice period and subtracting it from the spectrum obtained from the noise-containing voice, Get the spectrum. After that, the feature amount of the input voice is obtained, and the identification process is performed with the standard pattern that is set and registered in advance, and the input voice is identified based on whether or not their similarity is within a predetermined threshold value. To do.

【０００４】また、雑音の重畳による特徴パラメタの変
化に対応する技術の一つとして、音声認識装置を用いる
場所での背景雑音下で発声された音声を用いて、標準パ
タンを作成するものも一般に知られている。例えば、マ
ルチテンプレート法では、信号対雑音比（ＳＮ比）を、
数段設定し、設定したレベルで音声に雑音を重畳し、そ
れらの雑音重畳信号から、複数の標準パタンを作成し、
入力音声から抽出される特徴ベクトルを、入力音声のＳ
Ｎ比に近い、いずれかのテンプレートで識別させること
により、雑音下の音声を識別している。Further, as one of the techniques for coping with the change of the characteristic parameter due to the superposition of noise, there is generally a technique of creating a standard pattern by using a voice uttered under background noise in a place where a voice recognition device is used. Are known. For example, in the multi-template method, the signal-to-noise ratio (SN ratio) is
Set several stages, superimpose noise on the voice at the set level, create multiple standard patterns from those noise superimposed signals,
The feature vector extracted from the input voice is the S of the input voice.
The voice under noise is identified by identifying the template with one of the N ratios.

【０００５】さらに、雑音下音声の認識率を向上させる
ための従来技術として、例えば、Ｓ．Ｆ．Ｂｏｌｌによ
る「ＩＥＥＥＴｒａｎｓ．ＡＳＳＰ−２７，，Ｎｏ
２（１９７９）」や、本発明の提案者である糟谷らによ
る「音響学会講演論文集、１−５−５（平３−１０）」
などに記載されている本発明の基礎となるものがある。
すなわち、入力音声ベクトルに背景雑音の微小変動を考
慮した変形を与えて識別を行なうことにより、雑音パワ
ーの変動に起因する識別率低下の軽減を図るものであ
る。この技術は、背景雑音のスペクトルがある程度一定
とみなせて、そのパワーのみが微小変動するモデルに基
づいており、音声から抽出される特徴ベクトルに、雑音
のパワー変動が与える影響を考慮して、入力音声ベクト
ルの変形を行なうことにより、入力を、パワー変化に追
従させるものである。図５を用いて、このような本発明
の基礎となる従来の雑音下音声認識技術の説明を行な
う。As a conventional technique for improving the recognition rate of noisy speech, for example, S. F. By Boll, "IEEE Trans. ASSP-27, No.
2 (1979) "and" Proceedings of the Acoustical Society of Japan, 1-5-5 (flat 3-10) "by Kasuya et al., The proposer of the present invention.
There is a basis of the present invention described in the above.
In other words, the input speech vector is deformed in consideration of the minute fluctuation of the background noise to perform the classification, thereby reducing the deterioration of the classification rate due to the fluctuation of the noise power. This technology is based on a model in which the spectrum of background noise can be regarded as constant to some extent, and only its power fluctuates slightly, and the effect of noise power fluctuations on the feature vector extracted from speech is taken into consideration. By changing the voice vector, the input is made to follow the power change. The conventional noisy speech recognition technology which is the basis of the present invention will be described with reference to FIG.

【０００６】図５は、従来の雑音下音声認識システムの
構成を示すブロック図である。本図において、１は、音
声や背景雑音などを入力する入力装置、５２は、入力さ
れた音声の認識処理を行なう音声認識処理装置、３は、
音声認識処理装置５２の認識結果を出力する出力装置、
４は、音声認識処理装置５２の認識処理に用いる標準パ
タンを登録する標準パタン格納装置である。音声認識処
理装置５２は、入力装置１から入力された音声に対し、
ＬＰＣスペクトル分析などの前処理を行なう前処理部５
と、この前処理部５で前処理した入力音声から特徴ベク
トルを抽出する特徴抽出部６と、本雑音下音声認識シス
テムの特徴であり、この特徴抽出部６で抽出した特徴ベ
クトルに対して、背景雑音の時間的変化により生じる雑
音パワーの変化に基づき、変形を行なう特徴ベクトル変
形部７と、この特徴ベクトル変形部７で変形した入力音
声ベクトルと、標準パタン格納装置４に登録してある標
準パタンの特徴ベクトルとの距離により、入力音声を識
別する識別部８と、特徴ベクトル変形部７の特徴ベクト
ルに対する変形処理に用いるベクトルの変化方向を算出
するベクトル変化方向算出部９とにより構成されてい
る。尚、標準パタン格納部４に登録している標準パタン
の特徴ベクトルは、予め、雑音のない音声を分析して抽
出したものである。このような構成により、雑音下音声
認識システムは、音声認識処理装置５２において、ベク
トル変化方向算出部９により、その入力音声に微小な雑
音パワー（背景雑音パワー）が加わった場合の移動方向
を算出し、そして、一意に決定した入力音声ベクトルの
変化方向に対する変化量を示す分散比（σ）により、そ
の移動方向に対する冗長さを特徴ベクトルに持たせてか
ら、マッチング処理を行なう。FIG. 5 is a block diagram showing the configuration of a conventional noisy speech recognition system. In the figure, 1 is an input device for inputting voice or background noise, 52 is a voice recognition processing device for recognizing input voice, and 3 is
An output device for outputting the recognition result of the voice recognition processing device 52,
Reference numeral 4 is a standard pattern storage device for registering standard patterns used in the recognition processing of the voice recognition processing device 52. The voice recognition processing device 52 responds to the voice input from the input device 1 by
Preprocessing unit 5 for performing preprocessing such as LPC spectrum analysis
And a feature extraction unit 6 for extracting a feature vector from the input speech preprocessed by the preprocessing unit 5 and features of the present speech recognition system under noise, for the feature vector extracted by the feature extraction unit 6, A feature vector transformation unit 7 that performs transformation based on a change in noise power caused by a temporal change in background noise, an input speech vector transformed by the feature vector transformation unit 7, and a standard registered in the standard pattern storage device 4. An identification unit 8 for identifying the input voice based on the distance to the feature vector of the pattern, and a vector change direction calculation unit 9 for calculating the change direction of the vector used in the transformation process of the feature vector transformation unit 7 for the feature vector. There is. The characteristic vector of the standard pattern registered in the standard pattern storage unit 4 is obtained by analyzing noise-free speech in advance. With such a configuration, in the noisy voice recognition system, in the voice recognition processing device 52, the vector change direction calculation unit 9 calculates the moving direction when a minute noise power (background noise power) is added to the input voice. Then, the matching process is performed after the feature vector is provided with redundancy in the moving direction by the variance ratio (σ) indicating the amount of change in the changing direction of the input voice vector that is uniquely determined.

【０００７】以下、音声認識処理装置５２における動作
を説明する。尚、この音声認識処理装置５２では、音声
認識のための特徴量として、線形予測分析（ＬＰＣ：Ｌ
ｉｎｅａｒＰｒｅｄｉｃｔｉｖｅＣｏｄｉｎｇ）ケ
プストラム、あるいは、ケプストラムを用いることとす
る。この場合、ケプストラムは、次の数１の式で定義さ
れる。The operation of the voice recognition processing device 52 will be described below. In the speech recognition processing device 52, a linear prediction analysis (LPC: L) is used as a feature amount for speech recognition.
An inner predictive coding) cepstrum or cepstrum is used. In this case, the cepstrum is defined by the following equation (1).

【数１】ただし、ここで、ｓは、音声信号を表しており、数１の
式のケプストラムは、音声信号ｓのケプストラムという
意味で、Ｃｓと記述している。また、このケプストラム
Ｃｓは、入力音声ベクトルとして用いられる特徴量であ
り、ＳＳ法などでは、前処理により、予め、雑音成分が
差し引かれたケプストラムの推定値である。この時、音
声信号ｓは、ケプストラムＣｓから、逆に推定されるも
のである。[Equation 1] However, here, s represents a voice signal, and the cepstrum of the expression of Formula 1 is described as Cs in the meaning of the cepstrum of the voice signal s. Further, the cepstrum Cs is a feature amount used as an input speech vector, and in the SS method or the like, it is an estimated value of the cepstrum in which a noise component is subtracted in advance by preprocessing. At this time, the audio signal s is conversely estimated from the cepstrum Cs.

【０００８】ここで、音声信号ｓに、新たに、微小雑音
△ｎが混入すると仮定する。これは、音声に加わる雑音
パワーが変動するか、もしくは、雑音パワーの推定誤差
によるパワー変動を示すものである。雑音のパワー変動
により、数１の式のｓは、「ｓ＋△ｎ」に置き換わり、
次の数２の式のように展開される。Here, it is assumed that a minute noise Δn is newly added to the voice signal s. This indicates that the noise power added to the voice fluctuates or the power fluctuates due to the noise power estimation error. Due to the fluctuation of the power of noise, s in Expression 1 is replaced with “s + Δn”,
It is expanded as the following formula (2).

【数２】ここで、微小雑音△ｎによるケプストラム変化を△Ｃと
し、さらに、[Equation 2] Here, the change in cepstrum due to the small noise Δn is ΔC, and

【数３】と仮定すれば、[Equation 3] Assuming that

【数４】が得られ、この数４の式から算出される△Ｃベクトル
を、雑音が付加されたときのケプストラムの変化方向と
して、次の図６に示すようにして、入力音声ベクトルの
変形に利用する。[Equation 4] Is obtained, and the ΔC vector calculated from the equation (4) is used as the changing direction of the cepstrum when noise is added, as shown in FIG.

【０００９】図６は、図５における雑音下音声認識シス
テムの入力音声ベクトルの変形に係わる処理動作を示す
説明図である。本図は、上述のケプストラムの変化方向
△Ｃを考慮した入力音声ベクトルを新たに用いることに
よる、入力音声ベクトルの変形を示すものである。すな
わち、基準ベクトル（参照ベクトル）６１から見た入力
音声ベクトル６２を、雑音により、入力音声ベクトルが
変動を受ける方向６４へ、予め決められた変化量を示す
値（分散比）σに基づいて変化させ、その結果で生じる
ベクトル６３を、新たな入力音声ベクトルとみなし、雑
音パワーが変動する方向に対しては、大きな類似度を示
し、それと垂直な方向に対しては、小さな値に評価さ
れ、見かけ上の標準パタンが、図中の楕円で示すように
変化させるものである。このような距離尺度を用いるこ
とにより、入力音声ベクトルに、雑音のパワー変動を考
慮した変形を加えた効果が得られる。FIG. 6 is an explanatory diagram showing the processing operation relating to the transformation of the input speech vector of the noisy speech recognition system in FIG. This figure shows a modification of the input voice vector by newly using the input voice vector in consideration of the above-described change direction ΔC of the cepstrum. That is, the input voice vector 62 viewed from the reference vector (reference vector) 61 is changed in a direction 64 in which the input voice vector is changed by noise based on a value (variance ratio) σ indicating a predetermined change amount. The resulting vector 63 is regarded as a new input speech vector, shows a large similarity in the direction in which the noise power fluctuates, and is evaluated as a small value in the direction perpendicular to it. The apparent standard pattern is changed as shown by the ellipse in the figure. By using such a distance measure, it is possible to obtain the effect of modifying the input speech vector in consideration of noise power fluctuations.

【００１０】このような雑音パワーの微小変動を考慮し
た識別尺度を用いる技術は、雑音パワーの時間的な微小
変動において、入力音声に混入する雑音のスペクトル情
報を用い、入力音声ベクトルが雑音パワーの微小変動に
より変化する方向を算出して、その変化方向の類似度
が、その変化方向と垂直な方向より大きく評価されるた
めに、入力音声ベクトルを変化させることにより、雑音
パワー変動を考慮した識別を行なう技術である。しか
し、この技術では、入力音声ベクトルの変化方向に対す
る変化量を一意に決定していた。そのため、雑音パワー
の時間的変動量を正確に反映することができない。The technique using the discrimination measure considering such a minute fluctuation of noise power uses the spectrum information of the noise mixed in the input speech in the temporal minute fluctuation of the noise power, and the input speech vector has the noise power of the noise power. The change direction is calculated by a minute change, and the similarity in the change direction is evaluated to be larger than the direction perpendicular to the change direction. Therefore, by changing the input speech vector, the discrimination considering the noise power change is performed. Is a technique for performing. However, this technique uniquely determines the amount of change in the direction of change of the input voice vector. Therefore, it is not possible to accurately reflect the temporal fluctuation amount of noise power.

【００１１】[0011]

【発明が解決しようとする課題】解決しようとする問題
点は、従来の技術では、入力音声ベクトルの変化方向に
対する変化量を一意に決定しているために、雑音パワー
の時間的変動量を正確に反映することができない点であ
る。本発明の目的は、これら従来技術の課題を解決し、
背景雑音パワーの微小変動を考慮した背景雑音の除去を
効率良く行なうと共に、雑音下で発声された音声に対す
る認識率を向上させることを可能とする雑音下音声認識
システムを提供することである。The problem to be solved by the present invention is that the conventional technique uniquely determines the amount of change in the direction of change of the input speech vector, so that the amount of temporal change in noise power is accurately determined. This is a point that cannot be reflected in. The object of the present invention is to solve these problems of the prior art,
It is an object of the present invention to provide a noisy speech recognition system capable of efficiently removing background noise in consideration of minute fluctuations in background noise power and improving the recognition rate for speech uttered under noise.

【００１２】[0012]

【課題を解決するための手段】上記目的を達成するた
め、本発明の雑音下音声認識システムは、（１）入力音
声に混入する背景雑音のスペクトル情報を用いて、背景
雑音による雑音パワーの微小変動に伴う入力音声ベクト
ルの変化方向を算出し、この算出した変化方向と、予め
一意に設定した変化量とを用いて、入力音声ベクトルを
変形し、この変形後の入力音声ベクトルを用いて、標準
パタンの特徴ベクトルとのマッチングを行ない、入力音
声を識別する雑音下音声認識システムにおいて、音声区
間に先行する雑音区間を分析し、この雑音区間における
雑音パワーの分散に基づき、入力音声ベクトルの変形に
用いる変化量を算出するベクトル移動量算出部を設ける
ことを特徴とする。In order to achieve the above object, the noisy speech recognition system of the present invention uses (1) the spectral information of the background noise mixed in the input speech to reduce the noise power due to the background noise. The change direction of the input voice vector due to the change is calculated, and the calculated change direction and the change amount set uniquely are used to deform the input voice vector, and the input voice vector after the change is used, In a noisy speech recognition system that matches the feature vector of the standard pattern and identifies the input speech, the noise section preceding the speech section is analyzed and the input speech vector is transformed based on the variance of the noise power in this noise section. It is characterized in that a vector movement amount calculation unit for calculating a variation amount used for is provided.

【００１３】[0013]

【作用】本発明においては、ベクトル移動量算出部によ
り、音声区間に先行する雑音区間を分析し、パワーの時
系列を得て、その分散値を算出する。そして、雑音パワ
ーの変動が、音声区間においても継続すると仮定し、算
出した分散値（σ）を、その推定値として、重み付け係
数を決定する。このように、音声区間に先行する雑音区
間のパワー分散値を用いて、入力音声ベクトルに対する
冗長性に適度な値を設定することで、より雑音の特徴を
考慮した音声認識を行なうことができ、雑音が混入した
音声の識別性能を、従来より向上させることができる。
また、ＳＳ法を用いる場合、雑音除去量の過不足による
悪影響を、より軽減することができる。In the present invention, the vector movement amount calculation unit analyzes the noise section preceding the voice section, obtains the power time series, and calculates the variance value thereof. Then, it is assumed that the fluctuation of the noise power continues even in the voice section, and the calculated variance value (σ) is used as the estimated value to determine the weighting coefficient. In this way, by using the power variance value of the noise section preceding the speech section and setting an appropriate value for redundancy with respect to the input speech vector, it is possible to perform speech recognition in consideration of noise characteristics, It is possible to improve the identification performance of voice mixed with noise as compared with the conventional case.
Further, when the SS method is used, it is possible to further reduce the adverse effect due to excess or deficiency of the noise removal amount.

【００１４】[0014]

【実施例】以下、本発明の実施例を、図面により詳細に
説明する。図１は、本発明の雑音下音声認識システムの
本発明に係わる構成の一実施例を示すブロック図であ
る。本図において、１は、音声を入力するための入力装
置、２は、本発明に係わり、入力された音声の識別処理
を行なう音声認識処理装置、３は、音声認識処理装置２
による音声の識別結果を出力するための出力装置、４
は、音声認識処理装置２による音声の識別処理に用いる
標準パタンを格納する標準パタン格納装置である。尚、
この標準パタン格納装置４に登録している標準パタンの
特徴ベクトルは、予め、雑音のない音声を分析して抽出
したものである。音声認識処理装置２は、入力された音
声に対して、ＬＰＣスペクトル分析などの前処理を行な
う前処理部５と、前処理された入力音声から特徴ベクト
ルを抽出する特徴抽出部６と、特徴抽出部６で抽出した
特徴ベクトルを、背景雑音の時間的変化により生じる雑
音パワーの変化に基づき変形させる特徴ベクトル変形部
７と、この特徴ベクトル変形部７で変形した入力音声ベ
クトルと、標準パタン格納装置４に登録してある標準パ
タンの特徴ベクトルとの距離により、入力音声を識別す
る識別部８と、特徴ベクトル変形部７の特徴ベクトルに
対する変形処理に用いるベクトルの変化方向を算出する
ベクトル変化方向算出部９と、本発明に係わり、特徴ベ
クトル変形部７の特徴ベクトルに対する変形処理に用い
るベクトルの変化量を算出するベクトル移動量算出部１
０とにより構成されている。また、このベクトル移動量
算出部１０は、音声区間に先行する雑音区間を検出する
雑音区間検出部１１と、雑音区間検出部１１で検出した
雑音区間における雑音パワーの分散を算出するパワー分
散算出部１２と、パワー分散算出部１２で算出した雑音
パワーの分散に基づき、入力音声ベクトルの変形に用い
る変化量を示す分散比（σ）を算出する分散比決定部１
３とにより構成されている。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the configuration of the noisy speech recognition system according to the present invention. In the figure, 1 is an input device for inputting voice, 2 is related to the present invention, 3 is a voice recognition processing device for performing a discrimination process of the input voice, 3 is a voice recognition processing device 2.
Output device for outputting the result of voice identification by
Is a standard pattern storage device for storing standard patterns used in the voice recognition processing by the voice recognition processing device 2. still,
The feature vector of the standard pattern registered in the standard pattern storage device 4 is obtained by analyzing noise-free speech in advance. The speech recognition processing device 2 includes a preprocessing unit 5 that performs preprocessing such as LPC spectrum analysis on input speech, a feature extraction unit 6 that extracts a feature vector from the preprocessed input speech, and feature extraction. A feature vector transformation unit 7 that transforms the feature vector extracted by the unit 6 based on a change in noise power caused by a temporal change in background noise, an input speech vector transformed by the feature vector transformation unit 7, and a standard pattern storage device. The vector change direction calculation for calculating the change direction of the vector used in the transformation process for the feature vector transformation unit 7 and the discrimination unit 8 that identifies the input voice based on the distance from the feature vector of the standard pattern registered in FIG. And a vector for calculating the amount of change of the vector used in the transformation process for the feature vector of the feature vector transformation unit 7 according to the present invention. Le movement amount calculating section 1
It is composed of 0 and. Further, the vector movement amount calculation unit 10 includes a noise section detection unit 11 that detects a noise section preceding a speech section, and a power dispersion calculation unit that calculates a noise power dispersion in the noise section detected by the noise section detection unit 11. 12 and a variance ratio determination unit 1 that calculates a variance ratio (σ) indicating the amount of change used to transform the input speech vector based on the variance of the noise power calculated by the power variance calculator 12.
3 and 3.

【００１５】後述の図２と図３で示すように、雑音パワ
ーの分散値（σｎ）と、特徴量の広がりの間には、相関
があるため、本実施例の雑音下音声認識システムでは、
雑音パワーの分散を、スペクトル分散の推定に利用し、
入力音声に微小な雑音パワー（背景雑音パワー）が加わ
った場合の移動方向と移動量を算出し、その移動方向と
移動量に対する冗長さを入力音声ベクトルに持たせてか
ら、マッチング処理を行なう。すなわち、前処理部５
で、入力装置１から入力された音声（音声＋雑音）に対
して、ＬＰＣスペクトル分析を行ない、また、ＳＳ法な
どの処理を行ない、ある程度の雑音除去を行なう。その
後、特徴抽出部６により、特徴ベクトルを抽出する。一
方、ベクトル移動量算出部１０は、背景雑音パワーの分
散値（σｎ）を計算し、雑音の種類に見合う入力音声ベ
クトル変化量（分散比σ）を算出する。また、ベクトル
変化方向算出部９では、入力された音声に微小な背景雑
音が加わった場合のケプストラムの変化方向（△Ｃ）を
求める。そして、特徴ベクトル変形部７は、ベクトル変
化方向算出部９で算出した変化方向（△Ｃ）と、ベクト
ル移動量算出部１０で算出した変化量（分散比σ）とに
より、入力音声ベクトルの変形を行なう。その結果得ら
れる入力音声ベクトルを元に、識別部８は、標準パタン
格納装置４に登録してある標準パタンの特徴ベクトルと
の類似度計算（マッチング処理）を行ない、その識別結
果を、出力装置３に出力する。このように、雑音パワー
の分散をスペクトル分散の推定に利用することにより、
入力音声ベクトルに対する冗長性に適度な値を設定する
ことができ、雑音が混入した音声の識別性能を、従来よ
り向上させることができる。また、ＳＳ法を用いる場
合、雑音除去量の過不足による悪影響を、より軽減する
ことができる。As shown in FIGS. 2 and 3, which will be described later, there is a correlation between the variance value (σn) of noise power and the spread of the feature quantity. Therefore, in the noisy speech recognition system of this embodiment,
The variance of noise power is used to estimate the spectral variance,
The moving direction and the moving amount when a small noise power (background noise power) is added to the input voice are calculated, and the input voice vector is provided with redundancy for the moving direction and the moving amount, and then the matching process is performed. That is, the preprocessing unit 5
Then, the LPC spectrum analysis is performed on the voice (voice + noise) input from the input device 1, and processing such as the SS method is performed to remove noise to some extent. After that, the feature extraction unit 6 extracts the feature vector. On the other hand, the vector movement amount calculation unit 10 calculates the variance value (σn) of the background noise power, and calculates the input voice vector change amount (variance ratio σ) matching the type of noise. Further, the vector change direction calculation unit 9 obtains the change direction (ΔC) of the cepstrum when a minute background noise is added to the input voice. Then, the feature vector transforming unit 7 transforms the input voice vector based on the change direction (ΔC) calculated by the vector change direction calculating unit 9 and the change amount (variance ratio σ) calculated by the vector moving amount calculating unit 10. Do. Based on the input speech vector obtained as a result, the identification unit 8 performs similarity calculation (matching process) with the feature vector of the standard pattern registered in the standard pattern storage device 4, and outputs the identification result to the output device. Output to 3. In this way, by using the variance of noise power to estimate the spectral variance,
It is possible to set an appropriate value for redundancy with respect to the input voice vector, and it is possible to improve the discrimination performance of the voice in which noise is mixed as compared with the conventional case. Further, when the SS method is used, it is possible to further reduce the adverse effect due to excess or deficiency of the noise removal amount.

【００１６】図２は、図１における音声認識処理装置に
入力される背景雑音の時間的なパワー変化の一例を示す
説明図である。図２（ａ）においては、駅コンコースの
雑音パワー２１の変化を、また、図２（ｂ）において
は、電話ボックス内の雑音パワー２２の変化の例を示し
ている。そして、図２（ａ）において、２３は、雑音パ
ワー２１から算出されるパワー分散値（σｎ）を、ま
た、図２（ｂ）において、２４は、雑音パワー２２から
算出されるパワー分散値（σｎ）を表しており、雑音の
種類により異なる。FIG. 2 is an explanatory diagram showing an example of temporal power change of background noise input to the speech recognition processing apparatus in FIG. 2A shows an example of changes in the noise power 21 of the station concourse, and FIG. 2B shows an example of changes in the noise power 22 in the telephone box. 2A, 23 is a power dispersion value (σn) calculated from the noise power 21, and FIG. 2B is 24, a power dispersion value (σn) calculated from the noise power 22. σn), which differs depending on the type of noise.

【００１７】図３は、図２におけるそれぞれの雑音を含
む音声の特徴量の成分分布を示す説明図である。図３
（ａ）、および、図３（ｂ）は、それぞれ、図２
（ａ）、（ｂ）で示した雑音を、数千サンプルの音声／
ａ／に重畳した音声データから、特徴量を抽出して、主
成分分析したものを、その主成分と、第二成分に関して
表示したものである。図３（ａ）において、特徴量の広
がり３１は、図２（ａ）における雑音パワー２１で示す
データに対応しており、また、図３（ｂ）において、特
徴量の広がり３２は、図２（ｂ）における雑音パワー２
２で示すデータに対応している。このように、図２、お
よび、図３で示すように、雑音パワーの分散値（σｎ）
と、特徴量の広がりの間には、相関があるため、雑音パ
ワーの分散を、スペクトル分散の推定に利用すること
は、有効であるといえる。FIG. 3 is an explanatory diagram showing the component distribution of the feature quantity of the speech including each noise in FIG. Figure 3
2A and 2B are respectively shown in FIG.
The noise shown in (a) and (b) is converted into the noise of thousands of samples /
A feature amount is extracted from the voice data superimposed on a /, and the main component analysis is performed and the main component and the second component are displayed. In FIG. 3A, the feature amount spread 31 corresponds to the data indicated by the noise power 21 in FIG. 2A, and in FIG. 3B, the feature amount spread 32 is shown in FIG. Noise power 2 in (b)
It corresponds to the data shown in 2. Thus, as shown in FIGS. 2 and 3, the noise power variance value (σn)
Since there is a correlation between the spread of the feature amount and the spread of the feature amount, it can be said that it is effective to use the variance of the noise power for estimating the spectral variance.

【００１８】図４は、図１におけるベクトル移動量算出
部の本発明に係わる処理動作の一実施例示すフローチャ
ートである。本実施例は、雑音を含む音声入力波形か
ら、入力音声ベクトル変化量を決定する処理の流れを示
したものである。まず、入力音声波形から、雑音区間を
検出し（ステップ４０１）、音声区間と同様に、フレー
ム分割し、かつ、同条件で、それぞれのフレームで、雑
音スペクトルとパワーを求める。その情報から、雑音パ
ワーのバラツキの程度の指標値となる雑音パワーの分散
値（σｎ）を求める（ステップ４０２）。入力音声ベク
トルの変形は、雑音パワーの変動を仮定しているため、
どの程度の変動があるかを、雑音区間を分析することに
より反映させる。実際には、得られた分散値（σｎ）に
比例する値を、次の数５の式に従い、分散比（σ）に設
定する（ステップ４０３）。FIG. 4 is a flow chart showing an embodiment of the processing operation according to the present invention of the vector movement amount calculation section in FIG. This embodiment shows a flow of processing for determining an input voice vector change amount from a voice input waveform including noise. First, a noise section is detected from the input speech waveform (step 401), the frame is divided similarly to the speech section, and the noise spectrum and the power are obtained for each frame under the same conditions. From the information, the variance value (σn) of noise power, which is an index value of the degree of noise power variation, is obtained (step 402). Since the transformation of the input speech vector assumes the fluctuation of noise power,
The amount of fluctuation is reflected by analyzing the noise section. In practice, a value proportional to the obtained variance value (σn) is set as the variance ratio (σ) according to the following equation (5) (step 403).

【数５】尚、ここで、λは、実験的に決定される定数である。こ
のように、本処理を行なうベクトル移動量算出部を、図
５に示す従来の雑音下音声認識システムに、新たに追加
することにより、図１に示す雑音下音声認識システム
は、従来は一意に決定していた分散比（σ）を、音声区
間に先行する雑音区間の分析に基づき決定し、入力雑音
の種類の変化に対して、より正確に、入力音声ベクトル
変化量を決定できる。このことにより、識別率の向上を
図ることができる。[Equation 5] Here, λ is a constant that is experimentally determined. In this way, by adding the vector movement amount calculation unit that performs this processing to the conventional noise-free speech recognition system shown in FIG. 5, the noise-free speech recognition system shown in FIG. The determined variance ratio (σ) is determined based on the analysis of the noise section preceding the speech section, and the input speech vector change amount can be more accurately determined with respect to the change in the type of input noise. As a result, the identification rate can be improved.

【００１９】以上、図１〜図４を用いて説明したよう
に、本実施例の雑音下音声認識システムでは、従来は一
意に決定していた分散比（σ）を、音声区間に先行する
雑音区間の分析に基づき決定し、入力雑音の種類の変化
に対して、より正確に、入力音声ベクトル変化量を決定
できる。このことにより、識別率の向上を図ることがで
きる。尚、本発明は、図１〜図４を用いて説明した実施
例に限定されるものではない。As described above with reference to FIGS. 1 to 4, in the noisy speech recognition system of this embodiment, the variance ratio (σ), which has been uniquely determined in the prior art, is the noise preceding the speech section. The input voice vector change amount can be determined more accurately with respect to the change in the type of the input noise, which is determined based on the analysis of the section. As a result, the identification rate can be improved. The present invention is not limited to the embodiment described with reference to FIGS.

【００２０】[0020]

【発明の効果】本発明によれば、入力音声ベクトルの変
化方向に対する変化量を、音声区間に先行する雑音区間
の分析に基づき決定し、雑音パワーの時間的変動量を正
確に反映した音声認識を行なうことができ、背景雑音パ
ワーの微小変動を考慮した背景雑音の除去を効率良く行
なうと共に、雑音下で発声された音声に対する認識率を
向上させることが可能である。According to the present invention, the amount of change in the direction of change of the input speech vector is determined based on the analysis of the noise section preceding the speech section, and the speech recognition accurately reflects the temporal variation of noise power. It is possible to efficiently remove the background noise in consideration of the minute fluctuation of the background noise power, and it is possible to improve the recognition rate for the voice uttered under the noise.

【００２１】[0021]

[Brief description of drawings]

【図１】本発明の雑音下音声認識システムの本発明に係
わる構成の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a configuration according to the present invention of a noisy speech recognition system of the present invention.

【図２】図１における音声認識処理装置に入力される背
景雑音の時間的なパワー変化の一例を示す説明図であ
る。FIG. 2 is an explanatory diagram showing an example of a temporal power change of background noise input to the voice recognition processing device in FIG.

【図３】図２におけるそれぞれの雑音を含む音声の特徴
量の成分分布を示す説明図である。FIG. 3 is an explanatory diagram showing a component distribution of a feature amount of speech including each noise in FIG.

【図４】図１におけるベクトル移動量算出部の本発明に
係わる処理動作の一実施例示すフローチャートである。FIG. 4 is a flowchart showing an embodiment of the processing operation of the vector movement amount calculation unit in FIG. 1 according to the present invention.

【図５】従来の雑音下音声認識システムの構成を示すブ
ロック図である。FIG. 5 is a block diagram showing a configuration of a conventional noisy speech recognition system.

【図６】図５における雑音下音声認識システムの入力音
声ベクトルの変形に係わる処理動作を示す説明図であ
る。6 is an explanatory diagram showing a processing operation relating to modification of an input speech vector of the noisy speech recognition system in FIG.

[Explanation of symbols]

１入力装置２音声認識処理装置３出力装置４標準パタン格納装置５前処理部６特徴抽出部７特徴ベクトル変形部８識別部９ベクトル変化方向算出部１０ベクトル移動量算出部１１雑音区間検出部１２パワー分散算出部１３分散比決定部２１、２２雑音パワー２３、２４パワー分散値（σｎ）３１、３２特徴量の広がり５２音声認識処理装置６１基準ベクトル（参照ベクトル）６２入力音声ベクトル６３ベクトル６４雑音により入力音声ベクトルが変動を受ける方向 DESCRIPTION OF SYMBOLS 1 Input device 2 Speech recognition processing device 3 Output device 4 Standard pattern storage device 5 Pre-processing unit 6 Feature extraction unit 7 Feature vector transformation unit 8 Identification unit 9 Vector change direction calculation unit 10 Vector movement amount calculation unit 11 Noise interval detection unit 12 Power variance calculation unit 13 Variance ratio determination unit 21, 22 Noise power 23, 24 Power variance value (σn) 31, 32 Feature spread 52 Speech recognition processing device 61 Reference vector (reference vector) 62 Input speech vector 63 Vector 64 Noise The direction in which the input voice vector changes due to

Claims

[Claims]

1. A change direction of an input voice vector when a noise power of the background noise is minutely changed is calculated by using spectrum information of background noise mixed in input voice, and the calculated change direction is uniquely set in advance. The input speech vector is transformed using the set change amount, and the transformed input speech vector is used to perform matching with a feature vector of a standard pattern to identify the input speech. In the above, the noise is characterized by including a vector movement amount calculation means for analyzing a noise section preceding the speech section and calculating a variation used for transforming the input speech vector based on the variance of noise power in the noise section. Lower voice recognition system.