JPH07239696A

JPH07239696A - Voice recognition device

Info

Publication number: JPH07239696A
Application number: JP6029283A
Authority: JP
Inventors: Hiroaki Kokubo; 浩明小窪; Nobuo Hataoka; 信夫畑岡; Akio Amano; 明雄天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-02-28
Filing date: 1994-02-28
Publication date: 1995-09-12

Abstract

PURPOSE:To correct the change in inclination of a spectrum in a section of a voiced sound with a relatively larger sound power and easy to occur voice deformation by performing inversion filtering only in a sound section decided as the voiced sound by an adaptive first inversion filter part. CONSTITUTION:The sound data divided at every fixed interval are inputted to a self correlation function calculation part 201. The calculation part 201 calculates the self correlation function of an input sound. When the input sound is a voiced sound, a large peak should exist in a part equivalent to a repeat period in the autocorrelation function. Then, the maximum value among the peaks of the autocorrelation function excepting a 0-order peak is detected by a peak detection part 202. A decision part 203 compares the maximum value of the peaks obtained by the peak detection part 202 with a threshold value awaited beforehand, and decides that the inputted sound is the voiced sound when the maximum value exceeds the threshold value, and the change in the inclination of the spectrum due to the speaking deformation is corrected by the succeeding inversion filtering in the section of the voiced sound.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に係り、特
に騒音下でも安定に動作する耐騒音型音声認識装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus, and more particularly to a noise resistant speech recognition apparatus which operates stably even in the presence of noise.

【０００２】[0002]

【従来の技術】音声認識装置を実用化するためには、騒
音下で発声した音声でも正しく認識するような耐騒音化
技術が必要不可欠である。2. Description of the Related Art In order to put a voice recognition device into practical use, it is essential to have a noise resistance technique capable of correctly recognizing even a voice uttered under noise.

【０００３】騒音下音声認識における認識性能の劣化要
因として、音声に重畳した雑音に対する影響が挙げられ
る。この重畳雑音に対する影響を低減させる方法につい
て、音声認識の分野では、スペクトルサブトラクション
法が非常に有効な手法として知られている。この手法は
入力音声の短時間スペクトルから推定騒音スペクトルを
差し引くことで騒音成分の除去をおこなう。As a factor of deterioration of recognition performance in noisy speech recognition, there is an influence on noise superimposed on speech. In the field of speech recognition, the spectral subtraction method is known as a very effective method for reducing the influence on the superimposed noise. This method removes noise components by subtracting the estimated noise spectrum from the short-time spectrum of input speech.

【０００４】しかし、重畳雑音の影響以外にも、騒音環
境下で発声することによるストレスによって発声様式に
変動が生じ（発声変形）、認識に悪影響を与えることが
知られている。発声変形に対する対策としては、発声変
形音声を異話者の音声とみなして話者適応を行う方法、
発声変形音声の特徴量を正規化をする方法なとが提案さ
れている。However, in addition to the effect of superposed noise, it is known that the stress caused by speaking in a noisy environment causes variations in the speaking style (speaking deformation), which adversely affects recognition. As a measure against voicing deformation, a method of performing speaker adaptation by regarding the voicing deformation voice as the voice of a different speaker,
It has been proposed that the method for normalizing the feature amount of the vocalized modified voice is used.

【０００５】発声変形の現象の一つに音声の高域成分が
強調されスペクトルの傾きが変化することがあげられ
る。このスペクトルの傾きを正規化する方法として、適
応一次逆フィルタリングの適用が考えられる。この方法
は、スペクトルの傾きに相当する一次の線形予測係数を
用いて適応的に逆フィルタリングすることで周波数特性
を平坦化する。One of the phenomena of voicing deformation is that the high frequency component of the voice is emphasized and the slope of the spectrum changes. As a method of normalizing the slope of this spectrum, application of adaptive first-order inverse filtering can be considered. In this method, the frequency characteristic is flattened by adaptively performing inverse filtering using a linear prediction coefficient corresponding to the slope of the spectrum.

【０００６】[0006]

【発明が解決しようとする課題】適応一次逆フィルタは
雑音が混入していない発声変形音声に対しては、スペク
トルの傾き変化を補正することで、認識性能の改善に有
効である。しかし、実際の環境で発声した音声には雑音
が重畳しているために、ポーズなどの無音声区間や子音
など音声のレベルが小さい区間では、雑音成分に対して
逆フィルタをかけることになり、認識に悪影響を与え
る。たとえば、自動車の走行ノイズなど多くの雑音は低
周波数帯域にパワが集中している。このような雑音に対
して周波数特性を平坦化する逆フィルタをかけると、高
域の雑音成分が強調されてしまう。従って、騒音下で発
声された音声に適用する適応一次逆フィルタリングは、
雑音成分に影響を与えずにおこなう必要がある。The adaptive first-order inverse filter is effective for improving recognition performance by correcting a change in the slope of the spectrum for voicing modified speech in which noise is not mixed. However, since noise is superimposed on the voice uttered in an actual environment, an inverse filter is applied to the noise component in a voiceless section such as a pause or a section where the voice level is low such as a consonant. It has a negative effect on cognition. For example, many noises such as automobile running noises are concentrated in the low frequency band. When an inverse filter for flattening the frequency characteristic is applied to such noise, the high frequency noise component is emphasized. Therefore, the adaptive first-order inverse filtering applied to speech uttered under noise is
It is necessary to do so without affecting the noise component.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声認識装置は、認識対象となる音声を入
力する音声入力部と、該音声入力部に入力した音声信号
より求めた一次パーコール係数を用いて音声信号に対し
て逆フィルタリングをおこなう適応一次逆フィルタ部
と、逆フィルタリングされた該音声信号の特徴ベクトル
を計算する分析部と、予め登録しておいた標準パタンと
前記分析部で求めた特徴ベクトルとの類似度を求めるこ
とで前記入力音声の認識を行う照合部と、入力信号の特
性を判定する判定部とを設け、入力信号が前記判定部の
条件と一致した場合にのみ前記適応一次逆フィルタ部で
逆フィルタリングをおこなう手段を持つ。In order to achieve the above object, the voice recognition device of the present invention is obtained from a voice input unit for inputting a voice to be recognized and a voice signal input to the voice input unit. An adaptive first-order inverse filter unit that performs inverse filtering on a voice signal using a first-order Percoll coefficient, an analysis unit that calculates a feature vector of the inverse-filtered voice signal, a standard pattern registered in advance, and the analysis. When a matching unit that recognizes the input voice by determining the similarity to the feature vector obtained by the unit and a determination unit that determines the characteristics of the input signal are provided, and the input signal matches the conditions of the determination unit Only has a means for performing inverse filtering in the adaptive first-order inverse filter section.

【０００８】[0008]

【作用】本発明には数々の変形が考えられるが、その中
で代表的な手段についてその作用を説明する。The present invention can be modified in various ways, and the operation of typical means will be described below.

【０００９】周波数特性を平坦化する逆フィルタリング
をおこなう適応一次逆フィルタ部をもつ音声認識装置に
おいて、入力音声の有声または無声を判定する有声／無
声判定部を設け、適応一次逆フィルタ部では有声音と判
定された音声区間のみ逆フィルタリングをおこなうよう
にする。この処理により、比較的音声のパワが大きく、
発声変形の影響が生じやすい有声音の区間は、発声変形
によるスペクトルの傾き変化を逆フィルタリングで補正
することができる。一方、比較的音声パワが小さく、騒
音の影響を受けやすい無声音の区間は逆フィルタリング
を省くことで、フィルタリングによって雑音成分を強調
する等の悪影響を避けることができる。In a speech recognition apparatus having an adaptive first-order inverse filter section for performing inverse filtering for flattening frequency characteristics, a voiced / unvoiced decision section for determining whether the input voice is voiced or unvoiced is provided. Inverse filtering is performed only for the voice section determined to be. By this process, the power of the voice is relatively large,
In a voiced sound section that is likely to be affected by voicing deformation, a change in spectral slope due to voicing deformation can be corrected by inverse filtering. On the other hand, by omitting the inverse filtering in the unvoiced section, which has a relatively small voice power and is easily affected by noise, it is possible to avoid adverse effects such as emphasizing a noise component by filtering.

【００１０】したがって、本発明によれば、騒音環境下
で発声した発声変形を伴う音声の認識性能を向上させる
ことが可能となる。Therefore, according to the present invention, it is possible to improve the recognition performance of the voice accompanied by the deformation of the voice uttered in a noisy environment.

【００１１】[0011]

【実施例】以下、本発明の実施例を示す。EXAMPLES Examples of the present invention will be shown below.

【００１２】図１は本発明の一実施例を説明するための
音声認識システムのブロック図である。図１において、
１０１は音声入力部、１０２は有声／無声判定部、１０
３は適応一次逆フィルタ部、１０４は分析部、１０５は
照合部、１０６は標準パタン格納部、１０７はスイッチ
部である。図１では本実施例の概要のみを説明し、各部
分の詳細な説明は図２以降で述べる。音声入力部１０１
に入力した音声はＡ／Ｄ変換によってディジタル信号に
変換された後、一定間隔(通常は数十ms)毎に分割される
(分析フレーム)。分析フレーム毎に分割された音声デー
タは有声／無声判定部１０２に入力し、有声音であるか
否かが判定される。ここで、入力した音声フレームデー
タが有声音であると判定されたときには、適応一次逆フ
ィルタ部１０３では適応一次逆フィルタリングをおこな
い音声データの周波数特性を平坦にする。適応一次逆フ
ィルタについてはあとで詳細な説明をおこなう。また、
入力した音声フレームデータが有声音ではないと判定さ
れた場合、適応一次逆フィルタ部１０３の処理は省く。
つぎに分析部１０４ではフレーム毎に分割した入力音声
から特徴パラメータを計算する。標準パタン格納部１０
６にはあらかじめ計算しておいた認識対象語彙の標準パ
タン(特徴ベクトル系列)が格納してある。もちろん、こ
こに格納されている標準パタンは本システムの分析系と
同一の分析系で特徴ベクトルを計算している。照合部１
０５は、標準パタン格納部１０６に格納されている標準
パタンと、音声分析部１０４で分析された入力音声の特
徴ベクトルとの間で距離計算をおこなう。このとき照合
部１０５で照合した標準パタンのうち距離が一番小さい
単語が入力した音声の認識単語であると判定され、認識
結果として出力される。FIG. 1 is a block diagram of a voice recognition system for explaining an embodiment of the present invention. In FIG.
101 is a voice input unit, 102 is a voiced / unvoiced determination unit, 10
3 is an adaptive first-order inverse filter unit, 104 is an analysis unit, 105 is a matching unit, 106 is a standard pattern storage unit, and 107 is a switch unit. In FIG. 1, only the outline of the present embodiment will be described, and a detailed description of each part will be given from FIG. 2 onward. Voice input unit 101
The voice input to is converted into a digital signal by A / D conversion and then divided at regular intervals (usually several tens of ms).
(Analysis frame). The voice data divided for each analysis frame is input to the voiced / unvoiced determination unit 102, and it is determined whether or not it is a voiced sound. Here, when it is determined that the input voice frame data is voiced sound, the adaptive first-order inverse filter unit 103 performs adaptive first-order inverse filtering to flatten the frequency characteristic of the voice data. The adaptive first-order inverse filter will be described in detail later. Also,
When it is determined that the input voice frame data is not voiced sound, the process of the adaptive first-order inverse filter unit 103 is omitted.
Next, the analysis unit 104 calculates a characteristic parameter from the input voice divided for each frame. Standard pattern storage 10
6 stores a standard pattern (feature vector sequence) of the recognition target vocabulary calculated in advance. Of course, the standard pattern stored here calculates the feature vector by the same analysis system as this system. Collation unit 1
Reference numeral 05 performs distance calculation between the standard pattern stored in the standard pattern storage unit 106 and the feature vector of the input voice analyzed by the voice analysis unit 104. At this time, it is determined that the word having the smallest distance among the standard patterns matched by the matching unit 105 is the recognition word of the input voice, and is output as the recognition result.

【００１３】次に、図１で簡単に述べた各処理部につい
て詳細に説明する。Next, each processing unit briefly described in FIG. 1 will be described in detail.

【００１４】はじめに、有声／無声判定部１０２につい
て述べる。図２は有声／無声判定部１０２の一実施例を
説明する図である。図２において、２０１は自己相関関
数計算部、２０２はピーク検出部、２０３は判定部であ
る。有声音の判定は入力した音声データの周期性によっ
ておこなう。一定間隔毎に分割された音声データは自己
相関関数計算部２０１に入力する。自己相関関数計算部
２０１では数１に示す処理によって入力音声x(i)の自
己相関関数を計算する。First, the voiced / unvoiced determination unit 102 will be described. FIG. 2 is a diagram illustrating an example of the voiced / unvoiced determination unit 102. In FIG. 2, 201 is an autocorrelation function calculation unit, 202 is a peak detection unit, and 203 is a determination unit. The voiced sound is determined by the periodicity of the input voice data. The voice data divided at regular intervals is input to the autocorrelation function calculation unit 201. The autocorrelation function calculation unit 201 calculates the autocorrelation function of the input speech x (i) by the processing shown in Expression 1.

【００１５】[0015]

【数１】 [Equation 1]

【００１６】図３に実際の音声データから計算した自己
相関関数の一例を示す。もし、入力音声が有声音であれ
ば、自己相関関数において繰り返し周期（ピッチ）に相
当する部分に大きなピークが存在するはずである。そこ
で、ピーク検出部２０２において、0次のピークを除く
自己相関関数のピークの中から最大値を検出する（図３
の例では68サンプルめの値）。判定部２０３ではピーク
検出部２０２で求めたピークの最大値と予め用意してお
いたしきい値とを比較して、ピークの最大値がしきい値
を上回った場合に入力した音声が有声音であると判定す
る。FIG. 3 shows an example of the autocorrelation function calculated from actual voice data. If the input voice is a voiced sound, there should be a large peak in the portion corresponding to the repetition period (pitch) in the autocorrelation function. Therefore, the peak detection unit 202 detects the maximum value among the peaks of the autocorrelation function excluding the 0th-order peak (see FIG. 3).
In the example, the value of the 68th sample). The determination unit 203 compares the maximum value of the peak obtained by the peak detection unit 202 with a threshold value prepared in advance, and when the maximum value of the peak exceeds the threshold value, the input voice is voiced. Judge that there is.

【００１７】図４は有声／無声判定部１０２の第二の実
施例を説明するための図である。図２で説明した実施例
では、有声／無声の判定に入力音声の自己相関関数のピ
ーク値を使ったのに対して、図４の実施例では入力音声
の変形相関関数（線形予測分析の予測残差の自己相関係
数）のピーク値を使う。図４において、４０１は線形予
測分析部、４０２は線形予測逆フィルタ部、４０３は自
己相関関数計算部、４０４はピーク検出部、４０５は判
定部である。一定間隔毎に分割された音声データは線形
予測分析部４０１に入力する。線形予測分析部４０１で
は入力した音声データに対して線形予測分析をおこな
い、線形予測係数を出力する。線形予測分析に関しては
音声信号処理の分野では非常に一般的な分析手法であ
り、古井の「ディジタル音声処理」など詳しく解説され
ている良書は多い。線形予測係数の計算法にはいくつも
のアルゴリズムが提案されているが、一例としてLevins
on-Durbinのアルゴリズムの処理フローを図５に示し
た。線形予測逆フィルタ部４０２では、入力音声に対し
て、線形予測分析部４０１で求めた線形予測係数 a(i)
を用いて逆フィルタをかける。逆フィルタは数２に示す
処理によっておこなわる。FIG. 4 is a diagram for explaining a second embodiment of the voiced / unvoiced decision unit 102. In the embodiment described with reference to FIG. 2, the peak value of the autocorrelation function of the input speech is used for the voiced / unvoiced determination, whereas in the embodiment of FIG. 4, the modified correlation function of the input speech (prediction of the linear prediction analysis is used. The peak value of the residual autocorrelation coefficient) is used. In FIG. 4, 401 is a linear prediction analysis unit, 402 is a linear prediction inverse filter unit, 403 is an autocorrelation function calculation unit, 404 is a peak detection unit, and 405 is a determination unit. The voice data divided at regular intervals is input to the linear prediction analysis unit 401. The linear prediction analysis unit 401 performs a linear prediction analysis on the input voice data and outputs a linear prediction coefficient. Linear predictive analysis is a very common analysis method in the field of speech signal processing, and there are many good books that explain Furui's "Digital Speech Processing" in detail. Several algorithms have been proposed for calculating the linear prediction coefficient, but as an example, Levins
The processing flow of the on-Durbin algorithm is shown in FIG. In the linear prediction inverse filter unit 402, the linear prediction coefficient a (i) obtained by the linear prediction analysis unit 401 is applied to the input speech.
Inverse filter using. The inverse filter is performed by the process shown in Formula 2.

【００１８】[0018]

【数２】 [Equation 2]

【００１９】フィルタリングの出力ε(n)は線形予測分
析の予測誤差に相当し残差と呼ばれる。線形予測逆フィ
ルタ部４０２で計算した残差は自己相関関数計算部４０
３に入力する。自己相関関数計算部４０３では予測残差
の自己相関関数を計算する。ここで計算した相関関数は
入力音声の変形相関関数と呼ばれる。変形相関関数が計
算されると、ピーク検出部４０４において、0次のピー
クを除く変形相関関数のピークの中から最大値を検出す
る。判定部４０５ではピーク検出部４０４で求めたピー
クの最大値と予め用意しておいたしきい値とを比較し
て、ピークの最大値がしきい値を上回った場合に入力し
た音声が有声音であると判定する。The output ε (n) of the filtering corresponds to the prediction error of the linear prediction analysis and is called the residual. The residual calculated by the linear prediction inverse filter unit 402 is the autocorrelation function calculation unit 40.
Enter in 3. The autocorrelation function calculator 403 calculates the autocorrelation function of the prediction residual. The correlation function calculated here is called the modified correlation function of the input speech. When the modified correlation function is calculated, the peak detection unit 404 detects the maximum value from the peaks of the modified correlation function excluding the 0th-order peak. The determination unit 405 compares the maximum value of the peak obtained by the peak detection unit 404 with a threshold value prepared in advance, and when the maximum value of the peak exceeds the threshold value, the input voice is voiced. Judge that there is.

【００２０】図６は有声／無声判定部１０２の第三の実
施例を説明するための図である。本実施例も第一、第二
の実施例と同様、データの周期性に基づき有声／無声を
判定する手法である。ここでは周期性の判定にケプスト
ラムの高ケフレンシー成分のピーク値を使う。図６にお
いて、６０１はケプストラム計算部、６０２はピーク検
出部、６０３は判定部である。一定間隔毎に分割された
音声データはケプストラム計算部６０１に入力する。ケ
プストラム計算部６０１は入力した音声データに対して
FFTをおこなって周波数領域に変換し対数をとった後、I
FFTによって再び時間領域に変換することでケプストラ
ムを計算する。図７に実際の音声データから計算したケ
プストラムの一例を示す。ケプストラムの横軸はケフレ
ンシーと呼ばれ、低ケフレンシー部にはスペクトル包絡
の成分が集中し、高ケフレンシー部のピークにより基本
周波数が求まる。ピーク検出部６０２では求めたケプス
トラムからこの高ケフレンシー部のピーク値を検出す
る。判定部６０３ではピーク検出部６０４で求めたピー
クの値と予め用意しておいたしきい値とを比較して、ピ
ークの値がしきい値を上回った場合に入力した音声が有
声音であると判定する。FIG. 6 is a diagram for explaining a third embodiment of the voiced / unvoiced determination unit 102. Like the first and second embodiments, the present embodiment is also a method for determining voiced / unvoiced based on the periodicity of data. Here, the peak value of the high-keflency component of the cepstrum is used to determine the periodicity. In FIG. 6, 601 is a cepstrum calculation unit, 602 is a peak detection unit, and 603 is a determination unit. The voice data divided at regular intervals is input to the cepstrum calculation unit 601. The cepstrum calculator 601 uses the input voice data
After performing FFT and transforming into the frequency domain and taking the logarithm, I
The cepstrum is calculated by transforming into the time domain again by FFT. FIG. 7 shows an example of the cepstrum calculated from actual voice data. The abscissa of the cepstrum is called keflenency, and the components of the spectrum envelope are concentrated in the low kefflenency portion, and the fundamental frequency is obtained from the peak of the high kefflenency portion. The peak detection unit 602 detects the peak value of this high-keflency portion from the obtained cepstrum. The determination unit 603 compares the peak value obtained by the peak detection unit 604 with a threshold value prepared in advance, and if the peak value exceeds the threshold value, the input voice is a voiced sound. judge.

【００２１】これ以外にも、有声または無声を判定する
方法は数多く存在し、それらの方法も本実施例に適用で
きることはいうまでもない。In addition to the above, there are many methods for judging voiced or unvoiced, and it goes without saying that those methods can also be applied to this embodiment.

【００２２】つぎに、適応一次逆フィルタ部１０３につ
いて説明する。図８は適応一次逆フィルタ部１０３の一
実施例を説明するための図である。図８において、８０
１は一次パーコール係数計算部、８０２は逆フィルタ部
である。一次パーコール係数計算部８０１は、入力した
音声データの一次パーコール係数k1を数３より計算す
る。Next, the adaptive first-order inverse filter unit 103 will be described. FIG. 8 is a diagram for explaining an example of the adaptive first-order inverse filter unit 103. In FIG. 8, 80
Reference numeral 1 is a first-order Percoll coefficient calculation unit, and 802 is an inverse filter unit. The primary PARCOR coefficient calculating unit 801 calculates the primary PARCOR coefficient k1 of the input voice data from the equation (3).

【００２３】[0023]

【数３】 [Equation 3]

【００２４】ここで、r0、r1はそれぞれ自己相関関数の
０次の項と１次の項である。逆フィルタ部８０２は、一
次パーコール係数計算部８０１で計算した一次パーコー
ル係数を用いて入力音声データのフィルタリングをおこ
なう。一次パーコール係数を用いた逆フィルタリングに
はスペクトルを平坦化する働きがあり、音声認識の分野
では、電話回線の高域損失や個人差によるスペクトル傾
斜の補償に効果があることが過去に報告されている。適
応一次逆フィルタは入力信号をx(n)、一次パーコール係
数をk1とすると、以下の式で定式化される。Here, r0 and r1 are the 0th order term and the 1st order term of the autocorrelation function, respectively. The inverse filter unit 802 filters the input voice data by using the first-order Percoll coefficient calculated by the first-order Percoll coefficient calculation unit 801. Inverse filtering using the first-order Percoll coefficient has the function of flattening the spectrum, and in the field of speech recognition, it has been reported in the past that it is effective in compensating for spectrum tilt due to high-frequency loss in telephone lines and individual differences. There is. The adaptive first-order inverse filter is formulated by the following equation, where x (n) is the input signal and k1 is the first-order Percoll coefficient.

【００２５】[0025]

【数４】 [Equation 4]

【００２６】ところで、数４と数１とから、数５が求ま
る。By the way, Equation 5 is obtained from Equation 4 and Equation 1.

【００２７】[0027]

【数５】 [Equation 5]

【００２８】数５を用いると、自己相関関数を直接フィ
ルタリングすることが可能となり、波形信号を直接フィ
ルタリングする場合に比べて必要とする処理量を少なく
する事ができる。By using the equation 5, the autocorrelation function can be directly filtered, and the required processing amount can be reduced as compared with the case where the waveform signal is directly filtered.

【００２９】図９を用いて適応一次逆フィルタの効果に
ついて説明する。図９において、９０１は通常に発声し
た音声のスペクトル、９０２は発声変形を起こした音声
のスペクトルである。また、９０３は通常に発声した音
声を適応一次逆フィルタリング処理した後のスペクト
ル、９０４は発声変形を起こした音声を適応一次逆フィ
ルタリング処理した後のスペクトルである。９０１と９
０２とを比較すると、９０２は９０１に比べてスペクト
ルの高域成分のパワが上昇しており、２つのスペクトル
間に大きな差異が見られる。一方、適応一次逆フィルタ
処理を施した音声のスペクトル９０３と９０４では、両
者の差異が小さくなっていることがわかる。つまり、適
応一次逆フィルタを用いることで発声変形の影響を補正
することが可能となる。The effect of the adaptive first-order inverse filter will be described with reference to FIG. In FIG. 9, 901 is a spectrum of a normally uttered voice, and 902 is a spectrum of a voicing-deformed voice. Reference numeral 903 denotes a spectrum obtained by subjecting a normally uttered voice to an adaptive first-order inverse filtering process, and 904 denotes a spectrum obtained after subjecting a uttered voice to an adaptive first-order inverse filtering process. 901 and 9
Comparing with 02, the power of the high frequency component of the spectrum of 902 is higher than that of 901, and a large difference is seen between the two spectra. On the other hand, in the spectra 903 and 904 of the voice subjected to the adaptive first-order inverse filter processing, it can be seen that the difference between the two is small. That is, it is possible to correct the influence of vocal deformation by using the adaptive first-order inverse filter.

【００３０】次に分析部１０４について説明する。分析
部１０４は入力音声から照合部１０５で距離計算をおこ
なう際に使用する音声の特徴パラメータを計算する。音
声認識で用いられる特徴パラメータには、LPCケプスト
ラム、メルケプストラム、帯域フィルタの出力、FFTス
ペクトルなど数多く存在する。本実施例ではもっとも一
般的に用いられているLPCケプストラムを用いる場合に
ついて説明する。図１０は分析部１０４の一実施例を説
明するためのプロック図である。図１０において、１０
０１は線形予測分析部、１００２はケプストラム計算部
である。線形予測分析部１００１に入力した音声データ
は図５に示した分析処理フローに従い、線形予測係数
（ＬＰＣ係数）が求められる。ケプストラム計算部１０
０２はＬＰＣ係数（a1,...,an）から数６に示す再帰式
によって、ＬＰＣケプストラム（c1,...,cn）を計算す
る。Next, the analysis unit 104 will be described. The analysis unit 104 calculates the characteristic parameter of the voice used when the matching unit 105 calculates the distance from the input voice. There are many characteristic parameters used in speech recognition, such as LPC cepstrum, mel cepstrum, bandpass filter output, and FFT spectrum. In this embodiment, the case of using the most commonly used LPC cepstrum will be described. FIG. 10 is a block diagram for explaining an example of the analysis unit 104. In FIG. 10, 10
Reference numeral 01 is a linear prediction analysis unit, and 1002 is a cepstrum calculation unit. For the speech data input to the linear prediction analysis unit 1001, a linear prediction coefficient (LPC coefficient) is obtained according to the analysis processing flow shown in FIG. Cepstrum calculator 10
02 calculates the LPC cepstrum (c1, ..., cn) from the LPC coefficients (a1, ..., an) by the recursive formula shown in Formula 6.

【００３１】[0031]

【数６】 [Equation 6]

【００３２】最後に、照合部１０５について説明する。
図１１は照合部１０５を説明するための図である。図１
１で、１１０１はＤＰマッチング部、１１０２は最小距
離判定部である。ＤＰマッチング部１１０１は、分析部
１０４で求めた入力音声フレーム毎の特徴パラメータ
（本実施例ではｎ次ＬＰＣケプストラム）系列と標準パ
タン格納部１０６に格納されている標準パタン（登録音
声の特徴ベクトル系列）との距離計算をおこなう。もち
ろん、標準パタンの作成において、入力音声と同様に有
声区間のみ一次適応逆フィルタ処理したのち、分析パラ
メータを求めた。ＤＰマッチングはＤＴＷ（Dynamic Ti
me Warping）とも呼ばれ、音声パタンの発声時間長の変
動に対する正規化を動的計画法（Dynamic Programmin
g）を用いておこなう手法で、孤立単語の認識に古くか
ら用いられている。ＤＰマッチングの詳細については古
井；「ディジタル音声処理」(東海大学出版)の説明が詳
しい。ＤＰマッチング部１１０１においてすべての標準
パタンとの距離計算が終了すると、最小距離判定部１１
０２は距離計算の値がもっとも小さかった標準パタンを
見つけだす。音声認識システムでは最小距離判定部１１
０２で得られた距離最小の標準パタンの登録単語を認識
結果とする。Finally, the collating unit 105 will be described.
FIG. 11 is a diagram for explaining the matching unit 105. Figure 1
In FIG. 1, 1101 is a DP matching unit and 1102 is a minimum distance determining unit. The DP matching unit 1101 stores the characteristic parameter (nth-order LPC cepstrum in this embodiment) sequence for each input voice frame obtained by the analysis unit 104 and the standard pattern (feature vector sequence of registered voice) stored in the standard pattern storage unit 106. ) And the distance calculation. Of course, in the creation of the standard pattern, the analysis parameters were obtained after the first-order adaptive inverse filter processing was applied only to the voiced section as in the input speech. DP matching is DTW (Dynamic Ti
Also known as me Warping), normalization for variations in the utterance duration of speech patterns is performed by dynamic programming.
g), which has long been used to recognize isolated words. For details of DP matching, see Furui; "Digital Audio Processing" (Tokai University Press). When the distance calculation with all the standard patterns is completed in the DP matching unit 1101, the minimum distance determination unit 11
02 finds the standard pattern with the smallest distance calculation value. Minimum distance determination unit 11 in the voice recognition system
The registered word of the standard pattern with the minimum distance obtained in 02 is set as the recognition result.

【００３３】以上説明したように、本実施例によれば、
比較的音声のパワが大きく、発声変形の影響が生じやす
い有声音の区間は、発声変形によるスペクトルの傾き変
化を逆フィルタリングで補正することができる。一方、
比較的音声パワが小さく、騒音の影響を受けやすい無声
音の区間は逆フィルタリングを省くことで、フィルタリ
ングによって雑音成分を強調する等の悪影響を避けるこ
とができる。As described above, according to this embodiment,
In a voiced sound section, which has a relatively large voice power and is easily affected by voicing deformation, a change in the slope of the spectrum due to voicing deformation can be corrected by inverse filtering. on the other hand,
By omitting the inverse filtering in the unvoiced section having a relatively small voice power and being easily influenced by noise, it is possible to avoid adverse effects such as emphasizing a noise component by filtering.

【００３４】したがって、本発明によれば、騒音環境下
で発声した発声変形を伴う音声の認識性能を向上させる
ことが可能となる。Therefore, according to the present invention, it is possible to improve the recognition performance of the voice accompanied by the deformation of the voice uttered in the noisy environment.

【００３５】ここで、本発明の第二の実施例として、有
声／無声を判定して逆フィルタ処理の有無を切り替える
かわりに、入力した音声フレームが母音であるかどうか
を判定してフィルタ処理の有無を切り替えることを考え
る。図１２は本発明の第二の実施例を説明するためのシ
ステムブロック図である。図１２において、１２０１は
音声入力部、１２０２は母音判定部、１２０３は適応一
次逆フィルタ部、１２０４は分析部、１２０５は照合
部、１２０６は標準パタン格納部、１２０７はスイッチ
部である。第１の実施例と同じく、音声入力部１２０１
に入力した音声はＡ／Ｄ変換によってディジタル信号に
変換された後、一定間隔毎に分割される。分析フレーム
毎に分割された音声データは母音判定部１２０２に入力
し、母音であるか否かが判定される。ここで、入力した
音声フレームデータが母音であると判定されたときに
は、適応一次逆フィルタ部１２０３では適応一次逆フィ
ルタリングをおこない音声データの周波数特性を平坦に
する。また、入力した音声フレームデータが母音ではな
いと判定された場合、適応一次逆フィルタ部１２０３の
処理は省く。つぎに分析部１２０４では入力音声から特
徴パラメータを計算する。標準パタン格納部１２０６に
はあらかじめ計算しておいた認識対象語彙の標準パタン
が格納してある。照合部１２０５は、標準パタン格納部
１２０６に格納されている標準パタンと、音声分析部１
２０４で分析された入力音声の特徴ベクトルとの間で距
離計算をおこなう。このとき照合部１２０５で照合した
標準パタンのうち距離が一番小さい単語が入力した音声
の認識単語であると判定され、認識結果して出力され
る。As a second embodiment of the present invention, instead of judging voiced / unvoiced and switching the presence / absence of inverse filter processing, it is judged whether the input voice frame is a vowel and the filtering processing is performed. Consider switching between presence and absence. FIG. 12 is a system block diagram for explaining the second embodiment of the present invention. In FIG. 12, 1201 is a voice input unit, 1202 is a vowel determination unit, 1203 is an adaptive first-order inverse filter unit, 1204 is an analysis unit, 1205 is a matching unit, 1206 is a standard pattern storage unit, and 1207 is a switch unit. As in the first embodiment, the voice input unit 1201
The voice input to is converted into a digital signal by A / D conversion and then divided at regular intervals. The voice data divided for each analysis frame is input to the vowel determination unit 1202, and it is determined whether or not it is a vowel. When it is determined that the input voice frame data is a vowel, the adaptive first-order inverse filter unit 1203 performs adaptive first-order inverse filtering to flatten the frequency characteristic of the voice data. When it is determined that the input voice frame data is not a vowel, the processing of the adaptive first-order inverse filter unit 1203 is omitted. Next, the analysis unit 1204 calculates characteristic parameters from the input voice. The standard pattern storage unit 1206 stores standard patterns of the vocabulary to be recognized, which have been calculated in advance. The collating unit 1205 compares the standard pattern stored in the standard pattern storage unit 1206 with the voice analysis unit 1.
Distance calculation is performed with the feature vector of the input voice analyzed in 204. At this time, it is determined that the word having the smallest distance among the standard patterns matched by the matching unit 1205 is the recognition word of the input voice, and the recognition result is output.

【００３６】次に、各処理部について詳細に述べる。と
ころで、音声入力部１２０１、適応一次逆フィルタ部１
２０３、分析部１２０４、照合部１２０５、標準パタン
格納部１２０６については、第一の実施例における説明
と重複する。そこでそれらの説明は省き、母音判定部１
２０２のみを説明する。Next, each processing unit will be described in detail. By the way, the voice input unit 1201 and the adaptive first-order inverse filter unit 1
The description of 203, the analysis unit 1204, the collation unit 1205, and the standard pattern storage unit 1206 overlaps with the description in the first embodiment. Therefore, the description thereof is omitted, and the vowel determination unit 1
Only 202 will be described.

【００３７】母音区間は、ピッチを有し、比較的大きな
パワをもつフレームが一定時間(60ms程度)継続するとい
った特徴がある。本実施例では、パワの大きさに基づい
た母音判定法を例にとって説明する。図１３は母音判定
部１２０２の一実施例を示す図である。図１３におい
て、１３０１はパワ計算部、１３０２は判定部である。
パワ計算部１３０１は入力した音声の分析フレームの短
時間パワを計算する。本実施例では、短時間パワとして
自己相関関数の０次の項を用いる。ここで、自己相関関
数を計算しておけば、適応一次逆フィルタ部１２０３お
よび分析部１２０４で再び自己相関関数を計算する必要
はなくなる。判定部１３０２ではあらかじめ音声パワに
対するしきい値を用意しておき、入力した音声のパワが
一定フレーム連続してこのしきい値を上回った時に、そ
の区間が母音区間であると判定する。The vowel section has a feature that it has a pitch and a frame having a relatively large power continues for a fixed time (about 60 ms). In the present embodiment, a vowel determination method based on the magnitude of power will be described as an example. FIG. 13 is a diagram showing an example of the vowel determination unit 1202. In FIG. 13, 1301 is a power calculation unit and 1302 is a determination unit.
The power calculator 1301 calculates the short-time power of the analysis frame of the input voice. In this embodiment, the zero-order term of the autocorrelation function is used as the short-time power. Here, if the autocorrelation function is calculated, it is not necessary for the adaptive first-order inverse filter unit 1203 and the analysis unit 1204 to calculate the autocorrelation function again. The determination unit 1302 prepares a threshold value for the voice power in advance, and when the input voice power exceeds the threshold value for a certain number of consecutive frames, the section is determined to be a vowel section.

【００３８】もちろんこれ以外にも、母音区間を判定す
る方法は数多く存在し、それらの方法も本実施例に適用
できることはいうまでもない。Of course, other than this, there are many methods for determining the vowel segment, and it goes without saying that these methods can also be applied to this embodiment.

【００３９】以上説明したように、第二の実施例によれ
ば、比較的音声のパワが大きく、発声変形の影響が生じ
やすい母音区間について、発声変形によるスペクトルの
傾き変化を逆フィルタリングで補正することができる。
一方、比較的音声パワが小さく、騒音の影響を受けやす
い母音以外の区間（子音区間、無音区間）は逆フィルタ
処理を省くことで、フィルタリングによって雑音成分を
強調する等の悪影響を避けることができる。As described above, according to the second embodiment, in the vowel section in which the voice power is relatively large and the influence of the voicing deformation is likely to occur, the inclination change of the spectrum due to the voicing deformation is corrected by inverse filtering. be able to.
On the other hand, it is possible to avoid adverse effects such as emphasizing the noise component by filtering by omitting the inverse filter processing for the sections (consonant sections, silent sections) other than vowels, which have relatively low voice power and are easily affected by noise. .

【００４０】したがって、本発明によれば、騒音環境下
で発声した発声変形を伴う音声の認識性能を向上させる
ことが可能となる。Therefore, according to the present invention, it is possible to improve the recognition performance of the voice accompanied by the deformation of the voice uttered in the noisy environment.

【００４１】ここまでの説明は入力音声として発声変形
をおこした音声が入力すると想定して説明した。発声変
形は、高騒音環境で発声した場合にのみ問題となる現象
であり、静かな環境では発声変形はおこらない。そこ
で、第三の実施例として、測定した周囲の騒音レベルの
大きさによって適応逆フィルタ処理の有無を切り替える
方法について説明する。図１４は本発明の第三の実施例
を説明するためのシステムブロック図である。図１４に
おいて、１４０１は音声入力部、１４０２は雑音レベル
測定部、１４０３は騒音判定部、１４０４はスイッチ
部、１４０５は適応一次逆フィルタ部、１４０６は分析
部、１４０７は標準パタン格納部、１４０８は標準パタ
ン選択部、１４０９は照合部である。音声入力部１４０
１から入力した音声信号はＡ／Ｄ変換によってディジタ
ル信号に変換された後、一定間隔（通常は数十ms）毎に
分割される（分析フレーム）。分析フレーム毎に分割さ
れた入力データは雑音レベル測定部１４０２において雑
音レベルが測定される。雑音レベル測定部１４０２につ
いてはあとで説明する。騒音判定部１４０３は雑音レベ
ル測定部１４０２で求められた雑音レベルから入力信号
の騒音の大小を判定する。つまり、雑音レベル測定部１
４０２で求められた雑音レベルがしきい値よりも大きい
場合に騒音が大であると判定する。スイッチ部１４０４
は騒音の大小によって処理を切り替える。もし騒音が大
であるときには適応一次逆フィルタ部１４０５に処理を
移す。逆に騒音が小である場合は分析部１４０６に処理
を移す。適応一次逆フィルタ部１４０５では適応一次逆
フィルタリングをおこない音声データの周波数特性を平
坦にする。また、分析部１４０６ではフレーム毎に分割
した入力音声から特徴パラメータを計算する。適応一次
逆フィルタ部、分析部の詳細についてはすでに説明し
た。標準パタン格納部１４０７には認識対象単語の標準
パタンが格納してある。本実施例の場合には適応一次逆
フィルタ処理を経由して分析した標準パタンと適応一次
逆フィルタ処理をおこなわずに分析した標準パタンの二
種類が格納されている。標準パタン選択部１４０８は騒
音判定部１４０３で判定された騒音の大小によって標準
パタンを選択する。つまり、騒音が大であるときは適応
一次逆フィルタ処理を経由して分析した標準パタンを用
い、騒音が小であるときには適応一次逆フィルタ処理を
おこなわずに分析した標準パタンを用いる。照合部１４
０９は、標準パタン選択部１４０８で選択された標準パ
タンと、分析部１４０６で分析された入力音声の特徴ベ
クトルとの間で距離計算をおこなう。このとき照合部１
４０９で照合した標準パタンのうち距離が一番小さい単
語が入力音声の認識単語であると判定される。The description so far has been made on the assumption that a voice-transformed voice is input as the input voice. Vocal deformation is a problem that occurs only when vocalizing in a high noise environment, and does not occur in a quiet environment. Therefore, as a third embodiment, a method of switching the presence / absence of adaptive inverse filter processing according to the measured ambient noise level will be described. FIG. 14 is a system block diagram for explaining the third embodiment of the present invention. In FIG. 14, 1401 is a voice input unit, 1402 is a noise level measuring unit, 1403 is a noise judging unit, 1404 is a switching unit, 1405 is an adaptive first-order inverse filter unit, 1406 is an analyzing unit, 1407 is a standard pattern storing unit, and 1408 is The standard pattern selection unit 1409 is a collation unit. Voice input unit 140
The voice signal input from 1 is converted into a digital signal by A / D conversion, and then divided at regular intervals (usually several tens of ms) (analysis frame). The noise level measuring unit 1402 measures the noise level of the input data divided for each analysis frame. The noise level measuring unit 1402 will be described later. The noise determining unit 1403 determines the magnitude of the noise of the input signal from the noise level obtained by the noise level measuring unit 1402. That is, the noise level measuring unit 1
When the noise level obtained in 402 is larger than the threshold value, it is determined that the noise is large. Switch part 1404
Switches processing depending on the noise level. If the noise is large, the process is moved to the adaptive first-order inverse filter unit 1405. On the contrary, if the noise is low, the processing is moved to the analysis unit 1406. The adaptive first-order inverse filter unit 1405 performs adaptive first-order inverse filtering to flatten the frequency characteristic of audio data. Also, the analysis unit 1406 calculates a characteristic parameter from the input voice divided for each frame. The details of the adaptive first-order inverse filter unit and the analysis unit have already been described. The standard pattern storage unit 1407 stores standard patterns of recognition target words. In the case of the present embodiment, two types are stored: a standard pattern analyzed through the adaptive first-order inverse filter process and a standard pattern analyzed without performing the adaptive first-order inverse filter process. The standard pattern selection unit 1408 selects a standard pattern according to the magnitude of the noise determined by the noise determination unit 1403. That is, when the noise is large, the standard pattern analyzed through the adaptive first-order inverse filter processing is used, and when the noise is small, the standard pattern analyzed without performing the adaptive first-order inverse filter processing is used. Collator 14
09 performs distance calculation between the standard pattern selected by the standard pattern selection unit 1408 and the feature vector of the input voice analyzed by the analysis unit 1406. At this time, the collating unit 1
It is determined that the word having the smallest distance among the standard patterns checked in 409 is the recognition word of the input voice.

【００４２】ここで、雑音レベル測定部１４０２につい
てくわしく説明する。図１５は雑音レベル測定部１４０
２を説明するための図である。図１５において、１５０
１は音声区間検出部、１５０２は雑音パワ計算部であ
る。音声区間検出部１５０１は入力信号から音声区間を
検出する。音声区間検出については古井の「ディジタル
音声処理」など詳しく解説されている。一般的な例とし
ては、一定しきい値以上の短時間パワが一定時間以上継
続した区間を基準に音声区間を決定する。雑音パワ計算
部１５０２はフレーム毎に計算される短時間パワの平均
をとる。この平均処理は音声区間検出部１５０１で音声
区間が検出されるまで継続する。この処理によって、音
声区間が検出されたときには騒音レベルの測定が完了し
ている。Here, the noise level measuring section 1402 will be described in detail. FIG. 15 shows the noise level measuring unit 140
It is a figure for demonstrating No. 2. In FIG. 15, 150
Reference numeral 1 is a voice section detection unit, and 1502 is a noise power calculation unit. The voice section detection unit 1501 detects a voice section from the input signal. The voice section detection is explained in detail, such as Furui's "Digital Speech Processing". As a general example, a voice section is determined on the basis of a section in which short-time power of a certain threshold value or more continues for a certain time or more. The noise power calculation unit 1502 takes an average of short-time power calculated for each frame. This averaging process continues until the voice section detection unit 1501 detects a voice section. By this processing, the noise level measurement is completed when the voice section is detected.

【００４３】以上説明したように、第三の実施例によれ
ば、発声変形がおこりやすい高騒音環境で発声した場合
にのみ、発声変形によるスペクトルの傾き変化を逆フィ
ルタリングで補正することができる。一方、発声変形の
おこらない静かな環境では逆フィルタリング処理を省く
ことで、逆フィルタリング処理の悪影響を避けることが
できる。As described above, according to the third embodiment, it is possible to correct the inclination change of the spectrum due to the utterance deformation by inverse filtering only when the utterance is made in a high noise environment where the utterance deformation is likely to occur. On the other hand, the adverse effect of the inverse filtering process can be avoided by omitting the inverse filtering process in a quiet environment in which no vocal transformation occurs.

【００４４】したがって、本発明によれば、静かな環境
での使用時における性能劣化を生じることなく、騒音環
境下で発声した発声変形を伴う音声の認識性能を向上さ
せることが可能となる。Therefore, according to the present invention, it is possible to improve the performance of recognizing a voice accompanied by a vocalization deformation in a noisy environment without causing performance degradation when used in a quiet environment.

【００４５】また、この第三の実施例の実施例と第一、
第二の実施例との併用も可能である。たとえば、図１６
で示す第四の実施例では、第三の実施例に有声／無声判
定部１６１０を追加した。この実施例によれば、比較的
音声のパワが大きく、発声変形の影響が生じやすい有声
音の区間は、発声変形によるスペクトルの傾き変化を逆
フィルタリングで補正することができる。一方、比較的
音声パワが小さく、騒音の影響を受けやすい無声音の区
間は逆フィルタリングを省くことで、フィルタリングに
よって雑音成分を強調する等の悪影響を避けることがで
きる。また、発声変形のおこらない静かな環境では全区
間において、逆フィルタリング処理を省くことで、逆フ
ィルタリング処理の悪影響を避けることができる。Further, the third embodiment and the first embodiment,
A combination with the second embodiment is also possible. For example, in FIG.
In the fourth embodiment shown by, the voiced / unvoiced determination unit 1610 is added to the third embodiment. According to this embodiment, in a voiced sound section in which voice power is relatively large and the influence of vocalization deformation is likely to occur, it is possible to correct the inclination change of the spectrum due to vocalization deformation by inverse filtering. On the other hand, by omitting the inverse filtering in the unvoiced section, which has a relatively small voice power and is easily affected by noise, it is possible to avoid adverse effects such as emphasizing a noise component by filtering. Further, in a quiet environment where voicing transformation does not occur, the adverse effect of the inverse filtering process can be avoided by omitting the inverse filtering process in all sections.

【００４６】したがって、本発明によれば、静かな環境
での使用時における性能劣化を生じることなく、騒音環
境下で発声した発声変形を伴う音声の認識性能を向上さ
せることが可能となる。Therefore, according to the present invention, it is possible to improve the recognition performance of a voice accompanied by a voicing deformation in a noisy environment without causing performance degradation when used in a quiet environment.

【００４７】もちろん、第三の実施例と母音判定部との
併用も同様に有効であることは言うまでもない。Of course, it goes without saying that the combined use of the third embodiment and the vowel judging section is also effective.

【００４８】[0048]

【発明の効果】以上述べてきたように、本発明によれ
ば、比較的音声のパワが大きく、発声変形の影響が生じ
やすい有声音の区間は、発声変形によるスペクトルの傾
き変化を逆フィルタリングで補正することができる。一
方、比較的音声パワが小さく、騒音の影響を受けやすい
無声音の区間は逆フィルタリングを省くことで、フィル
タリングによって雑音成分を強調する等の悪影響を避け
ることができる。As described above, according to the present invention, in a voiced sound section in which voice power is relatively large and the influence of voicing deformation is likely to occur, it is possible to inversely filter a change in spectrum slope due to voicing deformation. Can be corrected. On the other hand, by omitting the inverse filtering in the unvoiced section, which has a relatively small voice power and is easily affected by noise, it is possible to avoid adverse effects such as emphasizing a noise component by filtering.

【００４９】したがって、本発明によって騒音環境下で
発声した発声変形を伴う音声の認識性能を向上させるこ
とが可能となる。Therefore, according to the present invention, it is possible to improve the performance of recognizing a voice produced in a noisy environment and accompanied by a vocal deformation.

[Brief description of drawings]

【図１】本発明の第一の実施例を説明するためのブロッ
ク図である。FIG. 1 is a block diagram for explaining a first embodiment of the present invention.

【図２】有声／無声判定部の一実施例を説明するための
ブロック図である。FIG. 2 is a block diagram for explaining an example of a voiced / unvoiced determination unit.

【図３】音声データの自己相関関数の一例を示す図であ
る。FIG. 3 is a diagram showing an example of an autocorrelation function of voice data.

【図４】有声／無声判定部の第二の実施例を説明するた
めのブロック図である。FIG. 4 is a block diagram for explaining a second embodiment of a voiced / unvoiced determination unit.

【図５】線形予測分析部の一実施例を説明するための処
理フローである。FIG. 5 is a processing flow for explaining an example of a linear prediction analysis unit.

【図６】有声／無声判定部の第三の実施例を説明するた
めのブロック図である。FIG. 6 is a block diagram illustrating a third embodiment of a voiced / unvoiced determination unit.

【図７】音声データから計算したケプストラムの一例を
示す図である。FIG. 7 is a diagram showing an example of a cepstrum calculated from audio data.

【図８】適応一次逆フィルタ部の一実施例を説明するた
めの図である。FIG. 8 is a diagram for explaining an example of an adaptive first-order inverse filter unit.

【図９】適応一次逆フィルタの効果を説明するための図
である。FIG. 9 is a diagram for explaining the effect of an adaptive first-order inverse filter.

【図１０】分析部の一実施例を説明するための図であ
る。FIG. 10 is a diagram for explaining an example of an analysis unit.

【図１１】照合部の一実施例を説明するための図であ
る。FIG. 11 is a diagram illustrating an example of a matching unit.

【図１２】本発明の第二の実施例を説明するための図で
ある。FIG. 12 is a diagram for explaining the second embodiment of the present invention.

【図１３】母音判定部の一実施例を説明するための図で
ある。FIG. 13 is a diagram illustrating an example of a vowel determination unit.

【図１４】本発明の第三の実施例を説明するための図で
ある。FIG. 14 is a diagram for explaining the third embodiment of the present invention.

【図１５】雑音レベル測定部を説明するための図であ
る。FIG. 15 is a diagram for explaining a noise level measuring unit.

【図１６】本発明の第四の実施例を説明するための図で
ある。FIG. 16 is a diagram for explaining the fourth embodiment of the present invention.

[Explanation of symbols]

１０１…音声入力部、１０２…有声／無声判定部、１０
３…適応一次逆フィルタ部、１０４…分析部、１０５…
照合部、１０６…標準パタン格納部、１０７…スイッチ
部。101 ... Voice input unit, 102 ... Voiced / unvoiced determination unit, 10
3 ... Adaptive first-order inverse filter unit, 104 ... Analysis unit, 105 ...
Collating unit 106 ... Standard pattern storage unit 107 ... Switch unit.

Claims

[Claims]

1. A voice input unit for inputting a voice to be recognized, and an adaptive first-order inverse filter for performing inverse filtering on the voice signal using a first-order Percoll coefficient obtained from the voice signal input to the voice input unit. Section, an analysis section for calculating a feature vector of the inversely filtered voice signal, and recognition of the input voice by obtaining the similarity between the standard pattern registered in advance and the feature vector obtained by the analysis section. In a voice recognition device having a collating unit for performing,
A voice recognition device characterized by comprising a judging section for judging characteristics of an input signal, wherein the adaptive primary inverse filter section carries out inverse filtering only when the input signal matches the condition of the judging section.

2. The adaptive primary inverse function is provided only when a noise level measuring unit for measuring the magnitude of noise mixed in an input voice signal is provided, and the determining unit determines that the noise level exceeds a threshold value. The voice recognition device according to claim 1, wherein inverse filtering is performed by the filter unit.

3. A voiced / unvoiced determination unit for determining whether the input voice is voiced or unvoiced as the determination unit, and the adaptive first-order inverse filter unit performs inverse filtering only on a voice section determined to be voiced sound. The voice recognition device according to claim 1.

4. The voice recognition apparatus according to claim 3, wherein the voiced / unvoiced determination unit determines voiced or unvoiced voice using a peak of an autocorrelation function calculated from the input signal.

5. The voiced / unvoiced determination unit determines voiced or unvoiced voice using a peak of a modified correlation function (automatic correlation coefficient of prediction residual of linear prediction analysis) calculated from the input signal. The voice recognition device according to claim 3, characterized in that

6. The voiced / unvoiced determination unit is characterized by determining voiced / unvoiced voice using a high-keflency portion of a cepstrum calculated from the input signal.
The voice recognition device described.

7. A vowel determination unit for determining the input voice as a vowel section and a section other than the vowel section is provided as the determination section, and the adaptive first-order inverse filter section performs inverse filtering only on a voice section determined to be a vowel. The voice recognition device according to claim 1, characterized in that.

8. The voice recognition apparatus according to claim 7, wherein the vowel discrimination section uses the short-time power value calculated from the input signal to determine the vowel section.

9. An autocorrelation calculation unit for calculating an autocorrelation function of an input speech signal is provided, and the adaptive first-order inverse filter unit performs inverse filtering on the autocorrelation function calculated by the autocorrelation calculation unit. Claim 1 characterized by
8. The voice recognition device according to 8 above.