JPH06149285A

JPH06149285A - Speech recognizing device

Info

Publication number: JPH06149285A
Application number: JP4294884A
Authority: JP
Inventors: Hiroyuki Fujimoto; 博之藤本; Kazuya Sako; 和也佐古; Shoji Fujimoto; 昇治藤本
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 1992-11-04
Filing date: 1992-11-04
Publication date: 1994-05-27
Anticipated expiration: 2017-10-15
Also published as: JP3335389B2

Abstract

PURPOSE:To reduce speech misrecognition of the speech recognizing device which controls equipment by recognizing a speech. CONSTITUTION:The speech recognizing device which recognizes the speech by preprocessing the input signal of the speech and controls the equipment on the basis of the recognition result is provided with a parameter setting part 10 in which parameters for roughly classifying spectrum patterns of the speech by respective features and optimizing preprocessing by the roughly classified spectrum patterns of the speech are stored and a parameter switching part 11 which analyzes the frequency of the input signal of the speech, decides which of the roughly classified speech spectrum patterns the frequency-analyzed spectrum belongs to, and switches the set parameters of the parameter setting part 10.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声を認識することによ
り機器の制御を行うための音声認識装置に関し、特に本
発明では音声認識の誤認識を低減することに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus for controlling equipment by recognizing voice, and more particularly to reducing false recognition of voice recognition in the present invention.

【０００２】[0002]

【従来の技術】従来このような分野の技術として以下に
説明するものがあった。図５は従来の音声認識装置を用
いた制御システムを示す図である。本図に示すように、
音声認識装置を用いた制御システムは、車両の車室３０
０内の話者の音声を捕捉するマイクロフォン２００と、
該マイクロフォン２００からの音声の方向、音源からの
距離から話者を識別する話者方向・距離判定部２０１
と、該話者方向・距離判定部２０１に接続され話者を識
別した音声信号から雑音を消去する適応形処理さらに自
動利得制御（ＡＧＣ）を行う音声認識の前処理部２０２
と、該前処理部２０２に接続され音声を登録されたどの
単語に一致するかを認識する音声認識部２０３と、該音
声認識部２０３で認識された単語に基づき制御信号を形
成する各種制御部２０４と、該各種制御部２０４を介し
て認識された単語を音声に合成する音声合成部２０５
と、該音声合成部２０５に接続され合成された音声を再
生するスピーカ２０６と、前記各種制御部２０４により
制御されるオーディオ２０７と、エアコンデショナー２
０８と、電話２０９と、ナビゲーション２１０と、オー
トドライブ２１１等を含む。2. Description of the Related Art Conventionally, there have been techniques described below as techniques in such a field. FIG. 5 is a diagram showing a control system using a conventional voice recognition device. As shown in this figure,
A control system using a voice recognition device is used in a vehicle interior 30
A microphone 200 for capturing the voice of the speaker in 0;
A speaker direction / distance determining unit 201 for identifying a speaker based on the direction of the voice from the microphone 200 and the distance from the sound source.
And a speech recognition preprocessing unit 202 which is connected to the speaker direction / distance determining unit 201 and performs adaptive processing for eliminating noise from a voice signal that identifies a speaker and further performs automatic gain control (AGC).
And a voice recognition unit 203 connected to the pre-processing unit 202 for recognizing which of the registered words the voice matches, and various control units for forming control signals based on the words recognized by the voice recognition unit 203. 204, and a voice synthesis unit 205 for synthesizing words recognized through the various control units 204 into voice.
A speaker 206 connected to the voice synthesizing unit 205 for reproducing synthesized voice, an audio 207 controlled by the various control units 204, and an air conditioner 2
08, telephone 209, navigation 210, auto drive 211 and the like.

【０００３】すなわち、マイクロフォン２００で捕捉さ
れた音声は、話者方向距離判定部２０１、前処理部２０
２を介して音声認識部２０３により認識され、その結果
を各種制御部２０４、音声合成２０５を介してスピーカ
２０６により話者に伝え、各種制御部２０４によりオー
ディオ２０７等のそれぞれが制御される。ここでマイク
ロフォン２００〜スピーカ２０６は音声認識装置を構成
する。このような制御システムにおいては、制御の信頼
性の向上の観点から音声の認識率が高いことが要求され
ている。このため音声認識部２０３の性能向上が求めら
れるが、その前段である信号処理の結果にも大きく影響
を受ける。したがって、特にマイクロフォン２００につ
いては話者方向距離判定部２０１によりマイクロフォン
相互間の遅延量の差、利得量の差の最適化を図ってい
る。さらに前処理部２０２により、雑音低減用適応型フ
ィルタ（ＡＤＦ）のタップ長、遅延量、更新係数の最適
化、自動利得制御装置（ＡＧＣ）の設定値の最適化、帯
域制限フィルタのカットオフ周波数の最適化、遮断特性
（減衰特性）の最適化を行っている。That is, the voice captured by the microphone 200 is processed by the speaker direction distance determination unit 201 and the preprocessing unit 20.
2 is recognized by the voice recognition unit 203, the result is transmitted to the speaker by the speaker 206 via the various control units 204 and the voice synthesis 205, and the various control units 204 control the audio 207 and the like. Here, the microphone 200 to the speaker 206 constitute a voice recognition device. In such a control system, a high voice recognition rate is required from the viewpoint of improving control reliability. Therefore, the performance of the voice recognition unit 203 is required to be improved, but the result of the signal processing that is the preceding stage is also greatly affected. Therefore, particularly for the microphone 200, the speaker direction distance determination unit 201 optimizes the difference in delay amount and the difference in gain amount between the microphones. Further, the preprocessing unit 202 optimizes the tap length, delay amount, and update coefficient of the noise reduction adaptive filter (ADF), optimizes the setting value of the automatic gain control device (AGC), and cuts off the frequency of the band limiting filter. And the cutoff characteristics (attenuation characteristics) are optimized.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら従来の音
声認識装置の前処理部２０２では、前記各種最適化は経
験的に行われるが、話者の影響を強く受け最適化するの
が困難で、話者による認識率のばらつきが大きく安定し
て高認識率を得ることができないという問題があった。However, in the pre-processing unit 202 of the conventional speech recognition apparatus, although the various optimizations described above are performed empirically, it is difficult to optimize because of the strong influence of the speaker. There is a problem that the recognition rate varies widely among persons and a stable high recognition rate cannot be obtained.

【０００５】したがって本発明は上記問題点に鑑み異な
る話者に対しても認識率が高められる信号前処理を行う
ことができる音声認識装置を提供することを目的とす
る。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a voice recognition device capable of performing signal preprocessing for increasing the recognition rate even for different speakers in view of the above problems.

【０００６】[0006]

【課題を解決するための手段】本発明は前記問題点を解
決するために、音声の入力信号を前処理し音声を認識
し、この認識結果に基づき機器を制御する音声認識装置
に、パラメータ設定部及びパラメータ切換部を設ける。
前記パラメータ設定部は前記音声のスペクトルパターン
をそれぞれの特徴に基づき大別し、大別された音声のス
ペクトルパターン毎に前記前処理の最適化が図れるパラ
メータを格納するようにしてある。SUMMARY OF THE INVENTION In order to solve the above problems, the present invention sets a parameter in a voice recognition device which preprocesses an input signal of voice to recognize the voice and controls a device based on the recognition result. Section and parameter switching section.
The parameter setting unit roughly divides the spectrum pattern of the voice on the basis of each characteristic, and stores a parameter capable of optimizing the preprocessing for each broad spectrum pattern of the voice.

【０００７】前記パラメータ切換部は前記音声の入力信
号を周波数分析し、周波数分析されたスペクトル分析が
前記大別されたどの音声のスペクトルパターンに属する
かを判定して前記パラメータ設定部の設定パラメータを
切り換えるようにしてある。さらに音声のスペクトルパ
ターンを特徴づける第１ホルマント周波数を基準にして
前記音声のスペクトルパターンを大別し、さらに入力信
号の第１ホルマント周波数により前記パラメータを切り
換えるようにしてある。The parameter switching unit frequency-analyzes the input signal of the voice, determines which of the broad-spectrum voice spectrum patterns the frequency-analyzed spectrum analysis belongs to, and sets the setting parameter of the parameter setting unit. It is designed to be switched. Further, the spectrum pattern of the voice is roughly classified based on the first formant frequency which characterizes the spectrum pattern of the voice, and the parameter is switched according to the first formant frequency of the input signal.

【０００８】[0008]

【作用】本発明の音声認識装置によれば、音声のスペク
トルパターンがそれぞれの特徴に基づき大別され、大別
された音声のスペクトルパターン毎に前記前処理の最適
化が図れるパラメータが格納され、前記音声の入力信号
が周波数分析され、周波数分析されたスペクトル分析が
前記大別されたどの音声のスペクトルパターンに属する
かが判定されて前記パラメータが切り換えられることに
より、従来ではパラメータを固定していたものを話者に
より可変にしたので、発声話者による認識率のばらつき
がなくなり、安定して高認識率を得ることができる。According to the speech recognition apparatus of the present invention, the speech spectrum patterns are roughly classified based on their respective characteristics, and the parameters for optimizing the preprocessing are stored for each of the roughly classified speech spectrum patterns, The input signal of the voice is frequency-analyzed, the frequency-analyzed spectrum analysis is determined to belong to the broadly divided spectrum spectrum of the voice, and the parameter is switched, so that the parameter is conventionally fixed. Since the speaker is variable depending on the speaker, there is no variation in the recognition rate among the speaking speakers, and a high recognition rate can be stably obtained.

【０００９】さらに音声のスペクトルパターンを特徴づ
ける第１ホルマント周波数が基準にされ前記音声のスペ
クトルパターンが大別され、さらに入力信号の第１ホル
マント周波数により前記パラメータが切り換えられるこ
とにより、容易に実現可能できる。また前記第１ホルマ
ント周波数が基準とされ前記音声のスペクトルパターン
が男女に大別され、さらに入力信号の第１ホルマント周
波数により男女のパラメータが切り換えられることによ
り、男女の第１ホルマント周波数の顕著な相違を利用し
てさらに容易に実現が可能になる。Further, the first formant frequency that characterizes the spectrum pattern of the voice is used as a reference to roughly divide the spectrum pattern of the voice, and the parameter can be switched according to the first formant frequency of the input signal. it can. Further, the first formant frequency is used as a reference, the spectrum pattern of the voice is roughly classified into male and female, and the parameters of the male and female are switched by the first formant frequency of the input signal, so that the first formant frequency of the male and female is significantly different. Can be realized more easily by using.

【００１０】[0010]

【実施例】以下本発明の実施例について図面を参照して
説明する。図１は本発明の実施例に係る音声認識装置を
示す図である。なお、全図を通じて同様の構成要素につ
いては同一参照番号又は記号をもって表す。本図に示す
音声認識装置は、話者の音声を捕捉する複数のマイクロ
フォン２００と、該マイクロフォン２００に接続され音
声の方向、音源からの距離から話者を識別する話者方向
距離判定部２０１と、該話者方向距離判定部２０１に接
続され話者を識別した音声信号から雑音を消去する適応
形処理さらに自動利得制御（ＡＧＣ）を行う音声認識の
前処理部２０２と、該前処理部２０２に接続され音声を
登録されたどの単語に一致するかを認識しその結果を各
種制御部２０４（図５参照）に出力する音声認識部２０
３と、前記前処理部２０２の各種処理のパラメータを切
換設定するパラメータ設定部１０と、前記マイクロフォ
ン２００からの音声信号に基づき前記パラメータ設定部
のパラメータの切換を判定するパラメータ切換判定部１
１を具備する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a voice recognition device according to an embodiment of the present invention. In addition, the same reference number or symbol is used to represent the same component throughout the drawings. The voice recognition device shown in the figure includes a plurality of microphones 200 for capturing a voice of a speaker, a speaker direction distance determination unit 201 connected to the microphones 200 for identifying a speaker based on a direction of the voice and a distance from a sound source. , A speech recognition preprocessing unit 202 which is connected to the speaker direction distance determination unit 201 to eliminate noise from a voice signal which identifies a speaker and further performs automatic gain control (AGC), and the preprocessing unit 202 Connected to the voice recognition unit 20 for recognizing which word matches the registered voice and outputting the result to the various control units 204 (see FIG. 5).
3, a parameter setting unit 10 for switching and setting various processing parameters of the preprocessing unit 202, and a parameter switching determination unit 1 for determining switching of parameters of the parameter setting unit based on a voice signal from the microphone 200.
1 is provided.

【００１１】次にパラメータ設定部１０及びパラメータ
切換判定部１１について説明する。図２は図１のパラメ
ータ設定部１０及びパラメータ切換判定部における信号
処理を説明するフローチャートである。本図に示すよう
に、ステップ１及び２は認識システム外の処理であって
予め下記パラメータ値を決定するものであり、ステップ
３以降では認識システム内の処理を行う。Next, the parameter setting unit 10 and the parameter switching determination unit 11 will be described. FIG. 2 is a flowchart for explaining signal processing in the parameter setting unit 10 and the parameter switching determination unit of FIG. As shown in the figure, steps 1 and 2 are processes outside the recognition system, and determine the following parameter values in advance, and after step 3, the processes inside the recognition system are performed.

【００１２】先ずステップ１においては、マイクロフォ
ン２００からの音声に基づきパラメータ切換判定部１１
により音声波形のスペクトルパターン（第１ホルマント
周波数）の違いにより音声をｎ個のパターンに大別して
設定され、このｎ個のスペクトルパターンが格納され
る。この格納技術自体は周知のものなものであるから、
説明を省略する。ここで音声をｎ個のパターンに大別す
る方法として第１ホルマント周波数により音声をパター
ン化するものを以下に説明する。先ず音声生成について
簡単に説明する。音声の音響的特性を決める物理的要因
は、音源の特性、声道の共鳴特性及び唇ないし鼻孔から
の音波の放射特性であるといわれている。図３は音声波
のスペクトルを示す図である。本図に示すように、音声
波のスペクトルでは、周波数が高くなると一定の傾斜で
音声の強さが小さくなり、声道の共鳴に対応したいくつ
かの山がありこれをホルマントという。周波数が一番低
い山を第１ホルマントと呼ぶ。この第１ホルマントが生
じる第１ホルマント周波数は個人差があり、個人により
ばらつくが生じている。この第１ホルマント周波数のば
らつきに対応して認識率のばらつきが生じていることに
本発明者は気がついた。このため、第１ホルマント周波
数に対応して前記前処理部２０２に設定すべきパラメー
タを変更することにより前処理の最適化が図れることに
なる。したがってパラメータ切換判定部１１にはスペク
トルパターンとしてｎ個の第１ホルマント周波数領域、
例えば第ホルマント周波数を１００Ｈｚ、１２５Ｈｚ、
１５０Ｈｚ、１７５Ｈｚを中心に一定幅を記憶する。First, in step 1, the parameter switching determination unit 11 is based on the voice from the microphone 200.
According to the difference in the spectrum pattern (first formant frequency) of the voice waveform, the voice is roughly divided into n patterns and set, and the n spectrum patterns are stored. Since this storage technology itself is well known,
The description is omitted. Here, as a method of roughly classifying speech into n patterns, a method of patterning speech by the first formant frequency will be described below. First, the voice generation will be briefly described. It is said that the physical factors that determine the acoustic characteristics of voice are the characteristics of the sound source, the resonance characteristics of the vocal tract, and the emission characteristics of sound waves from the lips or nostrils. FIG. 3 is a diagram showing a spectrum of a sound wave. As shown in this figure, in the spectrum of a voice wave, the intensity of the voice decreases with a certain slope as the frequency increases, and there are several peaks corresponding to the resonance of the vocal tract, which is called a formant. The mountain with the lowest frequency is called the first formant. The first formant frequency generated by the first formant varies from person to person and varies from person to person. The present inventor has noticed that the recognition rate varies depending on the variation of the first formant frequency. Therefore, the preprocessing can be optimized by changing the parameter to be set in the preprocessing unit 202 according to the first formant frequency. Therefore, the parameter switching determination unit 11 includes n first formant frequency regions as spectral patterns,
For example, the first formant frequency is 100Hz, 125Hz,
A certain width is stored around 150 Hz and 175 Hz.

【００１３】ステップ２においては、ステップ１で第１
ホルマント周波数により大別したｎ種類の音声パターン
についてシミュレーション、エミュレーションを繰り返
し、各制御パラメータの最適を決定する。この最適値は
理論的裏付けがなく実験による経験則により決定され
る。なおシミュレーションでは本制御システムの音声認
識装置を用いず、例えば、パーソナルコンピュータに前
処理部２０２、音声認識装置を構成し、理想状態で、各
大別された第１ホルマントで最適パラーメタを求めるも
のである。エミューレションでは、ＤＳＰ（Digital Si
gnal Processor)で構成され、実機である本制御システ
ムの音声認識装置により、シミュレーションで決定され
たパラメータが実用できるかをチェックするものであ
る。In step 2, first in step 1
Simulation and emulation are repeated for n kinds of voice patterns roughly classified by formant frequencies, and the optimum of each control parameter is determined. This optimum value has no theoretical support and is determined by experimental empirical rules. In the simulation, the voice recognition device of the present control system is not used, but for example, the preprocessing unit 202 and the voice recognition device are configured in a personal computer, and in the ideal state, the optimal parameters are roughly classified by the first formants. is there. In emulation, DSP (Digital Si
gnal processor) and the actual voice recognition device of this control system checks whether the parameters determined by simulation can be used.

【００１４】ここでパラメータの内容は前述したよう
に、前処理部２０２における雑音低減用適応型フィルタ
（ＡＤＦ）のタップ長、遅延量、更新係数、自動利得制
御装置（ＡＧＣ）の設定値、帯域制限フィルタのカット
オフ周波数、遮断特性（減衰特性）等である。ステップ
３においては、メモリで構成されるパラメータ設定部１
０に、上記のようにして得られた各最適パラメータが第
１ホルマント周波数別に格納される。Here, the contents of the parameters are, as described above, the tap length, the delay amount, the update coefficient of the noise reduction adaptive filter (ADF) in the pre-processing unit 202, the set value of the automatic gain controller (AGC), and the band. The cutoff frequency of the limiting filter, the cutoff characteristic (attenuation characteristic), and the like. In step 3, the parameter setting unit 1 including a memory
The optimum parameters obtained as described above are stored in 0 for each first formant frequency.

【００１５】ステップ４においては、マイクロフォン２
００に入力した音声をパラメータ切換判定部１１により
スペクトル分析し、話者の音声パターンとパラメータ切
換判定部１１に格納されたｎ種の音声パターンを比較す
る。すなわち、スペクトル分析により得られた第１ホル
マント周波数がパラメータ切換判定部１１に格納された
第１ホルマント周波数を求め、この第１ホルマント周波
数が格納されているｎ個の第１ホルマント周波数のどの
領域に属するかを比較する。In step 4, the microphone 2
The voice input to 00 is spectrum-analyzed by the parameter switching determination unit 11, and the voice pattern of the speaker is compared with the n types of voice patterns stored in the parameter switching determination unit 11. That is, the first formant frequency obtained by the spectrum analysis determines the first formant frequency stored in the parameter switching determination unit 11, and the region of the n first formant frequencies in which the first formant frequency is stored is determined. Compare if they belong.

【００１６】ステップ５においては、ステップ４での比
較からパラメータ切換判定部１１によりｎ種の音声パタ
ーンの中から話者の音声パターンと最も類似したものを
選ぶ。ステップ６においては、ステップ５で選択した音
声パターンの各制御パラメータをパラメータ設定部１０
のメモリから読み出し、このパラメータを用いて前処理
部２０２で信号の前処理を行い、この前処理された信号
により音声認識部２０３により音声認識を行う。In step 5, from the comparison in step 4, the parameter switching determination unit 11 selects the most similar to the speaker's voice pattern from the n types of voice patterns. In step 6, each control parameter of the voice pattern selected in step 5 is set to the parameter setting unit 10
, The signal is pre-processed by the pre-processing unit 202 using this parameter, and the voice recognition unit 203 performs voice recognition based on the pre-processed signal.

【００１７】したがって本実施例によれば、従来では前
処理の最適パラメータが固定されていたが、話者により
最適パラメータを変化させるので、話者に依存せず安定
して高い認識率を得ることが可能になる。図４は図１の
パラメータ設定部１０及びパラメータ切換判定部におけ
る別の信号処理を説明するフローチャートである。本図
に示すように、ステップ１１は認識システム外の信号処
理を説明し、ステップ１２以降では認識システム内の信
号処理を説明する。ステップ１１において、シミュレー
ション、エミュレーションで各制御パラメータの最適値
を男女別に決定する。このように、大別するのは音声ス
ペクトルのパターンについては、男性の場合には概ね第
１ホルマント周波数が１００Ｈｚ〜１７５Ｈｚにあり、
女性の場合には第１ホルマント周波数が２００Ｈｚ〜３
００Ｈｚにあるからである。すなわち、特に第１ホルマ
ント周波数において男女間の差異が顕著に現れている。
なお、前記制御パラメータついては、前述のように、そ
の最適値は理論的裏付けがなく経験則から決定される。Therefore, according to the present embodiment, the optimum parameters for preprocessing have been fixed in the past, but since the optimum parameters are changed by the speaker, it is possible to stably obtain a high recognition rate without depending on the speaker. Will be possible. FIG. 4 is a flowchart illustrating another signal processing in the parameter setting unit 10 and the parameter switching determination unit in FIG. As shown in the figure, step 11 describes signal processing outside the recognition system, and step 12 and subsequent steps describe signal processing inside the recognition system. In step 11, the optimum value of each control parameter is determined for each gender by simulation and emulation. As described above, the patterns of the voice spectrum are roughly classified into the first formant frequency of 100 Hz to 175 Hz in the case of men,
In the case of women, the first formant frequency is 200 Hz to 3
This is because it is at 00 Hz. That is, the difference between men and women is particularly remarkable in the first formant frequency.
As described above, the optimum value of the control parameter is not theoretically supported and is determined from an empirical rule.

【００１８】ステップ１２において、ステップ１１で決
定した制御パラメータ、すなわち男女用、女性用の２系
列のパラメータの最適パラメータをパラメータ設定部１
０に格納する。ステップ１３において、パラメータ切換
判定部１１によりマイクロフォン２００からの入力音声
のスペクトルのパターンを分析し、この分析により第１
ホルマント周波数から音声パターンが男性のものか、又
は女性のものかを判定する。In step 12, the control parameters determined in step 11, that is, the optimum parameters of the two series of parameters for male and female, are set to the parameter setting unit 1.
Store in 0. In step 13, the parameter switching determination unit 11 analyzes the spectrum pattern of the input voice from the microphone 200, and the first pattern is analyzed by this analysis.
From the formant frequency, it is determined whether the voice pattern is male or female.

【００１９】ステップ１４において、話者の性別により
パラメータ設定部１０のメモリ内に格納したパラメータ
のうち該当する方を選択する。ステップ１５において、
ステップ４で選択したパラメータを用いて前処理部２０
２に設定し音声認識を行う。本信号処理例によれば、前
記例と比較して構成が簡単化するという効果がある。In step 14, one of the parameters stored in the memory of the parameter setting section 10 is selected according to the gender of the speaker. In step 15,
Using the parameters selected in step 4, the preprocessing unit 20
Set to 2 for voice recognition. According to this signal processing example, there is an effect that the configuration is simplified as compared with the above example.

【００２０】[0020]

【発明の効果】以上説明したように本発明によれば、音
声のスペクトルパターンをそれぞれの特徴に基づき大別
し、大別された音声のスペクトルパターン毎に前処理の
最適化が図れるパラメータを格納し、音声の入力信号を
周波数分析し、周波数分析されたスペクトル分析が大別
されたどの音声のスペクトルパターンに属するかを判定
しパラメータを切り換えるようにし、従来ではパラメー
タを固定していたものを話者により可変にしたので、発
声話者による認識率のばらつきがなくなり、安定して高
認識率を得ることができる。音声のスペクトルパターン
を特徴づける第１ホルマント周波数が基準にされ前記音
声のスペクトルパターンが大別され、さらに入力信号の
第１ホルマント周波数によりパラメータが切り換えられ
ることにより、容易に実現可能できる。As described above, according to the present invention, the speech spectrum patterns are roughly classified based on their respective characteristics, and the parameters for pre-processing optimization are stored for each of the roughly classified speech spectrum patterns. Then, the input signal of the voice is frequency-analyzed, the frequency-analyzed spectrum analysis is roughly classified to determine which voice's spectrum pattern belongs, and the parameters are switched. Since it is variable depending on the person, it is possible to stably obtain a high recognition rate without variations in the recognition rate depending on the speaker. This can be easily realized by roughly classifying the speech spectrum pattern based on the first formant frequency that characterizes the speech spectrum pattern, and by switching the parameter according to the first formant frequency of the input signal.

[Brief description of drawings]

【図１】本発明の実施例に係る音声認識装置を示す図で
ある。FIG. 1 is a diagram showing a voice recognition device according to an embodiment of the present invention.

【図２】図１のパラメータ設定部１０及びパラメータ切
換判定部１１における信号処理を説明するフローチャー
トである。2 is a flowchart illustrating signal processing in a parameter setting unit 10 and a parameter switching determination unit 11 of FIG.

【図３】音声波のスペクトルを示す図である。FIG. 3 is a diagram showing a spectrum of a voice wave.

【図４】図１のパラメータ設定部１０及びパラメータ切
換判定部１１における別の信号処理を説明するフローチ
ャートである。FIG. 4 is a flowchart illustrating another signal processing in the parameter setting unit 10 and the parameter switching determination unit 11 in FIG.

【図５】従来の音声認識装置を用いた制御システムを示
す図である。FIG. 5 is a diagram showing a control system using a conventional voice recognition device.

[Explanation of symbols]

１０…パラメータ設定部１１…パラメータ切換判定部２００…マイクロフォン２０１…話者方向・距離判定部２０２…前処理部２０３…音声認識部 10 ... Parameter setting unit 11 ... Parameter switching determination unit 200 ... Microphone 201 ... Speaker direction / distance determination unit 202 ... Preprocessing unit 203 ... Speech recognition unit

Claims

[Claims]

1. A voice recognition device for pre-processing an input signal of voice, recognizing voice, and controlling a device based on the recognition result, wherein the spectrum pattern of the voice is roughly classified according to each feature, and A parameter setting unit (10) that stores a parameter capable of optimizing the pre-processing for each of the separated speech spectrum patterns; and frequency analysis of the input signal of the speech, and the spectrum analysis subjected to the frequency analysis is roughly classified into the following. A voice recognition device, comprising: a parameter switching unit (11) for determining which of the voices the spectrum pattern belongs to and switching the setting parameter of the parameter setting unit (10).

2. The spectrum pattern of the voice is roughly classified on the basis of a first formant frequency which characterizes the spectrum pattern of the voice, and the parameter is switched to a parameter having the most similar spectrum pattern based on the result of the rough classification. 1. The voice recognition device according to 1.