JPH02232697A

JPH02232697A - Voice recognition device

Info

Publication number: JPH02232697A
Application number: JP1053200A
Authority: JP
Inventors: Kimiharu Shimizu; 公治清水; Haruyasu Yamaoka; 晴康山岡; Kunikazu Suzuki; 邦一鈴木; Kazuo Nakamura; 一雄中村; Yutaka Uono; 宇尾野　豊; Hiroshige Asada; 博重浅田
Original assignee: NipponDenso Co Ltd
Current assignee: Denso Corp
Priority date: 1989-03-06
Filing date: 1989-03-06
Publication date: 1990-09-14
Anticipated expiration: 2013-01-21
Also published as: JP2701431B2

Abstract

PURPOSE:To improve the recognition rate by providing an acoustic analyzing means which extracts feature data from a voice signal corresponding to a voice section of the voice signal from a microphone and outputs it to a comparing means. CONSTITUTION:A noise cut filter 2 outputs the voice signal inputted from the micriphone 1 after attenuating its band where more noise components are present and a voice section detection means 6 detects a voice section with the signal from the noise cut filter 2. Then an acoustic analyzing means 8 extracts the feature data from the voice signal corresponding to the voice section of the voice signal from the microphone 1 and outputs the data to the comparing means 14, which compares standard data stored in a voice storage means 12 with the feature data. Consequently, the feature data can be extracted from the inputted voice signal not through the noise cut filter 2 and the voice can be recognized based upon the feature data, so the recognition rate is improved.

Description

【発明の詳細な説明】［産業上の利用分野１本発明｛友　音声情報を判定する音声認ｍ装直に関し、
特１ミ　非定常騒音の多い環境下における音声認識装置
に関する．［従来の技術］従来より、キーワードとなる操作者の発声する音声信号
と、登録済みの音声信号との類似度により、音声信号を
認識して、各種機器の運転を制御する音声認識装置が知
られている．そして、音声認識装置の用いられる周囲の
環境からの騒音によって、例え｛戴　空気調和装置に音
声認識装置を組み込んだ場合｛ミ　空気調和装置の待つ
騒音、振動音、及びその他の外部の騒音によって、音声
の誤認識を起こす場合があり、その対策が取られている
．例え｛Ｌ　空気調和装置の運転中に１友　音声信号を第
５図に示すような低周波帯域をカットする特性を有する
騒音カットフィルタを通すことによって、騒音の信号を
減衰させている．その結果　第６図に示すような低周波
数の空気調和装厘の発生する騒音あるいは外部の騒音を
第７図に示すように減衰することができる．そして、騒
音カットフィルタを通した音声信号から特徴データを抽
出し、予め記憶された標準データとを比較して、そのマ
ッチングによって音声情報を判定し、空気調和装置等を
制御していた［発明が解決しようとする課題］しかしながら、こうした従来の音声認識装置で｛上　入
力される音声信号を騒音カットフィルタにより処理する
ので、騒音を低減することはできるが、騒音の低減と共
に音声信号の一部までをも減衰してしまう場合があった
　例え｛戯　音声信号として、母音「あ」が入力された
場合に１社　第８図ｆ二示すような周波数成分の音声信
号がマイクロフォンから出力される．そして、この音声
信号を前述した特性を有する騒音カットフィルタにより
処理すると、音声を特徴付けるホルマント周波数のへ　
低周波数の第１、第２ホルマント等を第８図トルのピー
クを欠落させてしまう．その表　フィルタ処理後の特徴
抽出の際１：，音声信号のへ　認識時最も重要となる特
徴量を減少させることとなってしまい、認識率の低下を
招くという問題があったそこで本発明は上記の課題を解決することを目的とし、
騒音を減衰させるフィルタの影響を受けることなく音声
信号の特徴を抽出し、認識率の向上を図った音声認識装
置を提供することにある．［課題を解決するための手段
］かかる目的を達成すべく、本発明は課題を解決するため
の手段として次の構成を取った　即ち、音声に応じた標
準データを予め記憶する音声記憶手段を有し，前記標準
データとマイクロフォンから入力される音声信号に応じ
た特徴データとを比較する比較手段を備えた音声認識装
置において、前記音声信号の内の騒音成分を多く含む帯
域を減衰して出力する騒音カットフィルタと、該騒音カ
ットフィルタからの信号により音声区間を検出する音声
区間検出手段と、前記マイクロフォンからの音声信号のれ　前記音声区間
に応じた前記音声信号から特徴データを抽出して前記比
較手段に出力する音響分析手段と、を備えたことを特徴
とする音声認諷装置の構成がそれである．［作用］前記構成を有する音声認識装置（上　騒音カットフィル
タが，マイクロフォンから入力された音声信号の内の騒
音成分を多く含む帯域を減衰して出力し、音声区間検比
手段が，ｉＩ音力ットフィ，ルタからの信号により音声
区間を検比する．そして、音響分析手段が、マイクロフ
ォンからの音声信号の丙　音声区間に応じた音声信号か
ら特徴データを抽出して比較手段に出力し、比較手段が
音声記憶手段に記憶された標準データと前記特徴データ
とを比較する．よって、騒音カットフィルタを通すこと
なく入力された音声信号から特徴データを抽出でき、こ
の特徴データに基づいて音声の認識ができるので認諷率
が向上する．［実施例］以下本発明の実施例を図面に基づいて詳細に説明する．第１図は本発明の一実施例である音声認識装置の概略構
成図である．　１は周知のマイクロフォンであり、操作
者の発した音声を電気信号に変換し音声信号として出力
するものである．このマイクロフォンｌｆｔ，１ｉ音カ
ットフィルタ２と、パンドバスフィルタ４とにそれぞれ
接続されている．騒音カットフィルタ２（九　マイクロ
フォン１から入力される騒音を含んだ音声信号から騒音
成分を含んでいる周波数帯域を減衰させるものである．
本実施例で｛友　第２図に示すような低周波数と高周波
数の領域の音声を大きく減衰させ２ｋＨｚ〜４ｋＨｚ付
近で最も感度が高い、人間の聴感特性に合致した補正特
性を有するものである．あるい｛上　このような聴感補
正特性を有するもの以外でも、その環境により、層音の
スペクトル成分に高い周波数成分が多い場合にＬＬ　　
騒音カットフィルタとしてハイカットフィルタを用いれ
ばよい．又，第６図に示すような低周波成分を多く含む
騒音特性を有する環境下におかれている場合に１　その
環境に応じて第５図に示すような低周波数成分を減衰さ
せる周波数特性を有するものであってもよい．若し＜は
、　　騒音が特定の中城に集中している場合に（友　騒
音カットフィルタとして中城カットフィルタを用いれば
よい．更にＣ山　　プログラムによってその特性を変え
ることができる自由度の大きなディジタルフィルタを用
い、周囲の騒音状態等によって，その特性を切り換える
構成としてもよい．砥　フィルタの減衰率等の特性を強
く設定し過ぎると，Ｒ音はより減衰させられるが、同時
に純粋な音声信号も減衰させられてしまう．又、特性を
弱くすると騒音の減衰が弱く、後述する音声区間の検出
に影響がでる．そして、もう一方のパンドバスフィルタ
４（友マイクロフォン１から入力される騒音を含んだ音
声信号を，音声帯坂　例えばおおよそ２，００｝−！ｚ
〜４ＫＨｚの周波数帯域以外の成分をカットするもので
ある．このフィルタとして、デイジタルフィルタを用い
てもよい．前記騒音カットフィルタ２１友　　音声区間検出部６に
接続されており、この音声区間検出部６は、騒音カット
フィルタ２から出力された音声信号のパワー情報と予め
設定された所定のしきい値とを比較して、音声区間の始
端と終端を検出し，しきい値のレベルより高いときは音
声区間として、低いときは無音区間として出力するもの
である．このしきい値【友　固定した１つだけでもよい
が，複数のしきい値や周囲の騒音に合わせてしきい値を
可変にしてもよい．また、音声スペクトルの傾きやピッ
チ情報などと併せて、音声区閘を検出するものであって
もよい．この音声区間検出部６１社前記バンドバスフィ
ルタ４と共に音響分析部８に接続されており、音響分析
部８に（友　バンドバスフィルタ４を通った音声帯域の
音声信号と、音声区間信号とが入力される．随騒音カッ
トフィルタ２を通過した信号によって音声区間を検出す
るだけでなく，この信号に基づいて図示しないアンプの
ゲインをコントロールするようにしてもよい．入力信号
が大きければアンプのゲインを下げ、小さければアンプ
のゲインを上げることによって、音声信号のダイナミッ
クレンジを大きくすることができる．この時，基準とな
る信号に騒音成分が多く含まれていると正確なゲインコ
ントロールは不可能であり、騒音カットフィルタ２１二
より騒音成分をカットした信号を基準にすることにより
、より正確なゲインコントロールが可能となる．この音
響分析部８｛戴　入力された音声区間信号に基づいて、
その音声区間内のバンドバスフィルタ４を通った音声帯
域の音声信号のスペクトルを分析し、音声の特徴を表す
特徴パラメータを抽出するものである．この抽出に当た
って（友　例えｔｆ，周知の高速フーリエ変換（ＦＦＴ
），　　バンドバスフィルタバンクや、線形予測分析な
どにより一定周期毎１：．入力された音声区間内の音声
信号から特徴パラメータを抽呂して、特徴ベクトルの時
系列として出力するものである．随　音響分析部８にお
いて，更１；　　ＩＩ音成分を低減するような処理を行
って音声信号の分析をするようにしてもよい．この音響
分析部８１表　　切換スイッチ１０を介して、音声記憶
部１２若しくは音声比較部１４に選択的に接続できるよ
うになされている．前記音声記憶部１２｛表　抽出され
た特徴データ、例え（瓜ベクトルの時系列を標準データ
として記憶するものである．そして、前記音声比較部１
４（友　この音声記憶部１２に記憶された標準データと
、切換スイッチ１０を介して入力される特徴データとを
比較し、その類似度を計算して出力するものである．こ
の音声比較部１４１表　判定部１６に接続されており、
判定部１６（良　音声比較部１４からの類似度に応じて
それと最も似通ったかつ予め決められた一定の条件を満
たした場合１：，それに対応した信号を出力するもので
ある．この信号は出力端子１８から出力されるようにな
されており、この出力端子１８に接続された他の機器２
０、例えば空気調和装置を制御するようになされている
．次１：，本実施例の作動について説明する．まず、操
作者が音声を発声すると、例えば「うんでん」と発声す
ると、それがマイクロフォン１によって拾われて、電気
信号に変換されて音声信号として出力される．この時、
マイクロフォン１によって、発声された音声の前後での
音声以外の外部からの騒音も入力さ札　第３図に示すよ
うな騒音が重畳された音声信号が出力される．この出力
される音声信号（上　それぞれ騒音カットフィルタ２と
バンドバスフィルタ４とに入力される．騒音カットフィ
ルタ２に入力された音声信号１友　第４図に示す如く音
声信号に重畳された騒音成分を減衰させられ音声区間検
畠部６に出力される．音声区間検出部６で１上　入力さ
れた音声信号のパワー情報と所定のしきい値とを比較し
て、このしきい値より低い区間は無音区間と判定し、し
きい値よりも高い区間は音声区間と判定し、音声区間信
号を音響分析部８に出力する．低騒音カットフィルタ２
に人間の聴感補正特性に合致した特性をもたせ、人間の
感覚に似た分析を行って、理想的な音声認識の分析とす
ることができる。[Detailed description of the invention] [Industrial field of application 1 The present invention relates to a voice recognition system for determining voice information,
Special 1. Concerning speech recognition equipment in environments with a lot of unsteady noise. [Prior Art] Conventionally, voice recognition devices have been used to control the operation of various devices by recognizing voice signals based on the similarity between voice signals uttered by an operator, which serve as keywords, and registered voice signals. It is being done. For example, if a voice recognition device is installed in an air conditioner, noise from the surrounding environment in which the voice recognition device is used, such as noise from the air conditioner, vibration sounds, and other external noises, may This may cause misrecognition of speech, and countermeasures are being taken to prevent this. For example, while an air conditioner is operating, the noise signal is attenuated by passing the audio signal through a noise cut filter that has the characteristic of cutting the low frequency band as shown in Figure 5. As a result, the low frequency noise generated by the air conditioner as shown in Figure 6 or the external noise can be attenuated as shown in Figure 7. Then, feature data was extracted from the audio signal passed through the noise cut filter, compared with pre-stored standard data, and the audio information was determined based on the matching to control air conditioners, etc. [Problems to be Solved] However, with these conventional speech recognition devices, the input speech signal is processed by a noise cut filter, so noise can be reduced. For example, when the vowel ``a'' is input as an audio signal, an audio signal with frequency components as shown in Figure 8 f-2 is output from the microphone. When this audio signal is processed by a noise cut filter with the characteristics described above, the formant frequency that characterizes the audio is reduced.
This causes the peaks of the first and second formants, etc. of low frequencies in Figure 8 to be lost. Table 1: When extracting features after filter processing 1: of audio signals There was a problem that the most important feature quantity during recognition was reduced, leading to a decrease in the recognition rate.Therefore, the present invention was developed as described above. The aim is to solve the problems of
The purpose of this invention is to provide a speech recognition device that extracts features of speech signals without being affected by filters that attenuate noise and improves recognition rate. [Means for Solving the Problems] In order to achieve the above object, the present invention has the following configuration as a means for solving the problems. That is, the present invention has a voice storage means that stores standard data corresponding to the voice in advance. In the speech recognition device, the speech recognition device includes comparison means for comparing the standard data and characteristic data corresponding to the speech signal inputted from the microphone, and outputs the speech signal after attenuating a band containing many noise components. a noise cut filter; a voice section detecting means for detecting a voice section using a signal from the noise cut filter; a sound signal from the microphone; extracting feature data from the voice signal corresponding to the voice section and comparing the characteristic data; This is the configuration of a speech recognition device characterized by comprising: an acoustic analysis means for outputting an output to a speech recognition means. [Function] A speech recognition device having the above configuration (a) The noise cut filter attenuates and outputs a band containing many noise components in the speech signal input from the microphone, and the speech section comparison means Then, the acoustic analysis means extracts feature data from the voice signal corresponding to the first voice interval of the voice signal from the microphone and outputs it to the comparison means. compares the standard data stored in the audio storage means with the feature data.Therefore, feature data can be extracted from the input audio signal without passing through a noise cut filter, and speech recognition can be performed based on this feature data. [Examples] Examples of the present invention will be explained in detail below based on the drawings. Fig. 1 is a schematic configuration diagram of a speech recognition device that is an embodiment of the present invention. 1 is a well-known microphone that converts the voice emitted by the operator into an electrical signal and outputs it as an audio signal.The microphone lft, 1i is connected to the sound cut filter 2 and the pandobus filter 4, respectively. Noise cut filter 2 (9) This filter attenuates the frequency band containing noise components from the noise-containing audio signal input from the microphone 1.
In this embodiment, as shown in Figure 2, it has a correction characteristic that greatly attenuates sounds in the low frequency and high frequency ranges and is most sensitive in the vicinity of 2kHz to 4kHz, matching the human hearing characteristics. ．． Or even if it does not have such auditory correction characteristics, depending on the environment, if there are many high frequency components in the spectral components of the layered sound, LL may occur.
A high cut filter can be used as a noise cut filter. In addition, if you are placed in an environment with noise characteristics that include many low frequency components as shown in Figure 6, 1. Depending on the environment, you can set the frequency characteristics that attenuate the low frequency components as shown in Figure 5. It may be something that you have. If the noise is concentrated in a specific Nakagusuku area, a Nakagusuku cut filter can be used as the noise cut filter.Furthermore, if the noise is concentrated in a specific Nakagusuku area, a Nakagusuku cut filter can be used. A configuration may also be used in which a filter is used and its characteristics are switched depending on the surrounding noise condition, etc. If the filter's attenuation rate and other characteristics are set too strongly, the R sound will be further attenuated, but at the same time the pure audio signal will also be attenuated. Also, if the characteristics are weakened, the attenuation of the noise will be weak, which will affect the detection of voice sections, which will be described later. For example, approximately 2,00}-!z
It cuts components other than the frequency band of ~4KHz. A digital filter may be used as this filter. The noise cut filter 21 is connected to a voice section detection section 6, and this voice section detection section 6 detects the power information of the voice signal output from the noise cut filter 2 and a predetermined threshold value set in advance. By comparison, the start and end of the voice section are detected, and when the level is higher than the threshold level, it is output as a voice section, and when it is lower than the threshold level, it is output as a silent section. This threshold value can be set to one fixed value, but it is also possible to have multiple threshold values or to vary the threshold value according to the surrounding noise. Furthermore, it may be possible to detect a speech gap in conjunction with the slope of the speech spectrum, pitch information, etc. This voice section detection section 61 is connected to the acoustic analysis section 8 together with the bandpass filter 4, and the voice signal in the voice band that has passed through the bandpass filter 4 and the voice section signal are input to the acoustic analysis section 8. In addition to detecting the voice section using the signal that has passed through the accompanying noise cut filter 2, the gain of an amplifier (not shown) may be controlled based on this signal.If the input signal is large, the gain of the amplifier may be controlled. If the gain is small, the dynamic range of the audio signal can be increased by increasing the gain of the amplifier.At this time, if the reference signal contains many noise components, accurate gain control is impossible. By using the signal from which noise components have been cut by the noise cut filter 212 as a reference, more accurate gain control becomes possible.
The spectrum of the audio signal in the audio band that has passed through the bandpass filter 4 within the audio section is analyzed, and characteristic parameters representing the characteristics of the audio are extracted. In this extraction (for example, tf, the well-known fast Fourier transform (FFT)
), 1:. It extracts feature parameters from the audio signal within the input audio section and outputs them as a time series of feature vectors. In the acoustic analysis section 8, the audio signal may be further analyzed by performing processing to reduce the II sound component. This acoustic analysis section 81 can be selectively connected to an audio storage section 12 or an audio comparison section 14 via a changeover switch 10. The voice storage unit 12 (table) Extracted feature data, e.g., time series of melon vectors, is stored as standard data.The voice comparison unit 1
4 (Friend) The standard data stored in this voice storage section 12 and the characteristic data inputted via the changeover switch 10 are compared, and the degree of similarity is calculated and outputted. This voice comparison section 141 The table is connected to the determination unit 16,
Judgment unit 16 (good) If the voice is most similar according to the similarity from the voice comparison unit 14 and satisfies a predetermined condition 1:, it outputs a signal corresponding to it.This signal is output. It is designed to be output from the terminal 18, and other equipment 2 connected to this output terminal 18
0, for example, it is designed to control an air conditioner. Next 1: The operation of this embodiment will be explained. First, when the operator utters a voice, for example, ``Unden,'' the voice is picked up by the microphone 1, converted into an electrical signal, and output as an audio signal. At this time,
The microphone 1 inputs external noises other than the voices before and after the uttered voice, and outputs an audio signal with superimposed noise as shown in Figure 3. This output audio signal (above) is input to the noise cut filter 2 and the bandpass filter 4, respectively.The audio signal 1 input to the noise cut filter 2 is the noise component superimposed on the audio signal as shown in Figure 4. is attenuated and output to the voice section detection section 6.The voice section detecting section 6 compares the power information of the input voice signal with a predetermined threshold value and detects a section lower than this threshold value. is determined to be a silent section, and a section higher than the threshold value is determined to be a voice section, and the voice section signal is output to the acoustic analysis section 8.Low noise cut filter 2
It is possible to give ideal speech recognition analysis by giving characteristics that match the human hearing correction characteristics and performing analysis similar to human senses.

一方、バンドバスフィルタ４（友　入力された音声信号
を音声帯域以外を減衰させて音響分析部８に出力する．
音響分析部８で（上　このバンドバスフィルタ４からの
出力された音声信号と、音声区間検出部６からの音声区
間信号とから、音声区間内の音声信号を分析して特徴デ
ータを抽出する．即ち、操作者によって音声を発声され
た区間の音声信号のみに基づいて特徴データを抽出し、
それ以外の区間｛友　騒音であるとして特徴データを抽
畠する処理を行わない。On the other hand, the bandpass filter 4 attenuates the input audio signal except for the audio band and outputs it to the acoustic analysis section 8.
The acoustic analysis section 8 (above) analyzes the speech signal within the speech section and extracts feature data from the speech signal output from the bandpass filter 4 and the speech section signal from the speech section detection section 6. That is, feature data is extracted based only on the audio signal of the section in which the operator uttered the audio,
Other sections {friend) are considered to be noise and no process is performed to extract feature data.

そして、切換スイッチ１０が音声記憶部１２側に切り換
えられているときに｛上　前記音響分析部８から出力さ
れる特徴データを標準データとして音声記憶部１２に記
憶する．こうして、音声記憶部１２に｛友　特定話者が
発声した単語や短文等の音声の特徴データ、例えｌｆ．
　　音声が「うんでんＪである場合にはその特徴データ
が標準データとして記憶される．切換スイッチ１０が音声比較部１４側に切り換えられて
いるときに（．ｔ，前記音響分析部８から出力される特
徴データと、前記音声記憶部１２に記憶された標準デー
タとが、音声比較部１４で比較される．そして，音声比
較部１４で１社　特徴データと標準データとの類似度を
計算して出力する．判定部１６で１表　この音声比較部
１４から出力される類似度に基づいて操作者が発した音
声が何であるかを判断し、その発声した音声１二応じた
信号を出力する．そして、出力端子１８を介して、その
信号を他の機器２０に出力する．例え１区　空気調和装置に用いた場合に１上　その発声
が「うんでん」である場合に１よ　空気調和装置の運転
を開始する信号を出力する．あるいはその発声に応じた
信号を出力して、設定温度を上げたりする制御を実行す
る．雌　本実施例で１良　特定話者の音声認識を例としたが
、不特定話者の音声認識であっても同様に実施可能であ
る．前述した如く、本実施例の音声認識装置１友　騒音カッ
トフィルタ２により騒音を減衰し、音声区間検出部６が
その音声信号に基づいて音声区間を検出する．そして、
この音声区間と騒音カットフィルタ２を通らない音声信
号とに基づいて音響分折部８が特徴データを抽出する。Then, when the changeover switch 10 is switched to the voice storage section 12 side, the feature data output from the acoustic analysis section 8 is stored in the voice storage section 12 as standard data. In this way, the voice storage unit 12 stores voice characteristic data such as words and short sentences uttered by a specific speaker, for example, lf.
When the sound is "Unden J", its characteristic data is stored as standard data. When the selector switch 10 is switched to the sound comparison section 14 side (. The feature data stored in the voice storage section 12 and the standard data stored in the voice storage section 12 are compared in the voice comparison section 14. Then, the voice comparison section 14 calculates the degree of similarity between the characteristic data of one company and the standard data. The determination unit 16 determines what kind of voice the operator has uttered based on the similarity output from the voice comparison unit 14, and outputs a signal corresponding to the voice uttered. .Then, the signal is output to other equipment 20 via the output terminal 18.For example, if it is used in an air conditioner, it will be 1.If the utterance is "Unden", it will be 1.Air conditioning Outputs a signal to start operating the device. Or outputs a signal in response to the utterance to execute control such as increasing the set temperature. However, it is also possible to perform speech recognition for unspecified speakers.As mentioned above, the speech recognition device 1 of this embodiment attenuates the noise by the noise cut filter 2, and the speech section detection unit 6 Detect a voice section based on the voice signal.Then,
Based on this audio section and the audio signal that does not pass through the noise cut filter 2, the acoustic splitting unit 8 extracts feature data.

切換スイッチ１０を切り換えて、この特徴データを標準
データとして音声記憶部１２に記憶し、又、音声比較部
１４によって、特徴データと標準データとを比較する．従って，音響分析部８において（友騒音カットフィルタ
２を通らない音声信号から特徴データを抽出する．よっ
て、騒音カットフィルタ２により音声成分までおも減衰
された音声信号からの音声の特徴データの抽出ではなく
、減衰されていない音声信号から適正に特徴データを抽
出することができる．これにより、的確に音声の特徴を
掴むことができ、音声比較部１４で特徴データと標準デ
ータとの類似度の計算精度が向上し，音声の認識率が向
上する．又，騒音が発声音声ヒオーバラップしていても、騒音カ
ットフィルタにより騒音成分を減衰させた信号から音声
区間検出を実行することで、精度よく音声区間検出を行
うことが可能である．従来１表発声音声の音声区間に騒
音が重なって、音声区間が広く取ら札　これによって、
特像データを抽出していたので、認識エラーを引き起こ
す場合があったが、適正に音声区間を検出することによ
って、認諷率の向上を図ることができる．以上本発明はこの様な実施例に何等限定されるものでは
なく、本発明の要旨を逸脱しない範囲において種々なる
態様で実施し得る．［発明の効果］以上詳述したように本発明の音声認識装置（よ騒音成分
を含んだ入力音声から騒音成分を減衰させ、その信号か
ら音声区間検出を実行するこｌとで精度よく音声区間検
出を行うことが可能であり、かつ音響分析部に音声情報
を何ら欠落させることなく入力して分析することを可能
とし、騒音が大きい環境　特に非定常的な騒音の多い環
境での認識性能向上にすぐれた効果がある．The changeover switch 10 is switched to store this characteristic data as standard data in the voice storage section 12, and the voice comparison section 14 compares the characteristic data with the standard data. Therefore, the acoustic analysis unit 8 extracts feature data from the speech signal that does not pass through the noise cut filter 2. Therefore, the feature data of the speech is extracted from the speech signal whose speech components have been largely attenuated by the noise cut filter 2. Instead, it is possible to properly extract feature data from the unattenuated audio signal.This allows the features of the audio to be accurately grasped, and the audio comparison unit 14 can calculate the similarity between the feature data and the standard data. Calculation accuracy is improved, and the speech recognition rate is improved.Also, even if the noise overlaps with the spoken voice, by performing speech section detection from the signal whose noise components have been attenuated by the noise cut filter, it is possible to accurately recognize the speech. It is possible to perform section detection.Conventionally, noise overlaps with the speech section of the first uttered voice, making the speech section wide.As a result,
Since special image data was extracted, recognition errors could occur, but by properly detecting speech sections, the recognition rate can be improved. The present invention is not limited to these embodiments in any way, and may be implemented in various forms without departing from the gist of the present invention. [Effects of the Invention] As described in detail above, the speech recognition device of the present invention (attenuating noise components from input speech containing noise components and detecting speech sections from the signal) accurately detects speech sections. It is possible to perform detection and input and analyze audio information without losing any audio information to the acoustic analysis unit, improving recognition performance in noisy environments, especially in non-stationary noisy environments. It has excellent effects.

[Brief explanation of the drawing]

第１図は本発明の一実施例としての音声認識装置の概略
構成は　第２図は聴感補正特性を有する騒音カットフィ
ルタの特性を示すグラフ、第３図はマイクロフォンから
の音声信号を示すグラフ、第４図は騒音カットフィルタ
により処理した信号のグラフ、第５図はロー力ットフィ
ルタの特性を示すグラフ、第６図は低域に集中した騒音
のレベルを示すグラフ、第７図はローカットフィルタに
より第６図の騒音を処理したレベルのグラフ、第８図は
ホルマント周波数を説明するグラフである．１・・・マ
イクロフォン２・・・騒音カットフィルタ６・・・音声区間検出部　　　　８・・・音響分析部１
２・・・音声記憶部　　　　　１４・・・音声比較部代
理人　　弁理士　　足立　勉（ほが２名）第１図第３図第４図第２図第５図周波数（Ｈｚ）周波数FIG. 1 is a schematic configuration of a speech recognition device as an embodiment of the present invention; FIG. 2 is a graph showing the characteristics of a noise cut filter having audibility correction characteristics; FIG. 3 is a graph showing audio signals from a microphone; Figure 4 is a graph of the signal processed by the noise cut filter, Figure 5 is a graph showing the characteristics of the low cut filter, Figure 6 is a graph showing the level of noise concentrated in the low range, and Figure 7 is the graph of the signal processed by the low cut filter. Figure 6 is a graph of the noise processed level, and Figure 8 is a graph explaining the formant frequency. 1... Microphone 2... Noise cut filter 6... Voice section detection section 8... Acoustic analysis section 1
2...Speech storage section 14...Speech comparison section agent Patent attorney Tsutomu Adachi (2 people) Figure 1 Figure 3 Figure 4 Figure 2 Figure 5 Frequency (Hz) Frequency

Claims

[Claims]

(1) A speech recognition device having a voice storage means for storing standard data corresponding to a voice in advance, and a comparison means for comparing the standard data with characteristic data according to a voice signal input from a microphone, a noise cut filter that attenuates and outputs a band containing many noise components in the audio signal; a voice section detection means that detects a voice section using a signal from the noise cut filter; , acoustic analysis means for extracting characteristic data from the voice signal corresponding to the voice section and outputting it to the comparison means.

(2) The speech recognition device according to claim 1, wherein the filter is a filter having correction characteristics that match or approximate human hearing characteristics.