JPH06110492A

JPH06110492A - Speech recognition device

Info

Publication number: JPH06110492A
Application number: JP5131866A
Authority: JP
Inventors: Kenichi Oishi; 健一大石
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-08-13
Filing date: 1993-06-02
Publication date: 1994-04-22

Abstract

PURPOSE:To provide a means which effectively remove a noise component other than a speech that a user vocalizes. CONSTITUTION:This device is provided with a vowel detection processing part 4 which detects a vowel part in spectrum data in a certain word section and a noise decision processing part 5 which decides whether the word section includes a speech vocalized by the user or a background noise according to information regarding the presence/absence state or the rate of the vowel part in the spectrum data in the word section received from the vowel detection processing part 4.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声入力装置におい
て、使用者の発声音声と、これとは区別が困難な、突発
的・大音量のノイズとを判別するための方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for discriminating between a user's uttered voice and a sudden and loud noise that is difficult to distinguish from a voice uttered by a user in a voice input device.

【０００２】近年、音声認識装置は徐々に実用化が進
み、補助の入力手段に用いることが可能になってきた。
また、ＦＡ化などにより、従来のキーボード等の入力手
段を用いることが不可能な環境へも、コンピュータが導
入されるようになり、従来の入力手段に代わる新しい入
力手段としての音声認識装置を運用することが要求され
ている。よって、高精度、かつ高認識率な入力を可能と
する音声認識装置を実用化することは、重要なことであ
る。In recent years, voice recognition devices have been gradually put into practical use, and can be used as auxiliary input means.
Also, due to FA and other factors, computers are being introduced into environments where conventional input means such as keyboards cannot be used, and a voice recognition device is being operated as a new input means to replace the conventional input means. Required to do so. Therefore, it is important to put into practical use a voice recognition device that enables input with high accuracy and a high recognition rate.

【０００３】[0003]

【従来の技術】従来の音声認識装置においては、主に入
力音の有無、あるいは入力音量の大小を入力音声の検出
基準としている。よって、従来の音声認識装置は、なん
らかの音を検出した場合や、ある一定のレベル範囲の音
量を検出した場合にのみ動作する。2. Description of the Related Art In a conventional voice recognition apparatus, the presence or absence of an input sound or the volume of an input sound volume is mainly used as a reference for detecting an input voice. Therefore, the conventional voice recognition device operates only when some sound is detected or when a sound volume in a certain level range is detected.

【０００４】ところが、音声認識装置は先に述べたよう
に、ＦＡ環境等の高騒音下での運用要求が非常に高いの
で、その背景には衝撃音など、使用者の発声音と同程度
の音量を持つノイズが数多く存在する。そのため、音声
認識装置が、それらのノイズをも認識の対象としてしま
うことがしばしば発生する。However, as described above, the voice recognition device is required to operate in a high noise environment such as an FA environment. Therefore, the background of the voice recognition device is almost the same as a user's vocal sound such as an impact sound. There are many loud noises. Therefore, the voice recognition device often targets such noise as well.

【０００５】[0005]

【発明が解決しようとする課題】このように、従来の音
声認識装置では、背景ノイズを入力音声として誤認識
し、無意味な処理・実行結果を出力する。そのため、音
声認識装置を擁するシステムの運用を頻繁に妨げるとい
った問題を生じていた。As described above, in the conventional voice recognition apparatus, the background noise is erroneously recognized as the input voice and the meaningless processing / execution result is output. Therefore, there has been a problem that the operation of a system having a voice recognition device is frequently hindered.

【０００６】本発明は、このような従来の問題点に鑑み
て為されたものであり、使用者が発声する音声以外の入
力ノイズを効果的に排除することにより、音声認識装置
の性能を向上して、音声認識装置を擁したシステムの円
滑な運用を図ることを目的とする。The present invention has been made in view of such conventional problems, and improves the performance of a voice recognition device by effectively eliminating input noise other than the voice uttered by the user. Then, it aims at the smooth operation of the system having the voice recognition device.

【０００７】[0007]

【課題を解決するための手段】本発明によれば、上述の
目的は、前記特許請求の範囲に記載した手段にて達成さ
れる。According to the invention, the above mentioned objects are achieved by means of the patent claims.

【０００８】すなわち、請求項１の発明は、入力音声を
単位時間ごとにサンプリングし、音声データを生成する
サンプリング処理部と、音声データより入力データとし
て有効な単語区間を検出する単語区間検出処理部と、音
声データの周波数成分を解析し、スペクトルデータを生
成するスペクトル解析処理部とを持つ装置において、あ
る単語区間のスペクトルデータ中から、母音部分を検出
する母音検出処理部と、この母音検出処理部より受け取
る、ある単語区間のスペクトルデータ中に含まれる母音
部分の有無あるいは割合に関する情報に基づいて、その
単語区間が使用者による発声音声によるものであるか、
それとも背景ノイズに起因したものであるかを判定する
ノイズ判別処理部とを設ける音声認識装置である。That is, according to the first aspect of the present invention, a sampling processing unit for sampling input voice for each unit time to generate voice data, and a word section detection processing unit for detecting a valid word section as input data from the voice data. A vowel detection processing unit for detecting a vowel part from the spectrum data of a certain word section in a device having a spectrum analysis processing unit for analyzing frequency components of voice data and generating spectrum data, and the vowel detection processing. Based on the information received from the department regarding the presence or absence or the ratio of the vowel portion included in the spectrum data of a certain word section, whether the word section is the vocalized voice of the user,
Alternatively, it is a voice recognition device provided with a noise discrimination processing unit for discriminating whether or not it is caused by background noise.

【０００９】また、請求項２の発明は、各母音のスペク
トルデータを予め保持する手段を具備すると共に、この
各母音のスペクトルデータと、入力するスペクトルデー
タとの間で、数学的距離尺度と距離閾値とを用い、前記
入力するスペクトルデータ中から母音部分を検出する母
音検出処理部を設ける音声認識装置である。Further, the invention of claim 2 is provided with a means for holding spectral data of each vowel in advance, and a mathematical distance scale and a distance are provided between the spectral data of each vowel and the input spectral data. It is a voice recognition device provided with a vowel detection processing section for detecting a vowel part from the inputted spectrum data by using a threshold value.

【００１０】また、請求項３の発明は、入力音声を単位
時間毎にサンプリングして音声データを生成するサンプ
リング処理部と、音声データより入力データとして有効
な単語区間を検出する単語区間検出処理部と、単語区間
の冒頭部分における音量変化の大小から、立ち上がりの
急峻性をみて、その単語区間が使用者の発声音声による
ものか、あるいは衝撃性背景ノイズによるものかを判定
するノイズ判別処理部とを具備する音声認識装置であ
る。Further, the invention according to claim 3 is that a sampling processing unit for sampling input voice for each unit time to generate voice data, and a word section detection processing unit for detecting a word section effective as input data from the voice data. And a noise discrimination processing unit that determines whether the word segment is due to the voice of the user or the impact background noise by checking the steepness of the rising edge based on the magnitude of the volume change at the beginning of the word segment. It is a voice recognition device comprising.

【００１１】また、請求項４の発明は、入力音声の波形
振幅量からパワー値を生成するパワー算出部と、入力音
声を単位時間毎にサンプリングして音声データを生成す
るサンプリング処理部と、音声データより入力データと
して有効な単語区間を検出する単語区間検出処理部と、
単語区間先頭と直前部分における入力音のパワー値の差
の大小から、立ち上がりの急峻性をみて、その単語区間
が使用者の発声音声によるものか、あるいは衝撃性背景
ノイズによるものかを判定するノイズ判別処理部とを具
備する音声認識装置である。According to a fourth aspect of the present invention, a power calculation section for generating a power value from the waveform amplitude amount of the input voice, a sampling processing section for sampling the input voice for each unit time to generate voice data, and a voice processing section A word section detection processing unit that detects a valid word section as input data from the data,
Noise that determines whether the word segment is due to the user's uttered voice or impulsive background noise by checking the steepness of the rising edge based on the magnitude of the difference between the power values of the input sounds at the beginning and immediately preceding part of the word segment. A voice recognition device including a discrimination processing unit.

【００１２】また、請求項５の発明は、入力音声の波形
振幅量からパワー値を生成するパワー算出部と、入力音
声を単位時間毎にサンプリングして音声データを生成す
るサンプリング処理部と、音声データより入力データと
して有効な単語区間を検出する単語区間検出処理部と、
単語区間先頭と最初の極大パワー値部分における入力音
のパワー値の差の大小から、立ち上がりの急峻性をみ
て、その単語区間が使用者の発声音声によるものか、あ
るいは衝撃性背景ノイズによるものかを判定するノイズ
判別処理部とを具備する音声認識装置である。According to a fifth aspect of the present invention, a power calculation section for generating a power value from the waveform amplitude amount of the input voice, a sampling processing section for sampling the input voice for each unit time to generate voice data, and a voice processing section A word section detection processing unit that detects a valid word section as input data from the data,
From the magnitude of the difference in the power value of the input sound between the beginning of the word section and the first maximum power value section, see the steepness of the rising edge to determine whether the word section is due to the user's vocal sound or impact background noise. A speech recognition apparatus including a noise discrimination processing unit for discriminating.

【００１３】また、請求項６の発明は、入力音声を単位
時間毎にサンプリングして音声データを生成するサンプ
リング処理部と、音声データより入力データとして有効
な単語区間を検出する単語区間検出処理部と、音声デー
タの周波数成分を、周波数方向に適当な次数をとったベ
クトル値として解析し、スペクトルデータを生成するス
ペクトル解析処理部と、単語区間先頭と直前部分におけ
るスペクトルデータのベクトル長の差の大小から、立ち
上がりの急峻性をみて、その単語区間が使用者の発声音
声によるものか、あるいは衝撃性背景ノイズによるもの
かを判定するノイズ判別処理部とを具備する音声認識装
置である。Further, according to the invention of claim 6, a sampling processing unit for sampling the input voice for each unit time to generate voice data, and a word section detection processing unit for detecting a word section effective as the input data from the voice data. , And the frequency component of the voice data is analyzed as a vector value with an appropriate degree in the frequency direction, and a spectrum analysis processing unit that generates spectrum data and a difference in the vector length of the spectrum data at the beginning and immediately preceding portion of the word section A speech recognition apparatus comprising a noise discrimination processing unit for judging whether the word segment is due to a user's uttered voice or due to a shocking background noise by observing the steepness of rising from large and small.

【００１４】また、請求項７の発明は、入力音声の波形
振幅量からパワー値を生成するパワー算出部と、入力音
声を単位時間毎にサンプリングして音声データを生成す
るサンプリング処理部と、音声データより入力データと
して有効な単語区間を検出する単語区間検出処理部と、
音声データの周波数成分を、周波数方向に適当な次数を
とったベクトル値として解析し、スペクトルデータを生
成するスペクトル解析処理部と、単語区間先頭と最初の
極大パワー値部分におけるスペクトルデータのベクトル
長の差の大小から、立ち上がりの急峻性をみて、その単
語区間が使用者の発声音声によるものか、あるいは衝撃
性背景ノイズによるものかを判定するノイズ判別処理部
とを具備する音声認識装置である。According to a seventh aspect of the present invention, a power calculation section for generating a power value from the waveform amplitude amount of the input voice, a sampling processing section for sampling the input voice for each unit time to generate voice data, and a voice processing section A word section detection processing unit that detects a valid word section as input data from the data,
The frequency component of the voice data is analyzed as a vector value that takes an appropriate degree in the frequency direction, and the spectrum analysis processing unit that generates the spectrum data and the vector length of the spectrum data at the beginning of the word section and the first maximum power value part The speech recognition device includes a noise discrimination processing unit that determines whether the word segment is due to the user's uttered voice or impactive background noise by observing the steepness of rising from the difference.

【００１５】また、請求項８の発明は、入力音声を単位
時間毎にサンプリングして音声データを生成するサンプ
リング処理部と、音声データより入力データとして有効
な単語区間を検出する単語区間検出処理部と、音声デー
タの周波数成分を解析し、スペクトルデータを生成する
スペクトル解析処理部と、単語区間の冒頭部分における
スペクトル変化の大小から、立ち上がりの急峻性をみ
て、その単語区間が使用者の発声音声によるものか、あ
るいは衝撃性背景ノイズによるものかを判定するノイズ
判別処理部とを具備する音声認識装置である。According to the invention of claim 8, a sampling processing section for sampling the input voice for each unit time to generate voice data, and a word section detection processing section for detecting a valid word section as input data from the voice data. , The spectrum analysis processing unit that analyzes the frequency components of the voice data and generates the spectrum data, and the sharpness of the rising edge based on the magnitude of the spectrum change at the beginning of the word section, and that word section is the uttered voice of the user. And a noise discrimination processing unit for discriminating whether it is due to impact background noise.

【００１６】また、請求項９の発明は、入力音声の波形
振幅量からパワー値を生成するパワー算出部と、入力音
声を単位時間毎にサンプリングして音声データを生成す
るサンプリング処理部と、音声データより入力データと
して有効な単語区間を検出する単語区間検出処理部と、
音声データの周波数成分を解析し、スペクトルデータを
生成するスペクトル解析処理部と、単語区間の先頭と最
初の極大パワー値部分におけるスペクトル変化の大小か
ら、立ち上がりの急峻性をみて、その単語区間が使用者
の発声音声によるものか、あるいは衝撃性背景ノイズに
よるものかを判定するノイズ判別処理部とを具備する音
声認識装置である。According to the invention of claim 9, a power calculation section for generating a power value from the waveform amplitude amount of the input voice, a sampling processing section for sampling the input voice for each unit time to generate voice data, A word section detection processing unit that detects a valid word section as input data from the data,
A spectrum analysis processing unit that analyzes the frequency components of the voice data and generates spectrum data, and the word section is used based on the sharpness of the rising from the magnitude of the spectrum change at the beginning and the first maximum power value part of the word section. A voice recognition device comprising: a noise discrimination processing unit for discriminating whether the voice is a person's uttered voice or a shocking background noise.

【００１７】また、請求項１０の発明は、入力音声を単
位時間毎にサンプリングして音声データを生成するサン
プリング処理部と、音声データより入力データとして有
効な単語区間を検出する単語区間検出処理部と、音声デ
ータの周波数成分を、周波数方向に適当な次数をとった
ベクトル値として解析し、スペクトルデータを生成する
スペクトル解析処理部と、単語区間の先頭と直前部分に
おけるスペクトルデータのベクトル間距離の大小から、
立ち上がりの急峻性をみて、その単語区間が使用者の発
声音声によるものか、あるいは衝撃性背景ノイズによる
ものかを判定するノイズ判別処理部とを具備する音声認
識装置である。According to a tenth aspect of the present invention, a sampling processing unit for sampling input voice for each unit time to generate voice data, and a word section detection processing unit for detecting a valid word section as input data from the voice data. , And the frequency component of the voice data is analyzed as a vector value with an appropriate degree in the frequency direction, and a spectrum analysis processing unit that generates spectrum data, and the distance between vectors of the spectrum data at the beginning and immediately preceding portion of the word section Big and small
The speech recognition apparatus includes a noise discrimination processing unit that determines whether the word segment is due to the voiced voice of the user or the impact background noise in view of the steepness of rising.

【００１８】また、請求項１１の発明は、入力音声の波
形振幅量からパワー値を生成するパワー算出部と、入力
音声を単位時間毎にサンプリングして音声データを生成
するサンプリング処理部と、音声データより入力データ
として有効な単語区間を検出する単語区間検出処理部
と、音声データの周波数成分を、周波数方向に適当な次
数をとったベクトル値として解析し、スペクトルデータ
を生成するスペクトル解析処理部と、単語区間の先頭と
最初の極大パワー値部分におけるスペクトルデータのベ
クトル間距離の大小から、立ち上がりの急峻性をみて、
その単語区間が使用者の発声音声によるものか、あるい
は衝撃性背景ノイズによるものかを判定するノイズ判別
処理部とを具備する音声認識装置である。According to the invention of claim 11, a power calculation unit for generating a power value from the waveform amplitude amount of the input voice, a sampling processing unit for sampling the input voice at every unit time to generate voice data, and a voice processing unit A word segment detection processing unit that detects a valid word segment as input data from the data, and a spectrum analysis processing unit that analyzes the frequency component of the voice data as a vector value having an appropriate order in the frequency direction and generates spectrum data. And, from the size of the distance between the vectors of the spectrum data at the beginning of the word section and the first maximum power value part, see the steepness of the rising edge,
A speech recognition apparatus comprising: a noise discrimination processing unit that discriminates whether the word section is due to a voiced voice of a user or due to impact background noise.

【００１９】図１は、本発明の原理説明図である。図
中、サンプリング処理部１は、入力音声のアナログ信号
を単位時間ごとのデジタル信号に変換することで音声デ
ータを生成する。単語区間検出処理部２は、音声データ
の音量情報から、音声データのうち入力データとして有
効な単語区間を検出する。スペクトル解析処理部３は、
音声データの周波数成分を解析し、スペクトルデータを
生成する。FIG. 1 is a diagram for explaining the principle of the present invention. In the figure, the sampling processing unit 1 generates voice data by converting an analog signal of input voice into a digital signal for each unit time. The word section detection processing unit 2 detects a word section effective as input data in the voice data from the volume information of the voice data. The spectrum analysis processing unit 3
The frequency component of the voice data is analyzed and the spectrum data is generated.

【００２０】また、母音検出処理部４は、予め用意した
母音スペクトルデータとの類似度判定により、スペクト
ルデータ中から母音部分を検出する。ノイズ判別処理部
５は、単語区間中に含まれる母音部分の有無により、前
記単語区間検出処理部２が検出した単語区間が、発声者
の入力音声によるものであるか、あるいは背景ノイズに
起因したものであるかの判別を行なう。Further, the vowel detection processing section 4 detects a vowel part from the spectrum data by judging the similarity with the vowel spectrum data prepared in advance. The noise discrimination processing unit 5 determines whether the word segment detected by the word segment detection processing unit 2 is due to the input voice of the utterer or is due to background noise depending on the presence or absence of a vowel part included in the word segment. Determine if it is a thing.

【００２１】図３も、本発明の原理説明図である。図
中、サンプリング処理部２１、単語区間検出処理部２
２、及びスペクトル解析処理部２３は、図１に示したも
のと同様のものである。ノイズ判別処理部２５は、例え
ば、単語区間の冒頭部分における音量の変化又はスペク
トル変化の大小から、検出された単語区間が発声者の入
力音声か、あるいは衝撃性背景ノイズかの判別を行な
う。FIG. 3 also illustrates the principle of the present invention. In the figure, a sampling processing unit 21 and a word section detection processing unit 2
2 and the spectrum analysis processing unit 23 are the same as those shown in FIG. The noise discrimination processing unit 25 discriminates, for example, whether the detected word segment is the input voice of the speaker or the shocking background noise based on the magnitude of the volume change or the spectrum change at the beginning of the word segment.

【００２２】[0022]

【作用】図１に示すように、本発明では、単語区間中の
スペクトルデータを、母音検出処理部４が監視し、母音
らしいと判定できるスペクトルデータには母音部分であ
るという情報を付加する。As shown in FIG. 1, according to the present invention, the spectrum data in the word section is monitored by the vowel detection processing section 4, and the spectrum data which can be judged as a vowel is added with the information that it is a vowel part.

【００２３】一般に、日本語を含む多くの言語では単語
中において確実に母音音素を伴うことが知られている
が、図１のノイズ判別処理部５では、この性質を利用し
ている。すなわち、単語区間中に母音部分であるという
ことを示す情報が付加されている部分が無いか、あるい
は極端に少ないかを調べ、該当する場合にはこれをノイ
ズと判別するようにする。It is generally known that in many languages including Japanese, a word is surely accompanied by a vowel phoneme, but the noise discrimination processing section 5 of FIG. 1 utilizes this property. That is, it is checked whether or not there is a portion to which information indicating that it is a vowel portion is added in the word section, or if it is extremely small, and if it is, this is discriminated as noise.

【００２４】一方、一定以上の母音部分について付加情
報があれば、使用者が発声した音声と判別して動作を開
始する。このように、本発明による音声認識装置は、背
景ノイズが入力された場合には、これを除外し、使用者
の発声音声が入力された場合にのみ、認識結果を出力す
る。On the other hand, if there is additional information for a certain vowel part or more, it is judged that the voice is uttered by the user and the operation is started. As described above, the voice recognition device according to the present invention excludes the background noise when it is input, and outputs the recognition result only when the user's uttered voice is input.

【００２５】また、図３に示すように、本発明では、サ
ンプリング処理部２１が生成する音声データを元にし
て、単語区間検出処理部２２が単語区間を検出し、スペ
クトル解析部２３がスペクトルデータを出力する。ノイ
ズ判別処理部２５は、これらの音声データ、単語区間、
スペクトルデータを参照し、単語区間の先頭の要素とそ
の直前の要素となる音声データ及びスペクトルデータよ
りパワー変化量及びスペクトル変化量を算出する。Further, as shown in FIG. 3, in the present invention, the word section detection processing section 22 detects the word section based on the voice data generated by the sampling processing section 21, and the spectrum analysis section 23 detects the spectrum data. Is output. The noise discrimination processing unit 25 uses the voice data, the word section,
With reference to the spectrum data, the power change amount and the spectrum change amount are calculated from the voice data and the spectrum data, which are the first element of the word section and the element immediately before it.

【００２６】そして、それぞれの変化量について衝撃性
背景ノイズとの尤度判定を行なう。一般に、衝撃性背景
ノイズは、音声（人声）に対し、比較的急峻なパワー又
はスペクトル変化を示すことが知られている。従って、
この判定において、ノイズとの尤度大と評価できた場合
には、衝撃性背景ノイズの入力があったものとしてその
単語区間を棄却し、尤度小と評価できた場合には、その
単語区間部分の音声データを次段の音声認識処理部２６
に出力する。Then, the likelihood determination with the impact background noise is performed for each variation. It is generally known that impact background noise shows a relatively sharp power or spectrum change with respect to voice (human voice). Therefore,
In this judgment, if it is possible to evaluate the likelihood of noise to be large, the word section is rejected as if there was input of impulsive background noise, and if the likelihood is small, the word section is Part of the voice data is processed by the voice recognition processing unit 26 of the next stage.
Output to.

【００２７】このように、本発明による音声認識装置
は、衝撃性ノイズが入力された場合には、これを確実に
排除するため、使用者の発声音声に対してのみの認識結
果の出力が可能になる。As described above, in the voice recognition device according to the present invention, when the impact noise is input, it is surely eliminated, so that the recognition result can be output only for the voice uttered by the user. become.

【００２８】[0028]

【実施例】図２は、本発明の一実施例構成図である。図
中、母音検出処理部１４は、予め各種母音について母音
スペクトルデータ１７を持ち、入力音声から得られた入
力スペクトルデータ１６の１単位部分ごととの類似度計
算を行なう。これにより、入力スペクトルデータ１６の
ある単位部分が、母音であるか否かの判定を行ない、母
音部分の検出を行なう。FIG. 2 is a block diagram of an embodiment of the present invention. In the figure, the vowel detection processing unit 14 has vowel spectrum data 17 for various vowels in advance, and calculates the similarity with each unit portion of the input spectrum data 16 obtained from the input voice. As a result, it is determined whether or not a unit portion of the input spectrum data 16 is a vowel, and the vowel portion is detected.

【００２９】その検出結果の表示は、例えば、入力スペ
クトルデータ１６のある単位部分が母音部分であるとい
うことを示す母音情報１８を、その入力スペクトルデー
タ１６に付与することによって行なう。The display of the detection result is performed, for example, by adding vowel information 18 indicating that a certain unit part of the input spectrum data 16 is a vowel part to the input spectrum data 16.

【００３０】ノイズ判別処理部１５は、単語区間検出処
理部１２が検出した単語区間に含まれる範囲のスペクト
ルデータを参照していき、これを、母音の有無に関する
母音情報１８を付与された部分と、付与されなかった部
分とに類別する。そして、例えば、母音情報１８を付与
された部分がある一定の閾値を越えていれば、そのデー
タを有意な入力音声とし、閾値以内ならば、背景ノイズ
と判別する。The noise discrimination processing section 15 refers to the spectrum data of the range included in the word section detected by the word section detection processing section 12, and regards this as the portion to which the vowel information 18 regarding the presence or absence of the vowel is added. , And the part that was not added. Then, for example, if the portion to which the vowel information 18 is added exceeds a certain threshold value, the data is regarded as a significant input voice, and if it is within the threshold value, it is determined to be background noise.

【００３１】上述の実施例では、ノイズ判別処理部１５
は、ノイズを判別するための基準として、母音部分を示
す母音情報１８が付与された部分の総和を用いることと
したが、その他、例えば単語区間全体に占める比を判別
基準としてもよい。なお、サンプリング処理部１１、及
びスペクトル解析処理部１３は、図１に示す同名称のも
のと同様に作用するものである。In the above embodiment, the noise discrimination processing section 15
In the above, the sum of the parts to which the vowel information 18 indicating the vowel parts is added is used as the criterion for discriminating the noise, but other than this, for example, the ratio occupying the entire word section may be used as the discrimination criterion. The sampling processing unit 11 and the spectrum analysis processing unit 13 operate in the same manner as those having the same names shown in FIG.

【００３２】図４は、本発明の他の実施例構成図であ
る。図４において、不図示の単語区間検出処理部が検出
した単語区間と、スペクトル解析部が出力するスペクト
ルデータとが、‘単語区間情報付きスペクトルデータ’
としてノイズ判別処理部３５に渡されている。FIG. 4 is a block diagram of another embodiment of the present invention. In FIG. 4, the word section detected by the word section detection processing unit (not shown) and the spectrum data output by the spectrum analysis unit are'spectral data with word section information '.
Is passed to the noise discrimination processing section 35.

【００３３】この‘単語区間情報付きスペクトルデー
タ’は、周波数軸上に適当な次元をとった、各次元にお
ける強さのベクトルで表わされる。このベクトルで次元
を無限大にとった場合、各元の値の総和は元のスペクト
ルデータの積分値と等しくなるので、ベクトルの各元の
総和をベクトル長とし、これを疑似パワーとして用い
る。This "spectral data with word section information" is represented by a vector of strength in each dimension, which has an appropriate dimension on the frequency axis. When the dimension of this vector is infinite, the sum of the values of each element becomes equal to the integrated value of the original spectrum data, so the sum of each element of the vector is used as the vector length, and this is used as the pseudo power.

【００３４】ノイズ判別処理部３５は、単語区間先頭と
なるデータとその直前データとを参照し、ベクトル長の
差をパワー変化量として、また、ベクトル間距離をスペ
クトル変化量として、それぞれ算出する。パワー及びス
ペクトルそれぞれについて、音声と衝撃性背景ノイズの
両者の分布を分け隔てることができるように、予めパワ
ー閾値及びスペクトル閾値を定めておく。The noise discrimination processing section 35 refers to the data at the beginning of the word section and the data immediately before it, and calculates the difference in vector length as the power change amount and the inter-vector distance as the spectrum change amount. For each of the power and spectrum, a power threshold value and a spectrum threshold value are set in advance so that the distributions of both voice and impulsive background noise can be separated.

【００３５】そして、それぞれの閾値と、パワー変化量
及びスペクトル変化量とを比較評価して、例えば、双方
でノイズと評価された場合は衝撃性背景ノイズの入力が
あったものとしてその単語区間を棄却する。棄却されな
ければ、その単語区間のスペクトルデータが‘音声判別
済データ’として、不図示の音声認識処理部に渡され
る。Then, the respective threshold values are compared and evaluated with the power change amount and the spectrum change amount. For example, when both are evaluated as noise, it is assumed that the impact background noise is input and the word section is determined. Reject. If not rejected, the spectrum data of the word section is passed to the voice recognition processing unit (not shown) as'voice discriminated data '.

【００３６】上記の実施例では、ノイズ判別処理部３５
が算出するパワー変化量としてベクトル長の差を用いた
が、これ以外にも、例えば元の入力音声の音量、すなわ
ち波形振幅量から得られるパワー値の差を利用すること
もできる。パワー値の算出は、パワー算出処理部を設け
て行なえばよい。In the above embodiment, the noise discrimination processing section 35.
Although the difference in vector length is used as the amount of power change calculated by, the volume of the original input voice, that is, the difference in power value obtained from the amount of waveform amplitude may be used instead. The power value may be calculated by providing a power calculation processing unit.

【００３７】また、スペクトル解析部が求めるスペクト
ルデータとして、周波数軸上に適当な次元をとった各次
元における強さのベクトルを、また、ノイズ判別処理部
が求めるスペクトル変化量として、ベクトル間距離を用
いたが、これらは、他のスペクトル表現パラメータ、例
えばケプストラム等とそのパラメータ間距離を用いても
よい。Further, as the spectrum data obtained by the spectrum analysis unit, a vector of strength in each dimension having an appropriate dimension on the frequency axis is taken, and as a spectrum change amount obtained by the noise discrimination processing unit, a vector distance is calculated. However, these may use other spectral expression parameters, such as cepstrum and the distance between the parameters.

【００３８】さらに、ノイズ判別処理部３５は、変化量
を測る対象として、単語区間の先頭となるデータとその
直前データを参照したが、これは、単語区間の冒頭部分
の最初の極大パワー値データと単語区間直前データとを
参照するようにしてもよい。また、ノイズ判別処理部３
５におけるノイズ判別基準として、パワー及びスペクト
ルの両判定結果の論理積を用いたが、必要に応じ、論理
和を用いてもよい。Further, the noise discrimination processing unit 35 refers to the data at the beginning of the word section and the data immediately before it as the object of measuring the variation amount. This is the first maximum power value data at the beginning of the word section. And the data immediately before the word section may be referred to. In addition, the noise discrimination processing unit 3
Although the logical product of both the power and spectrum determination results was used as the noise determination criterion in 5, the logical sum may be used if necessary.

【００３９】[0039]

【発明の効果】以上説明したように、本発明によれば、
入力音声の中に背景ノイズが混入した場合であっても、
背景ノイズのみを除外することができるので、不要な認
識処理を誤って実行することが無い。よって、無意味な
認識結果の出力を回避できるという効果を奏し、音声認
識装置を用いたシステムの運用効率の向上に寄与すると
ころが大きい。As described above, according to the present invention,
Even if background noise is mixed in the input voice,
Since only background noise can be excluded, unnecessary recognition processing will not be erroneously executed. Therefore, there is an effect that it is possible to avoid outputting meaningless recognition results, and it largely contributes to improvement of operation efficiency of the system using the voice recognition device.

[Brief description of drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の一実施例構成図である。FIG. 2 is a configuration diagram of an embodiment of the present invention.

【図３】本発明の原理説明図である。FIG. 3 is a diagram illustrating the principle of the present invention.

【図４】本発明の他の実施例構成図である。FIG. 4 is a configuration diagram of another embodiment of the present invention.

[Explanation of symbols]

１，１１，２１サンプリング処理部２，１２，２２単語区間検出処理部３，１３，２３スペクトル解析処理部４，１４母音検出処理部５，１５，２５，３５ノイズ判別処理部１６入力スペクトルデータ１７母音スペクトルデータ１８母音情報２６音声認識処理部 1,11,21 Sampling processing unit 2,12,22 Word section detection processing unit 3,13,23 Spectral analysis processing unit 4,14 Vowel detection processing unit 5,15,25,35 Noise discrimination processing unit 16 Input spectrum data 17 Vowel spectrum data 18 Vowel information 26 Speech recognition processing unit

Claims

[Claims]

1. A sampling processing unit (1) for sampling input voice every unit time to generate voice data.
And a word section detection processing unit (2) that detects a valid word section as input data from the voice data, and a spectrum analysis processing unit (3) that analyzes frequency components of the voice data and generates spectrum data. In vowel detection, a vowel detection processing unit (4) for detecting a vowel part from the spectrum data of a certain word section, and the presence or absence of a vowel part included in the spectrum data of a certain word section received from this vowel detection processing section (4). Alternatively, a noise discrimination processing unit (5) for deciding whether the word section is caused by a voice uttered by the user or caused by background noise based on the information about the ratio is provided. Voice recognition device.

2. A method for holding spectral data of each vowel in advance, and using the mathematical distance scale and the distance threshold between the spectral data of each vowel and the input spectral data, the input The voice recognition device according to claim 1, further comprising a vowel detection processing unit that detects a vowel portion from the spectrum data.

3. A sampling processing section for sampling input speech at every unit time to generate speech data, a word section detection processing section for detecting a word section effective as input data from the speech data, and a beginning section of the word section. And a noise discriminating processing unit for discriminating whether the word segment is due to the user's uttered voice or due to impact background noise, by observing the steepness of the rise from the magnitude of the volume change in Voice recognition device.

4. A power calculation unit that generates a power value from the waveform amplitude amount of the input voice, and a sampling processing unit that samples the input voice for each unit time to generate voice data.
The word section detection processing unit that detects a valid word section as input data from the voice data, and the word section is used based on the sharpness of the rising edge based on the difference in the power value of the input sound at the beginning and immediately preceding section of the word section. A speech recognition apparatus, comprising: a noise discrimination processing unit that discriminates whether the speech is generated by a person's voice or an impact background noise.

5. A power calculation unit for generating a power value from the waveform amplitude amount of the input voice, a sampling processing unit for sampling the input voice for each unit time, and generating voice data.
The word section detection processing unit that detects a valid word section as input data from the voice data, and the difference in the power value of the input sound between the beginning of the word section and the first maximum power value part, the sharpness of the rising edge A speech recognition apparatus, comprising: a noise discrimination processing unit that discriminates whether a word segment is generated by a user's voice or impact background noise.

6. A sampling processing section for sampling input voice for each unit time to generate voice data, a word section detection processing section for detecting a word section effective as input data from the voice data, and a frequency component of the voice data. Is analyzed as a vector value with an appropriate order in the frequency direction, and the spectrum analysis processing unit that generates spectrum data and the difference in the vector length of the spectrum data at the beginning and immediately preceding part of the word section The speech recognition apparatus, further comprising: a noise discrimination processing unit that discriminates whether the word section is due to the voiced voice of the user or due to impulsive background noise.

7. A power calculation unit for generating a power value from the waveform amplitude amount of the input voice, a sampling processing unit for sampling the input voice every unit time to generate voice data,
A word section detection processing unit that detects a valid word section as input data from voice data, and a spectrum analysis process that analyzes the frequency component of the voice data as a vector value with an appropriate order in the frequency direction and generates spectrum data. Part and the difference between the vector lengths of the spectrum data at the beginning of the word section and the first maximum power value section, the sharpness of the rising edge is checked to see if the word section is due to the user's uttered voice or the impactive background noise. A voice recognition device, comprising: a noise discrimination processing unit for discriminating whether the noise is due to

8. A sampling processing section for sampling input voice for each unit time to generate voice data, a word section detection processing section for detecting a word section effective as input data from the voice data, and a frequency component of the voice data. Parses
A spectrum analysis processing unit for generating spectrum data,
It is equipped with a noise discrimination processing unit that judges from the magnitude of the spectral change at the beginning of the word section, the sharpness of the rising edge, to determine whether the word section is due to the voice of the user or due to impulsive background noise. A voice recognition device characterized by:

9. A power calculation unit for generating a power value from the waveform amplitude amount of the input voice, a sampling processing unit for sampling the input voice for each unit time to generate voice data,
A word section detection processing unit that detects a valid word section as input data from the voice data, a spectrum analysis processing unit that analyzes the frequency component of the voice data and generates spectrum data, the beginning and first maximum power values of the word section. From the magnitude of the spectrum change in the part, see the steepness of the rise, and whether the word section is due to the voice of the user,
Alternatively, the speech recognition apparatus is provided with a noise discrimination processing unit for discriminating whether the noise is due to impact background noise.

10. A sampling processing unit for sampling input voice for each unit time to generate voice data, a word section detection processing unit for detecting a word section effective as input data from the voice data, and a frequency component of the voice data. Is analyzed as a vector value with an appropriate order in the frequency direction,
A spectrum analysis processing unit for generating spectrum data,
Noise that determines whether the word segment is due to the user's uttered voice or impulsive background noise by checking the steepness of the rising edge based on the magnitude of the vector distance of the spectrum data at the beginning and immediately preceding portion of the word segment. A voice recognition device comprising a discrimination processing unit.

11. A power calculation unit for generating a power value from a waveform amplitude amount of input voice, a sampling processing unit for sampling the input voice for each unit time to generate voice data, and a voice data effective as input data. A word section detection processing unit that detects a word section, a frequency component of voice data, which is analyzed as a vector value having an appropriate order in the frequency direction, and a spectrum analysis processing section that generates spectrum data. From the size of the vector distance of the spectrum data in the first maximum power value part,
A speech recognition apparatus, comprising: a noise discrimination processing unit for discriminating whether or not the word segment is due to a user's uttered voice or due to a shocking background noise in view of the steepness of rising.