JPH02293797A

JPH02293797A - Voice recognizing device

Info

Publication number: JPH02293797A
Application number: JP1114733A
Authority: JP
Inventors: Yumi Takizawa; 滝沢　由実
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1989-05-08
Filing date: 1989-05-08
Publication date: 1990-12-04
Anticipated expiration: 2010-06-07
Also published as: JPH0754434B2

Abstract

PURPOSE:To lessen erroneous recognition even in low S/N environment by calculating the S/N which is the ratio of an input voice power and a noise power, determining the prescribed threshold according to the high and low values of the S/N and executing the section detection of the input voice and the section correction of registered voices, then executing collating. CONSTITUTION:The background noise power of every unit time is calculated in an analyzing section 2 and the threshold is set in a temporarily threshold setting section 3 by adding the prescribed value to the average value thereof. The power of the input voice is thereafter calculated and the S/N is calculated in an S/N calculating section 4. The threshold setting section 5 sets the value obtd. after the prescribed value is subtracted from the peak value as the threshold when the S/N is above the prescribed value. The threshold of the setting section 3 is set as the threshold for the region where the S/N is low. The section of the contents in a buffer 10 for input voice is detected in a section detecting section 6 by this threshold. The section correction is not executed and the collating of the contents of the buffer 9 and buffer 10 for registered voice is executed without executing the section correction when the S/N is large. The contents of the buffer 9 are corrected in the section correcting section and the collating is executed when the S/N is small. The deviation in the collating is eliminated in this way and the erroneous recognition is lessened even in the low S/N environment.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声認識装置に関するものである。[Detailed description of the invention] Industrial applications The present invention relates to a speech recognition device.

従来の技術近年、音声認識技術の発達と共に、音声認識装置が様々
の分野で実用化されようとしているが、実用化するため
には、認識装置を実際に使用する上での様々の問題点を
解消する必要がある。この実用上の問題点の１つに、音
声入力時のＳＮ比が低い場合に、雑音を音声区間として
誤検出してしまい、その結果、誤認識してしまうという
点かある。Background of the Invention In recent years, with the development of speech recognition technology, speech recognition devices are being put into practical use in various fields. It needs to be resolved. One of the practical problems is that when the SN ratio at the time of voice input is low, noise is erroneously detected as a voice section, resulting in erroneous recognition.

従来の音声認識装置では、上記問題点を解決するために
、あらかじめ音声を入力する直前に背景雑音を入力して
そのパワーを調べ、音声区間を検出するための閾値を上
記パワー以上に設定しておき、設定された閾値を用いて
音声区間を検出する。In conventional speech recognition devices, in order to solve the above problem, background noise is input in advance just before inputting the speech, its power is checked, and the threshold value for detecting speech sections is set above the above power. Then, the voice section is detected using the set threshold.

この方法により、ＳＮ比が低い環境でも雑音を音声区間
として誤検出することなく、誤認識率が少なくなる。With this method, even in an environment with a low SN ratio, noise will not be erroneously detected as a speech section, and the erroneous recognition rate will be reduced.

以下，図面を参照しながら、上述したような従来の音声
認識装置について説明を行う。第３図は、従来の登録型
単語音声認識装置のブロック図である。同図において、
１は音声入力端子，２は分析部、１６は閾値設定部、１
７は区間検出部、１８は照合部、１９は認識結果出力端
子、２０は登録音声用バッフハ　２１は入力音声用バッ
フハ　２２、２３はスイッチである。以上のように構成
された音声認識装置について以下その動作について説明
する。Hereinafter, a conventional speech recognition device as described above will be explained with reference to the drawings. FIG. 3 is a block diagram of a conventional registered word speech recognition device. In the same figure,
1 is an audio input terminal, 2 is an analysis section, 16 is a threshold setting section, 1
7 is a section detection section, 18 is a collation section, 19 is a recognition result output terminal, 20 is a buffer for registered speech, 21 is a buffer for input speech, and 22 and 23 are switches. The operation of the speech recognition device configured as described above will be explained below.

まず登録時には、音声入力時直前に、音声入力端子１よ
り所定時間分の背景雑音信号が入力され、分析部２で単
位時間ごとの信号のパワーが算出され、算出結果は閾値
設定部１６に入力される。閾値設定部１６では上記で算
出されたパワーの平均値を求め、左記平均値に所定値（
たとえば６ｄＢとする）を加えた値を区間検出閾値と設
定する。First, at the time of registration, a background noise signal for a predetermined period of time is input from the audio input terminal 1 immediately before audio input, the power of the signal for each unit time is calculated by the analysis section 2, and the calculation result is input to the threshold value setting section 16. be done. The threshold value setting unit 16 calculates the average value of the powers calculated above, and adds a predetermined value (
(for example, 6 dB) is set as the section detection threshold.

登録単語音声入力時には、音声入力端子１より入力され
た信号にもとづき、分析部２では単位時間毎の信号のパ
ワーと特徴パラメータが算出され、パワー算出結果は区
間検出部１７に、特徴パラメータは入力音声用バッファ
２１に入力される。分折方法としてたとえばＬＰＣケプ
ストラム法を用いれば所定の個数のケプスドラム係数が
特徴パラメータとして算出される。次に区間検出部１７
では単位時間毎の信号のパワーと先に設定した区間検出
閾値とを比較し、信号のパワーが６０ｍｓｅｃ以上連続
して区間検出閾値以上となる部分を音声区間と決定する
。但し、信号パワーが区間検出閾値以下となっても閾値
以下の区間が６０ｍｓｅｃ以上連続しなければ音声区間
とする。次に決定された音声区間分の特徴パラメータを
入力音声用バッファ２１より入力し、登録音声用バッフ
ァ２０に保管する。以上の音声入力以降の処理を全認識
単語分繰り返す。When inputting registered word audio, the analysis unit 2 calculates the power and characteristic parameters of the signal for each unit time based on the signal input from the audio input terminal 1.The power calculation result is sent to the section detection unit 17, and the characteristic parameters are input. It is input to the audio buffer 21. If, for example, the LPC cepstral method is used as the analysis method, a predetermined number of cepstral coefficients are calculated as characteristic parameters. Next, the section detection section 17
Then, the signal power for each unit time is compared with the section detection threshold set previously, and a portion where the signal power is continuously equal to or higher than the section detection threshold for 60 msec or more is determined to be a voice section. However, even if the signal power becomes below the section detection threshold, if the section where the signal power is below the threshold does not continue for 60 msec or more, it is determined as a voice section. Next, the characteristic parameters for the determined voice section are inputted from the input voice buffer 21 and stored in the registered voice buffer 20. The process after the voice input described above is repeated for all recognized words.

次に認識時には、登録時と同様に背景雑音がら区間検出
閾値を設定した後、入力音声を分析し、音声区間を検出
する。分析方法、区間検出方法共に登録時と同じである
。音声区間検出後、照合部１８で登録音声と入力音声と
の照合を行い、最短距離を示す単語を認識結果として認
識結果出力端子１９より出力する。なおスイッチ２２は
、音声入力直前に雑音を入力する場合には閾値設定部１
６に、音声入力時には区間検出部１７と入力音声用バッ
ファ２１とに算出結果を入力するように動作する。スイ
ッチ２３は、登録時には登録用バッファ２０に、認識時
には照合部１８に特徴パラメータを入力するように動作
する。Next, during recognition, after setting a section detection threshold based on background noise as in the case of registration, the input speech is analyzed and speech sections are detected. Both the analysis method and section detection method are the same as at the time of registration. After the voice section is detected, the collation unit 18 collates the registered voice and the input voice, and outputs the word indicating the shortest distance from the recognition result output terminal 19 as a recognition result. Note that the switch 22 is set to the threshold value setting section 1 when inputting noise immediately before voice input.
6, when inputting audio, the calculation result is input to the section detecting section 17 and the input audio buffer 21. The switch 23 operates to input the characteristic parameters to the registration buffer 20 during registration and to the matching unit 18 during recognition.

発明が解決しようとする課題しかしながら、上記のような構成では、雑音パワーの変
化に無関係に雑音を除去することは可能であるが、雑音
または発声パワーの変化に伴い音声区間の始端及び終端
位置がずれるため、登録音声または標準音声発声時と入
力音声発声時との状況が違うと異なる音声区間で照合さ
れるため、誤認識を起こしやすいという問題点を有して
いた。Problem to be Solved by the Invention However, with the above configuration, it is possible to remove noise regardless of changes in noise power, but the start and end positions of the voice section may change due to changes in noise or vocalization power. Because of this shift, if the conditions are different between when the registered voice or standard voice is uttered and when the input voice is uttered, different voice sections are compared, resulting in a problem that misrecognition is likely to occur.

本発明は、上記問題点に鑑み、登録音声または標準音声
と入力音声との音声区間のずれを防ぎ、状況の違いによ
る誤認識を軽減することができる音声認識装置を提供す
るものである。In view of the above-mentioned problems, the present invention provides a speech recognition device that can prevent deviations in speech intervals between registered speech or standard speech and input speech, and can reduce misrecognition caused by differences in situations.

婢題点を解決するための手段上記目的を達成するために請求項１記載の音声認識装置
は、入力信号の単位時間毎のパワーを検出する分析部と
、ＳＮ比を算出するＳＮ比算出部と、ＳＮ比を考慮して
区間検出閾値を決定する閾値設定部と、決定された閾値
により上記入力信号の音声区間を検出する区間検出部と
、登録音声または標準音声区間を修正する区間修正部と
、登録音声または標準音声と入力音声とを照合して認識
結果を出力する照合部とから構成されている。Means for Solving the Problems In order to achieve the above object, the speech recognition device according to claim 1 includes an analysis section that detects the power of the input signal per unit time, and an SN ratio calculation section that calculates the SN ratio. a threshold setting section that determines a section detection threshold in consideration of the SN ratio; a section detection section that detects a speech section of the input signal using the determined threshold; and a section correction section that corrects the registered speech or standard speech section. and a matching unit that matches the registered speech or standard speech with the input speech and outputs a recognition result.

また請求項２記載の音声認識装置は、入力信号の単位時
間毎のパワーを検出する分析部と、信号パワーのピーク
値と雑音パワー値とを考慮して区間検出閾値を設定する
閾値設定部と、設定された閾値により上記入力信号の音
声区間を検出する区間検出部と、登録音声または標準音
声の区間を修正する区間修正部と、登録音声または標準
音声と入力音声とを照合して認識結果を出力する照合部
とから構成されている。The speech recognition device according to claim 2 further includes an analysis section that detects the power of the input signal per unit time, and a threshold setting section that sets the section detection threshold in consideration of the peak value of the signal power and the noise power value. , a section detection section that detects the speech section of the input signal using a set threshold; a section correction section that corrects the section of the registered speech or standard speech; and a recognition result by comparing the registered speech or standard speech with the input speech. It consists of a collation section that outputs.

作　　　用請求項１記載の音声認識装置によれば、ＳＮ比算出部で
ＳＮ比を算出し、閾値決定部でＳＮ比が低い環境では雑
音パワー以上の値を、ＳＮ比が高い環ｉではピーク値か
ら所定値を引いた値を閾値と決定した後、区間検出部で
上記閾値を用いて入力音声の区間検出を行い、さらに区
間修正部で上記閾値にて登録音声または標準音声区間を
修正し、照合部で上記登録音声または標準音声と入力音
声との照合を行う。According to the speech recognition device according to claim 1, the SN ratio calculation section calculates the SN ratio, and the threshold value determination section sets a value higher than the noise power in an environment where the SN ratio is low, and sets a value equal to or higher than the noise power in an environment where the SN ratio is high. After determining a value obtained by subtracting a predetermined value from the value as a threshold, a section detecting section detects sections of the input speech using the above threshold, and a section correcting section corrects the registered speech or standard speech section using the above threshold. , the collation unit collates the registered voice or standard voice with the input voice.

また請求項２記載の音声認識装置によれば、閾値設定部
で雑音パワー値に所定値を加えた値と、ピーク値より所
定値を引いた値とを比較して大きい方の値を閾値と設定
した後、区間検出部で上記閾値を用いて入力音声の区間
検出を行い、さらに区間修正部で上記閾値にて登録音声
または標準音声区間を修正し、照合部で上記登録音声ま
たは標準音声と入力音声との照合を行う。According to the speech recognition device according to claim 2, the threshold value setting section compares the value obtained by adding the predetermined value to the noise power value and the value obtained by subtracting the predetermined value from the peak value, and sets the larger value as the threshold value. After setting, the section detecting section detects sections of the input speech using the above thresholds, the section correcting section corrects the registered speech or standard speech section using the above thresholds, and the matching section compares the sections with the registered speech or standard speech. Verify with input audio.

実施例第１図は、本発明の第１の実施例（請求項１記載の発明
に対応）における登録型単語音声認識装置のブロック図
である。Embodiment FIG. 1 is a block diagram of a registered word speech recognition device in a first embodiment of the present invention (corresponding to the invention set forth in claim 1).

同図において、１は音声入力端子、２は分析部、３は仮
閾値設定部、４はＳＮ比算出部、５は閾値設定部、６は
区間検出部、７は区間修正部、８は照合部、９は登録音
声用バッフハ　１０は入力音声用バッフハ　１１は認識
結果出力端子、１２、１３はスイッチであり、従来例（
第３図参照）と同じものは同一の番号を付与している。In the figure, 1 is an audio input terminal, 2 is an analysis section, 3 is a temporary threshold setting section, 4 is an SN ratio calculation section, 5 is a threshold setting section, 6 is an interval detection section, 7 is an interval correction section, and 8 is a verification section. 9 is a buffer for registered audio, 10 is a buffer for input audio, 11 is a recognition result output terminal, 12 and 13 are switches, and the conventional example (
(See Figure 3) are given the same numbers.

以上のように構成された音声認識装置について以下その
動作について説明する。The operation of the speech recognition device configured as described above will be explained below.

まず登録時には、音声入力時直前に、音声入力端子１よ
り所定時間分の背景雑音信号が入力され、分析部２で単
位時間ごとの信号のパワーが算出される。算出結果は仮
閾値設定部３に入力される。First, at the time of registration, a background noise signal for a predetermined period of time is input from the audio input terminal 1 immediately before audio input, and the power of the signal for each unit time is calculated by the analysis section 2. The calculation result is input to the temporary threshold setting section 3.

仮閾値設定部３で上記パワーの平均値を求め、左記平均
値に所定値（本実施例では８ｄＢとする）加えた値を仮
区間検出閾値とする。The temporary threshold setting unit 3 calculates the average value of the power, and sets the value obtained by adding a predetermined value (8 dB in this embodiment) to the average value as the temporary section detection threshold.

登録単語音声入力時には、音声入力端子１より入力され
た信号にもとづき、分析部２では単位時間毎の信号のパ
ワーと特徴パラメータが算出される。パワー算出結果は
ＳＮ比算出部４に、特徴パラメータは入力音声用バッフ
ァ１０に入力される。At the time of voice input of registered words, the analysis section 2 calculates the power and characteristic parameters of the signal for each unit time based on the signal input from the voice input terminal 1. The power calculation result is input to the SN ratio calculation unit 4, and the feature parameters are input to the input audio buffer 10.

なお分析方法は従来例と同じである。ＳＮ比算出部４で
は、先に設定された仮区間検出閾値以上の信号部を仮の
音声区間として、仮音声区間内のピーク値と先に算出さ
れた雑音パワーとの平均値の比をＳＮ比として算出し、
ＳＮ比が所定値（本例では２４ｄＢとする）以下であれ
ば登録を再度やり直すよう話者に指示し、以上の登録処
理を初めからやり直す。Note that the analysis method is the same as in the conventional example. In the SN ratio calculation unit 4, the signal portion that is equal to or higher than the previously set temporary section detection threshold is regarded as a temporary voice section, and the ratio of the average value of the peak value within the temporary voice section and the previously calculated noise power is determined as the SN ratio. Calculated as a ratio,
If the SN ratio is less than a predetermined value (24 dB in this example), the speaker is instructed to perform the registration again, and the above registration process is restarted from the beginning.

ＳＮ値が２４ｄＢ以上であれば閾値設定部５で、ピーク
値より所定値（本実施例では１８ｄＢとする）を引いた
値を検出閾値として決定する。区間検出部６で単位時間
毎の信号パワーと検出閾値とを比較し、音声区間を検出
する。区間検出方法は、従来例と同じである。次に、音
声区間分の特徴パラメータを入力音声用バッファ１０よ
り入力し、登録音声用バッファ９に登録する。以上の登
録音声入力以降の処理を全認識単語分繰り返す。If the SN value is 24 dB or more, the threshold setting unit 5 determines a value obtained by subtracting a predetermined value (18 dB in this embodiment) from the peak value as the detection threshold. A section detection unit 6 compares the signal power for each unit time with a detection threshold to detect a voice section. The section detection method is the same as in the conventional example. Next, the characteristic parameters for the voice section are inputted from the input voice buffer 10 and registered in the registered voice buffer 9. The process after the above registered voice input is repeated for all recognized words.

次に認識時には、登録時と同様に背景雑音から仮区間検
出閾値を設定した後、入力音声を分析し、結果をＳＮ比
算出部４と入力音声用バッファ１０とに入力する。ＳＮ
比算出部４で登録時同様にＳＮ比を算出し、結果を閾値
設定部５に入カする。Next, at the time of recognition, after setting a temporary section detection threshold based on background noise as in the case of registration, the input voice is analyzed and the results are input to the SN ratio calculation unit 4 and the input voice buffer 10. SN
The ratio calculation section 4 calculates the SN ratio in the same manner as during registration, and inputs the result into the threshold setting section 5.

閾値設定部５で、ＳＮ比が２４ｄＢ以上であればピーク
値から１８ｄＢを引いた値を閾値とし、ＳＮ比が２４ｄ
Ｂ以下であれば先の仮区間検出閾値を閾値と設定した後
、区間検出部６で、左記閾値を用いて音声区間を検出す
る。なお区間検出方法は登録時と同様である。In the threshold value setting section 5, if the SN ratio is 24 dB or more, the value obtained by subtracting 18 dB from the peak value is set as the threshold value, and the SN ratio is set to 24 dB.
If it is less than or equal to B, the previous provisional section detection threshold is set as the threshold, and then the section detecting section 6 detects the voice section using the threshold described on the left. Note that the section detection method is the same as at the time of registration.

次に区間修正部７では、上記ＳＮ比が２４ｄＢ以上の際
には登録された登録音声区間の修正は行なわず、ＳＮ比
が２４ｄＢ以下の場合のみ、上記閾値にて登録音声の区
間検出を再度やり直す。次に照合部８で登録音声七人カ
音声との照合を行い、最短距離を示す単語を認識結果と
して出カ端子１１より出力する。なおスイッチ１２は、
音声入カ直前に雑音を入力する際には仮閾値設定部３に
、音声入力時にはＳＮ比算出部４と入方音声用バッファ
１０とに算出結果を入力するように動作する。Next, the section correction unit 7 does not correct the registered voice section when the SN ratio is 24 dB or more, and only when the SN ratio is 24 dB or less, detects the section of the registered voice again using the threshold value. Start over. Next, the verification section 8 performs verification against the registered voices of seven people, and outputs the word indicating the shortest distance from the output terminal 11 as a recognition result. Note that the switch 12 is
It operates to input the calculation result to the temporary threshold setting section 3 when inputting noise immediately before inputting the voice, and to input the calculation result to the SN ratio calculation section 4 and the incoming voice buffer 10 when inputting the voice.

スイッチ１３は、登録時には登録用バッファ９に、認識
時には区間修正部７に特徴パラメータを入ヵｌ１するように動作する。The switch 13 operates to input the feature parameters into the registration buffer 9 during registration and into the section correction unit 7 during recognition.

以上のように，本実施例によれば、ＳＮ比算出部４で信
号のピーク値と雑音の平均パワー値との比を算出し、閾
値設定部５で上記ＳＮ比が一定値以下の場合は雑音パワ
ー値に所定値を加えた値を、ＳＮ比が一定値以上の場合
にはピーク値より所定値を引いた値を閾値と決定し、区
間検出部６で上記閾値を用いて入力音声の区間検出を行
い、区間修正部７で上記閾値にて登録音声を修正し、照
合部８で上記登録音声と入力音声との照合を行うことに
より、登録音声と入力音声との音声区間のずれを防ぎ、
状況の違いによる誤認識を軽減することができる。As described above, according to this embodiment, the SN ratio calculation section 4 calculates the ratio between the peak value of the signal and the average power value of the noise, and the threshold setting section 5 calculates the ratio when the SN ratio is below a certain value. A value obtained by adding a predetermined value to the noise power value is determined as a threshold value, and a value obtained by subtracting a predetermined value from the peak value when the S/N ratio is greater than a predetermined value is determined as a threshold value. By performing section detection, correcting the registered speech using the above-mentioned threshold value in the section correction section 7, and comparing the above-mentioned registered speech and input speech in the matching section 8, the deviation in the speech section between the registered speech and the input speech is corrected. prevent,
Misrecognition due to differences in situations can be reduced.

第２図は、本発明の第２の実施例（請求項２記載の発明
に対応）における登録型単語音声認識装置のブロック図
である。FIG. 2 is a block diagram of a registered word speech recognition device according to a second embodiment of the present invention (corresponding to the invention set forth in claim 2).

同図において、１は音声入力端子、２は分析部、３は仮
閾値設定部、１４は閾値設定部、６は区間検出部、７は
区間修正部、８は照合部、９は登録音声用バッフハ　１
０は入力音声用バッフハ　１１は認識結果出力端子、１
２、１５はスイッチであり、前記実施例と同じものは，
同一の番号を付与している。In the figure, 1 is an audio input terminal, 2 is an analysis unit, 3 is a temporary threshold setting unit, 14 is a threshold setting unit, 6 is an interval detection unit, 7 is an interval correction unit, 8 is a matching unit, and 9 is for registered audio. Bachha 1
0 is input audio buffer, 11 is recognition result output terminal, 1
2 and 15 are switches, which are the same as those in the previous embodiment.
The same number is assigned.

仮閾値設定部３で上記パワーの平均値を求め、左記平均
値に一定値（本実施例では６ｄＢとする）加えた値を仮
区間検出閾値とする。The temporary threshold setting unit 3 calculates the average value of the power, and sets the value obtained by adding a constant value (6 dB in this embodiment) to the average value as the temporary section detection threshold.

登録単語音声入力時には、音声入力端子１より入力され
た信号にもとづき、分析部２では単位時間毎の信号のパ
ワーと特徴パラメータとが算出される。パワー算出結果
は閾値設定部１４に、特徴パラメータは入力音声用バッ
ファ１０に入力される。なお分析方法は前記実施例と同
じである。閾値設定部１４では、先に設定された仮区間
検出閾値以上の信号部を仮音声区間とし、仮音声区間内
のピーク値から所定値（本実施例では１８ｄＢとする）
を加えた値と先に算出された仮区間検出閾値とを比較し
、後者の値が大きければ登録を再度やり直すよう話者に
指示し、以上の登録処理を初めからやり直す。When registering word speech is input, the analysis section 2 calculates the signal power and characteristic parameters for each unit time based on the signal input from the speech input terminal 1. The power calculation result is input to the threshold value setting section 14, and the feature parameters are input to the input audio buffer 10. Note that the analysis method was the same as in the previous example. The threshold setting unit 14 defines a signal portion that is equal to or higher than the previously set temporary section detection threshold as a temporary voice section, and calculates a predetermined value (18 dB in this embodiment) from the peak value within the temporary voice section.
is compared with the provisional section detection threshold calculated earlier, and if the latter value is larger, the speaker is instructed to redo the registration, and the above registration process is restarted from the beginning.

前者の値が大きければ、この前者の値（ピーク値−１８
ｄＢ）を検出閾値として設定し、区間検出部６で単位時
間毎の信号パワーと検出閾値とを比較し、音声区間を検
出する。区間検出方法は、従来例と同じである。次に、
音声区間分の特徴パラメータを入力音声用バッファ１０
より入力し、登録音声用バッファ９に登録する。以上の
登録音声以降の処理を全認識単語分繰り返す。If the former value is large, this former value (peak value - 18
dB) is set as a detection threshold, and the section detection unit 6 compares the signal power for each unit time with the detection threshold to detect a voice section. The section detection method is the same as in the conventional example. next,
Audio buffer 10 for inputting feature parameters for audio sections
and register it in the registered audio buffer 9. The above process after the registered voice is repeated for all recognized words.

次に認識時には、登録時と同様に背景雑音から仮区間検
出閾値を設定した後、入力音声を分析し、結果を閾値設
定部１４と入力音声用バッファ１０とに入力する。閾値
設定部１４で、区間検出閾値以上の信号部を仮音声区間
とし、仮音声区間内のピーク値から１８ｄＢを引いた値
と先に算出された仮区間検出閾値とを比較し、両値の大
きい方を閾値と設定した後、区間検出部６で、左記閾値
を用いて音声区間を検出する。なお区間検出方法は登録
時と同様である。Next, during recognition, after setting a temporary section detection threshold based on background noise in the same way as during registration, the input speech is analyzed, and the results are input to the threshold setting unit 14 and the input speech buffer 10. The threshold setting unit 14 defines the signal portion that is equal to or higher than the section detection threshold as a provisional speech section, and compares the value obtained by subtracting 18 dB from the peak value within the provisional speech section with the provisional section detection threshold calculated previously, and calculates the difference between the two values. After setting the larger one as a threshold, the section detecting section 6 detects a speech section using the threshold described on the left. Note that the section detection method is the same as at the time of registration.

次に区間修正部７では、上記閾値がピーク値から１８ｄ
Ｂを引いた値で設定された場合には登録された登録音声
区間の修正は行なわず、閾値が仮区間検出閾値で設定さ
れた場合のみ、上記閾値にて登録音声の区間検出を再度
やり直す。次に照合部８で登録音声と入力音声との照合
を行い、最短距離を示す単語を認識結果として出力端子
１１より出力する。なおスイッチ１５は、音声入力直前
に雑音を入力する際には仮閾値設定部３に、音声入力時
には閾値設定部１４と入力音声用バッファ１０とに算出
結果を入力するように動作する。スイッチ１３は、登録
時には登録用バッファ９に、認識時には区間修正部７に
特徴パラメータを入力するように動作する。Next, the section correction unit 7 sets the threshold value to 18 d from the peak value.
If the value is set by subtracting B, the registered voice section is not corrected, and only when the threshold is set at the temporary section detection threshold, the section detection of the registered voice is re-performed using the threshold. Next, the matching section 8 matches the registered speech and the input speech, and outputs the word indicating the shortest distance from the output terminal 11 as a recognition result. Note that the switch 15 operates to input the calculation result to the temporary threshold setting section 3 when inputting noise immediately before voice input, and to input the calculation result to the threshold setting section 14 and the input voice buffer 10 when inputting voice. The switch 13 operates to input the feature parameters to the registration buffer 9 during registration and to the section correction unit 7 during recognition.

以上のように，本実施例によれば、閾値設定部１４で雑
音パワー値に６ｄＢを加えた値と、ピーク値より１８ｄ
Ｂを引いた値とを比較して大きいｌ５方を閾値と決定した後、区間検出部６で上記閾値を用い
て入力音声の区間検出を行い、区間修正部で上記閾値に
て登録音声区間を修正し、照合部で上記登録音声と入力
音声との照合を行うことにより、登録音声と入力音声と
の音声区間のずれを防ぎ、状況の違いによる誤認識を少
なくすることができる。また本実施例は、第１の実施例
に比べ、ＳＮ比を算出する手間をかけずに同じ効果を期
待できる。As described above, according to the present embodiment, the threshold value setting unit 14 uses a value obtained by adding 6 dB to the noise power value and a value of 18 dB from the peak value.
After comparing the value obtained by subtracting B and determining the larger l5 as the threshold, the section detecting section 6 detects sections of the input speech using the above threshold, and the section correcting section detects the registered speech section using the above threshold. By correcting the registered speech and comparing the input speech with the registered speech in the matching section, it is possible to prevent a shift in the speech section between the registered speech and the input speech, and to reduce misrecognition due to differences in situations. Further, in this embodiment, compared to the first embodiment, the same effect can be expected without taking the trouble of calculating the SN ratio.

発明の効果請求項１記載の音声認識装置は、ＳＮ比算出部でＳＮ比
を算出し、閾値決定部でＳＮ比が低い環境では雑音パワ
ー以上の値を、ＳＮ比が高い環境ではピーク値より所定
値を引いた値を閾値と設定した後、区間検出部で上記閾
値を用いて入力音声の区間検出を行い、さらに区間修正
部で上記閾値にて登録音声または標準音声区間を修正し
、照合部で上記登録音声または標準音声と入力音声との
照合を行うことにより、登録音声と入力音声との音声区
間のずれを防ぎ、状況の違いによる誤認識１［ｉ− を少なくすることができる。Effects of the Invention In the speech recognition device according to claim 1, the SN ratio calculation section calculates the SN ratio, and the threshold value determination section sets a value higher than the noise power in an environment with a low SN ratio, and a value higher than the peak value in an environment with a high SN ratio. After setting a value obtained by subtracting a predetermined value as a threshold, the section detecting section detects sections of the input speech using the above threshold, and the section correcting section corrects the registered speech or standard speech section using the above threshold and performs verification. By comparing the registered speech or standard speech with the input speech in the section, it is possible to prevent a shift in the speech interval between the registered speech and the input speech, and to reduce misrecognitions 1[i-] due to differences in situations.

また請求項２記載の音声認識装置は、閾値設定部で雑音
パワー値に所定値を加えた値と、ピーク値より所定値を
引いた値とを比較して大きい方を閾値と決定した後、区
間検出部で上記閾値を用いて入力音声の区間検出を行い
、さらに区間修正部で上記閾値にて登録音声または標準
音声区間を修正し、照合部で上記登録音声または標準音
声と入力音声との照合を行うことにより、登録音声また
は標準音声と入力音声との音声区間のずれを防ぎ、状況
の違いによる誤認識を少なくすることができる。また上
記発明に比べ、ＳＮ比を算出する手間をかけずに同じ効
果を期待できる。Further, in the speech recognition device according to claim 2, after the threshold value setting section compares the value obtained by adding the predetermined value to the noise power value and the value obtained by subtracting the predetermined value from the peak value, and determines the larger one as the threshold value, The section detecting section detects sections of the input speech using the above threshold, the section correcting section corrects the registered speech or standard speech section using the above threshold, and the matching section compares the registered speech or standard speech with the input speech. By performing the comparison, it is possible to prevent a shift in the voice section between the registered voice or standard voice and the input voice, and to reduce misrecognition due to differences in situations. Moreover, compared to the above invention, the same effect can be expected without taking the trouble of calculating the SN ratio.

【図面の簡単な説明】第１図は本発明の第１の実施例における音声認識装置の
ブロック図、第２図は本発明の第２の実施例における音
声認識装置のブロック図、第３図は従来例における音声
認識装置のブロック図である。．閾値設定部、ｅ．．．区間検出部、７．．．区間修正
部、８．．．照合部。代理人の氏名　弁理士　粟野重孝　はか１名２．．．分
析部、４　．．．Ｓ　Ｎ比算出部、５、１４．．第ｌ図第凶[Brief Description of the Drawings] Fig. 1 is a block diagram of a speech recognition device in a first embodiment of the present invention, Fig. 2 is a block diagram of a speech recognition device in a second embodiment of the invention, and Fig. 3 1 is a block diagram of a conventional speech recognition device. ．． Threshold value setting unit, e. ．．．． Section detection unit, 7. ．．．． Section correction section, 8. ．．．． Collation section. Name of agent: Patent attorney Shigetaka Awano 1 person 2. ．．．． Analysis Department, 4. ．．．． S/N ratio calculation unit, 5, 14. ．． Figure I

Claims

[Claims]

(1) An analysis unit that detects the power of the input signal per unit time and the ratio of voice power to noise power (hereinafter referred to as SN ratio)
a threshold determination unit that determines a section detection threshold in consideration of the SN ratio; a section detection section that detects a voice section of the input signal using the determined threshold; and a registered voice or standard voice. The section includes a section correction section that corrects the section of the input speech, and a matching section that matches the registered speech or standard speech with the input speech and outputs a recognition result.The SN ratio calculation section calculates the SN ratio, and the threshold value determination section In an environment with a low S/N ratio, set the value above the noise power, and in an environment with a high S/N ratio, set the maximum power value of the signal (
A threshold value is set as a value obtained by subtracting a predetermined value from the peak value (hereinafter referred to as the peak value), a section detecting section detects sections of input speech using the above threshold, and a section correcting section uses the above thresholds to detect registered speech or standard speech sections. A speech recognition device characterized in that the speech recognition device is configured such that the matching section matches the registered speech or standard speech with the input speech.

(2) An analysis section that detects the power of the input signal per unit time, a threshold setting section that sets an interval detection threshold considering the peak value of the signal power and the noise power value, and the input signal according to the set threshold. Equipped with a section detection section that detects a speech section of a signal, a section correction section that corrects a section of registered speech or standard speech, and a matching section that matches input speech with registered speech or standard speech and outputs a recognition result. Then, the threshold setting section compares the value obtained by adding a predetermined value to the noise power and the value obtained by subtracting the predetermined value from the peak value of the signal, and sets the larger value as the threshold, and the section detection section sets the above threshold value. The section is configured such that the section of the input speech is detected using the above-described method, the section correcting section corrects the registered speech or standard speech section using the threshold value, and the matching section matches the registered speech or standard speech with the input speech. A voice recognition device featuring: