JPH02293797A - Voice recognizing device - Google Patents

Voice recognizing device

Info

Publication number
JPH02293797A
JPH02293797A JP1114733A JP11473389A JPH02293797A JP H02293797 A JPH02293797 A JP H02293797A JP 1114733 A JP1114733 A JP 1114733A JP 11473389 A JP11473389 A JP 11473389A JP H02293797 A JPH02293797 A JP H02293797A
Authority
JP
Japan
Prior art keywords
section
speech
threshold
value
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1114733A
Other languages
Japanese (ja)
Other versions
JPH0754434B2 (en
Inventor
Yumi Takizawa
滝沢 由実
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP1114733A priority Critical patent/JPH0754434B2/en
Publication of JPH02293797A publication Critical patent/JPH02293797A/en
Publication of JPH0754434B2 publication Critical patent/JPH0754434B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To lessen erroneous recognition even in low S/N environment by calculating the S/N which is the ratio of an input voice power and a noise power, determining the prescribed threshold according to the high and low values of the S/N and executing the section detection of the input voice and the section correction of registered voices, then executing collating. CONSTITUTION:The background noise power of every unit time is calculated in an analyzing section 2 and the threshold is set in a temporarily threshold setting section 3 by adding the prescribed value to the average value thereof. The power of the input voice is thereafter calculated and the S/N is calculated in an S/N calculating section 4. The threshold setting section 5 sets the value obtd. after the prescribed value is subtracted from the peak value as the threshold when the S/N is above the prescribed value. The threshold of the setting section 3 is set as the threshold for the region where the S/N is low. The section of the contents in a buffer 10 for input voice is detected in a section detecting section 6 by this threshold. The section correction is not executed and the collating of the contents of the buffer 9 and buffer 10 for registered voice is executed without executing the section correction when the S/N is large. The contents of the buffer 9 are corrected in the section correcting section and the collating is executed when the S/N is small. The deviation in the collating is eliminated in this way and the erroneous recognition is lessened even in the low S/N environment.

Description

【発明の詳細な説明】 産業上の利用分野 本発明は、音声認識装置に関するものである。[Detailed description of the invention] Industrial applications The present invention relates to a speech recognition device.

従来の技術 近年、音声認識技術の発達と共に、音声認識装置が様々
の分野で実用化されようとしているが、実用化するため
には、認識装置を実際に使用する上での様々の問題点を
解消する必要がある。この実用上の問題点の1つに、音
声入力時のSN比が低い場合に、雑音を音声区間として
誤検出してしまい、その結果、誤認識してしまうという
点かある。
Background of the Invention In recent years, with the development of speech recognition technology, speech recognition devices are being put into practical use in various fields. It needs to be resolved. One of the practical problems is that when the SN ratio at the time of voice input is low, noise is erroneously detected as a voice section, resulting in erroneous recognition.

従来の音声認識装置では、上記問題点を解決するために
、あらかじめ音声を入力する直前に背景雑音を入力して
そのパワーを調べ、音声区間を検出するための閾値を上
記パワー以上に設定しておき、設定された閾値を用いて
音声区間を検出する。
In conventional speech recognition devices, in order to solve the above problem, background noise is input in advance just before inputting the speech, its power is checked, and the threshold value for detecting speech sections is set above the above power. Then, the voice section is detected using the set threshold.

この方法により、SN比が低い環境でも雑音を音声区間
として誤検出することなく、誤認識率が少なくなる。
With this method, even in an environment with a low SN ratio, noise will not be erroneously detected as a speech section, and the erroneous recognition rate will be reduced.

以下,図面を参照しながら、上述したような従来の音声
認識装置について説明を行う。第3図は、従来の登録型
単語音声認識装置のブロック図である。同図において、
1は音声入力端子,2は分析部、16は閾値設定部、1
7は区間検出部、18は照合部、19は認識結果出力端
子、20は登録音声用バッフハ 21は入力音声用バッ
フハ 22、23はスイッチである。以上のように構成
された音声認識装置について以下その動作について説明
する。
Hereinafter, a conventional speech recognition device as described above will be explained with reference to the drawings. FIG. 3 is a block diagram of a conventional registered word speech recognition device. In the same figure,
1 is an audio input terminal, 2 is an analysis section, 16 is a threshold setting section, 1
7 is a section detection section, 18 is a collation section, 19 is a recognition result output terminal, 20 is a buffer for registered speech, 21 is a buffer for input speech, and 22 and 23 are switches. The operation of the speech recognition device configured as described above will be explained below.

まず登録時には、音声入力時直前に、音声入力端子1よ
り所定時間分の背景雑音信号が入力され、分析部2で単
位時間ごとの信号のパワーが算出され、算出結果は閾値
設定部16に入力される。閾値設定部16では上記で算
出されたパワーの平均値を求め、左記平均値に所定値(
たとえば6dBとする)を加えた値を区間検出閾値と設
定する。
First, at the time of registration, a background noise signal for a predetermined period of time is input from the audio input terminal 1 immediately before audio input, the power of the signal for each unit time is calculated by the analysis section 2, and the calculation result is input to the threshold value setting section 16. be done. The threshold value setting unit 16 calculates the average value of the powers calculated above, and adds a predetermined value (
(for example, 6 dB) is set as the section detection threshold.

登録単語音声入力時には、音声入力端子1より入力され
た信号にもとづき、分析部2では単位時間毎の信号のパ
ワーと特徴パラメータが算出され、パワー算出結果は区
間検出部17に、特徴パラメータは入力音声用バッファ
21に入力される。分折方法としてたとえばLPCケプ
ストラム法を用いれば所定の個数のケプスドラム係数が
特徴パラメータとして算出される。次に区間検出部17
では単位時間毎の信号のパワーと先に設定した区間検出
閾値とを比較し、信号のパワーが60msec以上連続
して区間検出閾値以上となる部分を音声区間と決定する
。但し、信号パワーが区間検出閾値以下となっても閾値
以下の区間が60msec以上連続しなければ音声区間
とする。次に決定された音声区間分の特徴パラメータを
入力音声用バッファ21より入力し、登録音声用バッフ
ァ20に保管する。以上の音声入力以降の処理を全認識
単語分繰り返す。
When inputting registered word audio, the analysis unit 2 calculates the power and characteristic parameters of the signal for each unit time based on the signal input from the audio input terminal 1.The power calculation result is sent to the section detection unit 17, and the characteristic parameters are input. It is input to the audio buffer 21. If, for example, the LPC cepstral method is used as the analysis method, a predetermined number of cepstral coefficients are calculated as characteristic parameters. Next, the section detection section 17
Then, the signal power for each unit time is compared with the section detection threshold set previously, and a portion where the signal power is continuously equal to or higher than the section detection threshold for 60 msec or more is determined to be a voice section. However, even if the signal power becomes below the section detection threshold, if the section where the signal power is below the threshold does not continue for 60 msec or more, it is determined as a voice section. Next, the characteristic parameters for the determined voice section are inputted from the input voice buffer 21 and stored in the registered voice buffer 20. The process after the voice input described above is repeated for all recognized words.

次に認識時には、登録時と同様に背景雑音がら区間検出
閾値を設定した後、入力音声を分析し、音声区間を検出
する。分析方法、区間検出方法共に登録時と同じである
。音声区間検出後、照合部18で登録音声と入力音声と
の照合を行い、最短距離を示す単語を認識結果として認
識結果出力端子19より出力する。なおスイッチ22は
、音声入力直前に雑音を入力する場合には閾値設定部1
6に、音声入力時には区間検出部17と入力音声用バッ
ファ21とに算出結果を入力するように動作する。スイ
ッチ23は、登録時には登録用バッファ20に、認識時
には照合部18に特徴パラメータを入力するように動作
する。
Next, during recognition, after setting a section detection threshold based on background noise as in the case of registration, the input speech is analyzed and speech sections are detected. Both the analysis method and section detection method are the same as at the time of registration. After the voice section is detected, the collation unit 18 collates the registered voice and the input voice, and outputs the word indicating the shortest distance from the recognition result output terminal 19 as a recognition result. Note that the switch 22 is set to the threshold value setting section 1 when inputting noise immediately before voice input.
6, when inputting audio, the calculation result is input to the section detecting section 17 and the input audio buffer 21. The switch 23 operates to input the characteristic parameters to the registration buffer 20 during registration and to the matching unit 18 during recognition.

発明が解決しようとする課題 しかしながら、上記のような構成では、雑音パワーの変
化に無関係に雑音を除去することは可能であるが、雑音
または発声パワーの変化に伴い音声区間の始端及び終端
位置がずれるため、登録音声または標準音声発声時と入
力音声発声時との状況が違うと異なる音声区間で照合さ
れるため、誤認識を起こしやすいという問題点を有して
いた。
Problem to be Solved by the Invention However, with the above configuration, it is possible to remove noise regardless of changes in noise power, but the start and end positions of the voice section may change due to changes in noise or vocalization power. Because of this shift, if the conditions are different between when the registered voice or standard voice is uttered and when the input voice is uttered, different voice sections are compared, resulting in a problem that misrecognition is likely to occur.

本発明は、上記問題点に鑑み、登録音声または標準音声
と入力音声との音声区間のずれを防ぎ、状況の違いによ
る誤認識を軽減することができる音声認識装置を提供す
るものである。
In view of the above-mentioned problems, the present invention provides a speech recognition device that can prevent deviations in speech intervals between registered speech or standard speech and input speech, and can reduce misrecognition caused by differences in situations.

婢題点を解決するための手段 上記目的を達成するために請求項1記載の音声認識装置
は、入力信号の単位時間毎のパワーを検出する分析部と
、SN比を算出するSN比算出部と、SN比を考慮して
区間検出閾値を決定する閾値設定部と、決定された閾値
により上記入力信号の音声区間を検出する区間検出部と
、登録音声または標準音声区間を修正する区間修正部と
、登録音声または標準音声と入力音声とを照合して認識
結果を出力する照合部とから構成されている。
Means for Solving the Problems In order to achieve the above object, the speech recognition device according to claim 1 includes an analysis section that detects the power of the input signal per unit time, and an SN ratio calculation section that calculates the SN ratio. a threshold setting section that determines a section detection threshold in consideration of the SN ratio; a section detection section that detects a speech section of the input signal using the determined threshold; and a section correction section that corrects the registered speech or standard speech section. and a matching unit that matches the registered speech or standard speech with the input speech and outputs a recognition result.

また請求項2記載の音声認識装置は、入力信号の単位時
間毎のパワーを検出する分析部と、信号パワーのピーク
値と雑音パワー値とを考慮して区間検出閾値を設定する
閾値設定部と、設定された閾値により上記入力信号の音
声区間を検出する区間検出部と、登録音声または標準音
声の区間を修正する区間修正部と、登録音声または標準
音声と入力音声とを照合して認識結果を出力する照合部
とから構成されている。
The speech recognition device according to claim 2 further includes an analysis section that detects the power of the input signal per unit time, and a threshold setting section that sets the section detection threshold in consideration of the peak value of the signal power and the noise power value. , a section detection section that detects the speech section of the input signal using a set threshold; a section correction section that corrects the section of the registered speech or standard speech; and a recognition result by comparing the registered speech or standard speech with the input speech. It consists of a collation section that outputs.

作   用 請求項1記載の音声認識装置によれば、SN比算出部で
SN比を算出し、閾値決定部でSN比が低い環境では雑
音パワー以上の値を、SN比が高い環iではピーク値か
ら所定値を引いた値を閾値と決定した後、区間検出部で
上記閾値を用いて入力音声の区間検出を行い、さらに区
間修正部で上記閾値にて登録音声または標準音声区間を
修正し、照合部で上記登録音声または標準音声と入力音
声との照合を行う。
According to the speech recognition device according to claim 1, the SN ratio calculation section calculates the SN ratio, and the threshold value determination section sets a value higher than the noise power in an environment where the SN ratio is low, and sets a value equal to or higher than the noise power in an environment where the SN ratio is high. After determining a value obtained by subtracting a predetermined value from the value as a threshold, a section detecting section detects sections of the input speech using the above threshold, and a section correcting section corrects the registered speech or standard speech section using the above threshold. , the collation unit collates the registered voice or standard voice with the input voice.

また請求項2記載の音声認識装置によれば、閾値設定部
で雑音パワー値に所定値を加えた値と、ピーク値より所
定値を引いた値とを比較して大きい方の値を閾値と設定
した後、区間検出部で上記閾値を用いて入力音声の区間
検出を行い、さらに区間修正部で上記閾値にて登録音声
または標準音声区間を修正し、照合部で上記登録音声ま
たは標準音声と入力音声との照合を行う。
According to the speech recognition device according to claim 2, the threshold value setting section compares the value obtained by adding the predetermined value to the noise power value and the value obtained by subtracting the predetermined value from the peak value, and sets the larger value as the threshold value. After setting, the section detecting section detects sections of the input speech using the above thresholds, the section correcting section corrects the registered speech or standard speech section using the above thresholds, and the matching section compares the sections with the registered speech or standard speech. Verify with input audio.

実施例 第1図は、本発明の第1の実施例(請求項1記載の発明
に対応)における登録型単語音声認識装置のブロック図
である。
Embodiment FIG. 1 is a block diagram of a registered word speech recognition device in a first embodiment of the present invention (corresponding to the invention set forth in claim 1).

同図において、1は音声入力端子、2は分析部、3は仮
閾値設定部、4はSN比算出部、5は閾値設定部、6は
区間検出部、7は区間修正部、8は照合部、9は登録音
声用バッフハ 10は入力音声用バッフハ 11は認識
結果出力端子、12、13はスイッチであり、従来例(
第3図参照)と同じものは同一の番号を付与している。
In the figure, 1 is an audio input terminal, 2 is an analysis section, 3 is a temporary threshold setting section, 4 is an SN ratio calculation section, 5 is a threshold setting section, 6 is an interval detection section, 7 is an interval correction section, and 8 is a verification section. 9 is a buffer for registered audio, 10 is a buffer for input audio, 11 is a recognition result output terminal, 12 and 13 are switches, and the conventional example (
(See Figure 3) are given the same numbers.

以上のように構成された音声認識装置について以下その
動作について説明する。
The operation of the speech recognition device configured as described above will be explained below.

まず登録時には、音声入力時直前に、音声入力端子1よ
り所定時間分の背景雑音信号が入力され、分析部2で単
位時間ごとの信号のパワーが算出される。算出結果は仮
閾値設定部3に入力される。
First, at the time of registration, a background noise signal for a predetermined period of time is input from the audio input terminal 1 immediately before audio input, and the power of the signal for each unit time is calculated by the analysis section 2. The calculation result is input to the temporary threshold setting section 3.

仮閾値設定部3で上記パワーの平均値を求め、左記平均
値に所定値(本実施例では8dBとする)加えた値を仮
区間検出閾値とする。
The temporary threshold setting unit 3 calculates the average value of the power, and sets the value obtained by adding a predetermined value (8 dB in this embodiment) to the average value as the temporary section detection threshold.

登録単語音声入力時には、音声入力端子1より入力され
た信号にもとづき、分析部2では単位時間毎の信号のパ
ワーと特徴パラメータが算出される。パワー算出結果は
SN比算出部4に、特徴パラメータは入力音声用バッフ
ァ10に入力される。
At the time of voice input of registered words, the analysis section 2 calculates the power and characteristic parameters of the signal for each unit time based on the signal input from the voice input terminal 1. The power calculation result is input to the SN ratio calculation unit 4, and the feature parameters are input to the input audio buffer 10.

なお分析方法は従来例と同じである。SN比算出部4で
は、先に設定された仮区間検出閾値以上の信号部を仮の
音声区間として、仮音声区間内のピーク値と先に算出さ
れた雑音パワーとの平均値の比をSN比として算出し、
SN比が所定値(本例では24dBとする)以下であれ
ば登録を再度やり直すよう話者に指示し、以上の登録処
理を初めからやり直す。
Note that the analysis method is the same as in the conventional example. In the SN ratio calculation unit 4, the signal portion that is equal to or higher than the previously set temporary section detection threshold is regarded as a temporary voice section, and the ratio of the average value of the peak value within the temporary voice section and the previously calculated noise power is determined as the SN ratio. Calculated as a ratio,
If the SN ratio is less than a predetermined value (24 dB in this example), the speaker is instructed to perform the registration again, and the above registration process is restarted from the beginning.

SN値が24dB以上であれば閾値設定部5で、ピーク
値より所定値(本実施例では18dBとする)を引いた
値を検出閾値として決定する。区間検出部6で単位時間
毎の信号パワーと検出閾値とを比較し、音声区間を検出
する。区間検出方法は、従来例と同じである。次に、音
声区間分の特徴パラメータを入力音声用バッファ10よ
り入力し、登録音声用バッファ9に登録する。以上の登
録音声入力以降の処理を全認識単語分繰り返す。
If the SN value is 24 dB or more, the threshold setting unit 5 determines a value obtained by subtracting a predetermined value (18 dB in this embodiment) from the peak value as the detection threshold. A section detection unit 6 compares the signal power for each unit time with a detection threshold to detect a voice section. The section detection method is the same as in the conventional example. Next, the characteristic parameters for the voice section are inputted from the input voice buffer 10 and registered in the registered voice buffer 9. The process after the above registered voice input is repeated for all recognized words.

次に認識時には、登録時と同様に背景雑音から仮区間検
出閾値を設定した後、入力音声を分析し、結果をSN比
算出部4と入力音声用バッファ10とに入力する。SN
比算出部4で登録時同様にSN比を算出し、結果を閾値
設定部5に入カする。
Next, at the time of recognition, after setting a temporary section detection threshold based on background noise as in the case of registration, the input voice is analyzed and the results are input to the SN ratio calculation unit 4 and the input voice buffer 10. SN
The ratio calculation section 4 calculates the SN ratio in the same manner as during registration, and inputs the result into the threshold setting section 5.

閾値設定部5で、SN比が24dB以上であればピーク
値から18dBを引いた値を閾値とし、SN比が24d
B以下であれば先の仮区間検出閾値を閾値と設定した後
、区間検出部6で、左記閾値を用いて音声区間を検出す
る。なお区間検出方法は登録時と同様である。
In the threshold value setting section 5, if the SN ratio is 24 dB or more, the value obtained by subtracting 18 dB from the peak value is set as the threshold value, and the SN ratio is set to 24 dB.
If it is less than or equal to B, the previous provisional section detection threshold is set as the threshold, and then the section detecting section 6 detects the voice section using the threshold described on the left. Note that the section detection method is the same as at the time of registration.

次に区間修正部7では、上記SN比が24dB以上の際
には登録された登録音声区間の修正は行なわず、SN比
が24dB以下の場合のみ、上記閾値にて登録音声の区
間検出を再度やり直す。次に照合部8で登録音声七人カ
音声との照合を行い、最短距離を示す単語を認識結果と
して出カ端子11より出力する。なおスイッチ12は、
音声入カ直前に雑音を入力する際には仮閾値設定部3に
、音声入力時にはSN比算出部4と入方音声用バッファ
10とに算出結果を入力するように動作する。
Next, the section correction unit 7 does not correct the registered voice section when the SN ratio is 24 dB or more, and only when the SN ratio is 24 dB or less, detects the section of the registered voice again using the threshold value. Start over. Next, the verification section 8 performs verification against the registered voices of seven people, and outputs the word indicating the shortest distance from the output terminal 11 as a recognition result. Note that the switch 12 is
It operates to input the calculation result to the temporary threshold setting section 3 when inputting noise immediately before inputting the voice, and to input the calculation result to the SN ratio calculation section 4 and the incoming voice buffer 10 when inputting the voice.

スイッチ13は、登録時には登録用バッファ9に、認識
時には区間修正部7に特徴パラメータを入ヵl1 するように動作する。
The switch 13 operates to input the feature parameters into the registration buffer 9 during registration and into the section correction unit 7 during recognition.

以上のように,本実施例によれば、SN比算出部4で信
号のピーク値と雑音の平均パワー値との比を算出し、閾
値設定部5で上記SN比が一定値以下の場合は雑音パワ
ー値に所定値を加えた値を、SN比が一定値以上の場合
にはピーク値より所定値を引いた値を閾値と決定し、区
間検出部6で上記閾値を用いて入力音声の区間検出を行
い、区間修正部7で上記閾値にて登録音声を修正し、照
合部8で上記登録音声と入力音声との照合を行うことに
より、登録音声と入力音声との音声区間のずれを防ぎ、
状況の違いによる誤認識を軽減することができる。
As described above, according to this embodiment, the SN ratio calculation section 4 calculates the ratio between the peak value of the signal and the average power value of the noise, and the threshold setting section 5 calculates the ratio when the SN ratio is below a certain value. A value obtained by adding a predetermined value to the noise power value is determined as a threshold value, and a value obtained by subtracting a predetermined value from the peak value when the S/N ratio is greater than a predetermined value is determined as a threshold value. By performing section detection, correcting the registered speech using the above-mentioned threshold value in the section correction section 7, and comparing the above-mentioned registered speech and input speech in the matching section 8, the deviation in the speech section between the registered speech and the input speech is corrected. prevent,
Misrecognition due to differences in situations can be reduced.

第2図は、本発明の第2の実施例(請求項2記載の発明
に対応)における登録型単語音声認識装置のブロック図
である。
FIG. 2 is a block diagram of a registered word speech recognition device according to a second embodiment of the present invention (corresponding to the invention set forth in claim 2).

同図において、1は音声入力端子、2は分析部、3は仮
閾値設定部、14は閾値設定部、6は区間検出部、7は
区間修正部、8は照合部、9は登録音声用バッフハ 1
0は入力音声用バッフハ 11は認識結果出力端子、1
2、15はスイッチであり、前記実施例と同じものは,
同一の番号を付与している。
In the figure, 1 is an audio input terminal, 2 is an analysis unit, 3 is a temporary threshold setting unit, 14 is a threshold setting unit, 6 is an interval detection unit, 7 is an interval correction unit, 8 is a matching unit, and 9 is for registered audio. Bachha 1
0 is input audio buffer, 11 is recognition result output terminal, 1
2 and 15 are switches, which are the same as those in the previous embodiment.
The same number is assigned.

以上のように構成された音声認識装置について以下その
動作について説明する。
The operation of the speech recognition device configured as described above will be explained below.

まず登録時には、音声入力時直前に、音声入力端子1よ
り所定時間分の背景雑音信号が入力され、分析部2で単
位時間ごとの信号のパワーが算出される。算出結果は仮
閾値設定部3に入力される。
First, at the time of registration, a background noise signal for a predetermined period of time is input from the audio input terminal 1 immediately before audio input, and the power of the signal for each unit time is calculated by the analysis section 2. The calculation result is input to the temporary threshold setting section 3.

仮閾値設定部3で上記パワーの平均値を求め、左記平均
値に一定値(本実施例では6dBとする)加えた値を仮
区間検出閾値とする。
The temporary threshold setting unit 3 calculates the average value of the power, and sets the value obtained by adding a constant value (6 dB in this embodiment) to the average value as the temporary section detection threshold.

登録単語音声入力時には、音声入力端子1より入力され
た信号にもとづき、分析部2では単位時間毎の信号のパ
ワーと特徴パラメータとが算出される。パワー算出結果
は閾値設定部14に、特徴パラメータは入力音声用バッ
ファ10に入力される。なお分析方法は前記実施例と同
じである。閾値設定部14では、先に設定された仮区間
検出閾値以上の信号部を仮音声区間とし、仮音声区間内
のピーク値から所定値(本実施例では18dBとする)
を加えた値と先に算出された仮区間検出閾値とを比較し
、後者の値が大きければ登録を再度やり直すよう話者に
指示し、以上の登録処理を初めからやり直す。
When registering word speech is input, the analysis section 2 calculates the signal power and characteristic parameters for each unit time based on the signal input from the speech input terminal 1. The power calculation result is input to the threshold value setting section 14, and the feature parameters are input to the input audio buffer 10. Note that the analysis method was the same as in the previous example. The threshold setting unit 14 defines a signal portion that is equal to or higher than the previously set temporary section detection threshold as a temporary voice section, and calculates a predetermined value (18 dB in this embodiment) from the peak value within the temporary voice section.
is compared with the provisional section detection threshold calculated earlier, and if the latter value is larger, the speaker is instructed to redo the registration, and the above registration process is restarted from the beginning.

前者の値が大きければ、この前者の値(ピーク値−18
dB)を検出閾値として設定し、区間検出部6で単位時
間毎の信号パワーと検出閾値とを比較し、音声区間を検
出する。区間検出方法は、従来例と同じである。次に、
音声区間分の特徴パラメータを入力音声用バッファ10
より入力し、登録音声用バッファ9に登録する。以上の
登録音声以降の処理を全認識単語分繰り返す。
If the former value is large, this former value (peak value - 18
dB) is set as a detection threshold, and the section detection unit 6 compares the signal power for each unit time with the detection threshold to detect a voice section. The section detection method is the same as in the conventional example. next,
Audio buffer 10 for inputting feature parameters for audio sections
and register it in the registered audio buffer 9. The above process after the registered voice is repeated for all recognized words.

次に認識時には、登録時と同様に背景雑音から仮区間検
出閾値を設定した後、入力音声を分析し、結果を閾値設
定部14と入力音声用バッファ10とに入力する。閾値
設定部14で、区間検出閾値以上の信号部を仮音声区間
とし、仮音声区間内のピーク値から18dBを引いた値
と先に算出された仮区間検出閾値とを比較し、両値の大
きい方を閾値と設定した後、区間検出部6で、左記閾値
を用いて音声区間を検出する。なお区間検出方法は登録
時と同様である。
Next, during recognition, after setting a temporary section detection threshold based on background noise in the same way as during registration, the input speech is analyzed, and the results are input to the threshold setting unit 14 and the input speech buffer 10. The threshold setting unit 14 defines the signal portion that is equal to or higher than the section detection threshold as a provisional speech section, and compares the value obtained by subtracting 18 dB from the peak value within the provisional speech section with the provisional section detection threshold calculated previously, and calculates the difference between the two values. After setting the larger one as a threshold, the section detecting section 6 detects a speech section using the threshold described on the left. Note that the section detection method is the same as at the time of registration.

次に区間修正部7では、上記閾値がピーク値から18d
Bを引いた値で設定された場合には登録された登録音声
区間の修正は行なわず、閾値が仮区間検出閾値で設定さ
れた場合のみ、上記閾値にて登録音声の区間検出を再度
やり直す。次に照合部8で登録音声と入力音声との照合
を行い、最短距離を示す単語を認識結果として出力端子
11より出力する。なおスイッチ15は、音声入力直前
に雑音を入力する際には仮閾値設定部3に、音声入力時
には閾値設定部14と入力音声用バッファ10とに算出
結果を入力するように動作する。スイッチ13は、登録
時には登録用バッファ9に、認識時には区間修正部7に
特徴パラメータを入力するように動作する。
Next, the section correction unit 7 sets the threshold value to 18 d from the peak value.
If the value is set by subtracting B, the registered voice section is not corrected, and only when the threshold is set at the temporary section detection threshold, the section detection of the registered voice is re-performed using the threshold. Next, the matching section 8 matches the registered speech and the input speech, and outputs the word indicating the shortest distance from the output terminal 11 as a recognition result. Note that the switch 15 operates to input the calculation result to the temporary threshold setting section 3 when inputting noise immediately before voice input, and to input the calculation result to the threshold setting section 14 and the input voice buffer 10 when inputting voice. The switch 13 operates to input the feature parameters to the registration buffer 9 during registration and to the section correction unit 7 during recognition.

以上のように,本実施例によれば、閾値設定部14で雑
音パワー値に6dBを加えた値と、ピーク値より18d
Bを引いた値とを比較して大きいl5 方を閾値と決定した後、区間検出部6で上記閾値を用い
て入力音声の区間検出を行い、区間修正部で上記閾値に
て登録音声区間を修正し、照合部で上記登録音声と入力
音声との照合を行うことにより、登録音声と入力音声と
の音声区間のずれを防ぎ、状況の違いによる誤認識を少
なくすることができる。また本実施例は、第1の実施例
に比べ、SN比を算出する手間をかけずに同じ効果を期
待できる。
As described above, according to the present embodiment, the threshold value setting unit 14 uses a value obtained by adding 6 dB to the noise power value and a value of 18 dB from the peak value.
After comparing the value obtained by subtracting B and determining the larger l5 as the threshold, the section detecting section 6 detects sections of the input speech using the above threshold, and the section correcting section detects the registered speech section using the above threshold. By correcting the registered speech and comparing the input speech with the registered speech in the matching section, it is possible to prevent a shift in the speech section between the registered speech and the input speech, and to reduce misrecognition due to differences in situations. Further, in this embodiment, compared to the first embodiment, the same effect can be expected without taking the trouble of calculating the SN ratio.

発明の効果 請求項1記載の音声認識装置は、SN比算出部でSN比
を算出し、閾値決定部でSN比が低い環境では雑音パワ
ー以上の値を、SN比が高い環境ではピーク値より所定
値を引いた値を閾値と設定した後、区間検出部で上記閾
値を用いて入力音声の区間検出を行い、さらに区間修正
部で上記閾値にて登録音声または標準音声区間を修正し
、照合部で上記登録音声または標準音声と入力音声との
照合を行うことにより、登録音声と入力音声との音声区
間のずれを防ぎ、状況の違いによる誤認識1[i− を少なくすることができる。
Effects of the Invention In the speech recognition device according to claim 1, the SN ratio calculation section calculates the SN ratio, and the threshold value determination section sets a value higher than the noise power in an environment with a low SN ratio, and a value higher than the peak value in an environment with a high SN ratio. After setting a value obtained by subtracting a predetermined value as a threshold, the section detecting section detects sections of the input speech using the above threshold, and the section correcting section corrects the registered speech or standard speech section using the above threshold and performs verification. By comparing the registered speech or standard speech with the input speech in the section, it is possible to prevent a shift in the speech interval between the registered speech and the input speech, and to reduce misrecognitions 1[i-] due to differences in situations.

また請求項2記載の音声認識装置は、閾値設定部で雑音
パワー値に所定値を加えた値と、ピーク値より所定値を
引いた値とを比較して大きい方を閾値と決定した後、区
間検出部で上記閾値を用いて入力音声の区間検出を行い
、さらに区間修正部で上記閾値にて登録音声または標準
音声区間を修正し、照合部で上記登録音声または標準音
声と入力音声との照合を行うことにより、登録音声また
は標準音声と入力音声との音声区間のずれを防ぎ、状況
の違いによる誤認識を少なくすることができる。また上
記発明に比べ、SN比を算出する手間をかけずに同じ効
果を期待できる。
Further, in the speech recognition device according to claim 2, after the threshold value setting section compares the value obtained by adding the predetermined value to the noise power value and the value obtained by subtracting the predetermined value from the peak value, and determines the larger one as the threshold value, The section detecting section detects sections of the input speech using the above threshold, the section correcting section corrects the registered speech or standard speech section using the above threshold, and the matching section compares the registered speech or standard speech with the input speech. By performing the comparison, it is possible to prevent a shift in the voice section between the registered voice or standard voice and the input voice, and to reduce misrecognition due to differences in situations. Moreover, compared to the above invention, the same effect can be expected without taking the trouble of calculating the SN ratio.

【図面の簡単な説明】 第1図は本発明の第1の実施例における音声認識装置の
ブロック図、第2図は本発明の第2の実施例における音
声認識装置のブロック図、第3図は従来例における音声
認識装置のブロック図である。 .閾値設定部、e...区間検出部、7...区間修正
部、8...照合部。 代理人の氏名 弁理士 粟野重孝 はか1名2...分
析部、4 ...S N比算出部、5、14..第 l 図 第 凶
[Brief Description of the Drawings] Fig. 1 is a block diagram of a speech recognition device in a first embodiment of the present invention, Fig. 2 is a block diagram of a speech recognition device in a second embodiment of the invention, and Fig. 3 1 is a block diagram of a conventional speech recognition device. .. Threshold value setting unit, e. .. .. Section detection unit, 7. .. .. Section correction section, 8. .. .. Collation section. Name of agent: Patent attorney Shigetaka Awano 1 person 2. .. .. Analysis Department, 4. .. .. S/N ratio calculation unit, 5, 14. .. Figure I

Claims (2)

【特許請求の範囲】[Claims] (1)入力信号の単位時間毎のパワーを検出する分析部
と、音声パワーと雑音パワーの比(以後SN比という)
を算出するSN比算出部と、SN比を考慮して区間検出
閾値を決定する閾値決定部と、決定された閾値により上
記入力信号の音声区間を検出する区間検出部と、登録音
声または標準音声の区間を修正する区間修正部と、登録
音声または標準音声と入力音声とを照合して認識結果を
出力する照合部とを具備し、SN比算出部でSN比を算
出し、閾値決定部でSN比が低い環境では雑音パワー以
上の値を、SN比が高い環境では信号の最大パワー値(
以後ピーク値という)より所定値を引いた値を閾値と設
定し、区間検出部で上記閾値を用いて入力音声の区間検
出を行い、区間修正部で上記閾値を使って登録音声また
は標準音声区間を修正し、照合部で上記登録音声または
標準音声と入力音声との照合を行うように構成したこと
を特徴とする音声認識装置。
(1) An analysis unit that detects the power of the input signal per unit time and the ratio of voice power to noise power (hereinafter referred to as SN ratio)
a threshold determination unit that determines a section detection threshold in consideration of the SN ratio; a section detection section that detects a voice section of the input signal using the determined threshold; and a registered voice or standard voice. The section includes a section correction section that corrects the section of the input speech, and a matching section that matches the registered speech or standard speech with the input speech and outputs a recognition result.The SN ratio calculation section calculates the SN ratio, and the threshold value determination section In an environment with a low S/N ratio, set the value above the noise power, and in an environment with a high S/N ratio, set the maximum power value of the signal (
A threshold value is set as a value obtained by subtracting a predetermined value from the peak value (hereinafter referred to as the peak value), a section detecting section detects sections of input speech using the above threshold, and a section correcting section uses the above thresholds to detect registered speech or standard speech sections. A speech recognition device characterized in that the speech recognition device is configured such that the matching section matches the registered speech or standard speech with the input speech.
(2)入力信号の単位時間毎のパワーを検出する分析部
と、信号パワーのピーク値と雑音パワー値とを考慮して
区間検出閾値を設定する閾値設定部と、設定された閾値
により上記入力信号の音声区間を検出する区間検出部と
、登録音声または標準音声の区間を修正する区間修正部
と、登録音声または標準音声と入力音声とを照合して認
識結果を出力する照合部とを具備し、閾値設定部で雑音
パワーに所定の値を加えた値と、信号のピーク値より所
定値を引いた値を比較して大きい方の値を閾値と設定し
、区間検出部で上記閾値を用いて入力音声の区間検出を
行い、区間修正部で上記閾値にて登録音声または標準音
声区間を修正し、照合部で上記登録音声または標準音声
と入力音声との照合を行うように構成したことを特徴と
する音声認識装置。
(2) An analysis section that detects the power of the input signal per unit time, a threshold setting section that sets an interval detection threshold considering the peak value of the signal power and the noise power value, and the input signal according to the set threshold. Equipped with a section detection section that detects a speech section of a signal, a section correction section that corrects a section of registered speech or standard speech, and a matching section that matches input speech with registered speech or standard speech and outputs a recognition result. Then, the threshold setting section compares the value obtained by adding a predetermined value to the noise power and the value obtained by subtracting the predetermined value from the peak value of the signal, and sets the larger value as the threshold, and the section detection section sets the above threshold value. The section is configured such that the section of the input speech is detected using the above-described method, the section correcting section corrects the registered speech or standard speech section using the threshold value, and the matching section matches the registered speech or standard speech with the input speech. A voice recognition device featuring:
JP1114733A 1989-05-08 1989-05-08 Voice recognizer Expired - Lifetime JPH0754434B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1114733A JPH0754434B2 (en) 1989-05-08 1989-05-08 Voice recognizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1114733A JPH0754434B2 (en) 1989-05-08 1989-05-08 Voice recognizer

Publications (2)

Publication Number Publication Date
JPH02293797A true JPH02293797A (en) 1990-12-04
JPH0754434B2 JPH0754434B2 (en) 1995-06-07

Family

ID=14645272

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1114733A Expired - Lifetime JPH0754434B2 (en) 1989-05-08 1989-05-08 Voice recognizer

Country Status (1)

Country Link
JP (1) JPH0754434B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09212195A (en) * 1995-12-12 1997-08-15 Nokia Mobile Phones Ltd Device and method for voice activity detection and mobile station
JP2003524794A (en) * 1999-02-08 2003-08-19 クゥアルコム・インコーポレイテッド Speech endpoint determination in noisy signals
JP2007304605A (en) * 1994-08-10 2007-11-22 Qualcomm Inc Method and apparatus for selecting a speech encoding rate in a variable rate vocoder
JP2010204266A (en) * 2009-03-02 2010-09-16 Fujitsu Ltd Sound signal converting device, method and program

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007304605A (en) * 1994-08-10 2007-11-22 Qualcomm Inc Method and apparatus for selecting a speech encoding rate in a variable rate vocoder
JP4680957B2 (en) * 1994-08-10 2011-05-11 クゥアルコム・インコーポレイテッド Method and apparatus for speech encoding rate determination in a variable rate vocoder
JPH09212195A (en) * 1995-12-12 1997-08-15 Nokia Mobile Phones Ltd Device and method for voice activity detection and mobile station
JP2008293038A (en) * 1995-12-12 2008-12-04 Nokia Corp Voice activity detection device and mobile station, and voice activity detection method
JP2003524794A (en) * 1999-02-08 2003-08-19 クゥアルコム・インコーポレイテッド Speech endpoint determination in noisy signals
JP2010204266A (en) * 2009-03-02 2010-09-16 Fujitsu Ltd Sound signal converting device, method and program

Also Published As

Publication number Publication date
JPH0754434B2 (en) 1995-06-07

Similar Documents

Publication Publication Date Title
US7050973B2 (en) Speaker recognition using dynamic time warp template spotting
JPH03206499A (en) Voice recognition device
JPH02293797A (en) Voice recognizing device
JP3114757B2 (en) Voice recognition device
JPH02210500A (en) Standard pattern registering system
KR100449912B1 (en) Apparatus and method for detecting topic in speech recognition system
JPH11249688A (en) Device and method for recognizing voice
JP2901976B2 (en) Pattern matching preliminary selection method
JP3065739B2 (en) Voice section detection device
JP2979999B2 (en) Voice recognition device
JP3631020B2 (en) Speaker recognition method
JPH03248268A (en) Audio interactive processing system
JPH0619491A (en) Speech recognizing device
JPS6285393A (en) Pattern recognizing device with rejecting function
JP2844592B2 (en) Discrete word speech recognition device
JPH04152397A (en) Voice recognizing device
JPH0419700A (en) Method for matching voice pattern
JPH0376471B2 (en)
JP3439602B2 (en) Voice recognition device
JPH01222299A (en) Voice recognizing device
KR19980045013A (en) How to Improve Speaker Recognizer by Entering Password
JPS58159598A (en) Monosyllabic voice recognition system
JPS6329279B2 (en)
JPH05257493A (en) Voice recognizing device
JPH02198499A (en) Automatic updating system for dictionary of voice recognizing device