JPS61156100A - Voice recognition equipment - Google Patents
Voice recognition equipmentInfo
- Publication number
- JPS61156100A JPS61156100A JP59277403A JP27740384A JPS61156100A JP S61156100 A JPS61156100 A JP S61156100A JP 59277403 A JP59277403 A JP 59277403A JP 27740384 A JP27740384 A JP 27740384A JP S61156100 A JPS61156100 A JP S61156100A
- Authority
- JP
- Japan
- Prior art keywords
- voice
- recognition
- recognition result
- speech
- pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】 〔産業上の利用分野〕 本発明は音声認識装置に関する。[Detailed description of the invention] [Industrial application field] The present invention relates to a speech recognition device.
[従来の技術]
従来、この種の音声認識装置においては、音声区間の検
出は、決められた1種類の閾値に基づいて行なわれ、認
識結果がリジェクトあるいは誤認識となった場合1話者
に再度発声させ音声を再入力させて同じ認識処理を繰返
す方式が多く用いられていた。[Prior Art] Conventionally, in this type of speech recognition device, detection of a speech section is performed based on one type of predetermined threshold value, and if the recognition result is rejected or misrecognized, one speaker is A method was often used in which the same recognition process was repeated by making the user speak again and input the voice again.
[発明が解決しようとする問題点]
L述した従来の音声認識装置では、音声区間の検出を決
められた1種類の閾値に基づいて行なわれているため、
音声区間の検出が適切に行なわれなかったことが原因で
誤認識あるいはりジェツトとなった場合、話者に再度発
声させて音声を再入させても、同じような音声区間検出
誤りが発生し再び誤認識あるいはりジェツトを繰返す危
険性が大きく、また、同じ内容を話者に再発声させるた
め、話者への負担が増しサービス性の低下を招くという
問題点がある。[Problems to be Solved by the Invention] In the conventional speech recognition device described above, detection of speech sections is performed based on one type of predetermined threshold value.
If a speech segment is misrecognized or dropped due to improper speech segment detection, the same speech segment detection error will occur even if the speaker speaks again and the speech is re-entered. There is a large risk of repeating misrecognition or misrepresentation, and since the speaker is forced to repeat the same content, there is a problem that the burden on the speaker increases and the quality of service deteriorates.
[問題点を解決するための手段]
本発明は、入力音声の分析結果である音声パワーと特徴
パターンとをバッファ・メモリに記憶しておき、かつ音
声区間の検出に用いる閾値を数種類用意しておき、検出
された音声区間の特徴パターンと標準パターンとのパタ
ーン・マッチングを行ない類似度を求め、求まった類似
度を判定し、認識結果がリジェクトとなった場合、音声
区間検出の閾値を変更し、バッファ・メモリに記憶され
ている入力音声のパワーを読出して再度音声区間を検出
し、再度求まった音声区間の特徴パターンを同様にバッ
ファ・メモリから読出しパターン・マッチング、認識結
果の判定を行なうことにより、1回の発声の入力音声に
ついて、複数の閾値で音声区間の検出を行なうことがで
きるようにしたもので、より適切な音声区間の検出がで
きる確率が大きくなり、音声区間の誤検出が原因でリジ
ェクト、誤認識となる確率が小さくなり、上述した従来
装置の問題点を解決することができる。[Means for Solving the Problems] The present invention stores speech power and characteristic patterns, which are the analysis results of input speech, in a buffer memory, and prepares several types of threshold values for use in detecting speech sections. Then, pattern matching is performed between the characteristic pattern of the detected speech section and the standard pattern to find the degree of similarity.If the recognition result is rejected, the threshold for speech section detection is changed. , read out the power of the input voice stored in the buffer memory, detect the voice section again, read out the feature pattern of the voice section found again from the buffer memory, perform pattern matching, and judge the recognition result. This makes it possible to detect speech sections using multiple thresholds for input speech of a single utterance, increasing the probability of detecting a more appropriate speech section and reducing false detection of speech sections. The probability of rejection or erroneous recognition due to this decreases, and the above-mentioned problems of the conventional device can be solved.
[実施例] 本発明の実施例について図面を参照しながら説明する。[Example] Embodiments of the present invention will be described with reference to the drawings.
第1図は本発明による音声認識装置の一実施例のブロッ
ク図である。FIG. 1 is a block diagram of an embodiment of a speech recognition device according to the present invention.
本実施例の音声認識装置は、音声分析手段2、分析結果
バッファ・メモリ3、音声区間検出手段4、標準パター
ン・メモリ5.パターン−マツチング手段6、認識結果
判定手段7より構成される。The speech recognition device of this embodiment includes a speech analysis means 2, an analysis result buffer memory 3, a speech section detection means 4, a standard pattern memory 5. It is composed of a pattern matching means 6 and a recognition result determining means 7.
音声分析手段2は入力音声1を入力し、その分析結果で
ある特徴パターン8と音声パワー9を分析結果バッファ
・メモリ3に記憶させる。音声区間検出手段4は、分析
結果バッファ・メモリ3から記憶されている音声パワー
11を読出して音声区間の検出を行ない、音声区間の始
端・終端情報12をパターン・マッチング手段6に出力
する。パターン・マッチング手段6は音声区間の始端・
終端情報12に従って分析結果バッファ・゛メモリ3か
ら記憶されている音声区間の特徴パターン10を読出し
、また標準パターン・メモリ5(認識対象の単語・文の
それぞれに対して複数の標準パターンが記憶されている
)から標準パターン13を読出し、音声区間の特徴パタ
ーンと標準パターンとのパターン・マッチングを行ない
、類似度14を認識結果判定手段7へ出力する。認識結
果判定手段7は、入力された類似度14に従って認識結
果を判定し、リジェクトでない場合は認識結果1Bを出
力し、一連の音声認識処理を終了するが、リジェクトと
なった場合は、リジェクトという認識結果を出力せず、
再認識指示15を音声区間検出手段4とパターン・マッ
チング手段6に出力する。音声区間検出手段4は、再認
識指示15を受けとると、音声区間検出のための閾値を
変更し、分析結果バッファ1メモリ3から再度、記憶さ
れている音声パワー11を読出し、新たな音声区間を検
出し、パターン・マッチング手段6へ出力する。パター
ン・マッチング手段6は、再認識指示15を受けとると
、新たな音声区間の始端・終端情報12に従って分析結
果バッファ・メモリ3から記憶されている特徴パターン
を再度読出し、最初の認識処理と同様に標準パターン・
メモリ5から標準パターン13を読出しパターン・マッ
チングを行ない、新たな類似度14を認識結果判定手段
7へ出力する。認識結果判定手段7は入力された新たな
類似度14を判定し、リジェクトでない場合は認識結果
16を出カレ、再認識処理を終了し、認識結果がリジェ
クトとなり、再認識処理繰返し最大回数が1回の場合に
はりジェツトという認識結果1Bを出力し再認識処理を
終了し、最大回数が2回以上の場合には、再認識処理を
繰返し行ない、最大繰返し回数だけ再認識処理を行なっ
ても認識結果がリジェクトとなった場合には、リジェク
トという認識結果16を出力し、再認識処理を終了する
。The speech analysis means 2 inputs the input speech 1, and stores the analysis results of the characteristic pattern 8 and the speech power 9 in the analysis result buffer memory 3. The voice section detecting means 4 reads out the voice power 11 stored from the analysis result buffer memory 3, detects the voice section, and outputs the start and end information 12 of the voice section to the pattern matching means 6. The pattern matching means 6 detects the beginning and end of the voice section.
According to the termination information 12, the characteristic pattern 10 of the voice section stored in the analysis result buffer/memory 3 is read out, and the standard pattern memory 5 (in which a plurality of standard patterns are stored for each word/sentence to be recognized) is read out. The standard pattern 13 is read out from the standard pattern 13, pattern matching is performed between the characteristic pattern of the voice section and the standard pattern, and the similarity 14 is output to the recognition result determining means 7. The recognition result determination means 7 determines the recognition result according to the input similarity 14, and if it is not a reject, it outputs the recognition result 1B and ends the series of speech recognition processing, but if it is a reject, it outputs the recognition result 1B. without outputting recognition results,
A re-recognition instruction 15 is output to the voice section detection means 4 and the pattern matching means 6. When the voice section detection means 4 receives the re-recognition instruction 15, it changes the threshold for voice section detection, reads out the stored voice power 11 from the analysis result buffer 1 memory 3 again, and generates a new voice section. It is detected and output to the pattern matching means 6. When the pattern matching means 6 receives the re-recognition instruction 15, it reads out the characteristic pattern stored in the analysis result buffer memory 3 again according to the start/end information 12 of the new speech section, and performs the same process as the first recognition process. Standard pattern/
The standard pattern 13 is read out from the memory 5, pattern matching is performed, and a new degree of similarity 14 is output to the recognition result determination means 7. The recognition result determination means 7 determines the input new similarity degree 14, and if it is not a reject, outputs the recognition result 16, ends the re-recognition process, the recognition result becomes a reject, and the maximum number of times the re-recognition process is repeated is 1. If the maximum number of times is 2 or more, the re-recognition process is completed by outputting the recognition result 1B of ``Jet'', and if the maximum number of times is 2 or more, the re-recognition process is repeated. If the result is a reject, a recognition result 16 of reject is output, and the re-recognition process is ended.
第2図は音声パワーPの一例の波形(時間変化)図であ
る0本実施例では音声区間の検出のための音声パワーP
の閾値が第1の閾値Ptb+、第2の閾値Ptbz と
2個設定されている。閾値が第1の閾値Pth+ の場
合、音声パワーPが閾値Pth+ に一致した始端tb
+から始端継続時間Toが開始し、音声パワーPが閾値
Pth+ に再度一致した終端L6凰 に終了し、この
終端te+ から終端継続時間Teが開始する。そして
、この始端継続時間Tbと終端継続時間Teにより音声
区間の検出が行なわれる。すなわち、始端継続時間Tb
がある設定時間T1以上であればこの区間は音声区間と
見倣されるが5始端継続時間Tbがこの設定時間T、に
達しない場合には終端継続時間Teにより判断され、こ
の終端継続時間Teがある設定時間T2以上であれば前
記始端継続時間Tbの区間は音声区間でないと見倣され
、終端継続時間Teが設定時間T2に達しない場合には
この区間は前の始端継続時間Tbに加算されて次の終端
において音声区間かどうかが判定される。*1の閾値P
thsの場合には始端継続時間Tbは、設定値T、より
小さく、終端継続時間Teは設定値T2よりも大きいの
で、この始端継続時間Tbの区間は音声区間と見倣され
ない、第2の閾値Pthzの場合、始端tb2(始端t
b+ とほぼ同じ時点)から始端継続時間Tbが開始し
、途中でパワーディップ区間T−が存在するが、この区
間T−は設定時間T2よりも小さいので始端継続時間T
bに加算され、次の終端te2 まで続く、そして、こ
の始端tI12から終端te2 までの始端継続時間T
bは設定時間T、より大きく、この区間は音声区間と見
倣される。FIG. 2 is a waveform (temporal change) diagram of an example of the voice power P. In this embodiment, the voice power P for detecting a voice section is
Two threshold values are set, a first threshold value Ptb+ and a second threshold value Ptbz. When the threshold is the first threshold Pth+, the starting point tb where the audio power P matches the threshold Pth+
The starting end duration To starts from +, ends at the end L6 凰 when the audio power P matches the threshold value Pth+ again, and the ending end duration Te starts from this ending te+. Then, a voice section is detected based on the start end duration time Tb and the end end duration time Te. That is, the starting end duration Tb
If it is longer than a certain set time T1, this section is regarded as a voice section, but if the starting end duration Tb does not reach this set time T, it is judged based on the ending end duration Te, and this end end duration Te If it is longer than a certain set time T2, the section of the start end duration Tb is assumed to be not a voice section, and if the end end time Te does not reach the set time T2, this section is added to the previous start end duration Tb. Then, at the next end, it is determined whether it is a voice section or not. *1 threshold P
In the case of ths, the start duration Tb is smaller than the set value T, and the end duration Te is larger than the set value T2, so the section of the start end duration Tb is not regarded as a voice section, and the second threshold value is set. In the case of Pthz, starting point tb2 (starting point t
The start end duration time Tb starts from approximately the same time as b+), and there is a power dip section T- in the middle, but since this section T- is smaller than the set time T2, the start end duration time Tb starts.
b and continues until the next end te2, and the start end duration time T from this start end tI12 to the end end te2
b is larger than the set time T, and this section is regarded as a voice section.
従って、最初に第1の閾値Pth+を用いて音声区間検
出を行ない、認識結果がリジェクトとなっても、第2の
閾値Ptbzを用いて音声区間検出を行なうことにより
適切な音声区間が検出でき、正しい認識結果を得ること
ができる。Therefore, even if speech segment detection is first performed using the first threshold value Pth+ and the recognition result is rejected, an appropriate speech segment can be detected by performing speech segment detection using the second threshold value Ptbz. Correct recognition results can be obtained.
なお、本実施例では、説明を簡単にするために、音声パ
ワーPが閾値を越えはじめた時点、下まわりはじめた時
点を始端、終端としているが、始端、終端それぞれにハ
ング・オーバーを付加する、すなわち音声パワーが闇値
を越えはじめた時点より決められた時間だけ前の時点を
始端とし、音声パワーが閾値より下まわりはじめた時点
より決められた時間だけ後の時点を終端とする方法も有
効である。Note that in this embodiment, to simplify the explanation, the starting point and the ending point are the point in time when the audio power P starts to exceed the threshold value and the point in time when it starts to go below the threshold value, but a hangover is added to the starting point and the ending point, respectively. In other words, there is also a method in which the starting point is a predetermined time before the voice power begins to exceed the dark value, and the end point is a predetermined time after the voice power begins to fall below the threshold. It is valid.
[発明の効果]
以」二説明したように、本発明は、入力音声の分析結果
であるパワーと特徴パターンとをバッファ・メモリに記
憶しておき、音声区間の検出に用いる閾値を数種類用意
しておき、検出された音声区間の特徴パターンと標準パ
ターンとのパターン・マッチングを行ない類似度を求め
、求まった類似度を判定し認識結果がリジェクトとなっ
た場合。[Effects of the Invention] As explained below, the present invention stores the power and characteristic patterns that are the analysis results of input speech in a buffer memory, and prepares several types of threshold values for use in detecting speech sections. Then, pattern matching is performed between the characteristic pattern of the detected voice section and the standard pattern to determine the degree of similarity, and the degree of similarity determined is judged and the recognition result is rejected.
音声区間検出の閾値を変更し、/ヘソファ・メモリに記
憶されている入力音声のパワーを読出し、再度音声区間
を検出し、再度求まった音声区間の特徴パターンを同様
にバッファ拳メモリから読出しパターン・マッチング、
認識結果の判定を行なうことにより、1回の発声の入力
音声について、複り適切な音声区間検出ができる確率が
大きくなり、音声区間誤検出が原因でリジェクト、誤認
識となる確率を小さくし、再発声という話者の負担を少
なくシ、より効率のよい音声入力を可能とする効果があ
る。Change the voice interval detection threshold, read out the power of the input voice stored in the Hesofa memory, detect the voice interval again, and read out the feature pattern of the voice interval found again from the buffer memory in the same way. matching,
By judging the recognition results, the probability of correctly detecting a voice section for a single utterance of input speech is increased, and the probability of rejection or erroneous recognition due to erroneous detection of a voice section is reduced. This has the effect of reducing the burden on the speaker of re-voicing and enabling more efficient voice input.
第1図は本発明による音声認識装置の一実施例のブロッ
ク図、第2図は音声区間検出のための閾値と音声パワー
の波形の関係を示す図である。 ・l;入力音声。
2:音声分析手段。
3:分析結果バッファ拳メモリ。
4;音声区間検出手段、
5;標準パターン・メモリ、
6;パターン・マッチング子役、
7;認識結果判定手段、
8;特徴パターン、
9;音声パワー。
lO;記憶されている特徴パターン。
ll:記憶されている音声パワー。
12:音声区間の始端中終端情報。
13;標準パターン、
14:類似度、
15:再認識指示、
16;認識結果。FIG. 1 is a block diagram of an embodiment of a speech recognition device according to the present invention, and FIG. 2 is a diagram showing the relationship between a threshold value for detecting a speech section and a waveform of speech power.・l; Input audio. 2: Voice analysis means. 3: Analysis result buffer fist memory. 4; voice section detection means; 5; standard pattern memory; 6; pattern matching child actor; 7; recognition result determination means; 8; characteristic pattern; 9; voice power. lO; memorized feature pattern. ll: Memorized voice power. 12: Information on the beginning, middle, and end of the audio section. 13: Standard pattern, 14: Similarity, 15: Re-recognition instruction, 16: Recognition result.
Claims (1)
パターンが予め記憶された標準パターン・メモリと、 入力音声の音声パワーと特徴パターンを分析する音声分
析手段と、 この音声分析手段の分析結果である音声パワーと特徴パ
ターンを記憶しておく分析結果バッファ・メモリと、 分析結果バッファ・メモリに記憶されている音声パワー
を読出し、この音声パワーより音声区間を検出する音声
区間検出手段と、 この音声区間検出手段によって検出された音声区間の情
報に従って前記分析結果バッファ・メモリに記憶されて
いる音声区間の特徴パターンを読出すとともに前記標準
パターン・メモリから標準パターンを読出し、両者のパ
ターン・マッチングを行なって類似度を求めるパターン
・マッチング手段と、 前記類似度に応じて認識結果を判定する認識結果判定手
段を備え、該認識結果がリジエクトでない場合には該認
識結果を出力して認識処理を終了し、該認識結果がリジ
エクトである場合には認識結果を出力せず、前記音声区
間検出手段において音声区間を求めるために用いる閾値
を変更して前記音声区間検出手段と前記パターン・マッ
チング手段による前述の処理を繰返して新たに類似度を
求め、該類似度に応じて新たに認識結果を判定し出力す
るという再認識処理を行なうことを特徴とする音声認識
装置。 2、前記再認識処理の最大回数が設定され、前記再認識
処理を前記最大回数繰返し行なっても前記再認識処理の
認識結果がリジエクトである場合に認識結果をリジエク
トとして出力する特許請求の範囲第1項記載の音声認識
装置。[Scope of Claims] 1. A standard pattern memory in which a plurality of standard patterns are stored in advance for each word/sentence to be recognized; and a voice analysis means for analyzing the voice power and characteristic pattern of input voice; An analysis result buffer memory that stores the voice power and characteristic pattern that are the analysis results of this voice analysis means; and the voice power stored in the analysis result buffer memory is read out, and a voice section is detected from this voice power. a voice interval detection means; and according to the information of the voice interval detected by the voice interval detection means, reading out a characteristic pattern of the voice interval stored in the analysis result buffer memory and reading out a standard pattern from the standard pattern memory. , a pattern matching means for determining the degree of similarity by performing pattern matching between the two, and a recognition result determination means for determining the recognition result according to the degree of similarity, and when the recognition result is not reject, the recognition result is determined. If the recognition result is a reject, the recognition result is not output, and the threshold value used for determining the speech interval in the speech interval detection means is changed, and the recognition process is terminated. A speech recognition apparatus characterized in that a re-recognition process is performed in which the above-described process by the pattern matching means is repeated to obtain a new degree of similarity, and a new recognition result is determined and output according to the degree of similarity. 2. A maximum number of times of the re-recognition process is set, and if the recognition result of the re-recognition process is a reject even if the re-recognition process is repeated the maximum number of times, the recognition result is output as a reject. The speech recognition device according to item 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP59277403A JPS61156100A (en) | 1984-12-27 | 1984-12-27 | Voice recognition equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP59277403A JPS61156100A (en) | 1984-12-27 | 1984-12-27 | Voice recognition equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS61156100A true JPS61156100A (en) | 1986-07-15 |
Family
ID=17583057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP59277403A Pending JPS61156100A (en) | 1984-12-27 | 1984-12-27 | Voice recognition equipment |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS61156100A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002091470A (en) * | 2000-09-20 | 2002-03-27 | Fujitsu Ten Ltd | Voice section detecting device |
US6489248B2 (en) | 1999-10-06 | 2002-12-03 | Applied Materials, Inc. | Method and apparatus for etch passivating and etching a substrate |
JP2009080298A (en) * | 2007-09-26 | 2009-04-16 | Nippon Hoso Kyokai <Nhk> | Hearing aid device |
US8099277B2 (en) | 2006-09-27 | 2012-01-17 | Kabushiki Kaisha Toshiba | Speech-duration detector and computer program product therefor |
US8380500B2 (en) | 2008-04-03 | 2013-02-19 | Kabushiki Kaisha Toshiba | Apparatus, method, and computer program product for judging speech/non-speech |
-
1984
- 1984-12-27 JP JP59277403A patent/JPS61156100A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6489248B2 (en) | 1999-10-06 | 2002-12-03 | Applied Materials, Inc. | Method and apparatus for etch passivating and etching a substrate |
JP2002091470A (en) * | 2000-09-20 | 2002-03-27 | Fujitsu Ten Ltd | Voice section detecting device |
US8099277B2 (en) | 2006-09-27 | 2012-01-17 | Kabushiki Kaisha Toshiba | Speech-duration detector and computer program product therefor |
JP2009080298A (en) * | 2007-09-26 | 2009-04-16 | Nippon Hoso Kyokai <Nhk> | Hearing aid device |
US8380500B2 (en) | 2008-04-03 | 2013-02-19 | Kabushiki Kaisha Toshiba | Apparatus, method, and computer program product for judging speech/non-speech |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lamel et al. | An improved endpoint detector for isolated word recognition | |
US20130054236A1 (en) | Method for the detection of speech segments | |
Rabiner et al. | Application of dynamic time warping to connected digit recognition | |
US8315856B2 (en) | Identify features of speech based on events in a signal representing spoken sounds | |
EP1159737B9 (en) | Speaker recognition | |
JPS61156100A (en) | Voice recognition equipment | |
JP2996019B2 (en) | Voice recognition device | |
JPS6123560B2 (en) | ||
JPH01159697A (en) | Voice recognition apparatus | |
JP2975772B2 (en) | Voice recognition device | |
JP3011421B2 (en) | Voice recognition device | |
JP2000214879A (en) | Adaptation method for voice recognition device | |
JP3008593B2 (en) | Voice recognition device | |
JP3322491B2 (en) | Voice recognition device | |
JP3031081B2 (en) | Voice recognition device | |
JP3063855B2 (en) | Finding the minimum value of matching distance value in speech recognition | |
JP2666296B2 (en) | Voice recognition device | |
JPS61260299A (en) | Voice recognition equipment | |
JPS6194094A (en) | Voice recognition equipment | |
JPS6120880B2 (en) | ||
CN118366432A (en) | Voice processing method, device, equipment, medium and vehicle | |
JPS61259299A (en) | Voice pattern collation system | |
JPS59126600A (en) | Voice recognition equipment | |
JPS5926800A (en) | Voice recognition unit | |
JPH0950292A (en) | Voice recognition device |