JPH0432900A

JPH0432900A - Sound recognizing device

Info

Publication number: JPH0432900A
Application number: JP2138879A
Authority: JP
Inventors: Tetsuya Muroi; 室井　哲也
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-05-29
Filing date: 1990-05-29
Publication date: 1992-02-04

Abstract

PURPOSE:To prevent the generation of a failure in the input of the same vocalization repeated several times by reducing the values of two thresholds when the 1st candidate of the preceding vocalization coincides with that of the current vocalization. CONSTITUTION:When the 1st candidate of the current vocalization is equal to that of the preceding vocalization in the sound recognizing device provided with a collating part 4 for collating a sound pattern with a registered sound pattern and calculating the 1st candidate and its recognition reliability, a recognized result deciding part 6 and a candidate storage part 5, the threshold of recognition decision is reduced to decide a recognized result. Consequently, an effective recognized result for a revocalized input can be obtained.

Description

【発明の詳細な説明】夜虚芳Ｍ− 本発明は、音声認識装置、より詳細には、音声認識装置
における認識結果の判定方式に関する。DETAILED DESCRIPTION OF THE INVENTION Yokoyoshi M - The present invention relates to a speech recognition device, and more particularly, to a method for determining recognition results in a speech recognition device.

史米投佐音声認識は１機械、装置の音声による制御や音声による
データ入力等に広く利用されている。この場合、音声の
誤認識は制御入力の誤りやデータ入力の誤りとなって現
れるので、その影響は極めて重大である。特に、音声入
力によって機械の動作を指示する場合には、誤認識によ
る異常動作を防止しなければならない。このような場合
、従来、第１位の候補の認識結果の動作指示内容が危険
である場合には、第１位以外の上位候補の指示内容によ
って、無害な指示内容を持つ候補をＬ＆識結果としたり
、音声入力を無効にして１機械の異常動作を防いだり、
或いは、第１位の候補をそのまま認識結果として出力し
、機械の動作を指示すると、異常動作を行なう可能性が
ある場合には、第１位の候補の認識信頼度に応じて、使
用者に確認を求めたり、第１位の候補を無効にしたりし
ていた。Voice recognition is widely used for voice control of machines and devices, voice data input, etc. In this case, erroneous voice recognition manifests itself as an error in control input or data input, and the effects thereof are extremely serious. In particular, when instructing the operation of a machine by voice input, it is necessary to prevent abnormal operation due to erroneous recognition. In such cases, conventionally, if the action instruction content of the recognition result of the first candidate is dangerous, the candidate with harmless instruction content is L&referenced based on the instruction content of the higher ranking candidate other than the first candidate. or disable voice input to prevent abnormal operation of one machine,
Alternatively, if the first candidate is output as a recognition result and the machine is instructed to operate, if there is a possibility that the machine will perform an abnormal operation, the user may be given instructions based on the recognition reliability of the first candidate. They asked for confirmation and invalidated the number one candidate.

而して、上述のごとき従来技術では、入力音声を照合し
て、第１位の候補が正しい結果であっても認識信頼度が
第２の閾値より小さい場合には、認識結果は無効となり
、使用者は、同じ発声を繰り返すことになる。Therefore, in the conventional technology as described above, even if the input speech is compared and the first candidate is the correct result, if the recognition reliability is smaller than the second threshold, the recognition result is invalid. The user will repeat the same utterance.

このとき、認識信頼度の低い原因が、突発的な雑音や一
時的な声の変調（せき込み、声の裏返えりなど）の場合
には、言い直しによって、２回目の発声では有効な認識
結果が得られる可能性が高い。しかし、定常的な周囲の
騒音や、音声の経時変化などが原因の場合には、何回同
じ発声を繰り返しても有効な認識結果は得られない。At this time, if the cause of the low recognition reliability is sudden noise or temporary voice modulation (coughing, voice overturning, etc.), the second utterance may produce an effective recognition result. is likely to be obtained. However, if the cause is constant ambient noise or changes in voice over time, effective recognition results will not be obtained no matter how many times the same utterance is repeated.

豆−一敗本発明は、上述のごとき実情に鑑みてなされたもので、
特に、機械の異常動作を防止しつつ、言い直しによる入
力に対しては、有効な認識結果が得られるような音声認
識装置を提供することを目的としてなされたものである
。The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a speech recognition device that can obtain effective recognition results for inputs that are reworded while preventing abnormal machine operations.

豊−一戒本発明は、上記の目的を達成するために、入力された音
声パターンを登録された音声パターンと照合して第１位
の候補と該候補の認識信頼度を計算する照合部と、該認
識信頼度が第１の閾値より大きい場合には、第１位の候
補を認識結果とし、該認識信頼度が第２の閾値より大き
く、第１の閾値より小さい場合には、第１位の候補の確
認を促す表示を行ない、該認識信頼度が第２の閾値より
小さい場合には、第１位の候補を認識結果としない認識
結果判定部と、前発声の第１位の候補を記憶する候補記
憶部とを具備する音声認識装置において、現発声の第１
位の候補が前発声の第１位の候補と等しい場合には、前
記第１、第２の閾値の片方もしくは双方を下げて認識結
果を判定することを特徴としたものである。以下、本発
明の実施例に基づいて説明する。In order to achieve the above object, the present invention includes a collation unit that collates an input voice pattern with registered voice patterns and calculates the first candidate and the recognition reliability of the candidate. , when the recognition reliability is larger than the first threshold, the first candidate is set as the recognition result, and when the recognition reliability is larger than the second threshold and smaller than the first threshold, the first candidate is set as the recognition result. a recognition result determination unit that displays a display prompting confirmation of the first-ranked candidate and does not select the first-ranked candidate as a recognition result if the recognition reliability is smaller than a second threshold; a candidate storage section for storing the first candidate memory of the current utterance;
If the candidate for the first position is equal to the candidate for the first position of the previous utterance, one or both of the first and second thresholds are lowered to determine the recognition result. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
、図中、１は音声入力部、２は音声パターン変換部、３
は標準パターン格納部、４は照合部、５は候補記憶部、
６は認識結果判定部で、マイクなどの音声入力部１から
入力された音声信号は、音声パターン変換部２で音声パ
ターンに変換される。音声認識に有効な変換方法として
は様々なものが知られているが、例えば、フレーム周期
１０ｍ５で１５チヤンネルのバンドパスフィルタ群の出
力を取り出したものを用いれば良い。標準パターン格納
部３には、予め、認識対象語いを発声したものを音声パ
ターンに変換し、標準パターンとして格納しておく。照
合部４では、入力音声パターンＸと標準パターンＹ１、
Ｙ３、・・・ＹＭ（Ｍは。FIG. 1 is a block diagram for explaining one embodiment of the present invention, in which 1 is a voice input section, 2 is a voice pattern conversion section, and 3 is a block diagram for explaining an embodiment of the present invention.
is a standard pattern storage section, 4 is a matching section, 5 is a candidate storage section,
Reference numeral 6 denotes a recognition result determination section, in which a voice signal inputted from a voice input section 1 such as a microphone is converted into a voice pattern by a voice pattern conversion section 2 . Various conversion methods are known that are effective for speech recognition, and for example, one may be used that extracts the outputs of a bandpass filter group of 15 channels with a frame period of 10 m5. In the standard pattern storage unit 3, the speech of the recognition target word is converted into a voice pattern and stored as a standard pattern in advance. In the matching unit 4, the input voice pattern X and the standard pattern Y1,
Y3,...YM (M is.

登録しである標準パターン数）を照合する。照合の方法
は、例えば、ＤＰマツチングを用いれば良い。例えば、
全ての標準パターンの中でＹｊが最もＸと距離が小さか
ったとすれば、Ｙｊに対応する単語名やコマンド名など
が第１位の候補となる。(Number of registered standard patterns). For example, DP matching may be used as the matching method. for example,
If Yj has the smallest distance from X among all standard patterns, then the word name, command name, etc. corresponding to Yj will be the first candidate.

この距離をＤｌとすると、認識信頼度りを１／Ｄ１とす
れば良い。もしくは、単語ごとに設定した正規化係数Ｚ
ｊを用いてＬ＝Ｚｊ／Ｄ□とすれば良い。If this distance is Dl, the recognition reliability may be set to 1/D1. Or normalization coefficient Z set for each word
Using j, it is sufficient to set L=Zj/D□.

第２図は、認識結果判定部６の動作説明をするためのフ
ローチャートで、該認識結果判定部６では、各標準パタ
ーンごと、もしくは単語、コマンドごとに設定された２
種類の閾値Ｔ　ｉＪ　ｔ　Ｔ　ｚ　ｊを用いて、Ｌ＞Ｔ
工ｊの場合には、第１位の候補を認識結果として出力す
る（ただし、第２図のフローチャートにおいて、Ｌ：認
識信頼度、ｊ：第１位の候補、ｊ−ｏｌｄ　：前回の第
１位の候補、Ｔｘｊ＋　Ｔ２：Ｊ　：単語ごとに設定さ
れた閾値、α□β：正の定数である）　６Ｔ２ｊ＜Ｌ＜
ＴユＪの場合には、使用者に、第１位の候補を認識結果
として出力して良いかどうかの確認を促す表示を行ない
、出力と−して良いという指示があった場合のみ、第１
位の候補を認識結果として出力する。指示の方法として
は、はい／いいえを音声で入力する方法や３秒間のうち
に中止ボタンが押されなかった場合はＯＫの指示が出さ
れたと見なすなどの方法を用いれば良い。FIG. 2 is a flowchart for explaining the operation of the recognition result determination section 6. In the recognition result determination section 6, two
Using the threshold value T iJ t T z j of the type, L>T
In the case of process j, the first candidate is output as the recognition result (however, in the flowchart in Fig. 2, L: recognition reliability, j: first candidate, j-old: previous first candidate). candidate, Txj+ T2: J: Threshold value set for each word, α□β: Positive constant) 6T2j<L<
In the case of TyuJ, a display prompting the user to confirm whether or not to output the first candidate as a recognition result is displayed, and the first candidate is displayed only if the user is instructed to output the first candidate as a recognition result. 1
The position candidates are output as recognition results. As a method of giving the instruction, a method of inputting yes/no by voice or a method of assuming that an OK instruction has been given if the cancel button is not pressed within 3 seconds may be used.

Ｌ　＜　Ｔ　２　Ｊの場合には、第１位の候補を拒否し
、認識結果を無効にして、次の音声の入方待ちの状態へ
入る。候補記憶部５では、第１位の候補を記憶し、次の
音声を照合した際、第１位の候補がＹｊに対応する単語
名やコマンド名が候補記憶部５で記憶されたものと等し
い場合にはＴｌｊ、Ｔ２ｊを下げて、認識結果判定部６
での処理を行なう。In the case of L < T 2 J, the first candidate is rejected, the recognition result is invalidated, and a state of waiting for the next voice is entered. The candidate storage unit 5 stores the first candidate, and when the next voice is compared, the word name or command name corresponding to Yj in the first candidate is the same as the one stored in the candidate storage unit 5. In this case, Tlj and T2j are lowered and the recognition result determination unit 6
Processing is performed.

Ｔ工Ｊ＋Ｔｚｊの変更は、Ｔ工ｊ＝Ｔ１ｊ−αＬ’　　　　　　　　　　　　　（
１）Ｔ　２　：Ｊ　＝　Ｔ　２　ｊ−βＬ’　　　　　
　　　　　　　　（２）とすれば良い、ここでα、βは
正の定数、Ｌ′は前回の認識信頼度である。The change of T-work J + Tzz is as follows: T-work j = T1j - αL' (
1) T2:J = T2j-βL'
(2), where α and β are positive constants, and L' is the previous recognition reliability.

羞−一来以上の説明から明らかなように、本発明によると、前回
の第１位の候補と、今回の第１位の候補が等しい場合に
は、２つの閾値Ｔ　１ｊｔ　Ｔ　２　ｊの値を小さくし
ている。このため、前回と今回の第１位の候補が等しく
、この候補が正しい認識結果である可能性が高い場合は
、今回の認識信頼度が小さい場合でもそのまま、この候
補が出力される確率が大きくなり、何回も同じ発声を繰
り返しても入力できないという欠点が解消できる。As is clear from the above explanation, according to the present invention, when the previous first-place candidate and the current first-place candidate are equal, the values of the two thresholds T 1jt T 2 j is made smaller. Therefore, if the first-ranked candidates from last time and this time are the same and there is a high possibility that this candidate is the correct recognition result, there is a high probability that this candidate will be output as is even if the current recognition reliability is low. This solves the problem of not being able to input even if you repeat the same utterance many times.

[Brief explanation of drawings]

第１図は、本発明による音声認識装置の一実施例を説明
するためのブロック図、第２図は、第１図に示した認識
結果判定部の動作説明をするためのフローチャートであ
る。１・・・音声入力部、２・・・音声パターン変換部、３
標準パタ一ン格納部、４・・・照合部、訃・・候補記憶
部、６・・・認識結果判定部。第２図第１図Ｔ２Ｊ＝Ｔ１ｊ−βＴ２コ認識結果認識結果無効FIG. 1 is a block diagram for explaining an embodiment of the speech recognition apparatus according to the present invention, and FIG. 2 is a flowchart for explaining the operation of the recognition result determination section shown in FIG. 1. 1... Audio input section, 2... Audio pattern conversion section, 3
Standard pattern storage section, 4... Collation section, Death... Candidate storage section, 6. Recognition result determination section. Figure 2 Figure 1 T2J=T1j-βT2 recognition result recognition result invalid

Claims

[Claims]

1. A collation unit that collates the input voice pattern with the registered voice pattern and calculates the first candidate and the recognition reliability of the candidate, and if the recognition reliability is greater than the first threshold, , the first candidate is the recognition result, and if the recognition reliability is larger than the second threshold and smaller than the first threshold,
A recognition result determination unit that displays a display prompting confirmation of the first-ranked candidate and does not select the first-ranked candidate as a recognition result if the recognition reliability is smaller than a second threshold; In the speech recognition device, when the first candidate of the current utterance is equal to the first candidate of the previous utterance, one of the first and second thresholds is Alternatively, a speech recognition device is characterized in that the recognition result is determined by lowering both.