JPH075890A

JPH075890A - Voice interactive device

Info

Publication number: JPH075890A
Application number: JP5144739A
Authority: JP
Inventors: Hiroyuki Nishi; 宏之西; Mikio Kitai; 幹雄北井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-06-16
Filing date: 1993-06-16
Publication date: 1995-01-10

Abstract

PURPOSE:To provide a voice interactive device which reduces user's average number of uttering. CONSTITUTION:This device which decides a correct candidate work by the voice input by a user who responds to a candidate word by a candidate word presenting part 8 is provided with a word-oriented likelihood-correct rate corresponding data storage part 6 which stores the data indicating the relationship between the likelihood and correctness rate for every candidate word and a correctness rate threshold value storage part 7 which stores the threshold value of the correctness rate that is used to judge whether to confirm the correctness of candidate words or to instruct of the pronunciation of a word again to obtain the correctness rate of an objective candidate word from the likelihood of the candidate words obtained from a likelihood computing part 2 and the data stored in the part 6. Also, the device is provided with a confirmation control part 5 which compares the correctness rate and the threshold values stored in the section 7, controls the part 8 when the correctness rate is larger than the threshold value, and controls a repronunciation instruction part 9 if the correctness rate is smaller than the threshold value.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声対話装置に関
し、特に、音声対話装置の利用者が発声した単語音声の
認識結果の正否を合理的且つ高精度に確認する音声対話
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice interactive apparatus, and more particularly to a voice interactive apparatus for rationally and accurately confirming the correctness of a recognition result of a word voice uttered by a user of the voice interactive apparatus.

【０００２】[0002]

【従来の技術】従来例を図１を参照して説明する。音声
対話装置においては、入力された単語音声と、予めメモ
リに記憶登録されている単語の全てとの間の尤度を計算
し、図１に示される如く、尤度の大きい順に、例えば最
も尤度の大きい「伊藤」から「佐藤」、「加藤」という
順に利用者に候補単語を提示して確認を行うことが多
い。また、確認を行なおうとする単語の尤度が極めて小
さい場合は、正解である確率が低いものと判断して、確
認処理を行なわずに、図２に示される如く「モウ一度御
名前ヲドウゾ」という様に再発声を指示する手法が一般
的に採用されている。2. Description of the Related Art A conventional example will be described with reference to FIG. In the voice interaction device, the likelihood between the input word voice and all the words stored and registered in the memory in advance is calculated, and as shown in FIG. In many cases, the candidate words are presented to the user for confirmation in the order of “Ito”, which has the highest frequency, from “Sato” to “Kato”. If the likelihood of the word to be confirmed is extremely small, it is determined that the probability of being the correct answer is low, and the confirmation process is not performed, as shown in FIG. The method of instructing recurrent voice is generally adopted.

【０００３】従来、この様な確認対話処理を実施するに
際して、確認を行なうべき単語に関して、図３に示され
る如く、尤度について或る一定の閾値を設定して単語の
候補順位を決定するための尤度がこの閾値より大きいか
否かにより確認処理を選択すべきか或は再発声指示を選
択すべきかを判断している。尤度について或る一定の閾
値を設定するということは、確認処理を選択すべきか或
は再発声指示を選択すべきかを判断処理するに際して、
発声単語、候補単語に依らずに常に一定の閾値が使用さ
れるということである。ところが、単語によっては認識
尤度が高くしかも他の単語との間の尤度差が大きく出る
単語がある一方、正確な発声をすることが比較的に困難
であることその他の理由から認識尤度が低く他の単語と
の間の尤度差が余り出ない単語もある。Conventionally, in carrying out such confirmation dialogue processing, as to a word to be confirmed, as shown in FIG. 3, a certain threshold value is set for the likelihood so as to determine the word candidate rank. It is determined whether the confirmation processing should be selected or the re-voice instruction should be selected depending on whether or not the likelihood of is larger than this threshold value. Setting a certain threshold value for the likelihood means that when the confirmation process or the re-voice instruction is selected,
This means that a fixed threshold is always used regardless of the uttered word and the candidate word. However, while some words have a high recognition likelihood and a large likelihood difference from other words, it is relatively difficult to make accurate utterances. There are some words that have a low denominator and have little likelihood difference with other words.

【０００４】ここで、尤度が統計的に高い単語について
は、尤度が小さく出たということは発声を誤った公算が
大きいところから、確認処理を選択するよりも再発声指
示を選択する割合を大とするために、閾値を相対的に高
く設定して方が効率的であると考えられる。逆に、尤度
が統計的に低い単語については、これは本来正確な発声
をすることが比較的に困難な単語であるところから、尤
度が小さく出たとはいえ、発声を誤ったというよりもむ
しろ本来的に尤度が小さいという単語固有の特性である
公算が大きいので、閾値を相対的に低く設定して再発声
指示を選択するよりも確認処理を選択する割合を大とす
る方が効率的であると考えられる。Here, for a word with a statistically high likelihood, the likelihood that the likelihood is small means that the likelihood is wrong, so that the re-voice instruction is selected rather than the confirmation process. It is considered more efficient to set the threshold value relatively high in order to make the value large. On the other hand, for a word with a statistically low likelihood, it is relatively difficult to make a correct utterance, so although the likelihood is small, it is more likely that the utterance is wrong. Since it is more likely that this is a characteristic peculiar to the word that the likelihood is inherently small, it is better to select a confirmation process rather than set a relatively low threshold and select a re-voice instruction. Considered to be efficient.

【０００５】[0005]

【発明が解決しようとする課題】しかし、上述の判断処
理においては単語固有の尤度については何等の考慮をも
払わず、一定閾値を基準として判断処理を実施してい
る。従って、統計的に尤度が高い単語については、閾値
を高く設定した方が効率的であると考えられるにも関わ
らずに低く設定される結果となって、本来再発声を指示
する方が効率的であるにも関わらずに確認処理を継続す
ることとなる。一方、統計的に尤度が低い単語について
は、閾値を低く設定した方が効率的であると考えられる
にも関わらずに高く設定される結果となって、確認処理
を継続すべき時に再発声を指示することとなる。However, in the above-mentioned judgment processing, no consideration is given to the likelihood peculiar to a word, and the judgment processing is carried out based on a fixed threshold value. Therefore, for words with a statistically high likelihood, it is more efficient to set a high threshold value, although it is considered to be more efficient. The confirmation process will continue regardless of the target. On the other hand, for words with a statistically low likelihood, a low threshold is considered to be more efficient, but the result is set to a high value, and when the confirmation process should be continued Will be instructed.

【０００６】この発明は、上述の通りの問題を解消した
音声対話装置を提供するものである。The present invention provides a voice interactive apparatus which solves the above problems.

【０００７】[0007]

【課題を解決するための手段】入力される音声信号を分
析し、その特徴パラメータを計算する音声分析部１を具
備し、予め登録された複数の単語の特徴パラメータを格
納する登録単語パラメータ記憶部３を具備し、音声分析
部１から得られる入力音声の特徴パラメータと登録単語
パラメータ記憶部３に記憶された登録単語の特徴パラメ
ータ或は各単語を隠れマルコフモデルと仮定し特徴パラ
メータを統計処理して得られる状態遷移確率およびシン
ボル出力確率とから入力音声の各登録単語に対する尤度
を計算する尤度計算部２を具備し、候補単語番号とその
候補単語の尤度とを尤度が高い順に出力する候補単語出
力部４を具備し、候補単語出力部４から出力される候補
単語を尤度順に利用者に提示する候補単語提示部８を具
備し、単語の発声を再度行うことを利用者に指示する再
発声指示部９を具備し、候補単語提示部８から提示され
る候補単語に応答する利用者の音声入力により正解候補
単語を決定する音声対話装置において、候補単語毎にそ
の尤度と正解率との間の対応関係を示すデータを格納す
る単語別尤度−正解率対応データ格納部６を具備し、候
補単語の正誤を確認するか或は単語の発声を再度指示す
るかを判断する正解率の閾値を格納した正解率閾値格納
部７を具備し、尤度計算部２から得られる候補単語の尤
度と単語別尤度−正解率対応データ格納部６のデータと
から対象候補単語の正解率を求め、この正解率と正解率
閾値格納部７に格納される閾値とを比較し、閾値より正
解率の方が大きい場合は候補単語提示提示部８を駆動
し、閾値より正解率の方が小さい場合は再発声指示部９
を駆動する確認制御部５を具備する、音声対話装置を構
成した。A registered word parameter storage unit that includes a voice analysis unit 1 that analyzes an input voice signal and calculates a characteristic parameter thereof, and stores the characteristic parameters of a plurality of preregistered words. 3, the input voice feature parameters obtained from the voice analysis unit 1 and the registered word feature parameters stored in the registered word parameter storage unit 3 or each word is assumed to be a hidden Markov model, and the feature parameters are statistically processed. A likelihood calculation unit 2 that calculates the likelihood for each registered word of the input speech from the state transition probability and the symbol output probability obtained by the above is provided, and the candidate word number and the likelihood of the candidate word are arranged in descending order of likelihood. A candidate word output unit 4 for outputting and a candidate word presentation unit 8 for presenting the candidate words output from the candidate word output unit 4 to the user in the order of likelihood are provided. In the voice dialogue device, which is provided with a re-voiced voice instruction unit 9 for instructing the user to perform again, a correct dialogue candidate word is determined by the user's voice input in response to the candidate word presented by the candidate word presentation unit 8. A word-by-word likelihood-correctness rate correspondence data storage unit 6 that stores data indicating a correspondence relationship between the likelihood and the correctness rate for each word is provided, and the correctness or incorrectness of the candidate word is confirmed or the word is uttered. The correctness rate threshold value storage unit 7 stores a threshold value of the correctness rate for determining whether to instruct again, and the likelihood of the candidate word obtained from the likelihood calculation unit 2 and the word-specific likelihood-correctness rate correspondence data storage unit. The correct answer rate of the target candidate word is obtained from the data of No. 6, the correct answer rate is compared with the threshold value stored in the correct answer rate threshold storage unit 7, and if the correct answer rate is larger than the threshold value, the candidate word presentation / presentation unit 8 Drive and the accuracy rate is smaller than the threshold Re-utterance instruction unit 9
A voice interaction device including a confirmation control unit 5 for driving the device is constructed.

【０００８】[0008]

【実施例】この発明の実施例を説明する。この発明は、
音声対話装置において従来から採用されている尤度の代
わりに、処理対象の候補単語の正解率を尤度から計算し
直し、計算された正解率に基づいて確認処理或は再発声
指示の選択判断を行なうものである。この正解率は、後
で詳しく説明されるが、候補単語毎に予め求められた尤
度の分布の特性を表す計算式に尤度を代入して求められ
るものである。Embodiments of the present invention will be described. This invention
Instead of the likelihood that has been conventionally adopted in a voice dialog device, the correct answer rate of the candidate word to be processed is recalculated from the likelihood, and the confirmation processing or the selection judgment of the re-voice instruction is made based on the calculated correct answer rate. Is to do. As will be described in detail later, this correct answer rate is obtained by substituting the likelihood into a calculation formula representing the characteristics of the likelihood distribution obtained in advance for each candidate word.

【０００９】この様にして求められた正解率は、対応す
る候補単語単独の尤度ではなくして他の単語との間の識
別率をも考慮したものとなされている。その結果、候補
単語により尤度分布が大きく異なる場合においても、共
通の正解率に基づいて確認と再発声の判断を行なうこと
ができる。正解率に基づいて確認と再発声の判断処理を
実施することにより、単語固有の尤度については何等の
考慮をも払わずに一定閾値を基準として判断処理を実施
する従来の音声対話装置の上述の欠点を解消することが
できる。The correct answer rate thus obtained takes into consideration not only the likelihood of the corresponding candidate word alone, but also the identification rate with other words. As a result, even when the likelihood distribution is greatly different depending on the candidate word, the confirmation and the re-emergence voice can be determined based on the common correct answer rate. By executing the confirmation and re-evaluation judgment processing based on the correct answer rate, the above-mentioned conventional speech dialogue apparatus that executes judgment processing based on a certain threshold value without giving any consideration to word-specific likelihood. The drawbacks of can be eliminated.

【００１０】ここで、正解率について説明する。先ず、
第ｉ位に単語番号ｋ（ｋ＝１、２、…Ｎ、但し、Ｎは認
識対象単語数）の候補単語Ｏ_kが尤度Ｌで観測された場
合、観測結果が正解である確率Ｐ_ikLを認識結果の尤度
の頻度分布を用いて表現する。なお、発声さたれ真の単
語をｗ_j（ｊ＝１、２、…Ｎ、但し、ｊは単語番号）、
発声単語の発生頻度（事前確率）をＰ（ｗ_j）とする。Here, the accuracy rate will be described. First,
When a candidate word O _k having a word number k (k = 1, 2, ... N, where N is the number of recognition target words) is observed at the i-th place with the likelihood L, the probability P _ikL that the observation result is the correct answer. Is expressed using the frequency distribution of the likelihood of the recognition result. In addition, the uttered true word is represented by w _j (j = 1, 2, ... N, where j is a word number),
The occurrence frequency (prior probability) of the spoken word is P (w _j ).

【００１１】発生単語別出現尤度分布、即ち単語ｗ_jが
発声された時、第ｉ位に候補単語Ｏ_kが尤度Ｌで出現す
る確率分布Ｄ_j,ikLは予め得られているものとする。Ｄ_j,ikL＝Ｐ（ｉ,Ｏ_k,L｜ｗ_j）但し、ｉ、ｊ、ｋ＝１、２、・・・Ｎ（1) Ｄ_j,ikLはシステム運用前の評価段階で求めることがで
きる。一方、実際の認識処理時にシステムの入手するこ
とができるデータは認識処理部から受けとる候補単語列
Ｏ_kおよび各ｋに対応する尤度Ｌの一覧である。It is assumed that the occurrence likelihood distribution for each generated word, that is, the probability distribution D _{j, ikL} in which the candidate word O _k appears at the i-th place with the likelihood L when the word w _j is uttered is obtained in advance. To do. D _{j, ikL} = P (i, O _{k, L} | w _j ), where i, j, k = 1, 2, ... N (1) D _{j, ikL} should be obtained at the evaluation stage before system operation. You can On the other hand, the data that can be obtained by the system during the actual recognition processing is a list of candidate word strings O _k received from the recognition processing unit and the likelihood L corresponding to each k.

【００１２】従って、システムが入手することができる
情報、即ち、図４に示される様な単語番号ｋと尤度Ｌの
一覧から真の単語がｗj である確率Ｐ_ikL,jを計算する
ことができれば、Ｐ_ikL,j=kによって、観測結果が正解
である確率（正解率）Ｐ_ikLが得られ、システムの制
御、即ち、確認処理或は再発声処理の何れを選択すべき
かの判断をすることができる。Ｐ_ikL,jは、Ｐ_ikL,j＝Ｐ（ｗ_j｜ｉ，Ｏ_k，Ｌ）（２）の意味であるが、ベーズの定理から、下記の様に表現す
ることができる。Therefore, the probability P _{ikL, j} that the true word is wj can be calculated from the information available to the system, that is, the list of the word numbers k and the likelihoods L as shown in FIG. If possible, P _{ikL, j = k gives the} probability that the observation result is correct (correct answer rate) P _ikL , and determines whether to control the system, that is, whether confirmation processing or re-voice processing is to be selected. be able to. P _{ikL, j} has a _meaning of P _{ikL, j} = P (w _j | i, O _k , L) (2), but can be expressed as follows from the Bethe's theorem.

【００１３】[0013]

【数１】 [Equation 1]

【００１４】上式の分母を条件と条件付き確率の積和に
置き換え、更に（１）式を代入すると、If the denominator of the above equation is replaced with the sum of products of the condition and the conditional probability and the equation (1) is substituted,

【００１５】[0015]

【数２】 [Equation 2]

【００１６】となり、正解率Ｐ_ikLは、 _Therefore , the correct answer rate P _ikL is

【００１７】[0017]

【数３】 [Equation 3]

【００１８】で表され、予め得られている確率分布Ｄ
_j,ikLから候補の正解率を求めることができる。この発
明の方式は分布Ｄ_j,ikLが予めシステムの設計評価段階
で求められていることが前提となっている。正解候補に
ついては比較的短期間に分布を収集することができる
が、不正解候補、特に順位の低い候補の尤度の分布を求
めるためには大量の評価実験を行なう必要があり、上記
前提は必ずしも現実的でない。そこで、システムの運用
の初期段階においては次に述べる方法で初期分布を求
め、運用開始以後に十分なデータが蓄積された段階で、
分布を入れ換える方法が現実的かつ問題点が少ないと考
えられる。初期分布を決定するための条件として「認識
候補の正解率は、尤度のみで決定され、順位によらな
い。」という条件を設定する。即ち、Ｐ_ikL,j＝Ｐ_kL,j＝Ｐ（ｗ_j｜Ｏ_k，Ｌ）（６）なる仮定を初期分布の設定のために導入する。The probability distribution D represented in advance and obtained in advance
The correct answer rate of the candidate can be _obtained from _{j, ikL} . The method of the present invention is _{premised on} that the distribution D _{j, ikL} is obtained in advance at the system design evaluation stage. Although it is possible to collect distributions of correct answer candidates in a relatively short period of time, it is necessary to perform a large number of evaluation experiments in order to obtain likelihood distributions of incorrect answer candidates, especially candidates with a low rank. Not necessarily realistic. Therefore, at the initial stage of system operation, the initial distribution is obtained by the method described below, and when sufficient data has been accumulated since the start of operation,
It is considered that the method of exchanging the distribution is realistic and has few problems. As a condition for determining the initial distribution, the condition "the correct answer rate of the recognition candidate is determined only by the likelihood and does not depend on the rank." That is, the _assumption P _{ikL, j} = P _{kL, j} = P (w _j | O _k , L) (6) is introduced for setting the initial distribution.

【００１９】これにより、観測単語別出現分布は下記の
様に簡略化される。Ｄ_j,kL＝Ｐ（Ｏ_k，Ｌ｜ｗ_j）ｊ＝１，２，…Ｎ（７）この場合、一回の発声で全候補単語の尤度が得られるの
で、一単語当り数十程度の発声で初期データとして十分
なデータを獲得することができる。上述の仮定のもとに
正解率Ｐ_kLを再度求めると、下記の様になる。As a result, the distribution of appearances by observed word is simplified as follows. D _{j, kL} = P (O _k , L | w _j ) j = 1,2, ... N (7) In this case, the likelihood of all the candidate words can be obtained with one utterance, and therefore several tens per word can be obtained. Sufficient vocalization can acquire sufficient data as initial data. When the correct answer rate P _kL is obtained again based on the above assumption, the result is as _follows .

【００２０】[0020]

【数４】 [Equation 4]

【００２１】この正解率Ｐ_kLの分子であるＤ_k,kLＰ（ｗ
_k）は、単語ｗ_kが発声された時に自身である単語kが尤
度Ｌで出現する確率を意味する。そして、分母は、単語
ｗ_k自身をも含めた単語ｗ_lが発声された時に単語kが尤
度Ｌで出現する確率をすべて総和したものである。従っ
て、単語ｗ_k自身以外の単語ｗ_lが発声された時に単語k
が尤度Ｌで出現する確率が考慮されたことになる。この
様な考慮のなされた共通の正解率Ｐ_kLに基づいて確認と
再発声の判断を実施することにより、結局単語固有の尤
度に対応した判断が実施されたことになり、単語固有の
尤度については何等の考慮をも払わずに一定閾値を基準
として判断処理を実施する従来の音声対話装置の欠点を
解消することができる。The numerator of this correct answer rate P _kL is D _{k, kL} P (w
_k), the word k is its own when the word w _k is utterance means the probability of occurrence in the likelihood L. The denominator is the sum of all the probabilities that the word k appears with the likelihood L when the word w _l including the word w _k itself is uttered. Therefore, when a word w _l other than the word w _k itself is uttered, the word k
That is, the probability of appearing at the likelihood L is taken into consideration. By carrying out the confirmation and the judgment of the re-voiced voice based on the common correct answer rate P _kL thus considered, the judgment corresponding to the likelihood peculiar to the word is finally carried out, and the likelihood peculiar to the word is peculiar. It is possible to eliminate the drawbacks of the conventional voice dialog device in which the judgment processing is performed based on a certain threshold value without giving any consideration to the degree.

【００２２】この発明の実施例を図４、図５および図６
を参照して具体的に説明する。先ず、音声対話装置から
例えば「＋＋＋をどうぞ」という様にプロンプト音声出
力がなされる。利用者はこれに対応して例えば「ＯＯＯ
をお願いします」と発声する。この発声に対応する音声
信号は、音声分析部１において音声のアナログディジタ
ル変換処理、特徴パラメータ抽出処理が施される。特徴
パラメータとしては例えば登録単語のケプストラム系
列、デルタケプストラム系列、パワー系列、或はこれら
パラメータをベクトル量子化（ＶＱ）したベクトル量子
化系列が一般的に採用される。この発明の音声対話装置
においては、これらの内の何れの特徴パラメータをも適
用することができる。抽出された特徴パラメータは、尤
度計算部２に送られる。尤度計算部２は、音声分析部１
において抽出処理された入力音声の特徴パラメータと、
登録単語パラメータ記憶部３に予め記憶される複数の登
録単語の特徴パラメータ或は各単語を隠れマルコフモデ
ル（ＨＭＭ）と仮定し特徴パラメータを統計処理して得
られる状態遷移確率およびシンボル出力確率とから、入
力された音声と各登録単語との間の尤度を計算する。計
算結果の尤度は候補単語出力部４に送られ、ここにおい
て図４に示される如く尤度の高い順に候補単語番号およ
び尤度が並べ替えられる。Embodiments of the present invention are shown in FIGS. 4, 5 and 6.
It will be specifically described with reference to. First, a prompt voice output is made from the voice interaction device, for example, "Please have ++". The user responds to this by, for example, “OOO
Please say. " The voice signal corresponding to this utterance is subjected to voice analog-digital conversion processing and feature parameter extraction processing in the voice analysis unit 1. As the characteristic parameter, for example, a cepstrum sequence of a registered word, a delta cepstrum sequence, a power sequence, or a vector quantization sequence obtained by vector quantization (VQ) of these parameters is generally adopted. In the voice interaction device of the present invention, any of these characteristic parameters can be applied. The extracted feature parameter is sent to the likelihood calculation unit 2. Likelihood calculation section 2 includes speech analysis section 1
A characteristic parameter of the input voice extracted in
From the state transition probabilities and the symbol output probabilities obtained by statistically processing the characteristic parameters of a plurality of registered words or each word stored in the registered word parameter storage unit 3 or assuming each word as a hidden Markov model (HMM). , Calculate the likelihood between the input speech and each registered word. The likelihoods of the calculation results are sent to the candidate word output unit 4, where the candidate word numbers and the likelihoods are rearranged in descending order of likelihood as shown in FIG.

【００２３】ここで、先ず、尤度第１位の候補単語につ
いて、その尤度と単語別尤度−正解率対応データ格納部
６のデータとから対象候補単語の正解率を求める。次い
で、求められた正解率と正解率閾値格納部７に格納され
る閾値とを比較して、正解率の方が大きい場合はその候
補単語の単語番号を候補単語提示部８に渡す。候補単語
提示部８は該当する候補単語が正しいか否かを「はい」
或は「いいえ」と答える様に利用者に質問する。例：
「○○ですか？」。First, with respect to the candidate word having the first highest likelihood, the correct answer rate of the target candidate word is obtained from the likelihood and the data in the word-wise likelihood-correct answer rate correspondence data storage unit 6. Next, the obtained correct answer rate is compared with the threshold value stored in the correct answer rate threshold storage unit 7, and if the correct answer rate is higher, the word number of the candidate word is passed to the candidate word presenting unit 8. The candidate word presenting section 8 asks whether the corresponding candidate word is correct or not.
Or ask the user to answer "No". Example:
"Is it XX?"

【００２４】利用者の応答音声を認識した結果「はい」
ならば、尤度第１位の候補単語が正解単語であり、「い
いえ」ならば、尤度第２位の候補について同様の処理を
繰り返す。正解率の方が小さい場合は再発声指示部９を
制御して「もういちどどうぞ」その他単語の発声を再度
行うことを利用者に指示する。"Yes" as a result of recognizing the response voice of the user
If so, the candidate word with the first highest likelihood is the correct word, and if “no”, the same process is repeated for the candidate with the second highest likelihood. If the correct answer rate is smaller, the re-voice instruction unit 9 is controlled to instruct the user to utter another word "Please try again".

【００２５】[0025]

【発明の効果】以上の通りであって、この発明の対話装
置は、候補単語により尤度分布が大きく異なる場合にお
いても、共通の正解率に基づいて確認と再発声の判断を
行なうことができるので、無駄な再発声、必要以上の確
認処理を行なうことなく対話を進めることができ、結果
的に利用者の平均発声回数を低減することができるとい
う効果を奏する。As described above, the dialogue device of the present invention can perform confirmation and judgment of recurrent voice based on the common correct answer rate even when the likelihood distributions differ greatly depending on the candidate words. Therefore, it is possible to proceed with the dialogue without performing unnecessary re-voice and unnecessary confirmation processing, and as a result, it is possible to reduce the average number of times the user speaks.

【００２６】この発明の対話装置において、予め単語別
に正解率の計算結果をのテーブル化して単語別尤度−正
解率対応データ格納部に格納しておき、実際の対話処理
時はこれを参照して処理を実行すれば処理速度は向上す
る。In the interactive apparatus of the present invention, the calculation result of the correct answer rate for each word is tabulated in advance and stored in the word-likelihood-correct answer rate correspondence data storage unit, and this is referred to during the actual interactive processing. The processing speed is improved by executing the processing in this way.

[Brief description of drawings]

【図１】候補単語の正否を確認する対話の例を説明する
図。FIG. 1 is a diagram illustrating an example of a dialogue for confirming the correctness of a candidate word.

【図２】再発声を指示する対話の例を説明する図。FIG. 2 is a diagram illustrating an example of a dialogue instructing re-voice.

【図３】尤度の一定閾値を使用した確認対話処理の従来
例を説明する図。FIG. 3 is a diagram illustrating a conventional example of confirmation dialogue processing using a certain threshold of likelihood.

【図４】認識処理部から渡される情報の例を示す図。FIG. 4 is a diagram showing an example of information passed from a recognition processing unit.

【図５】この発明の一実施例を示す図。FIG. 5 is a diagram showing an embodiment of the present invention.

【図６】この発明の確認対話処理の例を説明する図。FIG. 6 is a diagram illustrating an example of confirmation dialogue processing according to the present invention.

[Explanation of symbols]

１音声分析部２尤度計算部３登録単語パラメータ記憶部４候補単語出力部５確認制御部６単語別尤度−正解率対応データ格納部７正解率閾値格納部８候補単語提示部９再発声指示部 1 Speech analysis unit 2 Likelihood calculation unit 3 Registered word parameter storage unit 4 Candidate word output unit 5 Confirmation control unit 6 Word-specific likelihood-correctness rate correspondence data storage unit 7 Correctness rate threshold storage unit 8 Candidate word presentation unit 9 Recurrent voice Indicator

Claims

[Claims]

1. A voice analysis unit for analyzing an input voice signal and calculating a characteristic parameter thereof, and a registered word parameter storage unit for storing characteristic parameters of a plurality of words registered in advance, Input speech feature parameters obtained from the analysis unit and registered word parameter feature parameters of registered words stored in the storage unit or state transition probabilities and symbols obtained by statistically processing the feature parameters assuming each word as a hidden Markov model A likelihood calculation unit that calculates the likelihood of each registered word of the input speech from the output probability and a candidate word output unit that outputs the candidate word number and the likelihood of the candidate word in descending order of likelihood are provided. , Providing a candidate word presentation unit for presenting the candidate words output from the candidate word output unit to the user in the order of likelihood, and instructing the user to utter the word again In a voice dialog device that is equipped with a re-vocalization instruction unit that determines the correct candidate word by the user's voice input in response to the candidate word presented by the candidate word presenting unit, the likelihood and the correct answer rate are calculated for each candidate word. It is equipped with a word-likelihood-correctness rate correspondence data storage unit that stores data indicating a correspondence relationship between the words, and the accuracy rate of the correctness rate that determines whether the candidate word is confirmed to be correct or incorrect or the word is uttered again. It has a correctness rate threshold value storage unit that stores a threshold value, and the likelihood of candidate words and the likelihood by word obtained from the likelihood calculation unit −
The correct answer rate of the target candidate word is obtained from the data in the correct answer rate corresponding data storage unit, and this correct answer rate is compared with the threshold value stored in the correct answer rate threshold storage unit, and if the correct answer rate is higher than the threshold value, the candidate word A voice interaction device, comprising: a confirmation control unit that drives a presentation unit and drives a re-voice instruction unit when the correct answer rate is smaller than a threshold value.