JPH075890A - Voice interactive device - Google Patents

Voice interactive device

Info

Publication number
JPH075890A
JPH075890A JP5144739A JP14473993A JPH075890A JP H075890 A JPH075890 A JP H075890A JP 5144739 A JP5144739 A JP 5144739A JP 14473993 A JP14473993 A JP 14473993A JP H075890 A JPH075890 A JP H075890A
Authority
JP
Japan
Prior art keywords
word
likelihood
candidate
candidate word
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5144739A
Other languages
Japanese (ja)
Inventor
Hiroyuki Nishi
宏之 西
Mikio Kitai
幹雄 北井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP5144739A priority Critical patent/JPH075890A/en
Publication of JPH075890A publication Critical patent/JPH075890A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To provide a voice interactive device which reduces user's average number of uttering. CONSTITUTION:This device which decides a correct candidate work by the voice input by a user who responds to a candidate word by a candidate word presenting part 8 is provided with a word-oriented likelihood-correct rate corresponding data storage part 6 which stores the data indicating the relationship between the likelihood and correctness rate for every candidate word and a correctness rate threshold value storage part 7 which stores the threshold value of the correctness rate that is used to judge whether to confirm the correctness of candidate words or to instruct of the pronunciation of a word again to obtain the correctness rate of an objective candidate word from the likelihood of the candidate words obtained from a likelihood computing part 2 and the data stored in the part 6. Also, the device is provided with a confirmation control part 5 which compares the correctness rate and the threshold values stored in the section 7, controls the part 8 when the correctness rate is larger than the threshold value, and controls a repronunciation instruction part 9 if the correctness rate is smaller than the threshold value.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は、音声対話装置に関
し、特に、音声対話装置の利用者が発声した単語音声の
認識結果の正否を合理的且つ高精度に確認する音声対話
装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice interactive apparatus, and more particularly to a voice interactive apparatus for rationally and accurately confirming the correctness of a recognition result of a word voice uttered by a user of the voice interactive apparatus.

【0002】[0002]

【従来の技術】従来例を図1を参照して説明する。音声
対話装置においては、入力された単語音声と、予めメモ
リに記憶登録されている単語の全てとの間の尤度を計算
し、図1に示される如く、尤度の大きい順に、例えば最
も尤度の大きい「伊藤」から「佐藤」、「加藤」という
順に利用者に候補単語を提示して確認を行うことが多
い。また、確認を行なおうとする単語の尤度が極めて小
さい場合は、正解である確率が低いものと判断して、確
認処理を行なわずに、図2に示される如く「モウ一度御
名前ヲドウゾ」という様に再発声を指示する手法が一般
的に採用されている。
2. Description of the Related Art A conventional example will be described with reference to FIG. In the voice interaction device, the likelihood between the input word voice and all the words stored and registered in the memory in advance is calculated, and as shown in FIG. In many cases, the candidate words are presented to the user for confirmation in the order of “Ito”, which has the highest frequency, from “Sato” to “Kato”. If the likelihood of the word to be confirmed is extremely small, it is determined that the probability of being the correct answer is low, and the confirmation process is not performed, as shown in FIG. The method of instructing recurrent voice is generally adopted.

【0003】従来、この様な確認対話処理を実施するに
際して、確認を行なうべき単語に関して、図3に示され
る如く、尤度について或る一定の閾値を設定して単語の
候補順位を決定するための尤度がこの閾値より大きいか
否かにより確認処理を選択すべきか或は再発声指示を選
択すべきかを判断している。尤度について或る一定の閾
値を設定するということは、確認処理を選択すべきか或
は再発声指示を選択すべきかを判断処理するに際して、
発声単語、候補単語に依らずに常に一定の閾値が使用さ
れるということである。ところが、単語によっては認識
尤度が高くしかも他の単語との間の尤度差が大きく出る
単語がある一方、正確な発声をすることが比較的に困難
であることその他の理由から認識尤度が低く他の単語と
の間の尤度差が余り出ない単語もある。
Conventionally, in carrying out such confirmation dialogue processing, as to a word to be confirmed, as shown in FIG. 3, a certain threshold value is set for the likelihood so as to determine the word candidate rank. It is determined whether the confirmation processing should be selected or the re-voice instruction should be selected depending on whether or not the likelihood of is larger than this threshold value. Setting a certain threshold value for the likelihood means that when the confirmation process or the re-voice instruction is selected,
This means that a fixed threshold is always used regardless of the uttered word and the candidate word. However, while some words have a high recognition likelihood and a large likelihood difference from other words, it is relatively difficult to make accurate utterances. There are some words that have a low denominator and have little likelihood difference with other words.

【0004】ここで、尤度が統計的に高い単語について
は、尤度が小さく出たということは発声を誤った公算が
大きいところから、確認処理を選択するよりも再発声指
示を選択する割合を大とするために、閾値を相対的に高
く設定して方が効率的であると考えられる。逆に、尤度
が統計的に低い単語については、これは本来正確な発声
をすることが比較的に困難な単語であるところから、尤
度が小さく出たとはいえ、発声を誤ったというよりもむ
しろ本来的に尤度が小さいという単語固有の特性である
公算が大きいので、閾値を相対的に低く設定して再発声
指示を選択するよりも確認処理を選択する割合を大とす
る方が効率的であると考えられる。
Here, for a word with a statistically high likelihood, the likelihood that the likelihood is small means that the likelihood is wrong, so that the re-voice instruction is selected rather than the confirmation process. It is considered more efficient to set the threshold value relatively high in order to make the value large. On the other hand, for a word with a statistically low likelihood, it is relatively difficult to make a correct utterance, so although the likelihood is small, it is more likely that the utterance is wrong. Since it is more likely that this is a characteristic peculiar to the word that the likelihood is inherently small, it is better to select a confirmation process rather than set a relatively low threshold and select a re-voice instruction. Considered to be efficient.

【0005】[0005]

【発明が解決しようとする課題】しかし、上述の判断処
理においては単語固有の尤度については何等の考慮をも
払わず、一定閾値を基準として判断処理を実施してい
る。従って、統計的に尤度が高い単語については、閾値
を高く設定した方が効率的であると考えられるにも関わ
らずに低く設定される結果となって、本来再発声を指示
する方が効率的であるにも関わらずに確認処理を継続す
ることとなる。一方、統計的に尤度が低い単語について
は、閾値を低く設定した方が効率的であると考えられる
にも関わらずに高く設定される結果となって、確認処理
を継続すべき時に再発声を指示することとなる。
However, in the above-mentioned judgment processing, no consideration is given to the likelihood peculiar to a word, and the judgment processing is carried out based on a fixed threshold value. Therefore, for words with a statistically high likelihood, it is more efficient to set a high threshold value, although it is considered to be more efficient. The confirmation process will continue regardless of the target. On the other hand, for words with a statistically low likelihood, a low threshold is considered to be more efficient, but the result is set to a high value, and when the confirmation process should be continued Will be instructed.

【0006】この発明は、上述の通りの問題を解消した
音声対話装置を提供するものである。
The present invention provides a voice interactive apparatus which solves the above problems.

【0007】[0007]

【課題を解決するための手段】入力される音声信号を分
析し、その特徴パラメータを計算する音声分析部1を具
備し、予め登録された複数の単語の特徴パラメータを格
納する登録単語パラメータ記憶部3を具備し、音声分析
部1から得られる入力音声の特徴パラメータと登録単語
パラメータ記憶部3に記憶された登録単語の特徴パラメ
ータ或は各単語を隠れマルコフモデルと仮定し特徴パラ
メータを統計処理して得られる状態遷移確率およびシン
ボル出力確率とから入力音声の各登録単語に対する尤度
を計算する尤度計算部2を具備し、候補単語番号とその
候補単語の尤度とを尤度が高い順に出力する候補単語出
力部4を具備し、候補単語出力部4から出力される候補
単語を尤度順に利用者に提示する候補単語提示部8を具
備し、単語の発声を再度行うことを利用者に指示する再
発声指示部9を具備し、候補単語提示部8から提示され
る候補単語に応答する利用者の音声入力により正解候補
単語を決定する音声対話装置において、候補単語毎にそ
の尤度と正解率との間の対応関係を示すデータを格納す
る単語別尤度−正解率対応データ格納部6を具備し、候
補単語の正誤を確認するか或は単語の発声を再度指示す
るかを判断する正解率の閾値を格納した正解率閾値格納
部7を具備し、尤度計算部2から得られる候補単語の尤
度と単語別尤度−正解率対応データ格納部6のデータと
から対象候補単語の正解率を求め、この正解率と正解率
閾値格納部7に格納される閾値とを比較し、閾値より正
解率の方が大きい場合は候補単語提示提示部8を駆動
し、閾値より正解率の方が小さい場合は再発声指示部9
を駆動する確認制御部5を具備する、音声対話装置を構
成した。
A registered word parameter storage unit that includes a voice analysis unit 1 that analyzes an input voice signal and calculates a characteristic parameter thereof, and stores the characteristic parameters of a plurality of preregistered words. 3, the input voice feature parameters obtained from the voice analysis unit 1 and the registered word feature parameters stored in the registered word parameter storage unit 3 or each word is assumed to be a hidden Markov model, and the feature parameters are statistically processed. A likelihood calculation unit 2 that calculates the likelihood for each registered word of the input speech from the state transition probability and the symbol output probability obtained by the above is provided, and the candidate word number and the likelihood of the candidate word are arranged in descending order of likelihood. A candidate word output unit 4 for outputting and a candidate word presentation unit 8 for presenting the candidate words output from the candidate word output unit 4 to the user in the order of likelihood are provided. In the voice dialogue device, which is provided with a re-voiced voice instruction unit 9 for instructing the user to perform again, a correct dialogue candidate word is determined by the user's voice input in response to the candidate word presented by the candidate word presentation unit 8. A word-by-word likelihood-correctness rate correspondence data storage unit 6 that stores data indicating a correspondence relationship between the likelihood and the correctness rate for each word is provided, and the correctness or incorrectness of the candidate word is confirmed or the word is uttered. The correctness rate threshold value storage unit 7 stores a threshold value of the correctness rate for determining whether to instruct again, and the likelihood of the candidate word obtained from the likelihood calculation unit 2 and the word-specific likelihood-correctness rate correspondence data storage unit. The correct answer rate of the target candidate word is obtained from the data of No. 6, the correct answer rate is compared with the threshold value stored in the correct answer rate threshold storage unit 7, and if the correct answer rate is larger than the threshold value, the candidate word presentation / presentation unit 8 Drive and the accuracy rate is smaller than the threshold Re-utterance instruction unit 9
A voice interaction device including a confirmation control unit 5 for driving the device is constructed.

【0008】[0008]

【実施例】この発明の実施例を説明する。この発明は、
音声対話装置において従来から採用されている尤度の代
わりに、処理対象の候補単語の正解率を尤度から計算し
直し、計算された正解率に基づいて確認処理或は再発声
指示の選択判断を行なうものである。この正解率は、後
で詳しく説明されるが、候補単語毎に予め求められた尤
度の分布の特性を表す計算式に尤度を代入して求められ
るものである。
Embodiments of the present invention will be described. This invention
Instead of the likelihood that has been conventionally adopted in a voice dialog device, the correct answer rate of the candidate word to be processed is recalculated from the likelihood, and the confirmation processing or the selection judgment of the re-voice instruction is made based on the calculated correct answer rate. Is to do. As will be described in detail later, this correct answer rate is obtained by substituting the likelihood into a calculation formula representing the characteristics of the likelihood distribution obtained in advance for each candidate word.

【0009】この様にして求められた正解率は、対応す
る候補単語単独の尤度ではなくして他の単語との間の識
別率をも考慮したものとなされている。その結果、候補
単語により尤度分布が大きく異なる場合においても、共
通の正解率に基づいて確認と再発声の判断を行なうこと
ができる。正解率に基づいて確認と再発声の判断処理を
実施することにより、単語固有の尤度については何等の
考慮をも払わずに一定閾値を基準として判断処理を実施
する従来の音声対話装置の上述の欠点を解消することが
できる。
The correct answer rate thus obtained takes into consideration not only the likelihood of the corresponding candidate word alone, but also the identification rate with other words. As a result, even when the likelihood distribution is greatly different depending on the candidate word, the confirmation and the re-emergence voice can be determined based on the common correct answer rate. By executing the confirmation and re-evaluation judgment processing based on the correct answer rate, the above-mentioned conventional speech dialogue apparatus that executes judgment processing based on a certain threshold value without giving any consideration to word-specific likelihood. The drawbacks of can be eliminated.

【0010】ここで、正解率について説明する。先ず、
第i位に単語番号k(k=1、2、…N、但し、Nは認
識対象単語数)の候補単語Okが尤度Lで観測された場
合、観測結果が正解である確率PikL を認識結果の尤度
の頻度分布を用いて表現する。なお、発声さたれ真の単
語をwj(j=1、2、…N、但し、jは単語番号)、
発声単語の発生頻度(事前確率)をP(wj)とする。
Here, the accuracy rate will be described. First,
When a candidate word O k having a word number k (k = 1, 2, ... N, where N is the number of recognition target words) is observed at the i-th place with the likelihood L, the probability P ikL that the observation result is the correct answer. Is expressed using the frequency distribution of the likelihood of the recognition result. In addition, the uttered true word is represented by w j (j = 1, 2, ... N, where j is a word number),
The occurrence frequency (prior probability) of the spoken word is P (w j ).

【0011】発生単語別出現尤度分布、即ち単語wj
発声された時、第i位に候補単語Okが尤度Lで出現す
る確率分布Dj,ikL は予め得られているものとする。 Dj,ikL=P(i,Ok,L|wj) 但し、i、j、 k=1、2、・・・N (1) Dj,ikLはシステム運用前の評価段階で求めることがで
きる。一方、実際の認識処理時にシステムの入手するこ
とができるデータは認識処理部から受けとる候補単語列
kおよび各kに対応する尤度Lの一覧である。
It is assumed that the occurrence likelihood distribution for each generated word, that is, the probability distribution D j, ikL in which the candidate word O k appears at the i-th place with the likelihood L when the word w j is uttered is obtained in advance. To do. D j, ikL = P (i, O k, L | w j ), where i, j, k = 1, 2, ... N (1) D j, ikL should be obtained at the evaluation stage before system operation. You can On the other hand, the data that can be obtained by the system during the actual recognition processing is a list of candidate word strings O k received from the recognition processing unit and the likelihood L corresponding to each k.

【0012】従って、システムが入手することができる
情報、即ち、図4に示される様な単語番号kと尤度Lの
一覧から真の単語がwj である確率PikL,jを計算する
ことができれば、PikL,j=kによって、観測結果が正解
である確率(正解率)PikLが得られ、システムの制
御、即ち、確認処理或は再発声処理の何れを選択すべき
かの判断をすることができる。PikL,jは、 PikL,j=P(wj|i,Ok,L) (2) の意味であるが、ベーズの定理から、下記の様に表現す
ることができる。
Therefore, the probability P ikL, j that the true word is wj can be calculated from the information available to the system, that is, the list of the word numbers k and the likelihoods L as shown in FIG. If possible, P ikL, j = k gives the probability that the observation result is correct (correct answer rate) P ikL , and determines whether to control the system, that is, whether confirmation processing or re-voice processing is to be selected. be able to. P ikL, j has a meaning of P ikL, j = P (w j | i, O k , L) (2), but can be expressed as follows from the Bethe's theorem.

【0013】[0013]

【数1】 [Equation 1]

【0014】上式の分母を条件と条件付き確率の積和に
置き換え、更に(1)式を代入すると、
If the denominator of the above equation is replaced with the sum of products of the condition and the conditional probability and the equation (1) is substituted,

【0015】[0015]

【数2】 [Equation 2]

【0016】となり、正解率PikLは、 Therefore , the correct answer rate P ikL is

【0017】[0017]

【数3】 [Equation 3]

【0018】で表され、予め得られている確率分布D
j,ikLから候補の正解率を求めることができる。この発
明の方式は分布Dj,ikLが予めシステムの設計評価段階
で求められていることが前提となっている。正解候補に
ついては比較的短期間に分布を収集することができる
が、不正解候補、特に順位の低い候補の尤度の分布を求
めるためには大量の評価実験を行なう必要があり、上記
前提は必ずしも現実的でない。そこで、システムの運用
の初期段階においては次に述べる方法で初期分布を求
め、運用開始以後に十分なデータが蓄積された段階で、
分布を入れ換える方法が現実的かつ問題点が少ないと考
えられる。初期分布を決定するための条件として「認識
候補の正解率は、尤度のみで決定され、順位によらな
い。」という条件を設定する。即ち、 PikL,j=PkL,j=P(wj|Ok,L) (6) なる仮定を初期分布の設定のために導入する。
The probability distribution D represented in advance and obtained in advance
The correct answer rate of the candidate can be obtained from j, ikL . The method of the present invention is premised on that the distribution D j, ikL is obtained in advance at the system design evaluation stage. Although it is possible to collect distributions of correct answer candidates in a relatively short period of time, it is necessary to perform a large number of evaluation experiments in order to obtain likelihood distributions of incorrect answer candidates, especially candidates with a low rank. Not necessarily realistic. Therefore, at the initial stage of system operation, the initial distribution is obtained by the method described below, and when sufficient data has been accumulated since the start of operation,
It is considered that the method of exchanging the distribution is realistic and has few problems. As a condition for determining the initial distribution, the condition "the correct answer rate of the recognition candidate is determined only by the likelihood and does not depend on the rank." That is, the assumption P ikL, j = P kL, j = P (w j | O k , L) (6) is introduced for setting the initial distribution.

【0019】これにより、観測単語別出現分布は下記の
様に簡略化される。 Dj,kL=P(Ok,L|wj) j=1,2,…N (7) この場合、一回の発声で全候補単語の尤度が得られるの
で、一単語当り数十程度の発声で初期データとして十分
なデータを獲得することができる。上述の仮定のもとに
正解率PkLを再度求めると、下記の様になる。
As a result, the distribution of appearances by observed word is simplified as follows. D j, kL = P (O k , L | w j ) j = 1,2, ... N (7) In this case, the likelihood of all the candidate words can be obtained with one utterance, and therefore several tens per word can be obtained. Sufficient vocalization can acquire sufficient data as initial data. When the correct answer rate P kL is obtained again based on the above assumption, the result is as follows .

【0020】[0020]

【数4】 [Equation 4]

【0021】この正解率PkLの分子であるDk,kLP(w
k)は、単語wkが発声された時に自身である単語kが尤
度Lで出現する確率を意味する。そして、分母は、単語
k自身をも含めた単語wlが発声された時に単語kが尤
度Lで出現する確率をすべて総和したものである。従っ
て、単語wk自身以外の単語wlが発声された時に単語k
が尤度Lで出現する確率が考慮されたことになる。この
様な考慮のなされた共通の正解率PkLに基づいて確認と
再発声の判断を実施することにより、結局単語固有の尤
度に対応した判断が実施されたことになり、単語固有の
尤度については何等の考慮をも払わずに一定閾値を基準
として判断処理を実施する従来の音声対話装置の欠点を
解消することができる。
The numerator of this correct answer rate P kL is D k, kL P (w
k), the word k is its own when the word w k is utterance means the probability of occurrence in the likelihood L. The denominator is the sum of all the probabilities that the word k appears with the likelihood L when the word w l including the word w k itself is uttered. Therefore, when a word w l other than the word w k itself is uttered, the word k
That is, the probability of appearing at the likelihood L is taken into consideration. By carrying out the confirmation and the judgment of the re-voiced voice based on the common correct answer rate P kL thus considered, the judgment corresponding to the likelihood peculiar to the word is finally carried out, and the likelihood peculiar to the word is peculiar. It is possible to eliminate the drawbacks of the conventional voice dialog device in which the judgment processing is performed based on a certain threshold value without giving any consideration to the degree.

【0022】この発明の実施例を図4、図5および図6
を参照して具体的に説明する。先ず、音声対話装置から
例えば「+++をどうぞ」という様にプロンプト音声出
力がなされる。利用者はこれに対応して例えば「OOO
をお願いします」と発声する。この発声に対応する音声
信号は、音声分析部1において音声のアナログディジタ
ル変換処理、特徴パラメータ抽出処理が施される。特徴
パラメータとしては例えば登録単語のケプストラム系
列、デルタケプストラム系列、パワー系列、或はこれら
パラメータをベクトル量子化(VQ)したベクトル量子
化系列が一般的に採用される。この発明の音声対話装置
においては、これらの内の何れの特徴パラメータをも適
用することができる。抽出された特徴パラメータは、尤
度計算部2に送られる。尤度計算部2は、音声分析部1
において抽出処理された入力音声の特徴パラメータと、
登録単語パラメータ記憶部3に予め記憶される複数の登
録単語の特徴パラメータ或は各単語を隠れマルコフモデ
ル(HMM)と仮定し特徴パラメータを統計処理して得
られる状態遷移確率およびシンボル出力確率とから、入
力された音声と各登録単語との間の尤度を計算する。計
算結果の尤度は候補単語出力部4に送られ、ここにおい
て図4に示される如く尤度の高い順に候補単語番号およ
び尤度が並べ替えられる。
Embodiments of the present invention are shown in FIGS. 4, 5 and 6.
It will be specifically described with reference to. First, a prompt voice output is made from the voice interaction device, for example, "Please have ++". The user responds to this by, for example, “OOO
Please say. " The voice signal corresponding to this utterance is subjected to voice analog-digital conversion processing and feature parameter extraction processing in the voice analysis unit 1. As the characteristic parameter, for example, a cepstrum sequence of a registered word, a delta cepstrum sequence, a power sequence, or a vector quantization sequence obtained by vector quantization (VQ) of these parameters is generally adopted. In the voice interaction device of the present invention, any of these characteristic parameters can be applied. The extracted feature parameter is sent to the likelihood calculation unit 2. Likelihood calculation section 2 includes speech analysis section 1
A characteristic parameter of the input voice extracted in
From the state transition probabilities and the symbol output probabilities obtained by statistically processing the characteristic parameters of a plurality of registered words or each word stored in the registered word parameter storage unit 3 or assuming each word as a hidden Markov model (HMM). , Calculate the likelihood between the input speech and each registered word. The likelihoods of the calculation results are sent to the candidate word output unit 4, where the candidate word numbers and the likelihoods are rearranged in descending order of likelihood as shown in FIG.

【0023】ここで、先ず、尤度第1位の候補単語につ
いて、その尤度と単語別尤度−正解率対応データ格納部
6のデータとから対象候補単語の正解率を求める。次い
で、求められた正解率と正解率閾値格納部7に格納され
る閾値とを比較して、正解率の方が大きい場合はその候
補単語の単語番号を候補単語提示部8に渡す。候補単語
提示部8は該当する候補単語が正しいか否かを「はい」
或は「いいえ」と答える様に利用者に質問する。例:
「○○ですか?」。
First, with respect to the candidate word having the first highest likelihood, the correct answer rate of the target candidate word is obtained from the likelihood and the data in the word-wise likelihood-correct answer rate correspondence data storage unit 6. Next, the obtained correct answer rate is compared with the threshold value stored in the correct answer rate threshold storage unit 7, and if the correct answer rate is higher, the word number of the candidate word is passed to the candidate word presenting unit 8. The candidate word presenting section 8 asks whether the corresponding candidate word is correct or not.
Or ask the user to answer "No". Example:
"Is it XX?"

【0024】利用者の応答音声を認識した結果「はい」
ならば、尤度第1位の候補単語が正解単語であり、「い
いえ」ならば、尤度第2位の候補について同様の処理を
繰り返す。正解率の方が小さい場合は再発声指示部9を
制御して「もういちどどうぞ」その他単語の発声を再度
行うことを利用者に指示する。
"Yes" as a result of recognizing the response voice of the user
If so, the candidate word with the first highest likelihood is the correct word, and if “no”, the same process is repeated for the candidate with the second highest likelihood. If the correct answer rate is smaller, the re-voice instruction unit 9 is controlled to instruct the user to utter another word "Please try again".

【0025】[0025]

【発明の効果】以上の通りであって、この発明の対話装
置は、候補単語により尤度分布が大きく異なる場合にお
いても、共通の正解率に基づいて確認と再発声の判断を
行なうことができるので、無駄な再発声、必要以上の確
認処理を行なうことなく対話を進めることができ、結果
的に利用者の平均発声回数を低減することができるとい
う効果を奏する。
As described above, the dialogue device of the present invention can perform confirmation and judgment of recurrent voice based on the common correct answer rate even when the likelihood distributions differ greatly depending on the candidate words. Therefore, it is possible to proceed with the dialogue without performing unnecessary re-voice and unnecessary confirmation processing, and as a result, it is possible to reduce the average number of times the user speaks.

【0026】この発明の対話装置において、予め単語別
に正解率の計算結果をのテーブル化して単語別尤度−正
解率対応データ格納部に格納しておき、実際の対話処理
時はこれを参照して処理を実行すれば処理速度は向上す
る。
In the interactive apparatus of the present invention, the calculation result of the correct answer rate for each word is tabulated in advance and stored in the word-likelihood-correct answer rate correspondence data storage unit, and this is referred to during the actual interactive processing. The processing speed is improved by executing the processing in this way.

【図面の簡単な説明】[Brief description of drawings]

【図1】候補単語の正否を確認する対話の例を説明する
図。
FIG. 1 is a diagram illustrating an example of a dialogue for confirming the correctness of a candidate word.

【図2】再発声を指示する対話の例を説明する図。FIG. 2 is a diagram illustrating an example of a dialogue instructing re-voice.

【図3】尤度の一定閾値を使用した確認対話処理の従来
例を説明する図。
FIG. 3 is a diagram illustrating a conventional example of confirmation dialogue processing using a certain threshold of likelihood.

【図4】認識処理部から渡される情報の例を示す図。FIG. 4 is a diagram showing an example of information passed from a recognition processing unit.

【図5】この発明の一実施例を示す図。FIG. 5 is a diagram showing an embodiment of the present invention.

【図6】この発明の確認対話処理の例を説明する図。FIG. 6 is a diagram illustrating an example of confirmation dialogue processing according to the present invention.

【符号の説明】[Explanation of symbols]

1 音声分析部 2 尤度計算部 3 登録単語パラメータ記憶部 4 候補単語出力部 5 確認制御部 6 単語別尤度−正解率対応データ格納部 7 正解率閾値格納部 8 候補単語提示部 9 再発声指示部 1 Speech analysis unit 2 Likelihood calculation unit 3 Registered word parameter storage unit 4 Candidate word output unit 5 Confirmation control unit 6 Word-specific likelihood-correctness rate correspondence data storage unit 7 Correctness rate threshold storage unit 8 Candidate word presentation unit 9 Recurrent voice Indicator

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力される音声信号を分析し、その特徴
パラメータを計算する音声分析部を具備し、 予め登録された複数の単語の特徴パラメータを格納する
登録単語パラメータ記憶部を具備し、 音声分析部から得られる入力音声の特徴パラメータと登
録単語パラメータ記憶部に記憶された登録単語の特徴パ
ラメータ或は各単語を隠れマルコフモデルと仮定し特徴
パラメータを統計処理して得られる状態遷移確率および
シンボル出力確率とから入力音声の各登録単語に対する
尤度を計算する尤度計算部を具備し、 候補単語番号とその候補単語の尤度とを尤度が高い順に
出力する候補単語出力部を具備し、 候補単語出力部から出力される候補単語を尤度順に利用
者に提示する候補単語提示部を具備し、 単語の発声を再度行うことを利用者に指示する再発声指
示部を具備し、 候補単語提示部から提示される候補単語に応答する利用
者の音声入力により正解候補単語を決定する音声対話装
置において、 候補単語毎にその尤度と正解率との間の対応関係を示す
データを格納する単語別尤度−正解率対応データ格納部
を具備し、 候補単語の正誤を確認するか或は単語の発声を再度指示
するかを判断する正解率の閾値を格納した正解率閾値格
納部を具備し、 尤度計算部から得られる候補単語の尤度と単語別尤度−
正解率対応データ格納部のデータとから対象候補単語の
正解率を求め、この正解率と正解率閾値格納部に格納さ
れる閾値とを比較し、閾値より正解率の方が大きい場合
は候補単語提示部を駆動し、閾値より正解率の方が小さ
い場合は再発声指示部を駆動する確認制御部を具備す
る、 ことを特徴とする音声対話装置。
1. A voice analysis unit for analyzing an input voice signal and calculating a characteristic parameter thereof, and a registered word parameter storage unit for storing characteristic parameters of a plurality of words registered in advance, Input speech feature parameters obtained from the analysis unit and registered word parameter feature parameters of registered words stored in the storage unit or state transition probabilities and symbols obtained by statistically processing the feature parameters assuming each word as a hidden Markov model A likelihood calculation unit that calculates the likelihood of each registered word of the input speech from the output probability and a candidate word output unit that outputs the candidate word number and the likelihood of the candidate word in descending order of likelihood are provided. , Providing a candidate word presentation unit for presenting the candidate words output from the candidate word output unit to the user in the order of likelihood, and instructing the user to utter the word again In a voice dialog device that is equipped with a re-vocalization instruction unit that determines the correct candidate word by the user's voice input in response to the candidate word presented by the candidate word presenting unit, the likelihood and the correct answer rate are calculated for each candidate word. It is equipped with a word-likelihood-correctness rate correspondence data storage unit that stores data indicating a correspondence relationship between the words, and the accuracy rate of the correctness rate that determines whether the candidate word is confirmed to be correct or incorrect or the word is uttered again. It has a correctness rate threshold value storage unit that stores a threshold value, and the likelihood of candidate words and the likelihood by word obtained from the likelihood calculation unit −
The correct answer rate of the target candidate word is obtained from the data in the correct answer rate corresponding data storage unit, and this correct answer rate is compared with the threshold value stored in the correct answer rate threshold storage unit, and if the correct answer rate is higher than the threshold value, the candidate word A voice interaction device, comprising: a confirmation control unit that drives a presentation unit and drives a re-voice instruction unit when the correct answer rate is smaller than a threshold value.
JP5144739A 1993-06-16 1993-06-16 Voice interactive device Pending JPH075890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5144739A JPH075890A (en) 1993-06-16 1993-06-16 Voice interactive device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5144739A JPH075890A (en) 1993-06-16 1993-06-16 Voice interactive device

Publications (1)

Publication Number Publication Date
JPH075890A true JPH075890A (en) 1995-01-10

Family

ID=15369232

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5144739A Pending JPH075890A (en) 1993-06-16 1993-06-16 Voice interactive device

Country Status (1)

Country Link
JP (1) JPH075890A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09106296A (en) * 1995-07-31 1997-04-22 At & T Corp Apparatus and method for speech recognition
JP2000250585A (en) * 1999-02-25 2000-09-14 Nippon Telegr & Teleph Corp <Ntt> Interactive database retrieving method and device and recording medium recorded with interactive database retrieving program
JP2009157050A (en) * 2007-12-26 2009-07-16 Hitachi Omron Terminal Solutions Corp Uttering verification device and uttering verification method
US8977547B2 (en) 2009-01-30 2015-03-10 Mitsubishi Electric Corporation Voice recognition system for registration of stable utterances

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09106296A (en) * 1995-07-31 1997-04-22 At & T Corp Apparatus and method for speech recognition
JP2000250585A (en) * 1999-02-25 2000-09-14 Nippon Telegr & Teleph Corp <Ntt> Interactive database retrieving method and device and recording medium recorded with interactive database retrieving program
JP2009157050A (en) * 2007-12-26 2009-07-16 Hitachi Omron Terminal Solutions Corp Uttering verification device and uttering verification method
US8977547B2 (en) 2009-01-30 2015-03-10 Mitsubishi Electric Corporation Voice recognition system for registration of stable utterances

Similar Documents

Publication Publication Date Title
US8315870B2 (en) Rescoring speech recognition hypothesis using prosodic likelihood
EP2309489B1 (en) Methods and systems for considering information about an expected response when performing speech recognition
JP4657736B2 (en) System and method for automatic speech recognition learning using user correction
US5995928A (en) Method and apparatus for continuous spelling speech recognition with early identification
US5794196A (en) Speech recognition system distinguishing dictation from commands by arbitration between continuous speech and isolated word modules
US6317711B1 (en) Speech segment detection and word recognition
EP0965978B1 (en) Non-interactive enrollment in speech recognition
US5963903A (en) Method and system for dynamically adjusted training for speech recognition
EP1647970B1 (en) Hidden conditional random field models for phonetic classification and speech recognition
US6629073B1 (en) Speech recognition method and apparatus utilizing multi-unit models
US6801892B2 (en) Method and system for the reduction of processing time in a speech recognition system using the hidden markov model
JP2003316386A (en) Method, device, and program for speech recognition
EP1225567B1 (en) Method and apparatus for speech recognition
US7272560B2 (en) Methodology for performing a refinement procedure to implement a speech recognition dictionary
JPH075890A (en) Voice interactive device
JP3633254B2 (en) Voice recognition system and recording medium recording the program
JP4296290B2 (en) Speech recognition apparatus, speech recognition method and program
JP2003345388A (en) Method, device, and program for voice recognition
EP0987681B1 (en) Speech recognition method and apparatus
JPH09114482A (en) Speaker adaptation method for voice recognition
JPH10133686A (en) Nonnative language speech recognition device
KR100404852B1 (en) Speech recognition apparatus having language model adaptive function and method for controlling the same
JP3291073B2 (en) Voice recognition method
JPH0981177A (en) Voice recognition device, dictionary for work constitution elements and method for learning imbedded markov model
JPH06250689A (en) Voice recognition device