JPH09114493A

JPH09114493A - Interaction controller

Info

Publication number: JPH09114493A
Application number: JP27106595A
Authority: JP
Inventors: Otoya Shirotsuka; 音也城塚
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1995-10-19
Filing date: 1995-10-19
Publication date: 1997-05-02

Abstract

PROBLEM TO BE SOLVED: To reduce the number of utterings by a user at an erroneous recognition and to make the conversation smooth, in an interaction controller which has an user uttering content recognition means. SOLUTION: The controller has a recognition word estimating section 12. The section 12 is provided with a similar word table 122 which stores separating information to judge the similarity and the non-similarity of the words stored in a recognition word dictionary 121, word narrowing-down table 123 which stores the information to express the presence or the absence of erroneous recognition histories of each word in the dictionary 121, a similar word retrieving section 124 which retrieves similar words and a narrowing-down word retrieving section 125 which performs word narrowing-down retrieval. At the time of an erroneous recognition, the controller does not ask the user to re-input his voice, but asks the user the correctness or the incorrectness of the similar word that is retrieved by referring to the table 122 as the next estimating word. If more than two words having no erroneous recognition history are narrowed down, they are successively used as recognized words and the user is asked to make an appropriate selection. The user answers the question with only 'yes' or 'no'.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識及び音声
合成技術を駆使してユーザと対話を行い、その対話内容
にしたがって所定の音声サービスを実現する音声サービ
スシステムに係り、特に、ユーザとシステムとの間の対
話を制御する対話制御装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice service system which utilizes voice recognition and voice synthesis techniques to interact with a user and realizes a predetermined voice service in accordance with the content of the conversation, and more particularly to the user and the system. The present invention relates to a dialogue control device that controls a dialogue between a user and a user.

【０００２】[0002]

【従来の技術】ユーザとの間で音声で対話しながら所定
の音声サービスを実現する音声サービスシステムが知ら
れている。図１３は、この音声サービスシステムの一般
的なブロック構成図であり、ユーザが発話した音声は、
音声インタフェース３の音声認識部３１に入力され、こ
の音声認識部３１で認識されて対話制御装置４に送られ
る。対話制御装置４は、話題決定後、あるいはユーザか
らの音声入力を契機に、次の対話を行うための音声出力
文（文字コード）を生成し、これを音声合成部３２に送
る。対話制御装置４は、また、ユーザとの対話から必要
十分な情報を取得したときは、この取得情報をアプリケ
ーション処理部２に送り、ユーザの希望するサービスを
実行させる。2. Description of the Related Art A voice service system is known which realizes a predetermined voice service while talking with a user by voice. FIG. 13 is a general block configuration diagram of this voice service system. The voice uttered by the user is
It is input to the voice recognition unit 31 of the voice interface 3, recognized by the voice recognition unit 31, and sent to the dialogue control device 4. The dialogue control device 4 generates a voice output sentence (character code) for carrying out the next dialogue after the topic is decided or when the user inputs a voice, and sends it to the voice synthesizer 32. When the dialog control device 4 acquires necessary and sufficient information from the dialog with the user, the dialog control device 4 sends the acquired information to the application processing unit 2 to execute the service desired by the user.

【０００３】音声サービスシステムとユーザとの対話
は、システム側が話題毎に指示や質問等の音声メッセー
ジを出力し、利用者がそれに応えるということ（対話
対）の繰り返しにより行われる。この対話対の制御を司
るのが対話制御装置４である。従来の対話制御装置４の
概略構成は図１４に示すとおりであり、話題を決定する
話題決定部４１と、話題毎に複数の認識単語が格納され
た認識単語辞書４２２と単語検索部４２１とを用いて認
識単語を推定する認識単語推定部４２と、推定結果から
音声合成の対象となる音声出力文を生成して音声合成部
３２に送出する音声出力文生成部４３と、音声認識部３
１及びアプリケーション処理部２とのインタフェースと
なる認識処理部４４と、音声認識部３１から送られた認
識結果に基づいて上記推定された認識単語の正誤を判定
する正誤判定部４５とを備えている。The dialogue between the voice service system and the user is carried out by repeating that the system side outputs voice messages such as instructions and questions for each topic and the user responds to them (interaction pair). The dialogue control device 4 controls the dialogue pair. A schematic configuration of a conventional dialogue control device 4 is as shown in FIG. 14, and includes a topic determination unit 41 that determines a topic, a recognition word dictionary 422 that stores a plurality of recognition words for each topic, and a word search unit 421. A recognition word estimation unit 42 that estimates a recognition word using the speech recognition unit 3, a speech output sentence generation unit 43 that generates a speech output sentence that is a target of speech synthesis from the estimation result and sends the speech output sentence to the speech synthesis unit 32, and a speech recognition unit 3.
1 and the application processing unit 2, and a recognition processing unit 44, and a correctness determination unit 45 that determines whether the estimated recognition word is correct based on the recognition result sent from the voice recognition unit 31. .

【０００４】この対話制御装置４の動作を、会議室の予
約というサービスを例に挙げて説明する。会議室の予約
では、予約者名、日づけ、使用開始時間、使用終了時
間、予約対象となる会議室名の５つの話題が必要とな
る。対話制御装置４は、これらの話題の各々についてア
プリケーション処理部２とユーザとの間で行う対話対を
制御する。The operation of the dialogue control device 4 will be described by taking a service of reservation of a conference room as an example. When making a reservation for a conference room, five topics are required: the name of the person who made the reservation, the date, the use start time, the use end time, and the name of the conference room to be reserved. The dialogue control device 4 controls a dialogue pair performed between the application processing unit 2 and the user for each of these topics.

【０００５】図１５は、会議室名の話題について対話制
御装置４が行う制御フローの説明図であり、図１２は、
この場合の音声サービスシステム全体とユーザとの間で
実際になされる対話の様子を示す図である。図１５及び
図１２を参照すると、まず、話題決定部４１が今回の話
題である「会議室名」を決定する（Ｓ３０１）。これに
より音声出力文生成部４３は、アプリケーション処理部
２がどのような情報を音声入力して欲しいかをユーザに
知らせるために必要な所期音声出力文（「会議室名をど
うぞ」）を生成する（Ｓ３０２）。この所期音声出力文
に対応する合成音声を聞いたユーザが「コーナーＡで
す」と音声入力し、これが音声認識部３１で認識された
場合（Ｓ３０３：Yes）、認識単語推定部４２は、「会
議室名」に関する認識単語を格納している認識単語辞書
４２２を参照して認識単語を推定する（Ｓ３０４）。認
識単語「コーナーＢ」が推定されたと仮定すると、音声
出力文生成部４３は、その推定結果が正しいかをユーザ
に尋ねるための音声出力文（「コーナーＢですか？」）
を生成する（Ｓ３０５）。これに対応する合成音声を聞
いたユーザは、認識結果が正しくないので「いいえ」と
答える。FIG. 15 is an explanatory diagram of a control flow performed by the dialogue control device 4 regarding the topic of the conference room name, and FIG.
It is a figure which shows the mode of the dialog actually performed between the whole voice service system and a user in this case. Referring to FIGS. 15 and 12, first, the topic determination unit 41 determines the “meeting room name” that is the topic of this time (S301). As a result, the voice output sentence generation unit 43 generates a desired voice output sentence (“Please give the name of the conference room”) necessary to inform the user what information the application processing unit 2 wants to input by voice. Yes (S302). When the user who hears the synthesized voice corresponding to this intended voice output sentence inputs the voice as "Corner A" and this is recognized by the voice recognition unit 31 (S303: Yes), the recognized word estimation unit 42 displays " The recognized word is estimated by referring to the recognized word dictionary 422 that stores the recognized word related to "meeting room name" (S304). Assuming that the recognition word “corner B” has been estimated, the voice output sentence generator 43 asks the user if the estimation result is correct (“is it corner B?”).
Is generated (S305). The user who hears the synthesized voice corresponding to this answers "No" because the recognition result is incorrect.

【０００６】この「いいえ」の音声が認識された場合
（Ｓ３０６：Yes）、正誤判定部４５は、上記認識単語
が誤りであると判定する（Ｓ３０７）。これを受けて音
声出力文生成部は、再度ユーザに情報の音声入力を促す
ための音声出力文（「もう一度おっしゃって下さい」）
を生成する（Ｓ３０８）。このＳ３０３〜Ｓ３０８の手
順を繰り返し、ユーザが「はい」と答えた場合（Ｓ３０
７：Yes）は、認識単語が正しかったとみなして次の話
題について対話を継続させる（Ｓ３０９）。次の話題が
ない場合は対話制御を終える。なお、ユーザが答える
「はい」、「いいえ」については、ほぼ１００％に近い
精度で認識できることが知られている。When this "no" voice is recognized (S306: Yes), the correctness determination section 45 determines that the recognized word is incorrect (S307). In response to this, the voice output sentence generation unit prompts the user to input voice information again (“Tell me again”)
Is generated (S308). When the user answers "Yes" by repeating the procedure of S303 to S308 (S30
7: Yes) regards the recognized word as correct and continues the dialogue on the next topic (S309). If there is no next topic, the dialogue control ends. It is known that "yes" and "no" answered by the user can be recognized with accuracy close to 100%.

【０００７】[0007]

【発明が解決しようとする課題】上述のように、従来の
対話制御装置４では、ユーザの発話内容から認識単語を
推定してその正誤をユーザに尋ね、誤認識であることが
判明した場合には再度ユーザに対して音声入力（「もう
一度おっしゃって下さい）」を要求している。そのた
め、誤認識の回数が増えるにつれてユーザの負担が増え
るという問題があった。特に、同一の情報について誤認
識が繰り返されると、ユーザがシステム利用に抵抗を感
じる懸念があった。As described above, in the conventional dialogue control device 4, when the recognition word is estimated from the content of the user's utterance, the correctness is asked to the user, and it is determined that the recognition is incorrect. Requests the user again for voice input ("Please say again"). Therefore, there has been a problem that the burden on the user increases as the number of erroneous recognitions increases. In particular, if erroneous recognition of the same information is repeated, there is a concern that the user may feel uncomfortable with the system.

【０００８】このような問題を解消するための手段とし
て、誤認識時にユーザに対して再度の音声入力を促さ
ず、複数の認識単語候補を予め認識処理等によって求め
ておき、最も確からしい認識単語候補から順番にユーザ
に「はい」、「いいえ」によって確認させることが考え
られる。しかし、複数の認識単語候補から最も確からし
いものを絞り込むには、かなりの計算量と大きなメモリ
空間が必要となるばかりでなく、真に正しい認識単語候
補が下位順位にある場合はそれに到達するまでユーザへ
の確認回数が多くなるという問題が生じる。As a means for solving such a problem, a plurality of recognition word candidates are previously obtained by a recognition process or the like without prompting the user to input a voice again at the time of erroneous recognition, and the most probable recognition word is obtained. It is conceivable to prompt the user to confirm "yes" and "no" in order from the candidate. However, in order to narrow down the most probable one from a plurality of recognized word candidates, not only a considerable amount of calculation and a large memory space are required, but when a truly correct recognized word candidate is in the lower rank, it is necessary to reach it. There is a problem that the number of confirmations to the user increases.

【０００９】そこで本発明の課題は、ユーザの発話内容
の認識手段を有する対話制御装置において、認識単語の
特定の迅速化を図るとともに、誤認識時のユーザの発声
回数を減少させるとともに、発話内容を短縮させてユー
ザとの間の対話を円滑化させることにある。Therefore, an object of the present invention is to speed up the identification of a recognition word in a dialogue control device having a means for recognizing the utterance content of a user, reduce the number of times the user utters at the time of erroneous recognition, and utter the content To facilitate the dialogue with the user.

【００１０】[0010]

【課題を解決するための手段】上記課題を解決するた
め、本発明は、複数の認識単語を格納した認識単語辞書
と、この認識単語辞書を参照して入力音声に対応する認
識単語を推定する認識単語推定部と、推定された認識単
語の正誤を問うための音声出力文を生成する音声出力文
生成部と、推定された認識単語の正誤を入力音声に基づ
いて判定する正誤判定部とを有する対話制御装置におい
て、前記認識単語推定部を、前記認識単語辞書に格納さ
れている認識単語間の類似または非類似の区別情報を各
認識単語の識別領域に格納した類似単語テーブルと、前
記推定した認識単語が誤っているときに前記類似単語テ
ーブルを参照して当該認識単語に類似する他の認識単語
の識別領域を特定する第１の単語検索部と、を備えた構
成とし、この特定された識別領域に対応する認識単語を
次候補の認識単語として推定することを特徴とする。In order to solve the above problems, the present invention estimates a recognition word dictionary storing a plurality of recognition words and a recognition word corresponding to an input voice by referring to the recognition word dictionary. A recognition word estimation unit, a voice output sentence generation unit that generates a voice output sentence for inquiring whether the estimated recognition word is correct, and a correctness determination unit that determines whether the estimated recognition word is correct or incorrect based on the input voice. In the dialogue control device having, the recognition word estimating unit includes a similar word table in which identification information of the recognition words stored in the recognition word dictionary, which is similar or dissimilar between the recognition words, is stored in the identification area of each recognition word, and the estimation. And a first word search unit for identifying an identification area of another recognition word similar to the recognition word when the recognized word is wrong, And the recognized word corresponding to the identification areas and estimating a recognized word for the next candidate.

【００１１】類似または非類似の別は、例えば、予め認
識単語の認識モデル同士の類似性を計算して記録してお
くか、あるいは実際の使用履歴から間違いやすい認識単
語のペアを統計的に求めて記録しておくことで対応す
る。For similarity or dissimilarity, for example, the similarity between the recognition models of the recognition words is calculated and recorded in advance, or a pair of recognition words which is apt to be mistaken is statistically obtained from the actual use history. We will respond by recording it.

【００１２】前記類似単語テーブルを作成する場合は、
例えば、当該認識単語辞書に含まれる認識単語の識別領
域をマトリクス状に配列するとともに、一方を類似、他
方を非類似と規定した二値情報を各々前記マトリクスの
対応領域に格納する。あるいは、各認識単語間の類似度
を表す三値以上の多値情報を前記マトリクスの対応領域
に格納する。後者の場合、前記第１の単語検索部は、前
記多値情報が所定の閾値を超える認識単語同士を類似と
判定するように構成する。When creating the similar word table,
For example, the identification areas of the recognition words included in the recognition word dictionary are arranged in a matrix, and binary information defining one as similar and the other as dissimilar is stored in corresponding areas of the matrix. Alternatively, multi-valued information of three or more values indicating the similarity between each recognized word is stored in the corresponding area of the matrix. In the latter case, the first word search unit is configured to determine that recognized words whose multi-valued information exceeds a predetermined threshold are similar to each other.

【００１３】対話制御装置をこのように構成すると、一
の認識単語の推定が誤った場合、類似単語テーブルの対
応領域を参照することでその認識単語に類似する他の認
識単語の識別領域を直ちに特定することができる。した
がって、この識別領域を認識単語辞書内の認識辞書の識
別情報と１：１に対応させておけば、次の推定候補とな
る認識単語の特定及び索出が迅速になる。また、次の認
識単語が直ちに特定できることから、その認識単語の正
誤をユーザに尋ねて「はい」または「いいえ」のいずれ
かのみを答えさせるようにすることができ、再度の音声
入力を促す場合に比べて対話の円滑化を図ることもでき
る。According to this construction of the dialogue control device, when the estimation of one recognized word is incorrect, the corresponding area of the similar word table is referred to immediately identify the identified areas of other recognized words similar to the recognized word. Can be specified. Therefore, if this identification area is made to correspond to the identification information of the recognition dictionary in the recognition word dictionary in a one-to-one correspondence, the identification and search of the next recognition candidate recognition word will be speeded up. In addition, since the next recognized word can be immediately identified, it is possible to ask the user whether the recognized word is correct or not and to answer only “Yes” or “No”. You can also facilitate the dialogue compared to.

【００１４】また、上記課題を解決する本発明の他の構
成は、複数の認識単語を格納した認識単語辞書と、この
認識単語辞書を参照して入力音声に対応する認識単語を
推定する認識単語推定部と、推定された認識単語の正誤
を問うための音声出力文を生成する音声出力文生成部
と、推定された認識単語の正誤を入力音声に基づいて判
定する正誤判定部とを有する対話制御装置において、前
記認識単語推定部を、前記認識単語辞書に格納されてい
る全ての認識単語の識別領域に当該認識単語の誤認識歴
の有無を表す情報が格納された単語絞込テーブルと、こ
の単語絞込テーブルを参照して誤認識履歴のない認識単
語の識別領域を検出する第２の単語検索部と、を備えて
構成し、検出された識別領域に対応する認識単語のいず
れかを次候補の認識単語して推定することを特徴とす
る。Another structure of the present invention for solving the above-mentioned problems is a recognition word dictionary storing a plurality of recognition words and a recognition word for estimating a recognition word corresponding to an input voice by referring to this recognition word dictionary. Dialog having an estimation unit, a voice output sentence generation unit that generates a voice output sentence for inquiring whether the estimated recognition word is correct, and a correctness determination unit that determines whether the estimated recognition word is correct or incorrect based on the input voice. In the control device, the recognition word estimation unit, a word narrowing table in which information indicating the presence or absence of misrecognition history of the recognition word is stored in the identification areas of all the recognition words stored in the recognition word dictionary, A second word search unit for detecting an identification area of a recognition word having no erroneous recognition history by referring to this word narrowing-down table, and selecting one of the recognition words corresponding to the detected identification area. Recognizing the next candidate Characterized in that it estimated word.

【００１５】これにより、一度誤認識があった認識単語
はその履歴を単語絞込テーブルに格納しておくことで、
同一の認識単語による誤認識の繰り返しが回避され、対
話の円滑化を図ることができる。By this, by storing the history of the recognition word which has been erroneously recognized once in the word narrowing table,
Repeated erroneous recognition due to the same recognition word can be avoided, and the dialogue can be facilitated.

【００１６】なお、第２の検索部が検出した識別領域が
２つの場合、すなわち誤認識歴の無い認識単語が２つの
場合は、いずれか一方の認識単語が正解となる。この場
合は、認識単語推定部は、いずれか一方の識別領域に対
応する単語を認識単語として推定する。そして音声出力
文生成部がこの認識単語の正誤を問うための音声出力文
を生成する。これにより、ユーザに「はい」または「い
いえ」のみの音声入力を促すだけで正しい認識単語を認
識することができ、対話の円滑化を図ることができる。When there are two identification areas detected by the second search unit, that is, when there are two recognition words having no recognition error history, either one of the recognition words is the correct answer. In this case, the recognition word estimation unit estimates a word corresponding to one of the identification areas as a recognition word. Then, the voice output sentence generation unit generates a voice output sentence for asking whether the recognized word is correct or incorrect. As a result, the correct recognition word can be recognized simply by urging the user to input only “Yes” or “No”, and the dialogue can be facilitated.

【００１７】[0017]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を詳細に説明する。図１は、本発明の対話制御装
置を音声サービスシステムに適用した場合のブロック構
成図であり、基本的には図１０に示した従来システムと
同一構成となる。なお、全く同一の構成要素については
図１においても同一符号を付してある。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block configuration diagram when the dialogue control device of the present invention is applied to a voice service system, and basically has the same configuration as the conventional system shown in FIG. It should be noted that, in FIG. 1, the same components are denoted by the same reference numerals.

【００１８】ユーザが発話した音声は、音声認識部３１
に入力され、この音声認識部３１で認識されて本発明の
対話制御装置１に送られる。対話制御装置１は、話題決
定を契機に次の対話を行うための音声出力文を生成し、
これを音声合成部３２に送る。対話制御装置１は、ま
た、ユーザとの対話から必要十分な情報を取得したとき
は、この取得情報をアプリケーション処理部２に送り、
ユーザの希望するサービスを実行させる。The voice uttered by the user is the voice recognition unit 31.
To the dialogue control device 1 of the present invention. The dialogue control device 1 generates a voice output sentence for carrying out the next dialogue when the topic is decided,
This is sent to the voice synthesizer 32. When the dialogue control device 1 obtains necessary and sufficient information from the dialogue with the user, the dialogue control device 1 sends the obtained information to the application processing unit 2,
Execute the service desired by the user.

【００１９】図２は、本発明の対話制御装置１の概略構
成図である。この対話制御装置１は、認識単語辞書１２
１，類似単語テーブル１２２，及び単語絞込テーブル１
２３が接続された認識単語推定部１２と、図１４に示し
た従来装置と同一機能の話題決定部１１，音声出力文生
成部１３，及び正誤判定部１５を有する。認識単語推定
部１２は、さらに、類似単語テーブル１２２を参照して
認識単語辞書１２１内の類似単語を索出する類似単語検
索部１２４と、単語絞込テーブル１２３を参照して認識
単語辞書１２１内の認識単語の誤認識の有無を調べ、誤
認識歴の無いものを索出する絞込単語検索部１２５とを
備えて構成される。両検索部１２４，１２５は、各々独
立に用いてもよく、併用させてもよい。FIG. 2 is a schematic configuration diagram of the dialogue control device 1 of the present invention. This dialogue control device 1 includes a recognition word dictionary 12
1, similar word table 122, and word narrowing down table 1
The recognition word estimation unit 12 to which 23 is connected, the topic determination unit 11 having the same function as the conventional device shown in FIG. 14, the voice output sentence generation unit 13, and the correctness determination unit 15 are included. The recognized word estimation unit 12 further refers to the similar word table 122 to search for similar words in the recognized word dictionary 121, and the recognized word dictionary 121 by referring to the word narrowing table 123. And a narrowed-down word search unit 125 that searches the recognition words for erroneous recognition and searches for those having no erroneous recognition history. Both search units 124 and 125 may be used independently or in combination.

【００２０】図３は、この実施形態で用いる認識単語辞
書１２１の構造説明図であり、どの話題用の辞書かを判
別するための認識辞書番号と、認識候補となる単語名
と、検索処理の便宜のために各単語に振られた単語番号
とが所定フォーマットで格納されている。図示の例で
は、話題「会議室名」が認識辞書番号「３」と対応して
おり、この「会議室名」に対して「コーナーＡ」・・・
等の５つの単語、及びその単語番号（１）〜（５）が格
納されている。FIG. 3 is an explanatory diagram of the structure of the recognition word dictionary 121 used in this embodiment. The recognition dictionary number for determining which topic dictionary, the word name as a recognition candidate, and the search process. For convenience, a word number assigned to each word is stored in a predetermined format. In the illustrated example, the topic “meeting room name” corresponds to the recognition dictionary number “3”, and “corner A” ...
And the like, and their word numbers (1) to (5) are stored.

【００２１】図４は、類似単語テーブル１２２の一例で
あり、図３に示した認識単語辞書（認識辞書番号
「３」）内の単語間の類似または非類似の別を表す
“○”，“×”の二値情報を、各単語番号に対応する領
域、すなわち識別領域にマトリクス状に格納したもので
ある。図示の例では、「コーナーＡ」と「コーナーＢ」
とが相互に類似であり、その他の単語間は非類似である
ことを表している。FIG. 4 is an example of the similar word table 122, and "○", "" that indicate whether the words in the recognized word dictionary (recognition dictionary number "3") shown in FIG. 3 are similar or dissimilar. Binary information of “×” is stored in a matrix form in an area corresponding to each word number, that is, an identification area. In the example shown, "Corner A" and "Corner B"
And are similar to each other, and other words are dissimilar.

【００２２】なお、単語間の類似関係は、必ずしも二値
情報に限定されず多値情報で表しても良い。図５は、単
語間の類似度を、低い順に連続する多値（離散値、アナ
ログ値のいずれであっても良い）Ｌ１〜Ｌ５で表したも
のである。この場合は、所定の閾値を設定して各類似度
との比較を行い、閾値よりも類似度が高ければ類似、低
ければ非類似とする。類似関係を二値情報で表すことの
利点は類似単語検索部１２１の構成が簡略になることで
あり、多値情報で表すことの利点は、類似度が段階的に
表現されるので認識単語辞書１６内の単語間の類似関係
をより細かく規定できることにある。多値情報で表した
場合に複数の認識単語が閾値を超えた場合は、類似度の
高い順に認識単語候補を特定すればよい。The similarity relation between words is not necessarily limited to binary information and may be represented by multivalued information. FIG. 5 shows the degree of similarity between words as continuous low-valued multivalues (either discrete values or analog values) L1 to L5. In this case, a predetermined threshold is set and compared with each similarity, and if the similarity is higher than the threshold, the similarity is determined, and if the similarity is lower than the threshold, the similarity is determined. The advantage of expressing the similarity relation with binary information is that the configuration of the similar word search unit 121 is simplified, and the advantage of expressing it with multivalued information is that the degree of similarity is expressed step by step, so that the recognition word dictionary It is to be able to define the similarity between the words in 16 more finely. When a plurality of recognized words exceed the threshold value when represented by multivalued information, the recognized word candidates may be specified in descending order of similarity.

【００２３】図６は、単語絞込テーブル１２３の一例で
あり、図３に示した認識単語辞書１２１に格納されてい
る全ての単語の単語番号と対応する領域、すなわち識別
領域に、当該認識単語の誤認識歴の有無を表す二値情報
（誤認識歴有：○、誤認識歴無：×）を格納したもので
ある。初期値は全ての領域に“○”が格納され、誤認識
の度に“×”に更新されて次回の認識単語候補からはず
される。なお、この単語絞込テーブル１２３は、使用す
る認識単語辞書１２１が変わる度に、新しい認識単語辞
書内の単語数と同数の識別領域を有するものが作成され
る。FIG. 6 shows an example of the word narrowing table 123. The recognition word is stored in an area corresponding to the word numbers of all the words stored in the recognition word dictionary 121 shown in FIG. Binary information indicating the presence / absence of the misrecognition history of (No misrecognition history: ○, No misrecognition history: ×) is stored. As the initial value, “◯” is stored in all areas, and is updated to “×” at each misrecognition to be removed from the next recognition word candidate. It should be noted that this word narrowing-down table 123 is created each time the recognition word dictionary 121 to be used is changed, and has the same number of identification areas as the number of words in the new recognition word dictionary.

【００２４】次に、本実施形態の音声サービスシステム
及び対話制御装置１の動作を図７〜図１１を参照して説
明する。ここでは、便宜上、従来例と同様、会議室の予
約という話題の例を挙げる。Next, the operation of the voice service system and the dialogue control apparatus 1 of this embodiment will be described with reference to FIGS. Here, for the sake of convenience, an example of the topic of reservation of a conference room will be described as in the conventional example.

【００２５】図７は類似単語検索部１２４及び絞込単語
検索部１２５を併用する場合の対話制御装置１の制御フ
ローの説明図であり、図９は、この場合の音声サービス
システム全体とユーザとの間で実際になされる対話の様
子を示す図である。前提として、単語絞込テーブル１２
３の識別領域の値は全て所期状態（全て“○”）である
ものとする。FIG. 7 is an explanatory view of a control flow of the dialogue control device 1 when the similar word search unit 124 and the narrowed word search unit 125 are used together, and FIG. 9 shows the entire voice service system and the user in this case. It is a figure which shows the mode of the dialog actually performed between. As a premise, the word narrowing table 12
It is assumed that all the values in the identification area 3 are in the desired state (all are “◯”).

【００２６】この例では、まず、話題決定部１１が「会
議室名」を決定する（Ｓ１０１）。これにより音声出力
文生成部１３は、所期音声出力文（「会議室名をどう
ぞ」）を生成する（Ｓ１０２）。この所期音声出力文に
対応する合成音声を聞いたユーザが「コーナーＡです」
と音声入力し、これが音声認識部１１で認識された場合
（Ｓ１０３：Yes）、認識単語推定部１２は、「会議室
名」に対応する認識辞書番号「３」の認識単語辞書１２
１を参照して認識単語を推定する（Ｓ１０４）。認識単
語「コーナーＢ」が推定されたと仮定すると、音声出力
文生成部１３は、その推定結果が正しいかをユーザに尋
ねるための音声出力文（「コーナーＢですか？」）を生
成する（Ｓ１０５）。これに対応する合成音声を聞いた
ユーザは、認識結果が正しくないので「いいえ」と答え
る。In this example, first, the topic determining section 11 determines the "meeting room name" (S101). As a result, the voice output sentence generation unit 13 generates a desired voice output sentence (“Please call the meeting room name”) (S102). The user who heard the synthesized voice corresponding to this expected voice output sentence is "Corner A".
When the voice recognition unit 11 recognizes this (S103: Yes), the recognition word estimation unit 12 recognizes the recognition word dictionary 12 with the recognition dictionary number “3” corresponding to the “meeting room name”.
The recognized word is estimated by referring to 1 (S104). Assuming that the recognition word "corner B" has been estimated, the voice output sentence generation unit 13 generates a voice output sentence ("Corner B?") For asking the user if the estimation result is correct (S105). ). The user who hears the synthesized voice corresponding to this answers "No" because the recognition result is incorrect.

【００２７】この「いいえ」の音声が認識された場合
（Ｓ１０６：Yes）、正誤判定部１５は上記認識単語が
誤りであると判定する（Ｓ１０７：No）。これを受けて
認識単語推定部１２は、誤認識された単語の単語番号を
認識単語辞書１２１から特定して単語絞込テーブル１２
３上の該当識別領域を“×”に更新する（Ｓ１０８）。
認識単語推定部１２は、また、候補単語数、すなわち単
語絞込テーブル１２３で“○”が格納されている識別領
域数の数を調べ（Ｓ１０９）、３以上のときは（Ｓ１０
９：Yes）、類似単語の検索処理を行う（Ｓ１１０）。
具体的には、まず類似単語テーブル１２２を参照して、
誤認識された単語（コーナーＢ：単語番号（２））に対
して類似する単語の単語番号を特定する。図４または図
５の例では単語番号（１）の「コーナーＡ」が特定され
る。次に、この単語番号（１）に対応する単語を認識単
語辞書１６から索出する。そして索出された単語（「コ
ーナーＡ」）を次の認識単語として推定し（Ｓ１１
１）、Ｓ１０５の処理に戻る。When this "no" voice is recognized (S106: Yes), the correctness determination unit 15 determines that the recognized word is incorrect (S107: No). In response to this, the recognized word estimation unit 12 specifies the word number of the erroneously recognized word from the recognized word dictionary 121 and determines the word narrowing table 12
The relevant identification area on 3 is updated to "x" (S108).
The recognized word estimation unit 12 also checks the number of candidate words, that is, the number of identification areas in which “◯” is stored in the word narrowing table 123 (S109), and when 3 or more (S10).
9: Yes), a similar word search process is performed (S110).
Specifically, first, referring to the similar word table 122,
The word number of a word similar to the misrecognized word (corner B: word number (2)) is specified. In the example of FIG. 4 or 5, “corner A” of word number (1) is specified. Next, the word corresponding to this word number (1) is searched from the recognition word dictionary 16. Then, the searched word (“corner A”) is estimated as the next recognized word (S11).
1), the process returns to S105.

【００２８】また、Ｓ１０９において候補単語数が２以
下であって（Ｓ１０９：No）且つ候補単語が存在するこ
とが確認された場合は（Ｓ１１２：Yes）、類似単語テ
ーブル１２２を参照することなく、単語番号の若い順か
ら認識単語辞書１２１内の該当単語を索出してこれを認
識単語として順次推定し（Ｓ１１３）、Ｓ１０５の処理
に戻る。If the number of candidate words is 2 or less in S109 (S109: No) and it is confirmed that there are candidate words (S112: Yes), the similar word table 122 is not referred to. The corresponding word in the recognized word dictionary 121 is searched from the ascending order of word numbers, and this is sequentially estimated as a recognized word (S113), and the process returns to S105.

【００２９】Ｓ１０７で認識結果が正解と判定された場
合（Ｓ１０７：Yes）、すなわちユーザが「はい」と答
えた場合、あるいはＳ１１２で候補単語が存在しないこ
とが確認された場合（Ｓ１１２：No）は、次の話題の有
無が判定され（Ｓ１１４）、話題がある場合はＳ１０１
の処理に戻り、話題が無い場合は対話制御を終える。If the recognition result is determined to be correct in S107 (S107: Yes), that is, if the user answers "yes", or if it is confirmed in S112 that no candidate word exists (S112: No). Determines whether there is a next topic (S114), and if there is a topic, S101
Return to the process of (3), and if there is no topic, the dialogue control ends.

【００３０】図８は、図７の制御フローにおいて、所期
音声出力文の生成に代えて、話題決定後、直ちに単語絞
込を行う場合の例を示すものである。この場合は、話題
が決定され、それに対応する認識単語辞書１２１が特定
されると（Ｓ２０１）、認識単語推定部１２が絞込テー
ブル１２３を参照して候補単語数を調べる（Ｓ２０２，
Ｓ２０３）。そして候補単語数が３以上であれば認識単
語辞書１２１から任意の単語（「コーナーＢ」）を索出
し、これを認識単語として推定する（Ｓ２０４）。以後
の処理Ｓ２０５〜Ｓ２１４は、図７のＳ１０５〜Ｓ１１
４と同様である。FIG. 8 shows an example in the control flow of FIG. 7, in which word narrowing is performed immediately after the topic is decided instead of generating the desired voice output sentence. In this case, when the topic is determined and the recognized word dictionary 121 corresponding thereto is specified (S201), the recognized word estimation unit 12 refers to the narrowing-down table 123 to check the number of candidate words (S202,
S203). Then, if the number of candidate words is 3 or more, an arbitrary word (“corner B”) is searched from the recognized word dictionary 121, and this is estimated as a recognized word (S204). Subsequent processes S205 to S214 are the same as S105 to S11 in FIG.
Same as 4.

【００３１】一方、Ｓ２０３において候補単語数が２以
下の場合（Ｓ２０３：No）は、Ｓ２１２の処理にジャン
プする。すなわち２つの候補単語数があれば単語番号の
若い順から１つずつ、候補単語が１つであればその単語
を認識単語として推定してＳ２０５の処理に戻り（Ｓ２
１３）、候補単語が存在しない場合は次の話題に移る。
図１０は、候補単語が２つの場合、図１１は候補単語が
１つの場合の音声サービスシステム全体とユーザとの間
で実際になされる対話の様子を示す図である。On the other hand, if the number of candidate words is 2 or less in S203 (S203: No), the process jumps to S212. In other words, if there are two candidate word numbers, one is selected from the ascending order of word numbers, and if there is one candidate word, the word is estimated as a recognized word and the process returns to S205 (S2
13) If there is no candidate word, move on to the next topic.
FIG. 10 is a diagram showing a state of dialogue actually performed between the entire voice service system and the user when there are two candidate words and when there is one candidate word.

【００３２】このように、類似単語テーブル１２２を用
いることにより、誤認識時の次の認識単語候補の特定が
迅速になり、また、同一話題に対する従来の対話例を示
した図１２との比較から明らかなように、ユーザが希望
会議室名を発した後にシステム側へ音声で答える回数が
減少し、しかもその音声は、認識率の高い「はい」また
は「いいえ」のみとなるので、対話が円滑化される。As described above, by using the similar word table 122, the next recognition word candidate at the time of erroneous recognition can be quickly identified, and comparison with FIG. 12 showing an example of a conventional dialogue on the same topic is made. Obviously, the number of times the user answers the system by voice after uttering the desired conference room name is reduced, and the voice is only "Yes" or "No" with a high recognition rate, so the dialogue is smooth. Be converted.

【００３３】また、一度誤認識された単語については単
語絞込テーブル１２３にその履歴を格納し、次の認識単
語の候補から外されるようにしたので、誤認識の繰り返
しが防止される。さらに、この単語絞込テーブル１２３
を用いた単語絞込処理をユーザへの音声入力に先だって
行うことにより、例えば、「コーナーＡですか」、「コ
ーナーＢですね」のように、ユーザが最初から「はい」
「いいえ」で答えるだけで正しい単語を認識することが
でき、ユーザの発声回数が更に減少する。これによりユ
ーザの負担軽減と対話の円滑化を同時に達成することが
できる。Further, since the history of the once erroneously recognized words is stored in the word narrowing table 123 so as to be excluded from the candidates of the next recognized word, repetition of erroneous recognition is prevented. Furthermore, this word narrowing table 123
By performing the word narrowing process using the word prior to the voice input to the user, the user can select "Yes" from the beginning, such as "Corner A?" Or "Corner B?"
The correct word can be recognized simply by answering "no", and the number of times the user speaks is further reduced. As a result, it is possible to simultaneously reduce the burden on the user and facilitate the dialogue.

【００３４】なお、本実施形態では、図４に示した類似
単語テーブル１２２及び図６に示した単語絞込テーブル
１２３に格納される二値情報として“○”と“×”とを
用いているが、二値のいずれか一方を区別できる情報で
あれば、例えば論理１と論理０のように、他の種類の情
報であって良いのは勿論である。In this embodiment, ".smallcircle." And "x" are used as the binary information stored in the similar word table 122 shown in FIG. 4 and the word narrowing table 123 shown in FIG. However, of course, if it is information that can discriminate either one of the two values, it may be another type of information such as logic 1 and logic 0.

【００３５】[0035]

【発明の効果】以上の説明から明らかなように、本発明
によれば、誤認識時に類似単語テーブルを用いて認識単
語辞書内の単語間の類似関係を参照するようにしたの
で、次候補の単語を特定する時間が類否計算による場合
に比べて格段に短縮される効果がある。また、次候補の
単語が迅速に特定できることから、その単語を推定した
ことの正誤を問うための合成音声を直ちにユーザに発す
ることが容易となり、ユーザからは「はい」、「いい
え」のみの音声入力のみとすることができる。As is apparent from the above description, according to the present invention, the similar relation between words in the recognized word dictionary is referred to by using the similar word table at the time of erroneous recognition. This has the effect of significantly shortening the time required to specify a word, as compared with the case of calculating similarity. In addition, since the next candidate word can be quickly identified, it becomes easy to immediately give the user a synthetic voice asking whether the word is estimated to be correct or incorrect, and the user can only give a "Yes" or "No" voice. It can be input only.

【００３６】また、誤認識があった単語については、単
語絞込テーブル内の識別領域に誤認識歴が格納されるの
で、再度の誤認識が回避される効果がある。さらに誤認
識歴のない単語が２つ以下まで絞り込まれた場合は、い
ずれか一方の単語を認識単語と推定してその正誤をユー
ザに問い合わせるようにすることで、ユーザの音声入力
の負担を軽減させることができる。Further, regarding a word for which misrecognition has occurred, the misrecognition history is stored in the identification area in the word narrowing-down table, so that there is an effect of avoiding misrecognition again. Further, when the number of words having no misrecognition history is narrowed down to two or less, one of the words is estimated to be a recognized word and the user is inquired of whether the word is correct, thereby reducing the user's voice input load. Can be made.

【００３７】このようにして、ユーザの発声回数を極力
減少させ、しかも各発声内容を「はい」または「いい
え」のような短い音声とすることで、対話の円滑化を図
ることができる対話制御装置が実現される。In this way, the dialogue control can facilitate the dialogue by reducing the number of times the user utters as much as possible and by making each utterance content a short voice such as "Yes" or "No". The device is realized.

[Brief description of the drawings]

【図１】本発明の対話制御装置が適用される音声サービ
スシステムのブロック構成図。FIG. 1 is a block configuration diagram of a voice service system to which a dialogue control device of the present invention is applied.

【図２】本発明の対話制御装置の一実施形態のブロック
構成図。FIG. 2 is a block configuration diagram of an embodiment of a dialogue control device according to the present invention.

【図３】認識単語辞書の内容例を示す説明図。FIG. 3 is an explanatory diagram showing an example of contents of a recognized word dictionary.

【図４】類似単語テーブルに二値情報を格納した場合の
一例を示す説明図。FIG. 4 is an explanatory diagram showing an example of a case where binary information is stored in a similar word table.

【図５】類似単語テーブルに多値情報を格納した場合の
一例を示す説明図。FIG. 5 is an explanatory diagram showing an example of a case where multi-valued information is stored in a similar word table.

【図６】単語絞込テーブルの一例を示す説明図。FIG. 6 is an explanatory diagram showing an example of a word narrowing table.

【図７】本実施形態による対話制御装置の制御フローを
示す図。FIG. 7 is a diagram showing a control flow of the dialogue control device according to the present embodiment.

【図８】本実施形態による対話制御装置の他の制御フロ
ーを示す図。FIG. 8 is a diagram showing another control flow of the dialogue control device according to the present embodiment.

【図９】図７の制御フローによる、音声サービスシステ
ム全体とユーザとの間で実際になされる対話の様子を示
す図。9 is a diagram showing a state of actual dialogue between the entire voice service system and a user according to the control flow of FIG. 7. FIG.

【図１０】図８の制御フローによる、音声サービスシス
テム全体とユーザとの間で実際になされる対話の様子を
示す図。FIG. 10 is a diagram showing a state of dialogue actually performed between the entire voice service system and a user according to the control flow of FIG. 8;

【図１１】図８の制御フローにおいて、候補単語が１つ
の場合の対話の様子を示す図。FIG. 11 is a diagram showing a state of dialogue when there is one candidate word in the control flow of FIG. 8.

【図１２】従来の対話制御装置による、音声サービスシ
ステム全体とユーザとの間で実際になされる対話の様子
を示す図。FIG. 12 is a diagram showing a state of dialogue actually performed between the entire voice service system and a user by a conventional dialogue control device.

【図１３】従来の対話制御装置が適用される音声サービ
スシステムのブロック構成図。FIG. 13 is a block configuration diagram of a voice service system to which a conventional dialogue control device is applied.

【図１４】従来の対話制御装置のブロック構成図。FIG. 14 is a block configuration diagram of a conventional dialogue control device.

【図１５】従来の対話制御装置の制御フローを示す図。FIG. 15 is a diagram showing a control flow of a conventional dialogue control device.

[Explanation of symbols]

１対話制御装置１１話題決定部１２認識単語推定部１２１認識単語辞書１２２類似単語テーブル１２３単語絞込テーブル１２４類似単語検索部（第１の単語検索部）１２５絞込単語検索部（第２の単語検索部）１３音声出力文生成部１４認識制御部１５正誤判定部２アプリケーション処理部３音声インタフェース３１音声認識部３２音声合成部 1 Dialogue Control Device 11 Topic Determining Unit 12 Recognized Word Estimation Unit 121 Recognized Word Dictionary 122 Similar Word Table 123 Word Narrowing Table 124 Similar Word Searching Unit (First Word Searching Unit) 125 Narrowing Word Searching Unit (Second Word) Search unit) 13 voice output sentence generation unit 14 recognition control unit 15 correctness determination unit 2 application processing unit 3 voice interface 31 voice recognition unit 32 voice synthesis unit

Claims

[Claims]

1. A recognition word estimating unit for estimating a recognition word corresponding to an input voice by referring to a recognition word dictionary storing a plurality of words, and a voice output sentence for asking whether the estimated recognition word is correct or incorrect. In the dialogue control device having a voice output sentence generation unit that performs, and a correctness determination unit that determines the correctness of the estimated recognition word based on the input voice, the recognition word estimation unit is stored in the recognition word dictionary. A similar word table storing similar or dissimilarity information between words in the identification area of each word, and referring to the similar word table when the estimated recognized word is incorrect, and similar to the recognized word And a first word search unit for identifying the identification region of the word, and estimating the word corresponding to the identified identification region as the next candidate recognition word.

2. The similar word table arranges the identification regions of the words included in the recognized word dictionary in a matrix form, and associates the matrix with binary information that defines one as similar and the other as dissimilar. The dialogue control device according to claim 1, wherein the dialogue control device is stored in an area.

3. The similar word table arranges the identification regions of words included in the recognized word dictionary in a matrix form, and associates the matrix with multivalued information of three or more values representing the similarity between the words. The dialogue according to claim 1, wherein the dialogue is stored in an area, and the first word search unit is configured to determine that words whose multi-valued information exceeds a predetermined threshold value are similar to each other. Control device.

4. A recognition word estimating unit for estimating a recognition word corresponding to an input voice by referring to a recognition word dictionary storing a plurality of words, and a voice output sentence for asking whether the estimated recognition word is correct or incorrect. In the dialogue control device having a voice output sentence generation unit that performs, and a correctness determination unit that determines the correctness of the estimated recognition word based on the input voice, the recognition word estimation unit is stored in the recognition word dictionary. A word narrowing-down table in which information indicating the presence or absence of a misrecognition history of the word is stored in the identification area of each word, and a word identification area having no misrecognition history is detected by referring to the word narrowing-down table. And a word search unit, and estimates any one of the words corresponding to the detected identification region as the next candidate recognition word.

5. The recognition word estimating unit estimates a word corresponding to one of the identification regions as a recognition word when the number of identification regions detected by the second word searching unit is two. The dialogue control device according to claim 4.