JP2004355226A

JP2004355226A - Inter-different language dialog equipment and method

Info

Publication number: JP2004355226A
Application number: JP2003150816A
Authority: JP
Inventors: Naoki Asanoma; 直樹麻野間; Kura Furuse; 蔵古瀬; Yamato Takahashi; 大和高橋; Akira Kataoka; 明片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-28
Filing date: 2003-05-28
Publication date: 2004-12-16

Abstract

<P>PROBLEM TO BE SOLVED: To attain inter-different language intention understanding in a practical level by using a technology such as voice recognition or machine translation in a current technology level. <P>SOLUTION: This inter-different language dialog equipment is provided with a dialog processing means to execute any processing of first processing to decide response contents to a first interlocutor by referring to preliminarily stored dialog scenario, and to transmit the response contents through a first terminal part in first language by using a stylized dialog sentence designated for every response contents, second processing to transmit a sentence where a conversion result obtained by converting an utterance understanding result being the result of utterance understanding into second language different from the first language is combined with the stylized dialog sentence through a second terminal part different from the first terminal part to a second interlocutor different from the first interlocutor and third processing to end the dialog. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の端末部のそれぞれにおいて、互いに異なる言語を使用する対話者の間の意思疎通を可能にする異言語間対話装置および異言語間対話方法に関する。
【０００２】
【従来の技術】
従来の異言語間対話装置は、一方の話者が所定の言語で発話した全ての内容を、音声認識した後に、その音声認識結果の全てを、機械翻訳によって別の言語に変換し、さらに、音声合成装置等によって音声に変換し、伝える装置である（たとえば、非特許文献１参照）
上記非特許文献１には、異言語間対話システムである自動翻訳電話の技術について、詳しく記載されている。
【０００３】
【非特許文献１】
国際電気通信基礎技術研究所編「自動翻訳電話」オーム社、ＡＴＲ先端テクノロジーシリーズ、１９９４年１月
【０００４】
【発明が解決しようとする課題】
しかし、従来の異言語間対話装置では、音声認識の性能が充分でないので、入力した発話内容とは異なる音声認識結果を得ることが多い。また、機械翻訳の性能が充分でないので、発話内容とは異なる別の言語へ翻訳した結果になることも多い。
【０００５】
このような状況であるので、話者が発話した内容を、正確に別の言語へ変換することができない場合が多く、従来の技術によって、異なる言語による意思疎通や、異なる言語への発話の変換を、実用レベルで実現することは困難である。
【０００６】
本発明は、現状の技術レベルの音声認識や機械翻訳等の技術を使って、異なる言語で意思疎通する場合、実用レベルで意思疎通することができる異言語間対話装置および異言語間対話方法を提供することを目的とするものである。
【０００７】
また、本発明は、現状の技術レベルの音声認識や機械翻訳等の技術を使って、異なる言語へ機械翻訳する場合、実用レベルで使用することができる機械翻訳装置および機械翻訳方法を提供することを目的とするものである。
【０００８】
【課題を解決するための手段】
本発明は、予め記憶されている対話シナリオを参照することによって、第１の対話者への応答内容を決定し、応答内容毎に指定されている定型対話文を使用し、第１の端末部を介して、上記応答内容を、第１の言語によって伝達する第１の処理、上記発話理解の結果である発話理解結果を上記第１の言語とは異なる第２の言語に変換した変換結果と、定型対話文とを組み合わせた文を、上記第１の端末部とは異なる第２の端末部を介して、上記第１の対話者とは異なる第２の対話者に伝達する第２の処理、対話を終了する第３の処理のうちのいずれかの処理を実行する対話処理手段を有する異言語間対話装置である。
【０００９】
【発明の実施の形態および実施例】
図１は、本発明の第１の実施例である異言語間対話装置１００を示すブロック図である。
【００１０】
異言語間対話装置１００は、端末部１０と、端末部２０と、対話制御部３０と、対話シナリオ記憶部４０と、定型対話文データ記憶部５０と、構文パタン記憶部６０とによって構成されている。
【００１１】
端末部１０は、音声認識部１１と結合され、端末部２０は、音声認識部２１と結合されている。
【００１２】
対話制御部３０は、対話シナリオ記憶部４０と定型対話文データ記憶部５０と構文パタン記憶部６０とを参照可能であり、機械翻訳部３１と結合されている。また、対話制御部３０は、対話シナリオ記憶部４０を参照し、対話端末部１０を介して、端末部１０に関連する対話者Ａに発話を求め、または、端末部２０を介して、端末部２０に関連する対話者Ｂに発話を求める。
【００１３】
発話を求められた対話者は、端末部１０を介して、発話を入力し、端末部１０と結合した音声認識部２１を使い、その発話に対する音声認識結果を得る。
【００１４】
対話制御部３０は、上記音声認識結果に基づいて、構文パタン記憶部６０を参照することによって、発話を理解する。そして、対話シナリオ記憶部４０を参照することによって、上記発話理解結果に対応する「次の処理」を決定し、実行する。
【００１５】
対話制御部３０における上記「次の処理」の選択肢は、第１の処理、第２の処理、第３の処理のうちのいずれかの処理である。
【００１６】
上記「第１の処理」は、入力した対話者Ａ側の端末部１０を介して、対話者Ａ側の端末部１０で扱う言語で音声合成し、予め記憶されている対話シナリオを参照することによって、第１の対話者Ａへの応答内容を決定し、応答内容毎に指定されている定型対話文（定型対話文データ記憶部５０に記載されている）を使用し、第１の端末部１０を介して、上記応答内容を、第１の言語によって伝達し、対話者Ａに次の発話を求める処理である。
【００１７】
上記「第２の処理」は、上記発話理解の結果である発話理解結果を上記第１の言語とは異なる第２の言語に変換した変換結果と、定型対話文とを組み合わせた文を、上記第１の端末部とは異なる第２の端末部２０を介して、上記第１の対話者Ａとは異なる第２の対話者Ｂに伝達する処理である。
【００１８】
上記「第３の処理」は、対話を終了する処理である。
【００１９】
対話制御部３０において、次の処理が対話の終了であると決定されるまで、発話を求められた対話者Ａが、その対話者Ａ側の端末部１０を介して、発話を入力し、対話を続ける。
【００２０】
端末部１０での発話入力において、音声認識だけではなく、対話者が文字認識装置で入力し、システムが文字認識する方法を併用してもよく、また、対話者がキーボード入力した文字列を端末部１０がそのまま受理する方法を併用するようにしてもよい。
【００２１】
端末部１０での、定型対話文を使った発話内容の伝達において、音声合成装置で音声出力する方法のほかに、画面表示による方法、または音声合成装置での音声出力と画面表示とを併用する方法を採用するようにしてもよい。
【００２２】
なお、構文パタン記憶部６０は、従来例における構文パタン記憶部と基本的には同じであるが、オプション的に「疑問詞」を指定する点が、従来の構文パタン記憶部とは異なる。
【００２３】
図２は、上記実施例における構文パタン記憶部６０が記憶している構文パタンの例を示す図である。
【００２４】
構文パタン記憶部６０は、キーフレーズ、必須要素の項目を持つ表形式で構成され、入力発話にキーフレーズが含まれていた場合に、必要な語句として、どのような要素があるかを必須要素に示している。
【００２５】
上記「キーフレーズ」は、キーワードに対応するフレーズであり、名詞または名詞句で構成されている。上記「必須要素」は、所定の文において、主語、目的語等、上記所定の文が完全な文を構成するに必要な要素である。
【００２６】
たとえば、図２に示す構文パタンの例において、「つないでほしい」というキーフレーズ（語句）が発話に含まれていたときに、上記キーフレーズに対して、「へ」か「と」が後ろに続く語句と、「を」が後ろに続く語句とが、必須要素として必要であることを示している。
【００２７】
また、図２に示す例において、必須要素の欄の括弧内には、必須要素に対応する疑問詞が記述され、たとえば、「Ｎ１へ」に対応する疑問詞は、「どちら」であることを示している。なお、上記「Ｎ」は、名詞または名詞句である。
【００２８】
構文パタン記憶部６０には、「名詞＋へ」というように、必須要素の品詞を規定することによって、または、「人物」の意味を持つ“Ｎ１”というように、外部の辞書等から語句の意味を規定することによって、他の必須要素を規定する条件を記述するようにしてもよい。
【００２９】
また、必須要素に対応する疑問詞は、本実施例のように、予め記述するようにしてもよく、また、必須要素の意味制約等に基づいて、自動的に選ぶようにしてもよい。
【００３０】
さらに、構文パタン記憶部６０は、想定する対話の内容に応じて予め用意することによって、構文パタンを提供するようにしてもよく、また、機械翻訳システムにおいて利用される辞書やルールそのものを流用することによって、または、それを修正したものを使用することによって、構文パタンを提供するようにしてもよい。
【００３１】
図３は、上記実施例において、対話シナリオ記憶部４０が記憶している対話シナリオの例を示す図である。
【００３２】
対話シナリオ記憶部４０は、対話シナリオを記憶し、上記「対話シナリオ」は、対話状態、発話理解結果、応答内容、応答先、次の対話状態の項目を持つ表形式のシナリオで構成されている。つまり、対話シナリオは、図３に示す対話状態ＳＴＡＴＥ０から始まり、対話状態ＳＴＡＴＥ９に進むまで、対話が継続する間の対話状態の推移を、それぞれの対話状態で得られた端末部の発話理解結果のそれぞれに対して、端末部が、どの対話者を応答先とし、どのような応答内容で応答し、その結果、どの次対話状態に遷移するかについて、規定してあるシナリオである。
【００３３】
図４は、上記実施例において、定型対話文データ記憶部５０が記憶している定型対話文データの例を示す図であり、上記定型方対話文データ中の応答内容と、言語毎に、対応する定型対話文とが用意されている定型対話文データの例を示す図である。
【００３４】
応答内容に基づいて、日本語や英語等の対話文を具体的に生成する場合、図３に示すような対話シナリオが指定する応答内容に対応する定型対話文を、図４に示す定型文対話データから求め、求めた定型対話文を使って対話文を生成する。
【００３５】
図４に示す定型対話文データの例における「日本語定型対話文」において、＜発話理解内容＞には、その処理までに得られた発話理解内容を埋め込み、また、＜不足する必須要素を疑問詞に置換した発話理解内容＞には、それまで得られている発話理解内容と、キーフレーズに対して不足している必須要素を問う表現とを組み合わせた文を埋め込むことによって、定型対話文を生成する。
【００３６】
なお、ここで、定型対話文データを参照する代わりに、自然言語文生成の従来技術を利用し、応答内容に基づいて、日本語、英語等の対話文を生成するようにしてもよい。
【００３７】
以下では、異言語間対話装置１００において、対話者Ａは、端末部１０を介して、日本語で対話し、対話者Ｂは、端末部２０を介して、英語で対話内容を受ける場合について説明する。
【００３８】
次に、上記実施例において、図２に示す構文パタンの例と、図３に示す対話シナリオの例と、図４に示す定型対話文データの例とに従って、対話者Ａが、端末部１０を介して、日本語で依頼し、対話者Ｂが、端末部２０を介して、英語で依頼内容を受ける異言語間の対話の動作について説明する。
【００３９】
図５は、上記実施例において、図２に示す構文パタンの例と、図３に示す対話シナリオの例と、図４に示す定型対話文データの例とに従って、対話者Ａが、端末部１０を介して、日本語で依頼し、対話者Ｂが、端末部２０を介して、英語で依頼内容を受ける異言語間の対話の例を示す図である。
【００４０】
図６は、本実施例における全体的処理を示すフローチャートである。
【００４１】
対話は、初期状態である対話状態ＳＴＡＴＥ０（図３参照）から開始され、対話制御部３０は、応答先に指定された第１端末部である端末部１０に対して、図３に示す応答内容として、発話要求を出力する。発話要求という応答内容から、図４に示す日本語定型対話文データを参照し、「相手へお伝えする内容をおっしゃってください」という日本語文で、端末部１０に応答し、図３に示す「次対話状態」の指定によって、対話状態ＳＴＡＴＥ１へ遷移する（Ｓ１）。
【００４２】
図５に示す対話例では、「相手へお伝えする内容をおっしゃってください」という応答に対して、対話者Ａが、「ちょっと電話をつないでほしいんですけど」と発話し、システムの音声認識結果も同じ文で得られたとする（Ｓ３）。
【００４３】
この発話に対して、図２に示す構文パタンの例から、キーフレーズを検索する。この発話では、図５に下線が付されている「つないでほしい」が、キーフレーズとして検索され、必須要素は「Ｎ１（？）へ／と」、「Ｎ２を」であることが得られる。
【００４４】
さらに、音声認識結果の文中に、「Ｎ２を」に対応する「電話を」（図５の二重線）があり、必須要素「Ｎ１へ／と」に対応する語句が存在していない（誰に電話を掛けるかが明示されていない）ことがわかる。これによって、図３を参照し、発話理解結果は、「キーフレーズがあるが、キーフレーズに対する必須要素が不足している」である（Ｓ４）。
【００４５】
なお、図５に示す例において、たとえば、対話者Ａが話した「ちょっと電話をつないでほしいんですけど」を構成する語句のそれぞれを、所定のキーフレーズ構文パタンに登録されている語句と比較し、一致するものがあれば、その一致した語句をキーフレーズとして認識する。このようにすることによって、「つないでほしい」等が、キーフレーズとして認識される。
【００４６】
これに対して、図３で指定される応答内容「不足する必須要素の要求」の日本語定型応答文を、図４に示す定型対話文データの例を参照し、システムの応答文を生成する。「不足する必須要素の要求」の日本語定型対話文は、「＜不足する必須要素を疑問詞に置換した発話理解内容＞ですか？」であり、＜不足する必須要素を疑問詞に置換した発話理解内容＞の部分には、それまでの発話理解内容から生成された文が埋め込まれる。
【００４７】
ここで、上記「埋め込まれる文」は、図４に示す定型対話文データから、不足する必須要素「Ｎ１へ／と」に対する疑問詞「どちらへ」、残りの要素「電話を」、キーフレーズ「つないでほしい」、の３つを連結させたものである。
【００４８】
すなわち、ここでの応答文は、「どちらへ電話をつないでほしいのですか？」であり、この文を、端末部１０を通じて、対話者Ａに応答する。さらに、図３に従って、対話状態ＳＴＡＴＥ１から、次対話状態ＳＴＡＴＥ２に遷移する（Ｓ１）。
【００４９】
次に、図５に示すように、対話者Ａは、「２０４号室へお願いします」と答え、音声認識結果は、「２０５号室」と誤って認識したとする（Ｓ３）。音声認識結果に、前状態で不足していた必須要素「Ｎ１へ／と」に対応する「２０５号室へ」（図５の二重線）が含まれる。これによって、図３から、発話理解結果は「以前の発話の必須要素がある」になる（Ｓ４）。
【００５０】
これに対する応答内容「発話内容の確認」の日本語定型応答文を、図４に示す定型対話文データの例を参照し、その日本語定型応答文「＜発話理解内容＞、とお伝えすればよいですか？」から、システムの応答文を生成する。それまでの発話理解内容によって生成される文が、＜発話理解内容＞の部分に埋め込まれる。
【００５１】
ここで、この埋め込まれる文は、必須要素「２０５号室へ」、「電話を」と、キーフレーズ「つないでほしい」とを連結させた文「２０５号室へ電話をつないでほしい」である。
【００５２】
したがって、ここでの応答文は、「２０５号室へ電話をつないでほしい、とお伝えすればよいですか？」になり、この文を、端末部１０を通じて、対話者Ａに応答する。さらに、図３に従い、対話状態ＳＴＡＴＥ２から次対話状態ＳＴＡＴＥ３に遷移する（Ｓ１）。
【００５３】
次に、図５に示すように、対話者Ａは、「２０４号室へですよ」と答え、今回は音声認識結果も誤りなく得られたと認識する（Ｓ３）。この音声認識結果は、既に足りている必須要素「Ｎ１へ／と」と対応する「２０４号室へ」（図５の二重線）が含まれている。
【００５４】
これによって、図３に示す対話シナリオの例に基づいて、発話理解結果として、「理解内容の修正」と判断し（Ｓ４）、必須要素「Ｎ１へ／と」に対しては、「２０４号室へ」に更新される。これに対する応答内容「発話内容の確認」の日本語定型応答文を、図４に示す定型対話文データの例を参照し、その日本語定型応答文「＜発話理解内容＞、とお伝えすればよいですか？」から、システムの応答文を生成する。
【００５５】
それまでの発話理解内容によって生成された文が、＜発話理解内容＞の部分に埋め込まれる。ここで、この埋め込まれる文は、必須要素「２０４号室へ」、「電話を」と、キーフレーズ「つないでほしい」とを連結させた文「２０４号室へ電話をつないでほしい」にする。すなわち、ここでの応答文は、「２０４号室へ電話をつないでほしい、とお伝えすればよいですか？」であり、この文を、端末部１０を通じて、対話者Ａに応答する。さらに、図３に従って、対話状態ＳＴＡＴＥ３から、次対話状態ＳＴＡＴＥ３（修正）に自己遷移する（Ｓ１）。
【００５６】
対話者Ａの次の発話「そうです」（Ｓ３）を、対話制御部３０が、「肯定」であると発話理解し（Ｓ４）、図３の応答内容と応答先の指定に従って、第２端末部である対話者Ｂの端末部２０に、「発話内容を伝達」、さらに、対話者Ａの端末部１０に、「発話要求」を応答する（Ｓ１）。
【００５７】
端末部２０側の言語は、英語であり、応答内容「発話内容を伝達」の英語定型応答文は、図４から「Ｈｅ／Ｓｈｅｓａｙｓ，“＜発話理解内容＞”」となる。この応答文中の＜発話理解内容＞の部分には、上記「２０４号室へ電話をつないでほしい」の英語に変換したものが埋め込まれる。この埋め込み文は、機械翻訳部３１で日本語から英語へ変換した結果「ＩｗｏｕｌｄｌｉｋｅｙｏｕｔｏｐｕｔｍｅｔｈｒｏｕｇｈｔｏＲｏｏｍ２０４」で得られ、これを埋め込んだ応答文「Ｈｅ／Ｓｈｅｓａｙｓ，“ＩｗｏｕｌｄｌｉｋｅｙｏｕｔｏｐｕｔｍｅｔｈｒｏｕｇｈｔｏＲｏｏｍ２０４”」を、対話者Ｂの端末部２０に伝達する。機械翻訳部３１で使用する機械翻訳として、従来の機械翻訳技術を使用するようにしてもよい。
【００５８】
また、応答内容「発話要求」に対応する「相手へお伝えする内容をおっしゃってください」という日本語定型対話文で、対話者Ａの端末部１０に応答し、次の発話を促す。その後、図３の次対話状態の指定によって、ＳＴＡＴＥ１へ対話状態が遷移する（Ｓ１）。
【００５９】
図５に示すように、対話者Ａの次の発話「結構です、終わってください」の音声認識結果から（Ｓ３）、システムは、図３から「終了」と発話理解し（Ｓ４）、端末部１０を介して、「了承」に対する定型対話文「かしこまりました」を、対話者Ａに伝える（Ｓ１）。ここで、次対話状態が、ＳＴＡＴＥ９、すなわち終了状態である（Ｓ２）ので、対話は終了する。
【００６０】
上記方法によって、音声認識誤りを対話的に修正し、機械翻訳の入力文として必要となる単語を、対話者から得ることによって、対話システム内で処理する発話内容がより正確になり、結果的に、より円滑な異言語間の意思疎通を実現することができる。
【００６１】
なお、図５、図６に示す例は、本実施例以外でも、一般企業等における電話取次ぎの場合に適用することができる。
【００６２】
本発明の第２の実施例は、上記実施例の対話システムを、双方向に組み合わせて使う実施例である。
【００６３】
つまり、上記実施例である対話システムを、対話者Ｂが、端末部２０を介して、対話者Ｂに発話を求め英語で返答した対話内容を、対話者Ａが、端末部１０を介して、英語で受ける場合に適用することができる。上記のように、異言語間対話装置１００を、双方向に同時に用いることによって、言語の異なる２対話者間で、双方向の意思疎通が可能になる。
【００６４】
図７は、本発明の第３の実施例である機械翻訳装置２００を示すブロック図である。
【００６５】
機械翻訳装置２００は、異言語間対話装置１００から、端末部２０と音声認識部２１とを削除したものである。
【００６６】
すなわち、機械翻訳装置２００は、対話制御部３０が、端末部１０を介して、対話者Ａに発話を求め、第１言語で返答した対話内容を、端末部１０を介して、英語で受け取る場合である。機械翻訳装置２００は、異言語間対話装置１００において、図３に示す対話シナリオの応答先を全て同じ「第１端末部１０」とするだけで足りる。
【００６７】
つまり、機械翻訳装置２００は、第１の言語から第２の言語へ発話内容を変換する機械翻訳装置において、上記第１の言語によって端末部を介して、第１の対話者が発話した文を音声認識し、発話文の構成要素のパタンが記載されしかも予め記憶されている構文パタンと照合することによって、上記発話文の内容を理解する発話理解手段と、予め記憶した対話シナリオを参照することによって、上記第１の対話者に対する応答内容を決定し、応答内容毎に指定されている定型対話文を使用し、上記端末部を介して、上記応答内容を第１の言語によって伝達する第１の処理、上記発話理解結果を上記第１の言語とは異なる上記第２の言語に変換した変換結果を、機械翻訳による変換結果と定型対話文とを組み合わせることによって、上記端末部を介して、上記第１の対話者に伝達する第２の処理、対話を終了する第３の処理のうちのいずれかを実行する対話処理手段とを有する機械翻訳装置の例である。
【００６８】
なお、上記実施例は、１度の発話に、キーフレーズが１つだけ含まれる例であるが、複数のキーフレーズが含まれる例の場合、それぞれのキーフレーズに対して、必須要素の要求処理が繰り返されるように、対話シナリオ記憶部４０の対話シナリオを記述すればよい。
【００６９】
さらに、上記実施例は、上記実施例に限定されることなく、言語の種類、文字コード、対話シナリオ、構文パタン、定型対話文等を、変更、応用するようにしてもよい。
【００７０】
【発明の効果】
請求項１、２記載の発明によれば、異言語間の対話をコンピュータが補助する際に、対話者の発話に対して、音声認識誤りを対話的に修正することができ、また、機械翻訳の入力文として必要となる単語を対話的に得ることができ、発話理解結果を確認しながら、対話を制御することによって、現状の技術レベルの音声認識や機械翻訳を利用した場合でも、円滑な意思疎通を実現することができるという効果を奏する。
【００７１】
請求項３、４記載の発明によれば、現状の技術レベルの音声認識や機械翻訳等の技術を使って、異なる言語へ機械翻訳する場合、実用レベルで使用することができるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の第１の実施例である異言語間対話装置１００を示す図である。
【図２】上記実施例における構文パタン記憶部６０が記憶している構文パタンの例を示す図である。
【図３】上記実施例において、対話シナリオ記憶部４０が記憶している対話シナリオの例を示す図である。
【図４】上記実施例において、定型対話文データ記憶部５０が記憶している定型対話文データの例を示す図であり、定型方対話文データ中の応答内容と、言語毎に、対応する定型対話文とが用意されている定型対話文データの例を示す図である。
【図５】上記実施例において、図２に示す構文パタンの例と、図３に示す対話シナリオの例と、図４に示す定型対話文データの例とに従って、対話者Ａが、端末部１０を介して、日本語で依頼し、対話者Ｂが、端末部２０を介して、英語で依頼内容を受ける異言語間の対話の例を示す図である。
【図６】本実施例における全体的処理を示すフローチャートである。
【図７】本発明の第３の実施例である機械翻訳装置２００を示すブロック図である。
【符号の説明】
１００…異言語間対話装置、
１０、２０…端末部、
１１、２１…音声認識部、
３０…対話制御部、
４０…対話シナリオ記憶部、
５０…定型対話文データ記憶部、
６０…構文パタン記憶部、
２００…機械翻訳装置。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a cross-language dialogue apparatus and a cross-language cross-talking method that enables communication between interlocutors using different languages in each of a plurality of terminal units.
[0002]
[Prior art]
The conventional interlingual dialogue apparatus, after performing speech recognition of all the content uttered by one speaker in a predetermined language, converts all of the speech recognition results into another language by machine translation, This is a device that converts the speech into a speech by a speech synthesis device or the like and transmits it (for example, see Non-Patent Document 1)
Non-Patent Document 1 describes in detail the technology of an automatic translation telephone which is a cross-language dialogue system.
[0003]
[Non-patent document 1]
"International Telecommunications Research Institute", "Automatic Translation Telephone" Ohmsha, ATR Advanced Technology Series, January 1994 [0004]
[Problems to be solved by the invention]
However, in the conventional interlingual dialogue apparatus, since the performance of speech recognition is not sufficient, a speech recognition result different from the input utterance content is often obtained. In addition, since the performance of machine translation is not sufficient, the result often translates into another language different from the utterance content.
[0005]
Under such circumstances, it is often not possible to accurately convert the content uttered by the speaker into another language, and the conventional technology allows communication in different languages and conversion of utterances into different languages. Is difficult to achieve on a practical level.
[0006]
The present invention provides a cross-language dialogue apparatus and a cross-language dialogue method that can communicate at a practical level when communicating in different languages using the current technology-level technologies such as speech recognition and machine translation. It is intended to provide.
[0007]
In addition, the present invention provides a machine translation apparatus and a machine translation method that can be used at a practical level when machine translation into a different language is performed by using current technology-level techniques such as speech recognition and machine translation. The purpose is.
[0008]
[Means for Solving the Problems]
The present invention determines a response content to a first interlocutor by referring to a dialog scenario stored in advance, uses a fixed dialogue sentence specified for each response content, and uses a first terminal unit. A first process of transmitting the response content in a first language via a first language, and a conversion result obtained by converting an utterance understanding result, which is a result of the utterance understanding, into a second language different from the first language. And a second process of transmitting a sentence combining the fixed dialogue sentence to a second interlocutor different from the first interlocutor via a second terminal different from the first terminal. , A cross-language dialogue device having a dialogue processing means for executing any of the third processes for ending the dialogue.
[0009]
Embodiments and Examples of the Invention
FIG. 1 is a block diagram showing a cross-language dialogue apparatus 100 according to a first embodiment of the present invention.
[0010]
The interlingual dialogue device 100 includes a terminal unit 10, a terminal unit 20, a dialogue control unit 30, a dialogue scenario storage unit 40, a fixed form dialogue data storage unit 50, and a syntax pattern storage unit 60. I have.
[0011]
The terminal unit 10 is connected to a voice recognition unit 11, and the terminal unit 20 is connected to a voice recognition unit 21.
[0012]
The dialog control unit 30 can refer to the dialog scenario storage unit 40, the fixed-form dialogue data storage unit 50, and the syntax pattern storage unit 60, and is connected to the machine translation unit 31. Further, the dialogue control unit 30 refers to the dialogue scenario storage unit 40, requests the utterance of the interlocutor A associated with the terminal unit 10 via the dialogue terminal unit 10, or obtains the utterance from the terminal unit 20 via the terminal unit 20. Ask the interlocutor B related to 20 to speak.
[0013]
The interlocutor whose utterance is requested inputs the utterance via the terminal unit 10 and obtains a speech recognition result for the utterance using the speech recognition unit 21 coupled to the terminal unit 10.
[0014]
The dialogue control unit 30 understands the utterance by referring to the syntax pattern storage unit 60 based on the result of the speech recognition. Then, by referring to the dialogue scenario storage unit 40, the “next processing” corresponding to the utterance understanding result is determined and executed.
[0015]
The option of the “next processing” in the dialog control unit 30 is any one of the first processing, the second processing, and the third processing.
[0016]
In the “first processing”, voice synthesis is performed in the language handled by the terminal unit 10 of the interlocutor A via the terminal unit 10 of the interlocutor A, and the pre-stored dialog scenario is referred to. Determines the contents of a response to the first interlocutor A, and uses a fixed dialogue sentence (described in the fixed-form dialogue data storage unit 50) specified for each response content, and uses the first terminal unit. This is a process of transmitting the response content in a first language through the user 10 and asking the interlocutor A for the next utterance.
[0017]
The “second process” is to convert a sentence obtained by combining a conversion result obtained by converting an utterance understanding result, which is a result of the utterance understanding, into a second language different from the first language, and a fixed dialogue sentence. This is a process of transmitting to a second interlocutor B different from the first interlocutor A via a second terminal unit 20 different from the first terminal unit.
[0018]
The “third process” is a process for ending the conversation.
[0019]
Until the dialog control unit 30 determines that the next process is the end of the dialog, the dialogue person A who has been asked to speak inputs the speech through the terminal unit 10 of the dialogue person A side, and Continue.
[0020]
In the utterance input at the terminal unit 10, not only voice recognition, but also a method in which an interlocutor inputs characters using a character recognition device and the system recognizes characters may be used together. The method in which the unit 10 accepts the message as it is may be used together.
[0021]
In the transmission of the utterance content using the fixed dialogue sentence in the terminal unit 10, in addition to the method of outputting the voice by the voice synthesizer, the method of screen display or the voice output and the screen display by the voice synthesizer are used in combination. A method may be adopted.
[0022]
The syntax pattern storage unit 60 is basically the same as the syntax pattern storage unit in the conventional example, but differs from the conventional syntax pattern storage unit in that a “question word” is optionally specified.
[0023]
FIG. 2 is a diagram illustrating an example of the syntax pattern stored in the syntax pattern storage unit 60 in the above embodiment.
[0024]
The syntax pattern storage unit 60 is configured in a table format having key phrase and essential element items. When an input utterance includes a key phrase, a required element is described as a required word. Is shown in
[0025]
The “key phrase” is a phrase corresponding to a keyword, and is composed of a noun or a noun phrase. The “essential element” is an element such as a subject and an object that is necessary for the predetermined sentence to constitute a complete sentence in the predetermined sentence.
[0026]
For example, in the example of the syntax pattern shown in FIG. 2, when a key phrase (phrase) of "I want you to connect" is included in the utterance, "he" or "to" is added after the key phrase. This indicates that the following phrase and the phrase following "" are required as essential elements.
[0027]
Further, in the example shown in FIG. 2, the question words corresponding to the essential elements are described in parentheses in the column of the essential elements. For example, the question word corresponding to “to N1” is “Which”. Is shown. Note that “N” is a noun or a noun phrase.
[0028]
The syntax pattern storage unit 60 stores the words and phrases of the phrase from an external dictionary or the like by defining the part of speech of an essential element such as “noun + to” or “N1” having the meaning of “person”. By defining the meaning, conditions for defining other essential elements may be described.
[0029]
The question words corresponding to the essential elements may be described in advance as in the present embodiment, or may be automatically selected based on the semantic restrictions of the essential elements.
[0030]
Further, the syntax pattern storage unit 60 may provide a syntax pattern by preparing in advance according to the content of the dialogue to be assumed, and may use a dictionary or a rule used in the machine translation system. The syntactic pattern may be provided by itself or by using a modification thereof.
[0031]
FIG. 3 is a diagram illustrating an example of the dialog scenario stored in the dialog scenario storage unit 40 in the above embodiment.
[0032]
The dialogue scenario storage unit 40 stores a dialogue scenario, and the “dialogistic scenario” is composed of a tabular scenario having items of a dialogue state, an utterance understanding result, a response content, a response destination, and a next dialogue state. . In other words, the dialogue scenario starts from the dialogue state STATE0 shown in FIG. 3 and progresses to the dialogue state STATE9 until the dialogue state continues. This is a scenario in which the terminal unit specifies which interlocutor is the response destination, responds with what kind of response content, and as a result, transitions to the next conversation state.
[0033]
FIG. 4 is a diagram illustrating an example of the fixed form dialogue data stored in the fixed form dialogue data storage unit 50 in the above-described embodiment. FIG. 7 is a diagram illustrating an example of fixed form dialogue data in which a fixed form dialogue sentence is prepared.
[0034]
When a dialogue sentence such as Japanese or English is specifically generated based on the response content, a fixed dialogue corresponding to the response content specified by the dialogue scenario shown in FIG. It obtains from the data and generates a dialogue sentence using the found fixed form dialogue.
[0035]
In the “Japanese fixed dialogue sentence” in the example of the fixed dialogue data shown in FIG. 4, in the <utterance understanding content>, the utterance understanding content obtained up to the processing is embedded, and the <missing essential elements are questioned. In the utterance comprehension contents replaced with the lyric>, a sentence that combines the utterance comprehension contents obtained so far and an expression that asks for the essential element missing from the key phrase is embedded, so that the fixed dialogue sentence is Generate.
[0036]
Here, instead of referring to the standard dialogue sentence data, a conventional technique of natural language sentence generation may be used to generate a dialogue sentence such as Japanese or English based on the response content.
[0037]
In the following, in the interlingual dialogue apparatus 100, a case will be described in which the interlocutor A interacts in Japanese through the terminal unit 10 and the interlocutor B receives the dialogue in English through the terminal unit 20. I do.
[0038]
Next, in the above-described embodiment, in accordance with the example of the syntax pattern shown in FIG. 2, the example of the dialogue scenario shown in FIG. 3, and the example of the typical dialogue sentence data shown in FIG. The following describes the operation of a dialogue between different languages, in which a request is made in English and the interlocutor B receives a request in English via the terminal unit 20.
[0039]
FIG. 5 is a diagram showing an example of the syntax pattern shown in FIG. 2, the example of the dialogue scenario shown in FIG. 3, and the example of the typical dialogue data shown in FIG. FIG. 10 is a diagram illustrating an example of a dialogue between different languages, in which a request is made in Japanese via a terminal unit 20 and a request is made in English via a terminal unit 20.
[0040]
FIG. 6 is a flowchart showing the overall processing in this embodiment.
[0041]
The dialogue is started from a dialogue state STATE0 (see FIG. 3), which is an initial state, and the dialogue control unit 30 sends a response content shown in FIG. To output a speech request. From the response content of the utterance request, referring to the Japanese standard dialogue sentence data shown in FIG. 4, respond to the terminal unit 10 with a Japanese sentence "Please tell us what you want to tell the other party," When the "dialogue state" is designated, the state transits to the dialogue state STATE1 (S1).
[0042]
In the example of the dialog shown in FIG. 5, in response to the response "Please tell us what you want to tell the other party," Interactor A speaks, "I want you to connect the phone a little." Is also obtained with the same sentence (S3).
[0043]
For this utterance, a key phrase is searched from the example of the syntax pattern shown in FIG. In this utterance, “connect”, which is underlined in FIG. 5, is searched for as a key phrase, and the essential elements “to N1 (?)” And “to N2” are obtained.
[0044]
Further, in the sentence of the speech recognition result, there is “telephone” (double line in FIG. 5) corresponding to “N2”, and there is no phrase corresponding to the essential element “to N1”. It is not specified whether or not to make a call). As a result, referring to FIG. 3, the utterance understanding result is "there is a key phrase, but essential elements for the key phrase are insufficient" (S4).
[0045]
In the example shown in FIG. 5, for example, each of the words constituting "I want to connect a phone call" spoken by the interlocutor A is compared with the words registered in a predetermined key phrase syntax pattern. If there is a match, the matched word is recognized as a key phrase. By doing so, "I want you to connect" is recognized as a key phrase.
[0046]
On the other hand, the system generates a response sentence of the system by referring to the example of the fixed-form dialogue data shown in FIG. . The standard Japanese dialogue sentence of "Requirement for missing essential elements" is "Is the utterance understanding content in which missing essential elements are replaced with question words?", And <Insufficient essential elements are replaced with question words. The sentence generated from the utterance understanding content up to that point is embedded in the “utterance understanding content>” portion.
[0047]
Here, the above-mentioned “embedded sentence” is obtained from the fixed-form dialogue sentence data shown in FIG. 4 by using the question word “to which” for the missing essential element “to N1 / to”, the remaining element “telephone”, and the key phrase “ Please connect them ".
[0048]
In other words, the response sentence here is "Which phone do you want to connect to?" Further, according to FIG. 3, the state transits from the conversation state STATE1 to the next conversation state STATE2 (S1).
[0049]
Next, as shown in FIG. 5, it is assumed that the interlocutor A replies "Please go to Room 204" and the speech recognition result incorrectly recognizes "Room 205" (S3). The speech recognition result includes “to room 205” (the double line in FIG. 5) corresponding to the essential element “to N1 / to” that was missing in the previous state. Thus, from FIG. 3, the utterance understanding result becomes “there is an essential element of the previous utterance” (S4).
[0050]
In response to this, the Japanese standard response sentence of the “confirmation of utterance content” is referred to as an example of the typical dialogue sentence data shown in FIG. ?? "is generated from the system. The sentence generated based on the utterance understanding content up to that time is embedded in the <utterance understanding content> part.
[0051]
Here, the sentence to be embedded is a sentence "I want you to connect the telephone to Room 205" in which the essential elements "To room 205", "Call me" and the key phrase "I want you to connect" are connected.
[0052]
Therefore, the response sentence here is "Do you want to tell me that you want to connect to room 205?", And responds to this message to the interlocutor A through the terminal unit 10. Further, according to FIG. 3, the state transits from the conversation state STATE2 to the next conversation state STATE3 (S1).
[0053]
Next, as shown in FIG. 5, the interlocutor A answers "I'm going to room 204" and recognizes that the speech recognition result was obtained without error this time (S3). This speech recognition result includes “to room 204” (double line in FIG. 5) corresponding to the already required essential element “to N1”.
[0054]
As a result, based on the example of the dialogue scenario shown in FIG. 3, the utterance understanding result is determined to be “correction of the understanding contents” (S4), and for the essential element “to N1”, to “room 204”. "Is updated. In response to this, the Japanese standard response sentence of the “confirmation of utterance content” is referred to as an example of the typical dialogue sentence data shown in FIG. ?? "is generated from the system.
[0055]
The sentence generated based on the utterance understanding contents up to that time is embedded in the <utterance understanding contents> part. Here, the sentence to be embedded is a sentence "I want you to connect the telephone to Room 204" in which the essential elements "To room 204", "Call me" and the key phrase "I want you to connect" are connected. In other words, the response sentence here is "Do you want to inform that you want to connect to room 204?", And this response is sent to the interlocutor A via the terminal unit 10. Further, according to FIG. 3, the self-transition is made from the conversation state STATE3 to the next conversation state STATE3 (correction) (S1).
[0056]
The dialogue control unit 30 understands that the next utterance of the interlocutor A is "yes" (S3) as "yes" (S4), and in accordance with the response contents and the designation of the response destination in FIG. "Transmit the utterance contents" to the terminal unit 20 of the interlocutor B, and further, respond to the "utterance request" to the terminal unit 10 of the interlocutor A (S1).
[0057]
The language on the terminal unit 20 side is English, and the standard English response sentence of the response content “transmit the utterance content” is “He / Say sails,“ <utterance understanding content> ”from FIG. In the <speech comprehension content> portion of the response sentence, the above-mentioned converted to English of "I want you to connect a telephone to room 204" is embedded. This embedded sentence is obtained as a result of converting from Japanese to English by the machine translation unit 31 and is obtained as “I world like you to put me through to Room 204”, and the response sentence “He / Saysay,“ I world ” It transmits like like to put me through to Room 204 ″ ”to the terminal unit 20 of the interlocutor B. A conventional machine translation technique may be used as the machine translation used in the machine translation unit 31.
[0058]
In addition, a response is made to the terminal unit 10 of the interlocutor A with a Japanese standard dialogue sentence "Please tell us what you want to tell the other party" corresponding to the response content "request for utterance" to prompt the next utterance. After that, the dialogue state transitions to STATE1 according to the designation of the next dialogue state in FIG. 3 (S1).
[0059]
As shown in FIG. 5, from the speech recognition result of the next utterance of the interlocutor A, "OK, please end" (S3), the system understands the utterance as "End" from FIG. 3 (S4), and the terminal unit The template dialogue sentence "skilled" for "accept" is transmitted to the interlocutor A via 10 (S1). Here, since the next dialogue state is STATE 9, that is, the end state (S2), the dialogue ends.
[0060]
By the method described above, speech recognition errors are corrected interactively, and words required as input sentences for machine translation are obtained from the interlocutor, so that the utterance content processed in the dialogue system becomes more accurate, and as a result, Thus, smoother communication between different languages can be realized.
[0061]
Note that the examples shown in FIGS. 5 and 6 can be applied to the case of a telephone company in a general company and the like other than the present embodiment.
[0062]
The second embodiment of the present invention is an embodiment in which the interactive system of the above embodiment is used in combination in two directions.
[0063]
That is, the dialogue system according to the above-described embodiment uses the terminal unit 20 to request the utterance of the user B via the terminal unit 20 and reply to the dialogue in English. Applicable when receiving in English. As described above, by using the inter-language dialogue device 100 in two directions simultaneously, two-way communication between two different languages is possible.
[0064]
FIG. 7 is a block diagram showing a machine translation apparatus 200 according to a third embodiment of the present invention.
[0065]
The machine translation device 200 is obtained by removing the terminal unit 20 and the speech recognition unit 21 from the interlingual interaction device 100.
[0066]
That is, the machine translation device 200 may be configured so that the dialogue control unit 30 requests the utterance of the interlocutor A via the terminal unit 10 and receives the dialogue content returned in the first language in English via the terminal unit 10. It is. The machine translation device 200 need only use the same “first terminal unit 10” for all the response destinations of the dialog scenario shown in FIG. 3 in the interlingual dialog device 100.
[0067]
That is, the machine translation device 200 is a machine translation device that converts the utterance content from the first language to the second language, and converts the sentence spoken by the first interlocutor through the terminal unit in the first language. Referencing speech understanding means for understanding the contents of the utterance sentence by recognizing the speech and checking the content of the utterance sentence by comparing the pattern of the utterance sentence with a pre-stored syntax pattern. The first responder determines a response content to the first interlocutor, uses a fixed dialogue sentence specified for each response content, and transmits the response content in a first language via the terminal unit. The terminal unit by combining the conversion result obtained by converting the utterance understanding result into the second language different from the first language with the conversion result obtained by machine translation and the fixed dialogue sentence Through it, is an example of a machine translation apparatus and a dialogue processing means for executing said first second processing for transmitting the interlocutor, any of the third process of ending the conversation.
[0068]
The above embodiment is an example in which only one key phrase is included in one utterance. In the case of an example in which a plurality of key phrases are included, a request processing of an essential element is performed for each key phrase. The dialog scenario in the dialog scenario storage section 40 may be described so that is repeated.
[0069]
Furthermore, the above-described embodiment is not limited to the above-described embodiment, and may change and apply a language type, a character code, a dialog scenario, a syntax pattern, a fixed dialogue sentence, and the like.
[0070]
【The invention's effect】
According to the first and second aspects of the present invention, when a computer assists a dialogue between different languages, it is possible to interactively correct a speech recognition error with respect to an utterance of an interlocutor, The necessary words can be obtained interactively as input sentences, and by controlling the dialogue while checking the utterance comprehension results, even when using the current technology-level speech recognition and machine translation, The effect is that communication can be realized.
[0071]
According to the third and fourth aspects of the invention, when machine translation into a different language is performed using a technology such as speech recognition or machine translation at the current technical level, an effect can be obtained at a practical level.
[Brief description of the drawings]
FIG. 1 is a diagram showing a cross-language interactive device 100 according to a first embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of a syntax pattern stored in a syntax pattern storage unit 60 in the embodiment.
FIG. 3 is a diagram showing an example of a dialog scenario stored in a dialog scenario storage unit 40 in the embodiment.
FIG. 4 is a diagram showing an example of fixed form dialogue data stored in a fixed form dialogue data storage unit 50 in the embodiment, and corresponds to a response content in the fixed form dialogue data for each language. FIG. 7 is a diagram illustrating an example of fixed form dialogue data in which a fixed form dialogue sentence is prepared.
5 is a diagram showing an example of a syntax pattern shown in FIG. 2, an example of a dialog scenario shown in FIG. 3, and an example of fixed form dialogue data shown in FIG. FIG. 13 is a diagram showing an example of a dialogue between different languages, in which a request is made in English and a request is made in English by the interlocutor B via the terminal unit 20 via the terminal unit 20.
FIG. 6 is a flowchart illustrating overall processing in the present embodiment.
FIG. 7 is a block diagram illustrating a machine translation apparatus 200 according to a third embodiment of the present invention.
[Explanation of symbols]
100: Interlanguage dialogue device,
10, 20 ... terminal unit,
11, 21 ... voice recognition unit,
30: Dialogue control unit,
40: Dialogue scenario storage unit
50: fixed form dialogue data storage unit
60 ... Syntax pattern storage unit
200: Machine translation device.

Claims

In a cross-language dialogue device that interacts through a plurality of terminals and uses a different language in each terminal,
A first interlocutor voice-recognizes a sentence uttered in a first language via a first terminal unit, and collates with a syntax pattern in which patterns of components of the uttered sentence are described and stored in advance. Utterance understanding means for understanding the contents of the utterance sentence;
The contents of a response to the first interlocutor are determined by referring to a dialog scenario stored in advance, and a fixed dialogue sentence specified for each response is used to determine the response through the first terminal unit. A first process of transmitting the response content in a first language, a conversion result obtained by converting an utterance understanding result that is a result of the utterance understanding into a second language different from the first language, A second process of transmitting a sentence combining the dialogue sentence to a second interlocutor different from the first interlocutor via a second terminal different from the first terminal; Interactive processing means for executing any of the third processing for ending the processing;
A cross-language dialogue device, comprising:

In a cross-language dialogue method of interacting through a plurality of terminals and using a different language at each terminal,
A first interlocutor voice-recognizes a sentence uttered in a first language via a first terminal unit, and collates with a syntax pattern in which patterns of components of the uttered sentence are described and stored in advance. An utterance understanding step for understanding the contents of the utterance sentence;
The contents of a response to the first interlocutor are determined by referring to a dialog scenario stored in advance, and a fixed dialogue sentence specified for each response is used to determine the response through the first terminal unit. A first process of transmitting the response content in a first language, a conversion result obtained by converting an utterance understanding result that is a result of the utterance understanding into a second language different from the first language, A second process of transmitting a sentence combining the dialogue sentence to a second interlocutor different from the first interlocutor via a second terminal different from the first terminal; Performing a process of any of a third process of ending the process;
A cross-language dialogue method comprising:

In a machine translator for converting utterance content from a first language to a second language,
By speech-recognizing a sentence spoken by the first interlocutor via the terminal unit in the first language, and by collating with a pre-stored syntax pattern in which a pattern of a component of the spoken sentence is described. Utterance understanding means for understanding the content of the utterance sentence;
The content of the response to the first interlocutor is determined by referring to the dialog scenario stored in advance, and the content of the response is determined via the terminal unit using a fixed dialogue sentence specified for each response content. A first processing transmitted in a first language, combining a conversion result obtained by converting the utterance understanding result into the second language different from the first language by a machine translation and a fixed dialogue sentence And a dialogue processing means for executing one of a second process for transmitting to the first interlocutor via the terminal unit and a third process for ending the dialogue;
A machine translation device comprising:

In a machine translation method for converting speech content from a first language to a second language,
By speech-recognizing a sentence spoken by the first interlocutor via the terminal unit in the first language, and by collating with a pre-stored syntax pattern in which the patterns of the components of the spoken sentence are described. An utterance understanding step of understanding the contents of the utterance sentence;
The content of the response to the first interlocutor is determined by referring to the dialog scenario stored in advance, and the content of the response is determined via the terminal unit using a fixed dialogue sentence specified for each response content. A first processing transmitted in a first language, combining a conversion result obtained by converting the utterance understanding result into the second language different from the first language by a machine translation and a fixed dialogue sentence A second processing for transmitting to the first interlocutor via the terminal unit, and a third processing for ending the dialogue;
A machine translation method, comprising: