JP3566977B2

JP3566977B2 - Natural language processing apparatus and method

Info

Publication number: JP3566977B2
Application number: JP33286093A
Authority: JP
Inventors: 哲朗知野; 靖太森岡; 宏之坪井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-12-27
Filing date: 1993-12-27
Publication date: 2004-09-15
Anticipated expiration: 2019-09-15
Also published as: JPH07191687A

Abstract

PURPOSE:To enable natural and smooth interaction by implementing an interruption from the processor at need during a process for natural language input from a user, and starting interaction for pointing out ambiguity and an error and solving a problem and also giving response indicating the understanding of a complicated input. CONSTITUTION:The natural language processor consists of a language analysis part 1, an area knowledge storage part 2, a response contents determination part 3, a language generation part 4, and a talk management part 5; and the talk management part 5 controls the language analysis part 1, response contents determination part 3, and language generation part 4 according to the contents of analyzing operation state information, response determining operation state information, and generating operation state information outputted from the language analysis part 1, response contents determination part 3, an language generation part 4.

Description

【０００１】
【産業上の利用分野】
本発明は、自然言語によって利用者との対話をおこなう自然言語処理装置に関する。
【０００２】
【従来の技術】
近年、計算機装置の発展と普及に伴い専門家以外の一般の利用者が、計算機装置を利用する機会が富みに増加してきている。
【０００３】
しかし、計算機装置の操作には専門的な知識が必要であり、一般の利用者が自由に利用することは困難である。
【０００４】
このような背景の下、利用者が通常用いている日本語あるいは英語等といった自然言語によって、計算機装置を利用するための自然言語対話装置の研究開発が盛んになってきている。
【０００５】
このような自然言語対話装置では、言語解析、意味解釈を経て利用者の要求内容を理解し、その要求に対する問題解決を行なって応答内容を決定し、言語生成を行なうことによってその応答内容を利用者に提示する。
【０００６】
また、音声を利用した自然言語対話装置では、上述の処理過程の先頭で音声認識を行ない、処理の最後の段階で音声合成処理を行なう。
【０００７】
ところが、従来の自然言語対話装置では、利用者と対話装置の交互発話を仮定しているため、自然な対話を行なうことが出来ないという問題があった。
【０００８】
例えば、利用者の発話の一部に誤りや矛盾が存在したり、利用者が対話装置の解析能力の限界を超える発話をおこなっていたとしても、発話の途中でこれらの問題点を対話装置が利用者に指摘することは不可能であった。
【０００９】
そのため利用者が発話の全て完了した時点まで誤りに気がつくことが出来ず、行なった発話が無駄となり結果として利用者の負担を増加させてしまうという問題があった。
【００１０】
従来、特開平３−２４８２６８号の「音声対話処理方式」では、入力音声の信号レベルを検知して、信号レベルが一定時間連続して閾値以下の場合や入力音声の母音が一定時間連続した場合に、対話装置からあいづちを送出するものがあった。しかし、これはあくまでも（音声）信号処理のみによるあいづちの送出処理であるため、利用者の発話の内容を考慮して適切な割り込みを行なうものではなかった。
【００１１】
以上の示した通り、従来の自然言語対話装置では、あいづち、わりこみ、といった同時発話に対する充分な考慮がなされておらず、そのため自然で効率的な対話を行なうことが出来ないという問題があった。
【００１２】
また、近年、計算機装置の処理能力向上に伴って、人間にとっての計算機装置の操作上の使い勝手、つまりユーザインタフェースを向上させることが重要になってきている。
【００１３】
ユーザインタフェースの手段としては、キーボード、マウス、電子ペン等様々なものが提案され実用に供されている。その中でも音声を用いた自然言語による手段は、人間にとって自然で使いやすく、また非常に効率的な手段である。そのため、数々の音声入力による計算機装置への命令、音声による計算機出力方法といったものが提案されている。
【００１４】
そして、対話装置は、データベース等の計算機装置のフロントエンドとして、自然言語による利用者の入力を理解し、それを処理し、結果を自然言語で出力する計算機装置で、入力理解、処理、出力の流れを円滑にするための制御機構が付属したものである。
【００１５】
音声対話装置は、入力手段、出力手段に音声を用いる対話装置であり、音声言語に多用される省略、代名詞参照等、情報の伝達効率を高くする一方で、曖昧性の高い現象を、的確に処理できる機能を持つ必要がある。
【００１６】
音声による情報伝達に特有の問題として、同音異義語や、発音が似た単語への聞き違いといった問題がある。
【００１７】
最近の音声対話装置では、入力の理解では、文脈情報を制約に用いることで関連のある候補を選択するようにしたり、利用者に問い直すようにしている。また、認識候補の信頼度がいずれも低い場合等は、その個所について問い直したり、あるいは全文の再入力を要請する手法を採っている。
【００１８】
しかし、その問い直し文の出力、あるいは一般的な出力において、同音異義語や聞き誤られやすい発音の単語を、誤解が生じないように、どう出力するかについては、解決されていない。
【００１９】
【発明が解決しようとする課題】
このように従来の自然言語対話装置では、対話装置と利用者の交互の発話を仮定していたため、利用者の発話が完全に終るまで、対話装置は応答を開始することができなかった。
【００２０】
例えば、利用者の発話の一部に誤りや矛盾が存在したり、利用者が対話装置の解析能力の限界を超える発話をおこなっていたとしても、発話の途中でこれらの問題点を対話装置が利用者に指摘することは不可能なために、利用者行なった発話が無駄となり結果として利用者の負担を増加させてしまい、結果として自然で円滑で柔軟な対話を行なうことが出来ないという問題があった。
【００２１】
しかし、人間同士の対話では、あいづち、割り込み、といった同時発話によって円滑で効率的な対話を行なっていることから考えて、自然言語対話装置に同時発話を扱う能力を付与することは重要である。
【００２２】
第１の発明は、このような事情を考慮してなされたもので、自然言語処理装置と利用者との対話において、処理装置からの適切な割り込み及びあいづちの提示を実現することによって、自然で円滑で柔軟性を持った対話を行なうことを可能とする自然言語処理装置を提供する。
【００２３】
また、このように従来は、同音異義語あるいは聞き誤られやすい発音の単語の音声での入出力の場合に、特に出力において、同音異義語あるいは聞き誤られやすい発音の単語を、誤解の生じないあるいは生じ難い形に言い換えて処理を行なうことができないという問題があった。
【００２４】
第２の発明は、このような事情を考慮してなされたもので、状況に応じて誤解を受けないまたは受け難い形への言い換えを行なうか行なわないか、また言い換えを行なう場合はどの変形手段を用いるかの切り換えを可能とし、対象の文の内容を正確かつ効率的な形で扱うことを可能にした自然言語処理装置を提供することにある。
【００２５】
【課題を解決するための手段】
第１の発明に係る自然言語処理装置は、利用者からの自然言語入力に対し、順次解析することにより、統語情報、韻律情報あるいは意味情報の解析結果情報を出力するとともに、解析動作の状況を表す解析動作状況情報を出力する言語解析手段と、この言語解析手段からの解析結果情報に対し、その内容によって応答内容情報を決定するとともに、前記解析結果情報を受け取って前記応答内容情報を決定するまでの決定動作の状況を表す応答決定動作状況情報を出力する応答内容決定手段と、この応答内容決定手段からの応答内容情報に対し、その内容を自然言語出力に変換して、利用者に順次提示するとともに、前記変換による自然言語出力の生成動作の状況を表す生成動作状況情報を出力する言語生成手段と、前記解析動作状況情報、前記応答決定動作状況情報及び前記生成動作状況情報の少なくとも一つの内容に応じて、前記言語生成手段に対し利用者の自然言語入力に対するあいづち、または、割り込みといった同時発話を行うように制御する談話管理手段とを有し、前記言語解析手段は、自然言語入力の解析の過程で生じる解析結果の曖昧性を持つ候補の数に応じて曖昧性に関する情報を含む解析動作状況情報を出力し、前記談話管理手段は、この解析動作状況情報を受けとった時点で、前記言語生成手段を通じて間投詞からなる割り込みを実現する自然言語を出力することにより、利用者の発話にあいづちの応答、あるいは、割り込みをかけるようにしたものである。また、利用者からの自然言語入力に対し、順次解析することにより、統語情報、韻律情報あるいは意味情報の解析結果情報を出力するとともに、解析動作の状況を表す解析動作状況情報を出力する言語解析手段と、この言語解析手段からの解析結果情報に対し、その内容によって応答内容情報を決定するとともに、前記解析結果情報を受け取って前記応答内容情報を決定するまでの決定動作の状況を表す応答決定動作状況情報を出力する応答内容決定手段と、この応答内容決定手段からの応答内容情報に対し、その内容を自然言語出力に変換して、利用者に順次提示するとともに、前記変換による自然言語出力の生成動作の状況を表す生成動作状況情報を出力する言語生成手段と、前記解析動作状況情報、前記応答決定動作状況情報及び前記生成動作状況情報の少なくとも一つの内容に応じて、前記言語生成手段に対し利用者の自然言語入力に対するあいづち、または、割り込みといった同時発話を行うように制御する談話管理手段と、対話の内容に関連する知識である領域知識情報を保持する領域知識記憶手段を有し、前記言語解析手段は、得られた解析結果情報と、領域知識記憶手段に記録されている領域知識情報との比較を順次行ない、両者の間に矛盾が生じた場合に、矛盾した内容を含む解析動作状況情報として出力し、前記談話管理手段は、この解析動作状況情報を受けとった時点で、言語生成手段を通じて間投詞からなる割り込みを実現する自然言語出力を出力して、利用者の発話に対し割り込みをかけるようにしたものである。
【００２６】
第２の発明の自然言語処理装置は、自然言語情報を入力する入力手段と、前記自然言語情報の中で、音声的に同じ発音または似た発音の語句である類音語句を検出する類音検出手段と、この検出された類音語句の言い換え候補を生成する言い換え候補生成手段と、前記検出された類音語句が複数の場合に、前記各類音語句の言い換え候補の中で互いに類音関係にある言い換え候補を削除し、この削除されなかった言い換え候補を前記各類音語句とそれぞれ置換する言い換え候補判定手段とを有するものである。
【００２７】
【作用】
第１の発明の自然言語処理装置の動作状態について説明する。
【００２８】
言語解析手段は、利用者からの音声あるいは文字情報等による自然言語入力を受けとり、受けとった部分から順次解析することにより、統語情報、韻律情報あるいは意味情報の解析結果情報を出力するとともに、解析動作の状況を表す解析動作状況情報を出力する。
【００２９】
応答内容決定手段は、この言語解析手段から解析結果情報を受けとり、その内容によって応答内容情報を決定するとともに、応答決定動作の状況を表す応答決定動作状況情報を出力する。
【００３０】
言語生成手段は、この応答内容決定手段から応答内容情報を受けとり、その内容を音声あるいは自然言語からなる自然言語出力に変換して、利用者に順次提示するとともに、生成動作の状況を表す生成動作状況情報を出力する。
【００３１】
談話管理手段は、前記言語解析手段、前記応答内容決定手段あるいは前記言語生成手段から出力される解析動作状況情報、応答決定動作状況情報及び生成動作状況情報の少なくとも一の内容に応じて、前記言語生成手段の少なくとも一の動作を、利用者の自然言語入力に対するあいづち、または、割り込みといった同時発話を行う。
【００３２】
第１の発明の自然言語処理装置を、さらに具体化した場合について、実例を用いて説明する。
【００３３】
▲１▼ 前記言語解析手段は、自然言語入力の解析の過程で生じる解析結果の曖昧性を持つ候補の数に応じて曖昧性に関する情報を含む解析動作状況情報を出力し、前記談話管理手段は、この解析動作状況情報を受けとった時点で、前記言語生成手段を通じて間投詞等からなる割り込みを実現する自然言語出力を出力することにより、利用者の発話にあいづちの応答、あるいは、割り込みをかけるようにした場合は下記のようになる。
【００３４】
利用者から「黒い瞳の髪の長い美しい…」等といった自然言語入力がなされ、この入力の解釈を行なっている過程で、修飾句「黒い」の係り先である被修飾句の候補が多数生成され、曖昧性の数が処理装置の処理能力を超えてしまうような場合に、間投詞等を用いて、利用者の発話の終了を待たずに、必要な時点で利用者の発話に割り込みをかけること、あるいは割り込みに続いて曖昧性解消のための対話を起動する。また、利用者から「黒い瞳の髪の長い…」等といった自然言語入力がなされ、この入力の解析を行なっている過程で、修飾句「黒い」の係り先である被修飾句の候補が多数生成され、曖昧性の数が処理装置の処理能力の限界に近付いた時点で、処理装置が利用者の入力全ては理解できなかったが、とりあえずこの点については保留して、利用者に話を進めてほしいことを表す、例えば、「うーむ」といったような間投詞を出力したり、その後に利用者から例えば「女性が、…」といった利用者からの入力がなされたために、曖昧性が解消した場合に、利用者の入力を処理装置が理解したことを表現する、例えば「えぇ」といった間投詞を利用者に提示する。
【００３５】
これにより、無駄な発話を続けることによる利用者の負担を解消することができる。
【００３６】
▲２▼ 対話の内容に関連する知識である領域知識情報を保持する領域知識記憶手段を有し、前記言語解析手段は、得られた解析結果情報と、領域知識記憶手段に記録されている領域知識情報との比較を順次行ない、両者の間に矛盾が生じた場合に、矛盾した内容を含む解析動作状況情報として出力し、前記談話管理手段は、この解析動作状況情報を受けとった時点で、言語生成手段を通じて間投詞等からなる割り込みを実現する自然言語出力を出力して、利用者の発話に対し割り込みをかけるようにした場合は下記のようになる。
【００３７】
利用者から「アメリカにあるＡ社の支店に勤務している従業員数を示せ」といった入力がなされつつあり、かつ処理装置の保持している知識によるとアメリカにはＡ社の支店が存在しないことが判っており、利用者の発話に前提の誤りが存在するか、あるいは利用者の入力に対する処理装置の認識結果に誤りがあるか等といった対話コミュニケーション上の問題が生じているような場合に、必要に応じて利用者の発話が終了する以前に、間投詞等を用いて利用者の発話に割り込みをかける。
【００３８】
これにより、無駄な発話を続けることによる利用者の負担を解消することができる。
【００３９】
▲３▼ 前記談話管理手段は、言語解析手段から、利用者が割り込みを受理したことを表す解析動作状況情報を受けとった場合に、割り込みの原因となった利用者からの自然言語入力の解析時の曖昧性や利用者の自然言語入力の意味内容の矛盾事項の解消のための対話を行なうようにした場合は下記のようになる。
【００４０】
▲２▼の割り込みに続いて、利用者の発話の前提の誤りや利用者の入力に対する処理装置の認識結果の誤り等の解消のための対話を起動して解決する。
【００４１】
これにより、利用者と処理装置の間の誤解を無くし、柔軟で円滑な対話を行なうことができる。
【００４２】
▲４▼ 前記言語解析手段は、前記談話管理手段からの制御によって間投詞等による利用者への割り込みを行なった際に、利用者が発話を中断したり、あるいは、了解の間投詞等を発生して処理装置の割り込みを利用者が受理したかどうかを検知し、その結果を表す解析動作状況情報を出力し、前記談話管理手段は、この解析動作状況情報に応じて曖昧解消や矛盾解消の対話を起動するように制御するようにした場合は下記のようになる。
【００４３】
「あのー」といった間投詞によって割り込みを行なった場合に、例えば、利用者が「はい」と応答したりあるいは行なっていた発話を中断する等して、処理装置からの割り込みを受理した場合には、割り込み処理を継続し、例えば、利用者が行なっていた発話を継続した場合には、割り込み処理を中止する等というように、処理装置からの割り込みに対する利用者の応答の状況に基づいて割り込み以降の処理を制御する。
【００４４】
これにより、利用者が発話を中断したりあるいは了解の間投詞等を発声して処理装置の割り込みを利用者が受理した場合にのみ曖昧解消や矛盾解消の対話を起動することができる。
【００４５】
▲５▼ 自然言語対話における発話の終端位置、割り込みが可能な位置の直前の相手発話の統語的特徴、あるいは、前記相手発話の韻律的特徴に関する情報をあらかじめ記憶している割り込み可能位置特徴記憶手段を有し、前記談話管理手段は、割り込みが必要な場合に、前記言語解析手段から得られる解析結果情報と、この割り込み可能位置特徴記憶手段の内容とを比較し、割り込み可能な位置を検出した時点で前記言語生成手段を通じて間投詞等によって割り込みを行なうように制御するようにした場合は下記のようになる。
【００４６】
処理装置から利用者の発話に対する割り込みを実施する際に、利用者の行なっている発話と、例えば、「〜でございます」といった文末の敬語表現や、ポーズ（無音区間）等の韻律的な特徴等あらかじめ用意した割り込み可能位置での統語的あるいは韻律的な特徴の情報の比較に基づいて割り込みを行なうことによって、適切な位置で割り込みを行なう。
【００４７】
これにより、自然言語対話において発話の終端位置あるいは割り込みが可能な位置を判断して、適切なタイミングで割り込みをかけることができる。
【００４８】
▲６▼ 利用者との対話の履歴を保持する対話履歴記憶手段を有し、この対話履歴記憶手段の内容から発話の交替が起こった位置や無音区間位置を、発話終端位置、あるいは、割り込みが可能な位置として抽出し、該位置の直前の統語的な特徴、あるいは、韻律的な特徴を抽出し、該特徴を前記割り込み可能位置特徴記憶手段に追加記録する割り込み可能位置抽出手段を有したようにした場合は下記のようになる。
【００４９】
処理装置と利用者が行なっている対話の履歴を解析することによって、例えば、「〜どうですか？↑（イントネーションのライズ：音声の基本周波数の上昇）」といった表現や、あるいは「〜ですけどぉー」といった表現を利用者が行なった直後に発話の交替や、行なった割り込みが受理される傾向にあること等を抽出し、割り込み可能位置に関する情報を抽出する。
【００５０】
これにより、利用者ごとの発話の癖等に応じた柔軟な割り込みを行なうことができる等の実用上多大な効果が奏せられる。
【００５１】
▲７▼ 前記割り込み可能位置特徴記憶手段は、割り込み可能位置情報の信頼性に関する情報を保持し、前記言語解析手段は、この割り込み可能位置特徴記憶手段に記録されている情報に基づいた割り込みを行ない、該割り込みに対して利用者が発話を中断したり応答の間投詞を発生する等して割り込みを受理したかどうかを判定し、その結果を含む解析動作状況情報を出力し、前記談話管理手段は、この解析動作状況情報に応じて割り込み可能位置特徴記憶手段に記憶されている該割り込み可能位置情報の信頼性を調整するようにした場合は下記のようになる。
【００５２】
処理装置から利用者の発話に対する割り込みを実施した際に、行なった割り込みが利用者に受け入れられたか否かによって、利用した割り込み可能位置での統語的あるいは韻律的な特徴の情報の信頼性を調整する。
【００５３】
これにより、利用者ごとの発話の癖等に応じた柔軟な割り込みを行なうことができる等の実用上多大な効果が奏せられる。
【００５４】
第２の発明に係る自然言語処理装置について説明する。
【００５５】
類音検出手段が、音声的に同じ発音または似た発音の語句である類音語句を検出する。
【００５６】
言い換え候補生成手段が、この類音検出手段で検出された類音語句と音声的に同じ発音または似た発音の語句を言い換え候補として生成する。
【００５７】
言い換え候補判定手段が、この言い換え候補生成手段で生成された言い換え候補の中から目的の言い換え候補を判定し、前記類音語句を置換規則に従ってこの目的の言い換え候補と置換する。
【００５８】
これにより、同音異義語あるいは聞き誤られやすい発音の単語を持つ単語に、誤認されない形への言い換え処理を施すことによって、明確で効率の良い自然言語出力を得ることが可能となる。
【００５９】
【実施例】
第１の実施例
以下、第１の発明の自然言語処理装置の実施例につき図１〜図９に基づいて説明する。
【００６０】
図１は、本実施例の自然言語処理装置の構成を表し、言語解析部１、領域知識記憶部２、応答内容決定部３、言語生成部４及び談話管理部５よりなる。
【００６１】
（１）言語解析部１
言語解析部１は、音声等による利用者からの自然言語入力情報を受けとり、音声認識、言語解析、意味解釈等を施すことによって、該自然言語入力情報の解析結果を表す解析情報を出力するとともに、言語解析部１の動作状況を表す解析動作状況情報を出力する。
【００６２】
なお、言語解析部１では、利用者からの通常の応答の解析に加え、対話システムの自然言語出力中に行なわれる、利用者からのあいづちや割り込みといった同時発話の解析も合わせて行なう。
【００６３】
図２は、言語解析部１の構成の例を示している。
【００６４】
利用者からの音声入力が、マイク１０を通じて音声認識部１２へと取り込まれる。
【００６５】
音声認識部１２によって、音韻辞書１４等の音声認識のための知識を参照して音声認識が行なわれる。中間結果を含む解析の結果が、解析結果情報として言語解析部１から出力されるとともに、解析動作の成功不成功等の情報が解析動作状況情報として言語解析部１から出力される。
【００６６】
統語解析部１６によって、音声認識部１２の出力に対して、言語辞書１８及び文法規則２０等を参照した解析が行なわれる。統語解析結果と統語解析動作の状況が、同様にそれぞれ解析結果情報及び解析動作状況情報として言語解析部１から出力される。
【００６７】
意味解釈部２２によって、統語解析部１６の出力に対して、解釈規則２４等を参照した解析が行なわれる。意味解釈結果と意味解釈動作の状況が同様に、それぞれ解析結果情報及び解析動作状況情報として言語解析部１から出力される。
【００６８】
図３（ａ）〜（ｅ）は、言語解析部１によって出力される解析動作状況情報の例を表している。
【００６９】
（ａ）は、言語解析部１における利用者からの自然言語入力の音声認識、統語解析、意味解釈及び解析結果と領域知識記憶部２の内容との整合性の全てが成功した場合に出力される例を示している。
【００７０】
（ｂ）は、統語解析において、係り受けの曖昧性を持つ解析結果候補があらかじめ設定した閾値Ｔ１より多くなったことを表している。
【００７１】
（ｃ）は、処理装置が行なった利用者の発話への割り込みに対して、その受理を表す応答がなされたことを表している。
【００７２】
（ｄ）は、処理装置から利用者へ行なった質問に対して、利用者から了解の応答がなされたことを表している。
【００７３】
（ｅ）は、本実施例を拡張した場合に利用される情報の例であって、詳細については、後述する。
【００７４】
（２）領域知識記憶部２
領域知識記憶部２は、応答内容決定部３及び談話管理部５等から参照される領域知識情報を記録する。
【００７５】
図４は、領域知識記憶部２の内容の例を示している。
【００７６】
領域知識記憶部２の各エントリにおいて、知識内容情報Ａは処理装置が保持している領域知識を表しており、Ｂは格納アドレス情報である。
【００７７】
例えば、図４の格納アドレスＰ１１のエントリでは、知識内容情報Ａの内容が「（品番ワープロＡＪＷＰ−ＸＸ）」であるが、これは、「ワープロＡ」の「品番」が「ＪＷＰ−ＸＸ」であるという領域知識を表している。
【００７８】
（３）応答内容決定部３
応答内容決定部３は、言語解析部１から出力される解析情報の内容及び領域知識記憶部２から得られる領域知識情報等に基づいて、利用者への応答の内容を決定し、応答内容を表す応答内容情報を出力するとともに、応答内容決定部３の動作状況を表す応答決定動作状況情報を合わせて出力する。
【００７９】
（４）言語生成部４
言語生成部４は、応答内容決定部３から出力される応答内容情報にもとづいて、文法規則、言語辞書及び音韻規則等を参照した言語生成と音声合成等によって自然言語による利用者への応答を出力するとともに、言語生成部４の動作の状況を表す生成動作状況情報をあわせて出力する。
【００８０】
なお、言語生成部４では、利用者へのあいづちの提示や割り込み等が行なわれるが、この過程での処理装置の発話意図を表現するために提示する間投詞等の発話音声の生成において、イントネーションやアクセント等の韻律を適宜制御する。
【００８１】
図５は、言語生成部４の構成の例を示している。
【００８２】
統語構造生成部２６において、応答内容決定部３から出力される応答内容情報を受けとり、文法規則２８を参照して応答内容情報の内容を実現するための応答文の構文構造を決定する。また、構文構造生成の成功不成功等の動作の状況を表す情報を生成動作状況情報として言語生成部４から出力する。
【００８３】
表層文生成部３０において、決定された応答文の構文構造と応答内容情報に基づき、言語辞書３２等を参照して応答の表層文を生成するとともに、表層文生成の成功不成功等の動作の状況を表す情報を生成動作状況情報として言語生成部４から出力する。
【００８４】
音声合成器３４において、決定された応答の表層文と応答内容情報に基づき、音韻規則３６等を参照して音声合成によって応答である音声出力を生成し利用者へ提示するとともに、応答音声出力の生成の成功不成功等の動作の状況を表す情報を生成動作状況情報として言語生成部４から出力する。
【００８５】
なお、言語生成部４は、談話管理部５からの制御に応じて、出力中の応答出力の中断や、利用者の自然言語による入力に対する割り込みやあいづち等の同時発話もあわせて行なう。
【００８６】
（５）談話管理部５
談話管理部５は、言語解析部１から出力される解析動作状況情報、応答内容決定部３から出力される応答決定動作状況情報及び言語生成部４から出力される生成動作状況情報の全てあるいは一部に基づいて、言語解析部１、応答内容決定部３及び言語生成部４の全てあるいは一部を制御する。
【００８７】
この談話管理部５が、本実施例において中心的な役割を演じる構成要素であるため、続いてさらに詳しく説明する。
【００８８】
図６は、談話管理部５の構成の例を示している。
【００８９】
談話制御部５ａは、その内部に対話状態を保持する談話状態レジスタＲＳを持ち、解析動作状況情報、生成動作状況及び応答決定動作状況情報を受けとり、談話制御規則記憶部５ｂの内容を参照することによって、言語解析部制御情報、言語生成部制御情報及び応答内容決定部制御情報を通じて、言語解析部１、言語生成部４及び応答内容決定部３を制御する。
【００９０】
談話制御規則記憶部５ｂは、あらかじめ用意した談話制御部５ａの動作を決定するための規則が記録されており、談話制御部５ａから参照される。
【００９１】
図７は、談話制御規則記憶部５ｂの内容の例を示している。
【００９２】
談話制御規則記憶部５ｂの各エントリには、該エントリＱｉに対応する談話制御規則に関する情報が、談話状態情報Ａ、注目対象情報Ｂ、状況条件情報Ｃ、談話状態遷移情報Ｄ、制御内容情報Ｅ等と分類され記録される。Ｆは格納アドレス情報である。そして、エントリＱｉのＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆの情報を、まとめて談話制御規則Ｒｉという。
【００９３】
談話状態情報Ａの欄には、エントリＱｉに対応する談話制御規則Ｒｉを適用する場合の談話状態レジスタＲＳの内容への制限が記録されている。
【００９４】
注目対象情報Ｂの欄には、談話制御規則Ｒｉを適用する条件となる動作状況情報Ｘｊの種類を表す「解析」あるいは「生成」あるいは「応答」が記録されている。例えば、図７のＱ１１のエントリでは、注目対象情報Ｂの内容が「解析」であることから、Ｑ１１に格納されている談話制御規則Ｒｉを適用する場合の条件が解析動作状況情報に関するものであることが示されている。
【００９５】
状況条件情報Ｃの欄には、該談話制御規則Ｒｉを適用する場合の該動作状況情報Ｘｊの内容に関する条件が記録されている。例えば、図７のＱ１１のエントリでは、状況条件情報Ｃの内容が「（曖昧性＞閾値Ｔ１）」となっていることから、エントリＱ１１に対応する談話制御規則Ｒ１１を適用するためには、解析動作状況情報の内容が、曖昧性が閾値Ｔ２を越えたことを示すものでなくてはいけないことが示されている。
【００９６】
談話状態遷移情報Ｄの欄には、エントリＱｉに対応する談話制御規則Ｒｉを適用した後の談話状態を記録されている。例えば、エントリＱ１１では、談話状態遷移情報Ｄの内容が「Ｓ１」であることから、エントリＱ１１に対応する談話制御規則Ｒ１１を適用した後の談話状態をＳ１として、談話制御部の談話状態レジスタＲＳの値「Ｓ１」とすべきであることが判る。
【００９７】
制御内容情報Ｅの欄には、対応する談話制御規則Ｒｉの適用が決定された場合に行なうべき制御の手順に関する情報が記録されている。
【００９８】
個々の手順の内容の例を以下の制御手順例Ｐ１〜Ｐ７に示す。
【００９９】
制御手順例Ｐ１〜Ｐ７
Ｐ１制御内容情報Ｅの内容が、「手順（あいづち）」である談話制御規則の適用が決定された場合
談話管理部５から言語生成部４を制御する。例えば、「はい」あるいは「えぇ」等といった了解を表す間投詞を利用者に提示する。
【０１００】
Ｐ２制御内容情報Ｅの内容が、「手順（割り込み）」である談話制御規則の適用が決定された場合
談話管理部５から言語生成部４を制御する。例えば、「あのー」あるいは「えっ」等といった間投詞を利用者に提示し、利用者の自然言語入力発話に対する割り込みを実行する。
【０１０１】
Ｐ３制御内容情報Ｅの内容が、「手順（曖昧解消対話）」である談話制御規則の適用が決定された場合
係り受けに関して曖昧性を持つ自然言語入力の曖昧性解消のための対話と同様の処理を行なうことによって、曖昧性の解消を図る（特願平５−１００７１号明細書参照）。
【０１０２】
Ｐ４制御内容情報Ｅの内容が、「手順（矛盾確認対話）」である談話制御規則の適用が決定された場合
利用者からの自然言語入力の解析結果と矛盾する領域知識記憶部２の内容Ｋの真偽を利用者に確認するための応答内容情報「（確認（真偽（Ｋ）））」を言語生成部４へ送付し、該応答内容情報の内容を表す。例えば、「Ｋではないですか？」と言った自然言語表現の質問を利用者に提示することによって、利用者あるいは処理装置の誤りを解消する。
【０１０３】
Ｐ５制御内容情報Ｅの内容が、「手順（対話再開）」である談話制御規則の適用が決定された場合
一連の割り込み、あるいは、あいづちに関する談話の制御を終了し、処理装置の制御を通常の対話へと戻す。
【０１０４】
Ｐ６制御内容情報Ｅの内容が、「手順（割り込み中断）」である談話制御規則の適用が決定された場合
一連の割り込み、あるいは、あいづちに関する談話の制御を中断し、対話システムの制御を通常の対話へと戻す。
【０１０５】
続いて、談話制御部５ａにおいて、解析動作状況情報、生成動作状況情報及び応答決定動作状況情報と上述の談話制御規則記憶部５ｂの内容とに基づいて行なわれる処理について説明する。
【０１０６】
談話制御部５ａの基本動作は、下記の談話制御手順Ａに基づいてなされる。
【０１０７】
談話制御手順Ａ
Ａ１談話制御部５ａの談話状態レジスタＲＳに初期値として「Ｓ０］を記録する。
【０１０８】
Ａ２処理装置と利用者の対話の進行にともなって、言語解析部１、応答内容決定部３及び言語生成部４から出力される解析動作状況情報、応答決定動作状況情報及び生成動作状況情報の内容と、談話制御部５ａの談話状態レジスタＲＳの内容とに基づき、談話制御規則記憶部５ｂに記録されている談話管理規則の中から適用可能な規則を検索する。
【０１０９】
Ａ３該当する規則を発見し次第、該規則を適用する。
【０１１０】
Ａ４Ａ２へ進む。
【０１１１】
以上が本装置の構成とその機能である。
【０１１２】
具体例
上記自然言語処理装置の動作について、ワープロに関するデータベースと推論に基づき利用者の質問に自然言語で対話を行なう対話式情報処理装置を例に挙げ、図を参照して更に詳しく説明する。
【０１１３】
まず、利用者が「転写式のプリンタつきの携帯型のワープロの型番」を知るために、処理装置に向かって、「転写式のプリンタつきの携帯型のワープロの型番を答えよ」を処理装置の自然言語入力としようとしている場合を例として説明を行なう。
【０１１４】
図８に、処理装置での対話処理例として、各時点（Ｔ０〜Ｔ９）ごとの利用者（Ｕ）と処理装置（Ｓ）の発話の内容を示している。
【０１１５】
Ｔ０談話制御手順ＡのＡ１に従い、談話制御部５ａの談話状態レジスタＲＳが値「Ｓ０」で初期化される。
【０１１６】
Ｔ１利用者からのＵ１の発話が開始され、言語解析部１によって順次解析が行なわれる。
【０１１７】
しかし、発話を行いつつあるＵ１の発話には、図９に示すような係り受けに関する曖昧性が存在する。
【０１１８】
そのために、利用者が、例えば、発話Ｕ１の「〜ワープロの」の部分まで発話を行なった時点で、言語解析部１での言語解析において係り受けに関する曖昧性を持つ候補の数が、あらかじめ設定した閾値Ｔ１を超過し、図３の（ｂ）に示したような解析動作状況情報Ｘ１が出力されたとする。なお、この時点では、解析における曖昧性を持つ解析結果の候補が言語解析部１に保持されているものとする。
【０１１９】
解析動作状況情報Ｘ１：
［統語解析：（曖昧性＞閾値Ｔ１）］
Ｔ２談話管理部５が、解析動作状況情報Ｘ１を受ける。
【０１２０】
談話管理部５は、談話制御部５ａの談話情報レジスタＲＳ内に記録されている談話状態の値「Ｓ０」、解析動作状況情報Ｘ１の内容及び談話管理規則記憶部５ｂの内容の比較を行なう。
【０１２１】
図７の格納アドレスＱ２１のエントリの談話状態情報Ａの内容が「Ｓ０」、注目対象情報Ｂの内容が「解析」、状況条件情報Ｃの内容が「曖昧性＞閾値Ｔ１」であって、現在注目している解析動作状況情報Ｘ１の内容と合致する。
【０１２２】
したがって、この格納アドレスＱ２１に記録されている談話管理規則Ｒ２１が適用可能であることを検出する。
【０１２３】
談話管理規則Ｒ２１：
［Ｓ０、解析、（曖昧性＞閾値Ｔ１）、Ｓ３、手順（割り込み）］
Ｔ３談話制御部５ａは、談話管理規則Ｒ２１の制御内容情報Ｅの内容が「手順（割り込み）」で制御手順例Ｐ２に従った処理を実施し、言語生成部４を通じて利用者の発話への割り込みを実現する。例えば、発話Ｓ１の「あのー」等と言った間投詞が利用者に提示されるとともに、談話管理規則Ｒ２１の談話状態遷移情報Ｄに基づき談話状態レジスタＲＳに値「Ｓ３」が記録される。
【０１２４】
Ｔ４Ｔ３での処理装置からの割り込みに対し、利用者が処理装置の割り込みに対する受理を示す。例えば、発話Ｕ２の「はい」といった了解を表す間投詞を発話を行ない、これが言語処理部１によって検出され、図４（ｃ）に示すような割り込みの受理を表す解析動作状況情報Ｘ２が出力される。
【０１２５】
解析動作状況情報Ｘ２：
［統語解析：（割り込み受領＝Ｔ）］
Ｔ５談話制御規則記憶部５ｂでは、時点Ｔ２と同様の処理によって解析動作状況情報Ｘ２と、談話状況レジスタＲＳの内容「Ｓ３」に基づき、図７の格納アドレスＱ３１のエントリに対する談話制御規則Ｒ３１が検出される。
【０１２６】
談話管理規則Ｒ３１：
［Ｓ３、解析、（割り込み受理＝Ｔ）、Ｓ５、手順（曖昧解消対話）］
Ｔ６時点Ｔ３と同様の処理によって、談話制御規則Ｒ３１の制御内容情報Ｅの値が「手順（曖昧解消対話）」であることから、制御手順Ｐ３に従って、係り受け曖昧性解消のための対話と同様の処理が行なわれ（特開平５−１００７１号参照）、適用した談話制御規則Ｒ２に基づいて談話状態レジスタＲＳに値「Ｓ５」が記録される。
【０１２７】
時点Ｔ６での処理装置からの質問の発話例
［”携帯型の”の係り先は、”ワープロ”でよろしいですか？」
Ｔ７Ｔ６での処理装置からの質問に対し、例えば「はい」といったような利用者の確認の応答を表す入力がなされる。
【０１２８】
発話Ｓ２から発話Ｕ３の確認の対話によって、言語解析部１で生じていた係り受け解析により曖昧性が解決される。
【０１２９】
言語解析部１から、図３（ｄ）に示すような解析動作状況情報Ｘ３が出力される。
【０１３０】
時点Ｔ２と同様の処理によって図７の格納アドレスＱ５１に格納されている談話制御規則Ｒ５１が検出される。
【０１３１】
時点Ｔ３と同様の処理によって、制御手順Ｐ５が起動され、一連の割り込み、あるいは、あいづちに関する談話を終了し、通常の対話へと制御を戻す。
【０１３２】
解析動作状況情報Ｘ３：
［統語解析：（確認完了）］
談話管理規則Ｒ５１：
［Ｓ５、解析、（確認完了）、Ｓ０、手順（対話再開）］
Ｔ８通常の対話に戻り、つづいて行なわれる「そのワープロの型番を示せ」といった利用者からの入力に対し、例えば最も最近現れた同一名詞によって照応解決を行なう手法等によって下記のように行う。なお、括弧は、係り受け関係を表す。
【０１３３】
表現「そのワープロ」が、上述の処理によって曖昧性を解消された
「（（（転写式の（プリンタ））つきの）（携帯型の（ワープロ）））」
であることを解析すること等によって、一連の対話によって示された利用者の要求が、例えば
「（要求（出力（型番（（（転写式の（プリンタ））つきの）（携帯型の（ワープロ））））））」
といった意味表現で記述される、
該表現で示される「ワープロＡの型番」の情報提供を意味するものであることが解析される。
【０１３４】
Ｔ９図４に示した領域知識記憶部２の内容の例の格納アドレスＰ１１〜Ｐ２２の内容と、応答内容決定部３での推論処理によって応答内容すべき「ＪＷＰ−ＸＸ」であることが決定される。
【０１３５】
言語生成部４を通じて、「型番は、ＪＷＰ−ＸＸです」といった出力応答が利用者に提示される。
【０１３６】
このように構成された本装置によれば、下記のような効果がある。
【０１３７】
利用者からの自然言語入力の処理中に、必要に応じて処理装置からの割り込みを行なうことができる。
【０１３８】
利用者からの自然言語入力の処理中に、処理装置の解析能力を上回る曖昧性が生じた場合に、即座に間投詞等を出力することによって、利用者の発話に割り込みをかけ、無駄な発話を続けることによる利用者の負担を解消することができる。
【０１３９】
利用者からの自然言語入力の処理中に、処理装置の解析能力を上回る曖昧性が生じて割り込みをかけた後に、曖昧性の解消のための対話を行なうことによって、利用者と処理装置の間の誤解を無くし、柔軟で円滑な対話を行なうことができる。
【０１４０】
利用者の発話に対して、間投詞等によって処理装置が割り込みをかけた場合に、利用者が発話を中断したり、あるいは、了解の間投詞等を発声して処理装置の割り込みを利用者が受理した場合にのみ曖昧解消や矛盾解消の対話を起動することができる。
【０１４１】
上記実施例は、利用者からの言語解析部１での自然言語入力の解析において生じる曖昧性の数を監視し、あらかじめ設定した閾値との比較によって曖昧性の過多を検出し、利用者の発話に対する割り込みと曖昧性解消のための確認対話を行なっている。しかし、これに代えて、下記のような変更例が実施可能である。
【０１４２】
変更例１
利用者の発話に含まれる誤りや矛盾の解消に利用できるように拡張することが可能である。
【０１４３】
例えば、応答内容決定部３において、言語解析部１から得られる利用者の発話の（部分の）解析結果と領域知識記憶部２に記録されている情報との整合性の確認処理を随時行ない、両者の間に矛盾が生じた場合に利用者の発話への割り込み処理を実施するように構成できる。
【０１４４】
変更例２
利用者からの自然言語入力の解析における曖昧性の数の検査において、第２の閾値Ｔ２及び第３の閾値Ｔ３を、「閾値Ｔ３＜閾値Ｔ２＜閾値Ｔ１」となるよう設定する。談話状態Ｓ０で、曖昧性が「閾値Ｔ２＜曖昧性＜閾値Ｔ１」なった場合に、談話状態をＳ１とし、その後の利用者からの構造の自然言語入力の解析によって曖昧性が解消し「曖昧性＝０」となった場合、あるいは、曖昧性が充分に小さくなり「曖昧性＜閾値Ｔ３」となった時点で、言語生成部４を通じて「えぇ」等といったあいづちを利用者に提示する。
【０１４５】
これにより、利用者から、処理装置の解析能力を上回る曖昧性を生じる可能性のある複雑な自然言語入力がなされ、かつ、後続の入力の処理等によって処理装置が該入力の内容を解析できた場合に、解析失敗時の利用者の再度の言い直しによる負担増加への不安を解消できる。
【０１４６】
なお、この機能は、図７の談話制御規則記憶部５ｂの内容の例の格納アドレスＱ１１及びＱ１２に記録されている談話制御規則によって実現できる。
【０１４７】
変更例３
上記実施例においては、制御手順Ｐ１によるあいづちの提示や、制御手順Ｐ２の割り込みの言語生成部４での実施について、談話制御部５ａが実施を決定した時点ですぐに行なうようにしていた。
【０１４８】
これを拡張し、自然言語発話での割り込みが可能な時点の直前に現れる言葉（例えば、「〜ね」といった文末や文節末を表す終助詞や、「ございます」といった文末表現等の言葉）を言語解析部１に新設した割り込み可能位置記録部にあらかじめ用意しておく。談話制御部５ａから割り込み、あるいは、あいづちの発話の要求があった場合に、言語解析部１が利用者が現在行なっている発話の解析結果と、割り込み可能位置記憶部に記録されている情報との比較によって、割り込み可能な時点を抽出した時点で、例えば図３（ｅ）に示すような解析動作状況情報を出力させるようにしておく、談話管理部５が、この解析動作状況情報を受けとった時点で、言語生成部４を通じてあいづち、あるいは、割り込みを実施するように構成する。
【０１４９】
これにより、利用者の発話に対して処理装置からの割り込みが必要な場合に、自然言語対話において発話の終端位置あるいは割り込みが可能な位置を判断して、適切なタイミングで割り込みをかけることが可能である。
【０１５０】
変更例４
７
利用者と処理装置の対話の履歴を記録するとともに、記録された対話の履歴から、利用者の発話終端位置の直前の発話の部分を抽出し、割り込み可能位置記録部に追加するように構成できる。
【０１５１】
変更例５
割り込み可能位置記録部の各エントリに、そのエントリに記録されている情報の信頼性を表す数値である信頼性ポイントを追加し、処理装置が行なった割り込みが利用者から受理されたかどうかによって、対応する信頼性ポイントの値を調整し、信頼性ポイントの値によって割り込み可能位置情報記憶部に記録されている情報の適用を制御することによって、利用者ごとの発話の癖等に応じた柔軟な割り込みを行なうことができる。
【０１５２】
第２の実施例
以下、第２の発明の自然言語処理装置の一実施例である規則音声合成装置１００につき図１０〜図２５を参照して説明する。
【０１５３】
図１０は、本実施例の規則音声合成装置１００の構成図である。
【０１５４】
規則音声合成装置１００は、テキストを構文解析し、その結果の音韻情報を、予め用意した規則に従って適用したものを、音声データとして音声合成を行なう装置である。すなわち、規則音声合成装置１００は、下記のものよりなる。
【０１５５】
構文解析部１０１は、入力文を予め用意した文法規則に従って、形態素解析と統語解析を行ない、構文木を生成する。
【０１５６】
類音検索部１０２は、入力文中の各名詞の、同音異義語あるいは聞き誤りやすい発音の名詞を、アクセント・パタン及び聞き誤りやすい音素のリストを元に検索する。
【０１５７】
類音判定部１０３は、文中の他の単語との共起情報によって、類音判定部１０３の結果を制約する。
【０１５８】
言い換え候補生成部１０４は、同音異義語あるいは聞き誤りやすい発音の名詞を持つ名詞の言い換え候補を生成する。
【０１５９】
言い換え候補判定部１０５は、異なる名詞に対する言い換え候補同士が聞き誤りやすい関係にないように、各名詞に対する言い換え候補を選択し、選択した言い換え候補に従って構文木を書き換える。
【０１６０】
音韻処理部１０６は、構文木に対して予め用意した韻律規則に従って音韻情報を適用して音声データを生成する。
【０１６１】
合成器１０７は、音韻処理部１０６で生成された音声データを出力する。
【０１６２】
語調変化選択部１０８は、音声データから実際の音声を生成する合成器１０７と、言い換え候補に置換された部分を語調変化させるか否かを選択させる。
【０１６３】
ここで、本実施例で使用する言葉の定義及びアクセントの規則等について説明する。
【０１６４】
▲１▼単語間の発音距離
「単語間の発音距離」を、類音リスト１０９に含まれる音素を交換することによって、ある単語が別の単語に変換される交換回数と定義する。
【０１６５】
例えば、「洗剤」は音素に分解すると「ｓｅｎｚａｉ」となるが、類音リスト中の「ｚ→ｓ」によって、一個の音が入れ代わった「ｓｅｎｓａｉ」は、発音距離は１であるとする。
【０１６６】
類音リスト１０９は、図１１に示すように、ある音とその音が変化しやすい音を並べたものである。例えば、“ｓ→ｚ”は、子音「ｓ」が「ｚ」に（「サ（ｓａ）」が「ザ（ｚａ）」に、“ｐｊ→ｈｊ”は「ｐｊ」が「ｈｊ」に（「ピュ（ｐｊｕ）」が「ヒュ（ｈｊｕ）」に）聞き誤られやすいことを示す。
【０１６７】
▲２▼「類音語」、「類音名詞」
任意の二単語間でアクセント・パタンが同一で、発音距離が予め定められた閾値以下の場合、二つの単語は類音関係にあると呼び、一方の単語は、他方の単語の「類音語」、名詞の場合は「類音名詞」であると定義する。
【０１６８】
▲３▼アクセントの規則
アクセントは、図１２のように表現される。国体では「くたい」のを高く発音し、また「意義」では「い」の次の音で低くなることを示している。
【０１６９】
アクセント辞書１１０は、図１３に示すように、単語とアクセント・パタンを並べたものであり、「国体」のアクセント・パタンは「こＨくたい」であり、「こ」以後の音から高くなることを示し、「意義」のアクセント・パタン「いＬぎ」は、「い」の次の音から低くなることを示す。
【０１７０】
▲４▼その他
言い換え候補を置換した部分の音声出力の語調を変化させるか否かの選択は、利用者によって予め入力として与えられている。その状態は、語調変化選択部１０８に保持されている。語調を変化させるよう選択された場合は、言い換え部分の開始と終了を示すよう言い換え判定部が言い換え部分の始点と終点にマーカーを付加している。音韻処理部１０６は、言い換え判定部で付加されたマーカーをもとに語調を変化させて出力する。
【０１７１】
次に各構成の機能について順番に説明する。
【０１７２】
（１）構文解析部１０１
構文解析部１０１は、キーボードや対話装置によって入力された入力文を入力とし、予め用意した文法規則に従って、形態素解析と統語解析を行ない、出力として構文木を生成する。
【０１７３】
また、この入力文の入力時に、言い換えられた部分の出力音声を変化させるか否かの語調変化選択入力も入力され、その状態は語調変化選択部１０８に保持される。
【０１７４】
（２）類音検索部１０２
類音検索部１０２は、図１９の処理フローにあるように、構文木中の各名詞について、類音リスト１０９と辞書１１３及びアクセント辞書１１０を参照し、各名詞に対する類音名詞の有無を検索する。すなわち、各名詞について、名詞中の各音を類音リスト１０９に含まれる聞き誤られやすい音に閾値以内の個数分だけ置換した音の並びが、辞書中に名詞として含まれかつアクセント・パタンが同じ場合（つまり、類音名詞である場合）に、該当の名詞とその類音名詞の対を、言い換え対象リスト１１４に登録する。この時、発音が同じであっても、アクセント・パタンが異なる名詞や発音距離が閾値以上の場合の名詞は、誤認されることはないと判定する。
【０１７５】
辞書１１３は、図１７に示すように、品詞と同義語や類義語、同音の語と比較しての出現頻度を格納している。例えば、「洗剤」は読みが「せんざい」で、品詞が「普通名詞」であり、「洗浄剤」と同義、出現頻度は０．７であることを示している。
【０１７６】
言い換え対象リスト１１４は、入力文中の名詞で類音名詞を持つものと、その類音名詞の対を格納したもので、以降の処理は、この言い換え対象リスト１１４に含まれる名詞に対して行なわれる。
【０１７７】
（３）類音判定部１０３
類音判定部１０３は、類音検索部１０２の出力を受けて、図２０の処理フローにあるように、該名詞に対して共起辞書１１１を参照し、共起関係がある単語が構文木中に存在すれば、共起関係から誤認されることはないと判定し、該名詞とその類音名詞の対を言い換え対象リスト１１４から削除する。
【０１７８】
共起辞書１１１は、図１４に示すように、ある単語とある単語の共起関係、つまり同時に出現しやすい関係にあることを示したものである。例えば、「国会」と「国対」、「競技場」と「国体」、「洗剤」と「洗濯」、「前栽」と「植木屋」は共起関係である。
【０１７９】
しかし、辞書１１３を参照しての該名詞の出現頻度が閾値以下の場合、該名詞は変形の必要ありとして、言い換え対象リストからの削除は行なわない。
【０１８０】
（４）言い換え候補生成部１０４
言い換え候補生成部１０４は、図２１の処理フローにあるように、言い換え対象リスト１１４に含まれる入力文中の名詞に対して、予め設定された言い換え候補の最大生成個数回だけ、以下の処理によって言い換え候補を生成する。まず、該名詞について辞書の品詞情報を参照し、次の３通りの各々の場合について言い換え候補生成処理を行なう。元の名詞と言い換え候補の対は、言い換え候補リスト１１５に登録される。
【０１８１】
１．何らかの名詞を省略した略語の場合
辞書を参照して、その名詞の省略前の名詞（正式名称等）を言い換え候補とする。または、省略前の名詞をその名詞に前置あるいは後置しての併記を言い換え候補とする。
【０１８２】
また、英語等の外国語名詞の略語は、上記の処理と同様の処理に省略前の名詞の日本語訳、日本語訳での略語、あるいは上記の処理と同様に併記を言い換え候補とする。
【０１８３】
「国体」 → 「国民体育大会」
「国対」 → 「国会対策委員会」
「ＪＡＳ」 → 「日本食品規格」「日本エアシステム」
「ＷＡＳＰ」 → 「プロテスタントのアングロサクソンの白人」
２．固有名詞の場合
固有名詞は、人名、会社名、地名等、一つのものに限った名称を表わす名詞である。このような固有名詞に対する言い換え生成手段は、シソーラス１１２を参照しての、該固有名詞の上位、下位に位置する語を付加した形式を言い換え候補とする。
【０１８４】
シソーラス１１２は、図１５に示すように、単語間の概念的な上下関係を表現したもので、図１５の場合は、地方−都道府県−市町村といった地名のつながりに関する情報を表現している。この図では、「北町」「喜多町」「木田町」が「きたまち」、「南町」「皆実町」「美並町」が「みなみまち」、「中町」「那珂町」が「なかまち」と各々同じ読みである。
【０１８５】
ここで、図中の「喜多町」について考えると、その上位に位置する「Ｂ市」「Ｘ県」といった情報を付加した形、「Ｂ市喜多町」、「Ｘ県Ｂ市北町」といった形を言い換え候補とする。
【０１８６】
「北町」 → 「Ａ市北町」「Ｘ県Ａ市北町」
「喜多町」 → 「Ｂ市喜多町」「Ｘ県Ｂ市喜多町」
「南町」 → 「Ａ市南町」「Ｘ県Ａ市南町」
「皆実町」 → 「Ｃ市皆実町」「Ｙ県Ｃ市皆実町」
３．普通名詞の場合
普通名詞は、一つの種類に属する固体に対して用いられる。普通名詞に対する言い換え候補生成手段は、一つには固有名詞に対する手段と同じシソーラス１１２を参照しての方法がある。例えば、「こＨうえん」という同じ読み同じアクセントの「公園」と「公演」の各々に対するシソーラスが図１６のように用意されているとする。この時の言い換え候補は、
「公園」 → 「場所の公園」
「公演」 → 「イベントの公演」
のように生成される。
【０１８７】
もう一つ、普通名詞に対する言い換え候補生成手段は、辞書１１３を参照して、その単語の同義語・類義語を言い換え候補とすることである。
【０１８８】
「洗剤」 → 「石鹸」
「前栽」 → 「庭の植込」
（５）言い換え候補判定部１０５
言い換え候補判定部１０５は、図２２の処理フローにあるように、言い換え候補リスト１１５を参照し、異なる名詞に対する言い換え候補が同一である場合、または類音関係にある場合、それらの言い換え候補を言い換え候補リスト１１５から削除する。
【０１８９】
そして、以下の規則を順に適用して、構文木中の名詞を言い換え候補に置換する。
【０１９０】
１．置換処理の方法は、左から右、つまり先頭から終端に向かって行なう。
【０１９１】
２．略語の置換は最初の一回のみで、二回目以降出現しても置換しない。該略語の出現が一度のみの場合は、正式名称の言い換え候補に置換し、複数回数出現する場合は、正式名称併記の言い換え候補に置換する。
【０１９２】
３．固有名詞の置換は最初の一回のみで、二回目以降出現しても置換しない。
【０１９３】
４．普通名詞の、付加情報型言い換え候補への置換は、最初の一回のみで、二回目以降出現しても置換しない。
【０１９４】
５．普通名詞の、同義語または類義語への置換は、全ての出現に対して行なう。
【０１９５】
６．置換部分の語調を変化させるモードが選択されている場合には、その言い換え候補の前後に置換部分であることを示すマーカーを付加する。
【０１９６】
（６）音韻処理部１０６
音韻処理部１０６は、言い換え候補判定部１１５にて、言い換え候補への置換がなされて構文木を、音韻規則に従って、音声データに変換する。この時、置換部分の語調を変化させるモードが選択されている場合には、言い換え候補判定部の置換操作にて付加されてマーカーに従って、置換部分を切り出し、その直後に付属語が続く場合には、付属語を含めて語調変化区間として、置換部分の語調が変化するよう音声データに変換する。
【０１９７】
（７）合成器１０７
合成器１０７は、例えば、予め与えられたホルマント合成器を用いることで、音声データを実際の音声として出力する。
【０１９８】
以上が本装置の構成とその機能である。
【０１９９】
具体例
次に、具体的入力例に基づいて、本実施例の動作を説明する。具体例の入力文を、「今日、国会で国体の意義についての議論が行なわれた」とする。
【０２００】
（１）構文解析部１０１
構文解析部１０１は、上記入力文を形態素解析、統語解析を行ない、構文木を出力する。構文解析部１０１の結果である構文木は、構文解析部１０１が用いる文法の内容によって、大きく変化するが、いずれの場合も、言い換え処理の対象となる、文中の名詞は、「今日」「国会」「国体」「意義」「議論」の５つであると、解析・出力される。
【０２０１】
（２）類音検索部１０２
類音検索部１０２にこの構文木が入力される。類音検索部１０２では、図１８のように、「国体」と「国対」は、同音同アクセント、「意義」と「異議」は、同音同アクセント、として類音の名詞が検索され、言い換え対象リスト１１４に登録される。
【０２０２】
また、例えば類音関係の発音距離の閾値を１とした場合、「洗剤（せんざい）」と発音距離が１で同アクセントの「戦災（せんさい）」は、「洗剤」の類音名詞として検索される。「前菜（ぜんさい）」は、発音距離が２なので、類音名詞ではない。また、「天地」「てＬんち」と「転地」「てＨんち」は同音だが、アクセント・パタンが異なるので、類音名詞ではない。
【０２０３】
この時の言い換え対象リスト１１４の内容を図２３に示す。
【０２０４】
（３）類音判定部１０３
類音判定部１０３では、類音検索部１０２で、類音名詞が存在するとされた「国体」と「意義」について、共起関係にある単語を文中より検索する。本例では、「国体」「意義」は各々、文中に共起関係を持たない。また、出現頻度の閾値を０．５とした場合に、辞書情報から、「国体」の出現頻度は０．４、「意義」の出現頻度は０．６であるとする。この場合、どちらも共起関係を持たないことから、言い換え対象リスト１１４に対する検索はなされず、次の言い換え候補生成部１０４において言い換え候補が生成されることになる。
【０２０５】
（４）言い換え候補生成部１０４
言い換え候補生成部１０４では、該名詞及び類音名詞の種類によって言い換え候補生成処理を行なう。本例では、
「国体」 → 「国民体育大会」「国民体育大会（国体）」「国体（国民体育大会）」
「国対」 → 「国会対策委員会」「国会対策委員会（国対）」「国対（国会対策委員会）」
「意義」 → 「意味」「価値」
「異議」 → 「異論」「不服」「反対意見」
といった言い換え候補が生成される。このときの言い換え候補リスト１１５の様子を図２４に示す。
【０２０６】
（５）言い換え候補判定部１０５
言い換え候補判定部１０５では、異なる単語に対する各々の言い換え候補の内で、互いに類音関係にある言い換え候補を削除する。
【０２０７】
一例を挙げると、「意味」、「不服」各々が類音名詞を持ち、
「意味」 → 「意義」
「不服」 → 「異議」
という言い換え候補が生成された場合、言い換え候補「意義」「異議」は同音同アクセントで類音関係にあるから、言い換え候補リスト１１５から削除される。この様子を図２５に示す。
【０２０８】
そして、置換規則に従って構文木中の名詞を言い換え候補に置換する。本例では、「国体」は一度しか出現しないことから「国民体育大会」が、「意義」は同義語・類義語の先頭である「意味」がそれぞれ言い換え候補として選択され、置換が行なわれる。置換部分の語調を変化させるモードが選択されている場合には、置換部分の先頭と終端の各々に、置換部分の先頭と最後を示すマーカー’△’を付加する。
【０２０９】
以上の結果、言い換え／置換操作後、入力文は、「今日、国会で△国民体育大会△の△意味△についての議論が行なわれた」に言い換えられる。’△’は置換部分の先頭と終端を示す。
【０２１０】
（６）音韻処理部１０６
音韻処理部１０６で音韻規則に従って音声データに変換されるが、置換部分の直後の付属語を語調変化部分に含め、入力文を「今日、国会で『国民体育大会の』『意味についての』議論が行なわれた」として（『…』内は語調変化部分を示す）、音声データに変換される。
【０２１１】
（７）合成器１０７
合成器１０７によって音声データは実際の音声に変換される。
【０２１２】
かくして、このように構成された本装置によれば、音声的にアクセントが同一で発音が同一あるいは類似した複数の名詞が存在することで、聞き誤られやすい名詞を、情報付加あるいは同義語類義語への置換を行なうことで、誤って理解されることの少ない音声出力を生成する規則音声合成が可能となる。
【０２１３】
なお、各辞書の内部構造、各閾値の具体的数値も上述した例に限定されるものではない。
【０２１４】
変更例１
規則音声合成装置１００は、例えば「佐藤さん」がどの「佐藤さん」を指すのかを明示して出力したい場合に、装置の入力である文から、どの「佐藤さん」を指すのかについての情報を得ることができなかった。
【０２１５】
しかし、音声対話装置としての実施例では、音声出力部分の入力として対話装置からの出力用知識表現が与えられ、また音声対話装置の知識情報も参照可能であるので、音声出力において対話装置が、どの「佐藤さん」を指しているのかを知ることができ、例えば、「神戸の佐藤さん」等の言い換えが可能になる。
【０２１６】
変更例２
各単語はより細かい概念（以下、マイクロ・フィーチャーという）の集合によって表現し、言い換え候補の生成において、類音名詞各々を構成するマイクロ・フィーチャーを比較し、その差分を明確にするような情報の付加、同義語類義語への言い換えを行なうことによって、言い換え候補生成の効率を向上させ、より的確な言い換えを生成できるような方法に変更することが挙げられる。
【０２１７】
例えば、同じ「はさみ」という読みの、「挟み（書類挟み）」と「鋏（裁ち鋏）」では、以下のようなマイクロ・フィーチャーの集合で記述される。
【０２１８】
【表１】

となり、「挟み」を出力する場合には、差分のマイクロ・フィーチャーを明示して「固定する挟み」、「鋏」の場合には、「刃物の鋏」または「切断する鋏」といった、同音異義語との違いを最小限の付加情報によって、明確な言い換え候補の生成が可能となる。
【０２１９】
変更例３
本実施例は、音声対話装置の音声出力部分と捉えることが可能であり、図２６に示すように音声対話装置の音声出力部分を本実施例に置き換えることが可能である。
【０２２０】
このことによって、音声対話装置で、利用者からの音声入力の認識において、認識率が同程度の複数の認識候補が存在して、対話処理部において、利用者に対して問い直しを行なうことを決定し、実際に問い直しを行なう際、認識候補中で互いに類音関係にある候補について、言い換え生成部、言い換え処理部での処理を加えることで、類音関係にある認識候補を類音関係のない別の語句に言い換えして、質問することが可能である。
【０２２１】
動作例：百科事典データベースのインタフェース手段としての音声対話装置
利用者：「センザイについて知りたい」
「センザイ」の認識候補として、同音かつ同アクセントの「洗剤」と「前栽」と、発音距離が１の「戦災」が挙がる。装置は、これらの「洗剤」「前栽」「戦災」の内、利用者はどれを指しているのか問い直しを行なうことを決定する。
【０２２２】
そして、「洗剤」「前栽」「戦災」は類音関係なので、そのまま問い直しては利用者に誤解を与えやすいので、言い換え候補として各々「洗浄剤」「庭」「戦争災害」を生成選択し、これらの内で利用者はどれを指しているのかを問い直す。
【０２２３】
問い直し手段として、認識候補の内、一番確率が高いものを選択して問い直すとすると、
装置：「洗浄剤ですか？」
あるいは、各認識候補を並立させて問い直すとすると、
装置：「洗浄剤ですか、庭ですか、戦争災害ですか？」
という装置の問い直し出力が生成される。
【０２２４】
【発明の効果】
第１の発明の自然言語処理装置によれば、利用者からの自然言語入力の処理中に、必要に応じて処理装置からの割り込みを行なうことができる。これにより、自然で円滑で柔軟性を持った対話を行なうことができる。
【０２２５】
第２の発明の自然言語処理装置によれば、同音異義語あるいは聞き誤られやすい発音の単語を持つ単語は、誤認されない形への言い換え処理を施されることで、自然な音声出力を得ることができる等の実用上多大な効果が奏せられる。
【図面の簡単な説明】
【図１】第１の実施例の自然言語処理装置のブロック図である。
【図２】言語解析部１のブロック図である。
【図３】解析動作状況情報の例である。
【図４】領域知識記憶部２の内容の例である。
【図５】言語生成部４のブロック図である。
【図６】談話管理部５のブロック図である。
【図７】談話制御規則記憶部５ｂの内容の例である。
【図８】自然言語処理装置での対話処理例である。
【図９】例文の係り受けに関する曖昧性をもつ解析結果候補の例である。
【図１０】第２の実施例の規則音声合成装置１００のブロック図である。
【図１１】類音リスト１０９の例である。
【図１２】一般アクセントの表記例である。
【図１３】アクセント辞書１１０の例である。
【図１４】共起辞書１１１の例である。
【図１５】シソーラス１１２の例である。
【図１６】シソーラス１１２の他の例である
【図１７】辞書１１３の例である。
【図１８】類音検索部１０２と類音判定部１０３での類音の検索と判定の例である。
【図１９】類音検索部１０２の処理フローである。
【図２０】類音判定部１０３の処理フローである。
【図２１】言い換え生成部１０４の処理フローである。
【図２２】言い換え判定部１０５の処理フローである。
【図２３】言い換え対象リスト１１４の例である。
【図２４】言い換え候補リスト１１５の例である。
【図２５】言い換え候補リスト１１５からの削除の例である。
【図２６】音声対話装置の例のブロック図である。
【符号の説明】
１言語解析部
２領域知識記憶部
３応答内容決定部
４言語生成部
５談話管理部
１０１構文解析部
１０２類音検索部
１０３類音判定部
１０４言い換え候補生成部
１０５言い換え候補判定部
１０６音韻処理部
１０７合成器
１０８語調変化選択部
１０９類音リスト
１１０アクセント辞書
１１１共起辞書
１１２シソーラス
１１３辞書
１１４言い換え対象リスト
１１５言い換え候補リスト[0001]
[Industrial applications]
The present invention relates to a natural language processing device that interacts with a user in a natural language.
[0002]
[Prior art]
2. Description of the Related Art In recent years, with the development and spread of computer devices, opportunities for general users other than experts to use the computer devices have been increasing.
[0003]
However, the operation of the computer device requires specialized knowledge, and it is difficult for a general user to freely use the computer device.
[0004]
Under such a background, research and development of a natural language interactive device for using a computer device have been actively performed by using a natural language such as Japanese or English which a user usually uses.
[0005]
In such a natural language dialogue device, the contents of a user's request are understood through language analysis and semantic interpretation, a problem is solved for the request, a response is determined, and the response is used by generating a language. To the person.
[0006]
Also, in a natural language dialogue apparatus using speech, speech recognition is performed at the beginning of the above-described processing, and speech synthesis processing is performed at the last stage of the processing.
[0007]
However, the conventional natural language dialogue device has a problem that natural dialogue cannot be performed because the user and the dialogue device are assumed to speak alternately.
[0008]
For example, even if an error or inconsistency exists in part of the user's utterance, or if the user makes an utterance that exceeds the analysis capability of the dialogue device, the dialogue device recognizes these problems during the utterance. It was impossible to point out to the user.
[0009]
Therefore, there is a problem that the user cannot notice an error until the time when all the utterances are completed, and the utterance made becomes useless, resulting in an increase in the burden on the user.
[0010]
Conventionally, in Japanese Patent Application Laid-Open No. 3-248268, a "voice dialogue processing method" detects a signal level of an input voice, and when the signal level is continuously lower than a threshold for a predetermined time or when a vowel of the input voice is continuous for a predetermined time. In some cases, an interactive device sends a message. However, since this is a sending process of an answer only by (sound) signal processing, an appropriate interruption is not performed in consideration of the content of the utterance of the user.
[0011]
As described above, in the conventional natural language dialogue device, sufficient consideration has not been given to simultaneous utterances such as guesswork and involvement, and therefore, there has been a problem that natural and efficient dialogue cannot be performed. .
[0012]
In recent years, along with the improvement of the processing capability of the computer device, it has become important for humans to improve the operational convenience of the computer device, that is, to improve the user interface.
[0013]
Various means such as a keyboard, a mouse, and an electronic pen have been proposed and put to practical use as means of the user interface. Among them, the natural language means using voice is natural, easy to use, and very efficient for human beings. For this reason, there have been proposed a number of commands to a computer device by voice input and a computer output method by voice.
[0014]
The interactive device is a computer device that understands a user's input in a natural language, processes the input, and outputs a result in a natural language as a front end of a computer device such as a database. A control mechanism for smoothing the flow is attached.
[0015]
A speech dialogue device is a dialogue device that uses voices for input means and output means, and enhances the efficiency of information transmission, such as omission and pronoun reference, which are frequently used in speech languages, while accurately reducing ambiguity. It must have a function that can be processed.
[0016]
As a problem peculiar to information transmission by voice, there is a problem such as misunderstanding of a homonymous word or a word having a similar pronunciation.
[0017]
In recent spoken dialogue apparatuses, in understanding input, context information is used as a constraint to select a relevant candidate or to ask a user again. In addition, when the reliability of the recognition candidates is low, for example, a method of re-inquiring the location or requesting re-input of the entire text is adopted.
[0018]
However, there is no solution to how to output homonymous words or easily pronounced pronunciation words in the output of the re-translated question or in the general output so as not to cause misunderstanding.
[0019]
[Problems to be solved by the invention]
As described above, in the conventional natural language dialogue device, since the dialogue device and the user are assumed to alternately utter, the dialogue device cannot start a response until the user's utterance is completely finished.
[0020]
For example, even if an error or inconsistency exists in part of the user's utterance, or if the user makes an utterance that exceeds the analysis capability of the dialogue device, the dialogue device recognizes these problems during the utterance. Since it is impossible to point out to the user, the utterance made by the user is wasted, resulting in an increase in the burden on the user, and as a result, a natural, smooth, and flexible dialogue cannot be performed. was there.
[0021]
However, in human-to-human dialogue, it is important to give the natural language dialogue device the ability to handle simultaneous utterances, given that smooth and efficient dialogue is performed by simultaneous utterances such as hail and interruption. .
[0022]
The first invention has been made in consideration of such circumstances, and realizes appropriate interruption and presentation of a message from a processing device in a dialogue between a natural language processing device and a user. The present invention provides a natural language processing apparatus capable of performing a smooth and flexible dialog.
[0023]
Further, as described above, conventionally, in the case of inputting / outputting a speech having a homonym or a word that is apt to be misunderstood, particularly in output, a word having a homonym or a word that is apt to be misunderstood is not misunderstood. Alternatively, there has been a problem that the processing cannot be performed in other words in a form that hardly occurs.
[0024]
The second invention has been made in view of such circumstances, and whether or not a paraphrase is changed to a form that is not misunderstood or hardly perceived in accordance with the situation, and in which case, which deformation means is used. It is an object of the present invention to provide a natural language processing apparatus which enables switching between using a sentence and using the contents of a target sentence in an accurate and efficient manner.
[0025]
[Means for Solving the Problems]
The natural language processing apparatus according to the first invention sequentially analyzes a natural language input from a user to output syntactic information, prosody information, or analysis result information of semantic information, and to change the state of the analysis operation. Language analysis means for outputting the analysis operation status information to be represented, and response content information determined based on the content of the analysis result information from the language analysis means,The state of the decision operation until the analysis result information is received and the response content information is determinedResponse content determining means for outputting the response determination operation status information to be represented, and for the response content information from the response content determining means, converting the content into a natural language output and sequentially presenting it to the user, A language generation unit that outputs generation operation status information indicating a status of a generation operation of a natural language output, and according to at least one of the analysis operation status information, the response determination operation status information, and the generation operation status information, A discourse management means for controlling the language generation means to perform simultaneous utterances such as an answer to the user's natural language input, or an interruption, wherein the language analysis means is generated in the process of analyzing the natural language input. Analysis operation status information including information on ambiguity is output in accordance with the number of candidates having ambiguity in the analysis result, and the discourse management unit outputs the analysis operation status. At the time of receiving the broadcast, by outputting a natural language for realizing the interrupt consisting interjection through the language generation means, responsive to nod to the utterance of the user, or is obtained by the so interrupts. In addition, language analysis that outputs analysis result information of syntactic information, prosodic information or semantic information by sequentially analyzing natural language input from a user, and outputs analysis operation status information indicating the status of analysis operation Means and response content information for the analysis result information from the language analysis means are determined according to the content thereof,The state of the decision operation until the analysis result information is received and the response content information is determinedResponse content determining means for outputting the response determination operation status information to be represented, and for the response content information from the response content determining means, converting the content into a natural language output and sequentially presenting it to the user, A language generation unit that outputs generation operation status information indicating a status of a generation operation of a natural language output, and according to at least one of the analysis operation status information, the response determination operation status information, and the generation operation status information, Discourse management means for controlling the language generation means to make simultaneous utterances such as a response to user's natural language input or interruption, and area knowledge storage for holding area knowledge information which is knowledge related to the content of the dialogue The language analysis means sequentially compares the obtained analysis result information with the area knowledge information recorded in the area knowledge storage means. When a contradiction occurs between the two, the discourse management means outputs the analyzed operation status information including the contradictory content. In this case, a natural language output that realizes the above is output to interrupt the user's utterance.
[0026]
According to a second aspect of the present invention, there is provided a natural language processing apparatus comprising: input means for inputting natural language information; and a similar sound for detecting, from the natural language information, a similar sound phrase having a phonetically similar or similar pronunciation. Detecting means;The paraphrase candidate of this detected similar phrase isA paraphrase candidate generating means for generating;In the case where the detected similar sound phrases are plural, the paraphrase candidates having a similar sound relation to each other among the paraphrase candidates of the respective similar sound phrases are deleted, and the non-deleted paraphrase candidates are referred to as the respective similar sound phrases. Paraphrase to replace eachAnd candidate determination means.
[0027]
[Operation]
An operation state of the natural language processing device of the first invention will be described.
[0028]
The language analysis means receives natural language input such as voice or character information from a user and sequentially analyzes the received portions to output syntactic information, prosodic information or semantic information analysis result information, and perform an analysis operation. Outputs analysis operation status information indicating the status of.
[0029]
The response content determination means receives the analysis result information from the language analysis means, determines response content information based on the content, and outputs response determination operation status information indicating the status of the response determination operation.
[0030]
The language generation means receives the response content information from the response content determination means, converts the content into a natural language output consisting of voice or natural language, sequentially presents the output to the user, and generates a generation operation indicating the status of the generation operation. Output status information.
[0031]
The discourse management means, according to at least one of the analysis operation status information, the response operation status information, and the generation operation status information output from the language analysis unit, the response content determination unit, or the language generation unit, At least one operation of the generating means is a simultaneous utterance such as a response to a user's natural language input or an interrupt.
[0032]
A case where the natural language processing apparatus of the first invention is further embodied will be described using an actual example.
[0033]
{Circle around (1)} The linguistic analysis means outputs analytic operation status information including information on ambiguity according to the number of ambiguity candidates in the analysis result generated in the process of analyzing the natural language input, and the discourse management means Upon receiving the analysis operation status information, the language generating means outputs a natural language output for realizing an interrupt made up of an interjection or the like, so that a response to the user's utterance or an interrupt is generated. If you set it to:
[0034]
A natural language input such as "Beautiful black eyes with long long hair ..." is made by the user, and in the process of interpreting this input, a large number of qualified phrase candidates to which the modifier "black" is related are generated. When the number of ambiguities exceeds the processing capability of the processing device, the user's utterance is interrupted at a necessary point in time without waiting for the user's utterance to end using interjections, etc. Initiate a disambiguation dialog following an event or interrupt. In addition, a natural language input such as "long black eyes" is input from the user, and in the process of analyzing the input, there are many candidates for the modified phrase to which the modifier "black" is related. When the number of ambiguities was generated and the number of ambiguities approached the processing capacity of the processing unit, the processing unit could not understand all of the user's input, but for the time being, hold on to this point and talk to the user. When the ambiguity is resolved, for example, by outputting an interjection such as "Umm" to indicate that you want to proceed, or after the user has input from the user, for example, "Women are ..." Then, the user is presented with an interjection such as "Eh", which expresses that the processing device has understood the user's input.
[0035]
As a result, the burden on the user caused by continuing useless utterances can be eliminated.
[0036]
{Circle around (2)} having area knowledge storage means for holding area knowledge information which is knowledge related to the content of the dialogue, wherein the language analysis means stores the obtained analysis result information and the area recorded in the area knowledge storage means; The comparison with the knowledge information is sequentially performed, and when inconsistency occurs between the two, the analysis output information including the inconsistent content is output as the analysis operation status information.When the discourse management unit receives the analysis operation status information, The following is a case where a natural language output for realizing an interrupt made up of an interjection or the like is output through the language generating means to interrupt the user's utterance.
[0037]
The user is inputting "Show the number of employees working at the branch of Company A in the United States", and according to the knowledge possessed by the processing equipment, there is no branch of Company A in the United States When there is a problem in dialog communication such as whether there is an error in the premise of the user's utterance or whether there is an error in the recognition result of the processing device for the user's input, Before the user's utterance ends, the user's utterance is interrupted using interjections, if necessary.
[0038]
As a result, the burden on the user caused by continuing useless utterances can be eliminated.
[0039]
{Circle around (3)} The discourse management means, when receiving analysis operation status information indicating that the user has accepted the interrupt from the language analysis means, analyzes the natural language input from the user who caused the interrupt. When the dialogue for eliminating the ambiguity of the language and the inconsistency in the meaning content of the user's natural language input is performed as follows.
[0040]
Subsequent to the interruption of (2), a dialog for activating an error in the premise of the user's utterance or an error in the recognition result of the processing device in response to the user's input is activated and solved.
[0041]
Thereby, a misunderstanding between the user and the processing device can be eliminated, and a flexible and smooth conversation can be performed.
[0042]
{Circle around (4)} When the language analysis means interrupts the user by interjections or the like under the control of the discourse management means, the user interrupts the utterance or generates an interjection or the like during understanding. It detects whether or not the user has accepted the interrupt of the processing device, and outputs analysis operation status information indicating the result, and the discourse management means performs a dialog for ambiguity resolution and inconsistency resolution in accordance with the analysis operation status information. If the system is controlled to start, it will be as follows.
[0043]
If an interrupt is made by an interjection such as "Ao", for example, if the user responds to "Yes" or interrupts the utterance that was being made, and receives an interrupt from the processing device, the interrupt is issued. If the processing is continued, for example, if the utterance that the user was performing is continued, the interrupt processing is stopped, and the processing after the interrupt is performed based on the status of the user's response to the interrupt from the processing device. Control.
[0044]
Thus, the dialog for ambiguity resolution and inconsistency resolution can be activated only when the user interrupts the utterance or utters a verbal or the like during the understanding to accept the interruption of the processing device.
[0045]
{Circle around (5)} Interruptable position feature storage means which stores in advance information about the end position of the utterance in the natural language dialogue, the syntactic feature of the partner utterance immediately before the interruptible position, or the prosodic feature of the partner utterance. The discourse management means, when an interrupt is required, compares the analysis result information obtained from the language analysis means with the contents of the interruptable position feature storage means to detect an interruptible position The following is a case where control is performed such that an interrupt is made by an interjection or the like through the language generating means at the time.
[0046]
When the processing device interrupts the user's utterance, the utterance performed by the user and the prosodic features such as the honorific expression at the end of the sentence, such as “-is,” and the pause (silence interval) For example, an interrupt is performed at an appropriate position by performing an interrupt based on comparison of information on syntactic or prosodic features at an interruptable position prepared in advance.
[0047]
This makes it possible to determine the end position of the utterance or the position where interruption is possible in the natural language dialogue, and to apply an interruption at an appropriate timing.
[0048]
{Circle around (6)} A dialog history storage means for holding a history of dialogues with the user is provided. From the contents of the dialogue history storage means, the position at which the utterance change occurs or the silent section position is determined as the utterance end position or interrupt. The interruptable position extracting means for extracting as a possible position, extracting a syntactic characteristic or a prosodic characteristic immediately before the position, and additionally recording the characteristic in the interruptable position characteristic storage means. If you set it to:
[0049]
By analyzing the history of the dialogue between the processing device and the user, for example, expressions such as “How about? ↑ (rise of intonation: increase of fundamental frequency of voice)” or “is it ぉImmediately after the user has made such an expression, alternation of the utterance, the fact that the executed interrupt tends to be accepted, and the like are extracted, and information on the interruptable position is extracted.
[0050]
As a result, practically significant effects can be obtained, such as a flexible interrupt that can be performed according to the utterance habit of each user.
[0051]
{Circle around (7)} The interruptable position feature storage unit holds information on the reliability of the interruptable position information, and the language analysis unit performs an interrupt based on the information recorded in the interruptable position feature storage unit. It is determined whether or not the user has accepted the interrupt by interrupting the utterance or generating a spelling during the response to the interrupt, outputting analysis operation status information including the result, and the discourse management means The case where the reliability of the interruptable position information stored in the interruptable position feature storage means is adjusted according to the analysis operation status information is as follows.
[0052]
When the user interrupts the user's utterance from the processing device, the reliability of the information of syntactic or prosodic features at the used interruptible position is adjusted depending on whether the interrupt was accepted by the user. I do.
[0053]
As a result, practically significant effects can be obtained, such as a flexible interrupt that can be performed according to the utterance habit of each user.
[0054]
A natural language processing device according to the second invention will be described.
[0055]
A similar sound detecting means detects a similar sound phrase which is a word having the same pronunciation or similar pronunciation phonetically.
[0056]
The paraphrase candidate generating means generates, as paraphrase candidates, words having the same pronunciation or similar pronunciation phonetically as the similar sound words detected by the similar sound detection means.
[0057]
The paraphrase candidate determining unit determines a target paraphrase candidate from the paraphrase candidates generated by the paraphrase candidate generation unit, and replaces the similar sound phrase with the paraphrase candidate according to the substitution rule.
[0058]
This makes it possible to obtain a clear and efficient natural language output by performing a paraphrase process on a word having a homonym or a word having a pronunciation that is likely to be misunderstood so as not to be mistaken.
[0059]
【Example】
First embodiment
Hereinafter, an embodiment of the natural language processing apparatus of the first invention will be described with reference to FIGS.
[0060]
FIG. 1 shows the configuration of the natural language processing apparatus according to the present embodiment, which comprises a language analysis unit 1, an area knowledge storage unit 2, a response content determination unit 3, a language generation unit 4, and a discourse management unit 5.
[0061]
(1) Language analysis unit 1
The language analysis unit 1 receives natural language input information from a user by voice or the like, performs speech recognition, language analysis, semantic interpretation, and the like, and outputs analysis information representing the analysis result of the natural language input information. And outputs analysis operation status information indicating the operation status of the language analysis unit 1.
[0062]
In addition, the language analysis unit 1 analyzes not only a normal response from the user but also an analysis of simultaneous utterances such as a hail or an interruption from the user during the natural language output of the interactive system.
[0063]
FIG. 2 shows an example of the configuration of the language analysis unit 1.
[0064]
Voice input from the user is taken into the voice recognition unit 12 through the microphone 10.
[0065]
The speech recognition unit 12 performs speech recognition with reference to knowledge for speech recognition such as the phoneme dictionary 14. The analysis result including the intermediate result is output from the language analysis unit 1 as analysis result information, and information such as success or failure of the analysis operation is output from the language analysis unit 1 as analysis operation status information.
[0066]
The syntactic analysis unit 16 analyzes the output of the speech recognition unit 12 with reference to the language dictionary 18, the grammar rules 20, and the like. The syntactic analysis result and the status of the syntactic analysis operation are similarly output from the language analysis unit 1 as analysis result information and analysis operation status information, respectively.
[0067]
The semantic interpretation unit 22 analyzes the output of the syntactic analysis unit 16 with reference to the interpretation rules 24 and the like. Similarly, the semantic interpretation result and the state of the semantic interpretation operation are output from the language analysis unit 1 as analysis result information and analysis operation state information, respectively.
[0068]
3A to 3E show examples of the analysis operation status information output by the language analysis unit 1.
[0069]
(A) is output when all of the results of speech recognition, syntactic analysis, semantic interpretation and analysis of the natural language input from the user in the language analysis unit 1 and the consistency between the analysis result and the contents of the domain knowledge storage unit 2 are successful. An example is shown.
[0070]
(B) indicates that, in the syntactic analysis, the number of analysis result candidates having dependency ambiguity has become larger than a preset threshold T1.
[0071]
(C) indicates that, in response to the interruption to the utterance of the user performed by the processing device, a response indicating the acceptance is made.
[0072]
(D) indicates that the user responded to the question asked by the processing device to the user.
[0073]
(E) is an example of information used when this embodiment is extended, and details will be described later.
[0074]
(2) Area knowledge storage unit 2
The domain knowledge storage unit 2 records domain knowledge information referred to by the response content determination unit 3, the discourse management unit 5, and the like.
[0075]
FIG. 4 shows an example of the contents of the area knowledge storage unit 2.
[0076]
In each entry of the area knowledge storage unit 2, the knowledge content information A represents the area knowledge held by the processing device, and B is the storage address information.
[0077]
For example, in the entry of the storage address P11 in FIG. 4, the content of the knowledge content information A is “(part number word processor A JWP-XX)”, which means that the “part number” of “word processor A” is “JWP-XX”. Represents the domain knowledge that
[0078]
(3) Response content determination unit 3
The response content determination unit 3 determines the content of a response to the user based on the content of the analysis information output from the language analysis unit 1 and the domain knowledge information obtained from the domain knowledge storage unit 2, and determines the content of the response. In addition to outputting response content information representing the response, the response determination operation status information representing the operation status of the response content determination unit 3 is also output.
[0079]
(4) Language generation unit 4
Based on the response content information output from the response content determination unit 3, the language generation unit 4 generates a response to the user in a natural language by language generation and speech synthesis with reference to grammar rules, language dictionaries, phonological rules, and the like. In addition to the output, it also outputs generation operation status information indicating the operation status of the language generation unit 4.
[0080]
The language generating unit 4 presents a message to the user, interrupts the user, and the like. In this process, in the generation of uttered speech such as an interjection to be presented to express the utterance intention of the processing device, the intonation is used. Prosody such as accents and accents are appropriately controlled.
[0081]
FIG. 5 shows an example of the configuration of the language generation unit 4.
[0082]
The syntactic structure generation unit 26 receives the response content information output from the response content determination unit 3 and determines the syntax structure of the response sentence for realizing the content of the response content information with reference to the grammar rule 28. Further, the language generating unit 4 outputs information indicating the status of the operation such as the success / failure of the syntax structure generation as generation operation status information.
[0083]
The surface sentence generation unit 30 generates a response surface sentence by referring to the language dictionary 32 or the like based on the determined syntax structure of the response sentence and the response content information, and performs operations such as success / failure of the surface sentence generation. Information indicating the situation is output from the language generation unit 4 as generation operation situation information.
[0084]
Based on the determined response surface information and response content information, the voice synthesizer 34 generates a voice output as a response by voice synthesis with reference to the phonological rules 36 and the like, presents the response to the user, and presents the response voice output. The language generating unit 4 outputs information indicating an operation status such as success or failure of generation as generation operation status information.
[0085]
In addition, the language generating unit 4 also performs simultaneous utterances such as interruption of response output during output and interruption or a response to a user's input in natural language in accordance with control from the discourse management unit 5.
[0086]
(5) Discourse management unit 5
The discourse management unit 5 receives all or one of the analysis operation status information output from the language analysis unit 1, the response determination operation status information output from the response content determination unit 3, and the generation operation status information output from the language generation unit 4. Based on the section, all or a part of the language analysis section 1, the response content determination section 3, and the language generation section 4 are controlled.
[0087]
Since the discourse management unit 5 is a component that plays a central role in the present embodiment, it will be described in further detail below.
[0088]
FIG. 6 shows an example of the configuration of the discourse management unit 5.
[0089]
The discourse control section 5a has a discourse state register RS for holding a conversation state therein, receives analysis operation state information, generation operation state and response determination operation state information, and refers to the contents of the discourse control rule storage section 5b. Thus, the language analysis unit 1, the language generation unit 4, and the response content determination unit 3 are controlled through the language analysis unit control information, the language generation unit control information, and the response content determination unit control information.
[0090]
The discourse control rule storage unit 5b stores rules for determining the operation of the discourse control unit 5a prepared in advance, and is referred to by the discourse control unit 5a.
[0091]
FIG. 7 shows an example of the contents of the discourse control rule storage unit 5b.
[0092]
In each entry of the discourse control rule storage unit 5b, information on the discourse control rule corresponding to the entry Qi includes discourse state information A, attention target information B, situation condition information C, discourse state transition information D, and control content information E. Etc. and recorded. F is storage address information. The information of A, B, C, D, E, and F of the entry Qi is collectively referred to as a discourse control rule Ri.
[0093]
In the column of the discourse state information A, restrictions on the contents of the discourse state register RS when the discourse control rule Ri corresponding to the entry Qi is applied are recorded.
[0094]
In the column of the target information B, “analysis”, “generation”, or “response” representing the type of the operation situation information Xj that is a condition for applying the discourse control rule Ri is recorded. For example, in the entry of Q11 in FIG. 7, since the content of the target information B is "analysis", the condition when the discourse control rule Ri stored in Q11 is applied relates to the analysis operation status information. It has been shown.
[0095]
In the column of the situation condition information C, a condition relating to the content of the operation situation information Xj when the discourse control rule Ri is applied is recorded. For example, in the entry of Q11 in FIG. 7, since the content of the situation condition information C is "(ambiguity> threshold T1)", in order to apply the discourse control rule R11 corresponding to the entry Q11, It is shown that the contents of the operation status information must indicate that the ambiguity has exceeded the threshold value T2.
[0096]
In the column of the discourse state transition information D, the discourse state after applying the discourse control rule Ri corresponding to the entry Qi is recorded. For example, in the entry Q11, since the content of the discourse state transition information D is “S1”, the discourse state after applying the discourse control rule R11 corresponding to the entry Q11 is set to S1, and the discourse state register RS of the discourse control unit is set. Should be set to the value “S1”.
[0097]
In the column of control content information E, information on a control procedure to be performed when application of the corresponding discourse control rule Ri is determined is recorded.
[0098]
Examples of the contents of the individual procedures are shown in the following control procedure examples P1 to P7.
[0099]
Control procedure examples P1 to P7
P1 When the application of the discourse control rule in which the content of the control content information E is “procedure (aizu)” is determined
The discourse management unit 5 controls the language generation unit 4. For example, the user is presented with an interjection such as “Yes” or “Eh” which indicates consent.
[0100]
P2 When the discourse control rule in which the content of the control content information E is “procedure (interruption)” is determined to be applied
The discourse management unit 5 controls the language generation unit 4. For example, an interjection such as "Ah" or "Eh" is presented to the user, and an interruption to the user's natural language input utterance is executed.
[0101]
P3 When the content of the control content information E is “procedure (disambiguation dialogue)” and application of the discourse control rule is determined
The ambiguity is eliminated by performing the same processing as the dialog for eliminating the ambiguity of a natural language input having ambiguity with respect to the dependency (see Japanese Patent Application No. 5-10071).
[0102]
P4 When the content of the control content information E is “procedure (contradiction confirmation dialogue)”, the application of the discourse control rule is determined.
The language generation of response content information "(confirmation (true / false (K))") for confirming to the user whether the content K of the area knowledge storage unit 2 is inconsistent with the analysis result of the natural language input from the user. And sends the response content information. For example, by presenting to the user a question of natural language expression such as "Is not K?", The error of the user or the processing device is eliminated.
[0103]
P5 When application of the discourse control rule in which the content of the control content information E is “procedure (resumption of dialogue)” is determined
The control of a series of interruptions or discourse about the conversation is ended, and the control of the processing device is returned to the normal dialogue.
[0104]
P6 When the discourse control rule in which the content of the control content information E is “procedure (interruption interruption)” is determined to be applied
A series of interruptions, or interrupting the control of the discourse about the chat, is interrupted, and the control of the dialogue system is returned to the normal dialogue.
[0105]
Next, processing performed by the discourse control unit 5a based on the analysis operation status information, the generation operation status information, the response determination operation status information, and the contents of the discourse control rule storage unit 5b will be described.
[0106]
The basic operation of the discourse control unit 5a is performed based on the discourse control procedure A described below.
[0107]
Discourse control procedure A
A1 “S0” is recorded as an initial value in the discourse state register RS of the discourse control unit 5a.
[0108]
A2 The contents of the analysis operation status information, the response determination operation status information, and the generation operation status information output from the language analysis unit 1, the response content determination unit 3, and the language generation unit 4 as the dialog between the processing device and the user progresses. Based on the content of the discourse state register RS of the discourse control unit 5a, an applicable rule is searched from the discourse management rules recorded in the discourse control rule storage unit 5b.
[0109]
A3 As soon as the relevant rule is found, the rule is applied.
[0110]
A4 Go to A2.
[0111]
The above is the configuration of the present apparatus and its functions.
[0112]
Concrete example
The operation of the natural language processing device will be described in further detail with reference to the drawings, taking a dialogue information processing device that performs a natural language dialogue for a user's question based on a database and word processing relating to a word processor as an example.
[0113]
First, in order for the user to know "the model number of a portable word processor with a transfer type printer", the answer to the processing unit is "Please answer the model number of the portable word processor with a transfer type printer." A description will be given by taking the case of input as an example.
[0114]
FIG. 8 shows the contents of the utterance of the user (U) and the utterance of the processing device (S) at each time point (T0 to T9) as an example of the interactive processing in the processing device.
[0115]
T0 In accordance with A1 of the discourse control procedure A, the discourse status register RS of the discourse control unit 5a is initialized with the value “S0”.
[0116]
The utterance of U1 from the T1 user is started, and the language analysis unit 1 sequentially analyzes the utterance.
[0117]
However, the utterance of U1 that is making an utterance has an ambiguity regarding the dependency as shown in FIG.
[0118]
For this purpose, when the user utters, for example, up to the “to word processor” portion of the utterance U1, the number of candidates having ambiguity regarding dependency in the language analysis in the language analysis unit 1 is set in advance. It is assumed that the threshold value T1 has been exceeded and the analysis operation status information X1 as shown in FIG. 3B has been output. At this point, it is assumed that a candidate for an analysis result having ambiguity in the analysis is held in the language analysis unit 1.
[0119]
Analysis operation status information X1:
[Syntactic analysis: (ambiguity> threshold T1)]
T2 The discourse management unit 5 receives the analysis operation status information X1.
[0120]
The discourse management unit 5 compares the discourse state value “S0” recorded in the discourse information register RS of the discourse control unit 5a, the content of the analysis operation status information X1, and the content of the discourse management rule storage unit 5b.
[0121]
The content of the discourse state information A of the entry at the storage address Q21 in FIG. 7 is “S0”, the content of the attention target information B is “analysis”, and the content of the situation condition information C is “ambiguity> threshold T1”. This matches the content of the analysis operation status information X1 of interest.
[0122]
Therefore, it is detected that the discourse management rule R21 recorded at the storage address Q21 is applicable.
[0123]
Discourse management rule R21:
[S0, analysis, (ambiguity> threshold T1), S3, procedure (interrupt)]
T3 The discourse control unit 5a executes the process according to the control procedure example P2 when the content of the control content information E of the discourse management rule R21 is “procedure (interruption)”, and interrupts the user's utterance through the language generation unit 4. To achieve. For example, an interjection such as “Ah” of the utterance S1 is presented to the user, and the value “S3” is recorded in the discourse state register RS based on the discourse state transition information D of the discourse management rule R21.
[0124]
T4 In response to the interruption from the processing device at T3, the user indicates acceptance of the interruption of the processing device. For example, an interjection indicating an acknowledgment such as "Yes" of the utterance U2 is uttered, and this is detected by the language processing unit 1, and analysis operation status information X2 indicating acceptance of an interrupt as shown in FIG. 4C is output. .
[0125]
Analysis operation status information X2:
[Syntactic analysis: (interrupt receipt = T)]
T5 The discourse control rule storage unit 5b detects the discourse control rule R31 for the entry at the storage address Q31 in FIG. 7 based on the analysis operation status information X2 and the content "S3" of the discourse status register RS by the same processing as at the time T2. Is done.
[0126]
Discourse management rule R31:
[S3, analysis, (interrupt acceptance = T), S5, procedure (disambiguation dialog)]
T6 Since the value of the control content information E of the discourse control rule R31 is “procedure (disambiguation dialogue)” by the same processing as that at the time point T3, the dialogue is similar to the dialogue for dependency disambiguation according to the control procedure P3. (See JP-A-5-10071), and the value "S5" is recorded in the discourse state register RS based on the discourse control rule R2 applied.
[0127]
Example of utterance of question from processing device at time T6
[Is the "portable" person involved a "word processor"? "
In response to the question from the processing device at T7 and T6, an input indicating a response of the user's confirmation, such as "Yes", is made.
[0128]
The ambiguity is resolved by the dependency analysis that has occurred in the language analysis unit 1 through the dialogue for confirming the utterance S2 to the utterance U3.
[0129]
The language analysis unit 1 outputs analysis operation status information X3 as shown in FIG.
[0130]
The discourse control rule R51 stored at the storage address Q51 in FIG. 7 is detected by the same processing as at the time T2.
[0131]
By the same processing as at the time point T3, the control procedure P5 is started, and a series of interruptions or discourse about the quiz is ended, and the control is returned to the normal dialogue.
[0132]
Analysis operation status information X3:
[Syntactic analysis: (Confirmation completed)]
Discourse management rule R51:
[S5, analysis, (confirmation completed), S0, procedure (resumption of dialogue)]
T8 Returning to the normal dialogue, the following input from the user such as "show the model number of the word processor" is performed as follows, for example, by a method of performing an anaphoric solution by using the same noun that appears most recently. Note that parentheses indicate a dependency relationship.
[0133]
The expression "that word processor" was disambiguated by the above processing
"(((Transfer-type (printer)) (portable (word processor)))"
By analyzing that the user request indicated by a series of dialogues is, for example,
"(Request (output (model number (((transfer type (printer)))" (portable (word processor))))))
Described in a semantic expression such as
It is analyzed that the information means "information provision of the word processor A model number" indicated by the expression.
[0134]
T9 The contents of the storage addresses P11 to P22 in the example of the contents of the area knowledge storage unit 2 shown in FIG. 4 and the inference processing in the response content determination unit 3 determine that the response content is “JWP-XX” to be the response content. You.
[0135]
An output response such as “the model number is JWP-XX” is presented to the user through the language generation unit 4.
[0136]
According to the present device configured as described above, the following effects can be obtained.
[0137]
During processing of a natural language input from a user, an interrupt from the processing device can be performed as needed.
[0138]
During processing of natural language input from the user, if ambiguity that exceeds the analysis capability of the processing device occurs, interjections etc. are output immediately, thereby interrupting the user's utterance and causing unnecessary utterance. The burden on the user due to continuing can be eliminated.
[0139]
During processing of natural language input from the user, after an ambiguity exceeding the analysis capability of the processing device occurs and an interrupt is generated, a dialog for resolving the ambiguity is performed, so that the user and the processing device can communicate with each other. And a flexible and smooth dialogue can be conducted.
[0140]
When the processing device interrupts the user's utterance due to an interjection or the like, the user interrupts the utterance, or the user accepts the interrupt of the processing device by uttering the verbal or the like during the comprehension. Only in such a case can a dialog for disambiguation or inconsistency be invoked.
[0141]
The above embodiment monitors the number of ambiguities occurring in the analysis of the natural language input from the user in the language analysis unit 1, detects excessive ambiguity by comparing with a preset threshold, and utters the user. We are conducting a confirmation dialogue to interrupt ambiguity and to resolve ambiguity. However, instead of this, the following modifications can be made.
[0142]
Modification example 1
It can be extended so that it can be used to resolve errors and inconsistencies contained in the user's speech.
[0143]
For example, the response content determination unit 3 checks the consistency between the (partial) analysis result of the user's utterance obtained from the language analysis unit 1 and the information recorded in the area knowledge storage unit 2 as needed. When a contradiction occurs between the two, an interrupt process to the utterance of the user can be performed.
[0144]
Modification example 2
In checking the number of ambiguities in the analysis of the natural language input from the user, the second threshold value T2 and the third threshold value T3 are set so as to satisfy “threshold value T3 <threshold value T2 <threshold value T1”. In the discourse state S0, when the ambiguity becomes “threshold T2 <ambiguity <threshold T1”, the discourse state is set to S1, and the ambiguity is resolved by the analysis of the natural language input of the structure from the user afterwards. When gender = 0, or when the ambiguity becomes sufficiently small and “ambiguity <threshold T3”, a message such as “Eh” is presented to the user through the language generation unit 4.
[0145]
As a result, a complex natural language input that may cause ambiguity exceeding the analysis capability of the processing device is made by the user, and the processing device can analyze the content of the input by processing the subsequent input or the like. In this case, it is possible to eliminate anxiety about an increase in burden due to the user's restatement when the analysis fails.
[0146]
This function can be realized by the discourse control rules recorded at the storage addresses Q11 and Q12 in the example of the contents of the discourse control rule storage unit 5b in FIG.
[0147]
Modification 3
In the above-described embodiment, the presentation of the message by the control procedure P1 and the execution of the interruption of the control procedure P2 by the language generation unit 4 are performed immediately when the discourse control unit 5a determines the execution.
[0148]
By expanding this, words appearing immediately before the point at which interrupts can occur in natural language utterances (for example, words such as "-ne", the last particle that indicates the end of a sentence or a phrase, or words such as "a", which are the last sentence expression) It is prepared in advance in the interruptable position recording unit newly provided in the language analysis unit 1. When the discourse control unit 5a interrupts or requests for an utterance to be uttered, the language analysis unit 1 analyzes the utterance currently performed by the user and the information recorded in the interruptable position storage unit. The discourse management unit 5 outputs the analysis operation status information as shown in, for example, FIG. At this point, a message is sent through the language generator 4 or an interrupt is executed.
[0149]
This makes it possible to determine the end position of the utterance or the position where interruption is possible in a natural language dialogue and interrupt at an appropriate timing when an interrupt from the processing device is necessary for the user's utterance It is.
[0150]
Modification 4
7
It is possible to record the conversation history between the user and the processing device, extract the part of the utterance immediately before the utterance end position of the user from the recorded conversation history, and add it to the interruptable position recording unit. .
[0151]
Modification 5
A reliability point, which is a numerical value representing the reliability of the information recorded in the entry, is added to each entry of the interruptable position recording unit, and a response is made depending on whether or not the interrupt performed by the processing device has been received from the user. By adjusting the value of the reliability point to be performed and controlling the application of the information recorded in the interruptable position information storage unit according to the value of the reliability point, flexible interruption according to the utterance habit of each user can be performed. Can be performed.
[0152]
Second embodiment
Hereinafter, a rule speech synthesizer 100 as one embodiment of the natural language processing device of the second invention will be described with reference to FIGS.
[0153]
FIG. 10 is a configuration diagram of the rule speech synthesizer 100 of the present embodiment.
[0154]
The rule-based speech synthesizer 100 is a device that performs a syntax analysis of a text and applies speech information obtained as a result of applying the phoneme information according to rules prepared in advance as speech data to perform speech synthesis. That is, the rule speech synthesizer 100 includes the following.
[0155]
The syntax analysis unit 101 performs a morphological analysis and a syntactic analysis on the input sentence according to a grammar rule prepared in advance, and generates a syntax tree.
[0156]
The similar sound search unit 102 searches for a homonym or a noun whose pronunciation is easy to hear for each noun in the input sentence based on the accent pattern and a list of phonemes that are easy to hear.
[0157]
The similar sound determining unit 103 restricts the result of the similar sound determining unit 103 based on co-occurrence information with other words in the sentence.
[0158]
The paraphrase candidate generation unit 104 generates a paraphrase candidate of a noun having a homonym or a noun whose pronunciation is easy to hear.
[0159]
The paraphrase candidate determining unit 105 selects the paraphrase candidates for each noun and rewrites the syntax tree according to the selected paraphrase candidates so that the paraphrase candidates for different nouns do not have a misleading relationship.
[0160]
The phoneme processing unit 106 generates speech data by applying phoneme information to a syntax tree according to a prosody rule prepared in advance.
[0161]
The synthesizer 107 outputs the audio data generated by the phoneme processing unit 106.
[0162]
The tone change selection unit 108 causes the synthesizer 107 to generate an actual voice from the voice data and to select whether or not to change the tone of the part replaced by the paraphrase candidate.
[0163]
Here, the definition of words used in this embodiment, the rules for accents, and the like will be described.
[0164]
(1) Pronunciation distance between words
The “pronunciation distance between words” is defined as the number of times a certain word is converted to another word by exchanging phonemes included in the similar sound list 109.
[0165]
For example, if “detergent” is decomposed into phonemes, it becomes “sensai”, but “sensai”, in which one sound is replaced by “z → s” in the similar sound list, has a pronunciation distance of one.
[0166]
As shown in FIG. 11, the similar sound list 109 is a list of certain sounds and sounds whose sounds are likely to change. For example, “s → z” means that the consonant “s” becomes “z” (“sa (sa)” becomes “za (za)”, and “pj → hj” becomes “pj” becomes “hj” (“ "Pju" is easily misunderstood by "hju".
[0167]
▲ 2 ▼ “Synonyms”, “Synonyms Nouns”
If the accent pattern is the same between any two words, and the pronunciation distance is equal to or less than a predetermined threshold, the two words are said to have a similar sound relationship, and one word is referred to as a "similar word" of the other word. , And a noun is defined as a "similar noun."
[0168]
(3) Accent rules
The accent is represented as shown in FIG. In the national polity, "kutai" is pronounced high, and "significance" indicates that it is lowered by the sound following "i".
[0169]
As shown in FIG. 13, the accent dictionary 110 is a list of words and accent patterns, and the accent pattern of “Kokutai” is “Ko H Kutai”, which is higher than the sound after “Ko”. This means that the accent pattern “I Lgi” of “significance” is lower than the sound next to “i”.
[0170]
▲ 4 ▼ Others
The selection as to whether or not to change the tone of the voice output of the portion where the paraphrase candidate has been replaced is given in advance by the user as an input. The state is held in the tone change selection unit 108. If it is selected to change the tone, the paraphrase determination unit adds markers to the start and end points of the paraphrase to indicate the start and end of the paraphrase. The phonemic processing unit 106 changes the tone based on the marker added by the paraphrase determination unit and outputs the changed tone.
[0171]
Next, the function of each component will be described in order.
[0172]
(1) Syntax analyzer 101
The syntax analysis unit 101 receives an input sentence input from a keyboard or an interactive device, performs morphological analysis and syntactic analysis according to a grammar rule prepared in advance, and generates a syntax tree as an output.
[0173]
When the input sentence is input, a word tone change selection input as to whether or not to change the output voice of the paraphrased part is also input, and the state is held by the word tone change selection unit 108.
[0174]
(2) Sound search unit 102
As shown in the processing flow of FIG. 19, the similar sound search unit 102 refers to the similar sound list 109, the dictionary 113, and the accent dictionary 110 for each noun in the syntax tree, and searches for the presence or absence of a similar sound noun for each noun. I do. In other words, for each noun, a sequence of sounds in which each sound in the noun is replaced by a number of sounds within the threshold value that is easily misunderstood included in the similar sound list 109 is included as a noun in the dictionary, and the accent pattern is not included. In the same case (that is, in the case of a similar noun), a pair of the corresponding noun and the similar noun is registered in the paraphrase target list 114. At this time, it is determined that nouns having different accent patterns or nouns whose pronunciation distance is equal to or larger than the threshold are not erroneously recognized even if the pronunciation is the same.
[0175]
As shown in FIG. 17, the dictionary 113 stores the frequency of appearance of parts of speech, synonyms, synonyms, and words of the same sound. For example, “detergent” has a reading “senzai”, a part of speech “common noun”, is synonymous with “detergent”, and has an appearance frequency of 0.7.
[0176]
The paraphrase target list 114 stores a pair of a noun in the input sentence having a synonymous noun and a similar sound noun, and the subsequent processing is performed on the noun included in the paraphrase target list 114. .
[0177]
(3) Similar sound determination unit 103
The similar sound determination unit 103 receives the output of the similar sound search unit 102, and refers to the co-occurrence dictionary 111 for the noun, as shown in the processing flow of FIG. If it exists, it is determined that there is no misunderstanding from the co-occurrence relationship, and the pair of the noun and the similar noun is deleted from the paraphrase target list 114.
[0178]
As shown in FIG. 14, the co-occurrence dictionary 111 indicates a co-occurrence relationship between a certain word and a certain word, that is, a relationship that is likely to appear at the same time. For example, “diet” and “country-country”, “stadium” and “national body”, “detergent” and “washing”, “pre-plant” and “garden shop” have a co-occurrence relationship.
[0179]
However, if the appearance frequency of the noun with reference to the dictionary 113 is equal to or lower than the threshold, the noun is determined to need to be modified, and is not deleted from the paraphrase target list.
[0180]
(4) Paraphrase candidate generation unit 104
As shown in the processing flow of FIG. 21, the paraphrase candidate generation unit 104 paraphrases the noun in the input sentence included in the paraphrase target list 114 by the following processing for the preset maximum number of generations of the paraphrase candidate. Generate candidates. First, referring to the part of speech information of the dictionary for the noun, a paraphrase candidate generation process is performed in each of the following three cases. The pair of the original noun and the paraphrase candidate is registered in the paraphrase candidate list 115.
[0181]
1. Abbreviations that omit some nouns
With reference to the dictionary, the noun (formal name or the like) before the noun is omitted is a paraphrase candidate. Alternatively, a combination of the noun before the abbreviation before or after the noun is a paraphrase candidate.
[0182]
For abbreviations of foreign nouns such as English, in addition to the above-described processing, the Japanese translation of the noun before abbreviation, abbreviations in the Japanese translation, or parallel writing similar to the above-described processing are paraphrase candidates.
[0183]
"National Athletic Meet" → "National Athletic Meet"
"Country" → "Diet Measures Committee"
"JAS" → "Japan Food Standard" "Japan Air System"
"WASP" → "Protestant Anglo-Saxon Caucasians"
2. For proper nouns
The proper noun is a noun representing a name limited to one such as a personal name, a company name, and a place name. The paraphrase generation means for such proper noun refers to the thesaurus 112 and adds a word located above and below the proper noun as a paraphrase candidate.
[0184]
As shown in FIG. 15, the thesaurus 112 expresses a conceptual hierarchical relationship between words. In the case of FIG. 15, the thesaurus 112 expresses information relating to a connection between place names such as a region, a prefecture, and a municipal. In this figure, "Kitamachi", "Kitacho" and "Kidacho" are "Kitamachi", "Minamicho" "Minamicho" and "Minamicho" are "Minamimachi", "Nakamachi" and "Nakamachi" are "Nakamachi" And the same reading.
[0185]
Here, when considering "Kita-cho" in the figure, a form to which information such as "B city" and "X prefecture" located at a higher level is added, a form such as "K city of B city", and "Kita town of X prefecture B city" Is a paraphrase candidate.
[0186]
"Kitamachi" → "Kitamachi A City" "Kitamachi A City X Prefecture"
"Kita-cho" → "Kita-cho, B city" "Kita-cho, X city B city"
"Minamicho" → "Minamicho, City A" "Minamicho, City A"
"Minami Town" → "Minami Town in C City" "Minami Town in C City in Y Prefecture"
3. For common nouns
Common nouns are used for solids belonging to one class. One of the paraphrase candidate generation means for common nouns is a method of referring to the same thesaurus 112 as the means for proper nouns. For example, it is assumed that a thesaurus for “Ko” and “Performance” having the same reading and the same accent of “Ko H” is prepared as shown in FIG. The paraphrase candidate at this time is
"Park" → "park of place"
"Performance" → "Performance of event"
Is generated as follows.
[0187]
Another means for generating a paraphrase candidate for a common noun is to refer to the dictionary 113 and set a synonym / synonym of the word as a paraphrase candidate.
[0188]
"Detergent" → "soap"
"Pre-planting" → "Gardening"
(5) Paraphrase candidate determination unit 105
The paraphrase candidate determination unit 105 refers to the paraphrase candidate list 115 and paraphrases these paraphrase candidates if the paraphrase candidates for the different nouns are the same or have a similar sound relationship, as shown in the processing flow of FIG. It is deleted from the candidate list 115.
[0189]
Then, the following rules are applied in order to replace the noun in the syntax tree with a paraphrase candidate.
[0190]
1. The replacement process is performed from left to right, that is, from the beginning to the end.
[0191]
2. The abbreviation is replaced only once at the beginning, and is not replaced when it appears after the second time. If the abbreviation appears only once, it is replaced with a paraphrase candidate with a formal name, and if it appears more than once, it is replaced with a paraphrase candidate with a formal name.
[0192]
3. The proper noun is replaced only once at the first time, and is not replaced even if it appears after the second time.
[0193]
4. The replacement of the common noun with the additional information type paraphrase candidate is performed only once at the first time, and is not replaced even if it appears after the second time.
[0194]
5. Substitution of common nouns with synonyms or synonyms is performed for all occurrences.
[0195]
6. When the mode for changing the tone of the replacement part is selected, a marker indicating the replacement part is added before and after the paraphrase candidate.
[0196]
(6) Phoneme processing unit 106
The phoneme processing unit 106 converts the syntax tree, which has been replaced with the paraphrase candidate by the paraphrase candidate determination unit 115, into speech data according to the phoneme rules. At this time, if the mode of changing the tone of the replacement part is selected, the replacement part is cut out according to the marker added by the replacement operation of the paraphrase candidate determination unit, and if the attached word immediately follows, Is converted into voice data so that the tone of the replacement part changes as a tone change section including the attached word.
[0197]
(7) Synthesizer 107
The synthesizer 107 outputs the audio data as actual audio by using, for example, a formant synthesizer given in advance.
[0198]
The above is the configuration of the present apparatus and its functions.
[0199]
Concrete example
Next, the operation of the present embodiment will be described based on a specific input example. It is assumed that the input sentence in the specific example is "Today, the National Assembly has discussed the significance of the national polity."
[0200]
(1) Syntax analyzer 101
The syntax analyzer 101 performs a morphological analysis and a syntactic analysis on the input sentence, and outputs a syntax tree. The syntax tree as a result of the syntax analysis unit 101 greatly changes depending on the content of the grammar used by the syntax analysis unit 101. In any case, the nouns in the sentence to be paraphrased are “today”, “diet” It is analyzed and output if there are five types: "national body", "significance", and "discussion".
[0201]
(2) Sound search unit 102
This syntax tree is input to the similar sound search unit 102. As shown in FIG. 18, the similar sound search unit 102 searches for a similar sound noun as “national body” and “country pair” as homophonic accents, and “significance” and “opposition” as homophonic accents. It is registered in the list 114.
[0202]
Also, for example, when the threshold of the pronunciation distance related to similar sounds is 1, “Washing (Sensai)” and “Sensai” with the same accent and pronunciation distance as “Detergent (Sensai)” are the synonyms of “Detergent”. Searched. "Appetizer" is not a synonymous noun because the pronunciation distance is 2. “Tenchi”, “Tenchi” and “Tenchi” and “Tenchi” have the same sound, but are different from each other in their accent patterns, so they are not synonyms.
[0203]
FIG. 23 shows the contents of the paraphrase target list 114 at this time.
[0204]
(3) Similar sound determination unit 103
The similar sound determination unit 103 searches for co-occurrence words in the sentence regarding the "national body" and the "significance" in which the similar sound noun is present in the similar sound search unit 102. In this example, “national polity” and “significance” do not have a co-occurrence relationship in the text. When the threshold of the appearance frequency is 0.5, it is assumed that the appearance frequency of “national body” is 0.4 and the appearance frequency of “significance” is 0.6 from the dictionary information. In this case, since neither of them has a co-occurrence relationship, no search is performed on the paraphrase target list 114, and the next paraphrase candidate generation unit 104 generates a paraphrase candidate.
[0205]
(4) Paraphrase candidate generation unit 104
The paraphrase candidate generation unit 104 performs paraphrase candidate generation processing according to the type of the noun and the similar phonetic noun. In this example,
"National Athletic Meet" → "National Athletic Meet" "National Athletic Meet (National Athletic Meet)" "National Athletic Meet (National Athletic Meet)"
“Country” → “Diet Task Force” “Diet Task Force (country)” “Country Action (Diet Task Force)”
"Meaning" → "Meaning" "Value"
"Objection" → "objection" "appeal" "opposition"
Are generated. FIG. 24 shows the state of the paraphrase candidate list 115 at this time.
[0206]
(5) Paraphrase candidate determination unit 105
The paraphrase candidate determining unit 105 deletes paraphrase candidates having a similarity to each other from among the paraphrase candidates for different words.
[0207]
For example, "meaning" and "dissatisfaction" each have a similar noun,
"Meaning" → "Meaning"
“Dissatisfied” → “Objection”
When the paraphrase candidate is generated, the paraphrase candidates “significance” and “objection” are deleted from the paraphrase candidate list 115 because they have a similar sound relation with the same sound and the same accent. This is shown in FIG.
[0208]
Then, the noun in the syntax tree is replaced with a paraphrase candidate according to the replacement rule. In this example, since “national athletic” appears only once, “national athletic meet” is selected, and “meaning”, which is the head of synonyms and synonyms, is selected as a paraphrase candidate for “meaning”, and replacement is performed. When the mode for changing the tone of the replacement part is selected, a marker '△' indicating the head and end of the replacement part is added to each of the head and the tail of the replacement part.
[0209]
As a result of the above, after the paraphrase / replacement operation, the input sentence is paraphrased as "the debate has been held today in the Diet regarding the {meaning} of the {National Sports Festival}}. '△' indicates the beginning and end of the replacement part.
[0210]
(6) Phoneme processing unit 106
The phoneme processing unit 106 converts the speech data into speech data in accordance with the phoneme rules. The adjunct immediately after the replacement part is included in the tone change part, and the input sentence is discussed in the Diet today. Has been performed "(" ... "indicates a tone change portion) and is converted into voice data.
[0211]
(7) Synthesizer 107
The speech data is converted into actual speech by the synthesizer 107.
[0212]
Thus, according to the present device configured as described above, a plurality of nouns having the same or similar pronunciation in pronunciation and having the same or similar pronunciation are used to convert a noun that is easily misunderstood to information addition or a synonymous synonym. Is performed, regular speech synthesis that generates a speech output that is less likely to be misunderstood becomes possible.
[0213]
The internal structure of each dictionary and specific numerical values of each threshold value are not limited to the above examples.
[0214]
Modification example 1
For example, when it is desired to explicitly output “Mr. Sato” to which “Mr. Sato” points, the rule-based speech synthesizer 100 outputs information about which “Mr. Sato” points to from a sentence that is an input of the device. I couldn't get it.
[0215]
However, in the embodiment as a spoken dialogue device, the knowledge expression for output from the spoken dialogue device is given as an input of the speech output part, and the knowledge information of the spoken dialogue device can be referred to. It is possible to know which "Mr. Sato" is pointing, and for example, it is possible to paraphrase "Mr. Sato in Kobe".
[0216]
Modification example 2
Each word is represented by a set of finer concepts (hereinafter referred to as micro-features), and in the generation of paraphrase candidates, micro-features that make up each of the phonetic nouns are compared and information that clarifies the difference By adding and paraphrasing to synonyms and synonyms, the efficiency of paraphrase candidate generation can be improved, and the method can be changed to a method that can generate more accurate paraphrases.
[0217]
For example, in the same reading of "scissors", "sandwich (document pinching)" and "scissors (cutting scissors)" are described by the following set of micro features.
[0218]
[Table 1]

When "clamp" is output, the micro feature of the difference is clearly specified and "clamp" is fixed. In the case of "scissors", homonyms such as "knife scissors" or "cutting scissors" are used. A clear paraphrase candidate can be generated with additional information that minimizes the difference from a word.
[0219]
Modification 3
This embodiment can be regarded as a voice output portion of the voice interaction device, and the voice output portion of the voice interaction device can be replaced with this embodiment as shown in FIG.
[0220]
As a result, in the spoken dialogue apparatus, when recognizing the voice input from the user, when there are a plurality of recognition candidates having the same recognition rate, it is determined that the dialogue processing unit asks the user again. Then, when actually re-inquiring, for the candidates having a similar sound relation to each other among the recognition candidates, the processing in the paraphrase generation unit and the paraphrase processing unit is performed, so that the recognition candidates having the similar sound relation do not have the similar sound relation. In other words, it is possible to ask a question.
[0221]
Example of operation: Spoken dialogue device as interface means for encyclopedia database
User: “I want to know about Senzai”
“Senzai” recognition candidates include “detergent” and “maizeki” with the same sound and accent, and “war damage” with a pronunciation distance of 1. The apparatus decides to re-inquire which of these “detergent”, “pre-plant”, and “war disaster” the user refers to.
[0222]
And since "detergent", "planting" and "war disaster" are related sounds, it is easy to misunderstand the user if asked as it is, so we generate and select "cleaning agent", "garden" and "war disaster" as paraphrase candidates, The user asks which of these points he points to.
[0223]
As a re-querying method, if the candidate with the highest probability among the recognition candidates is selected and re-queryed,
Equipment: "Are you a cleaning agent?"
Or, if you ask each recognition candidate side by side and ask again,
Equipment: "Are you a cleaning agent, a garden, a war disaster?"
A query output of the device is generated.
[0224]
【The invention's effect】
According to the natural language processing device of the first invention, an interrupt from the processing device can be performed as necessary during processing of a natural language input from a user. This allows for a natural, smooth and flexible dialogue.
[0225]
According to the natural language processing apparatus of the second invention, a word having a homonym or a word having a pronunciation that is easily misunderstood is subjected to a paraphrase processing to a form that is not misunderstood, thereby obtaining a natural voice output. A great effect can be obtained in practical use, for example.
[Brief description of the drawings]
FIG. 1 is a block diagram of a natural language processing apparatus according to a first embodiment.
FIG. 2 is a block diagram of a language analysis unit 1;
FIG. 3 is an example of analysis operation status information.
FIG. 4 is an example of contents of an area knowledge storage unit 2;
FIG. 5 is a block diagram of a language generation unit 4.
6 is a block diagram of the discourse management unit 5. FIG.
FIG. 7 is an example of the contents of a discourse control rule storage unit 5b.
FIG. 8 is an example of an interaction process in the natural language processing device.
FIG. 9 is an example of an analysis result candidate having ambiguity regarding the dependency of an example sentence.
FIG. 10 is a block diagram of a rule speech synthesizer 100 according to a second embodiment.
11 is an example of a similar sound list 109. FIG.
FIG. 12 is a notation example of a general accent.
FIG. 13 is an example of an accent dictionary 110.
FIG. 14 is an example of a co-occurrence dictionary 111.
FIG. 15 is an example of a thesaurus 112.
FIG. 16 shows another example of the thesaurus 112.
FIG. 17 is an example of a dictionary 113.
18 shows an example of searching and determining a similar sound in a similar sound searching unit 102 and a similar sound determining unit 103. FIG.
FIG. 19 is a processing flow of a similar sound search unit 102.
20 is a processing flow of a similar sound determination unit 103. FIG.
21 is a processing flow of the paraphrase generation unit 104. FIG.
FIG. 22 is a processing flow of a paraphrase determination unit 105;
FIG. 23 is an example of a paraphrase target list 114.
24 is an example of a paraphrase candidate list 115. FIG.
25 shows an example of deletion from the paraphrase candidate list 115. FIG.
FIG. 26 is a block diagram of an example of a voice interaction device.
[Explanation of symbols]
1 Language analysis department
2 Area knowledge storage
3 Response content determination unit
4 Language generator
5 Discourse Management Department
101 Syntax analysis unit
102 Sound Searching Unit
103 Like sound judgment unit
104 Paraphrase candidate generation unit
105 Paraphrase candidate judging unit
106 phoneme processing unit
107 synthesizer
108 Tone change selector
109 Sound List
110 Accent Dictionary
111 Co-occurrence Dictionary
112 Thesaurus
113 dictionaries
114 Paraphrase List
115 Paraphrase candidate list

Claims

A language analyzing means for sequentially analyzing a natural language input from a user to output analysis result information of syntactic information, prosodic information or semantic information, and outputting analysis operation status information indicating the status of the analysis operation; ,
In response to the analysis result information from the linguistic analysis means, response content information is determined according to the content thereof, and response determination operation status information indicating the status of the determination operation from receiving the analysis result information to determining the response content information Response content determining means for outputting
In response to the response content information from the response content determination means, the content is converted into a natural language output and sequentially presented to the user, and generation operation status information indicating the status of the natural language output generation operation by the conversion is generated. A language generating means for outputting,
According to at least one of the analysis operation status information, the response determination operation status information, and the generation operation status information, a simultaneous utterance such as a response to a user's natural language input or an interrupt is given to the language generation unit. Having discourse management means for controlling to be performed,
The language analysis means,
According to the number of candidates having ambiguity in the analysis result generated in the process of analyzing the natural language input, a process of outputting analysis operation status information including information on ambiguity is performed ,
The discourse management means,
At the time of receiving the analysis operation status information, by outputting a natural language for realizing an interrupt composed of interjections through the language generating means, a response to the user's utterance or a process of interrupting is performed. A natural language processing apparatus characterized in that:

A language analyzing means for sequentially analyzing a natural language input from a user to output analysis result information of syntactic information, prosodic information or semantic information, and outputting analysis operation status information indicating the status of the analysis operation; ,
In response to the analysis result information from the linguistic analysis means, response content information is determined according to the content thereof, and response determination operation status information indicating the status of the determination operation from receiving the analysis result information to determining the response content information Response content determining means for outputting
In response to the response content information from the response content determination means, the content is converted into a natural language output and sequentially presented to the user, and generation operation status information indicating the status of the natural language output generation operation by the conversion is generated. A language generating means for outputting,
According to at least one of the analysis operation status information, the response determination operation status information, and the generation operation status information, a simultaneous utterance such as a response to a user's natural language input or an interrupt is given to the language generation unit. A discourse management means controlling to be performed,
Having domain knowledge storage means for retaining domain knowledge information which is knowledge related to the content of the dialogue;
The language analysis means,
And obtained analysis result information, sequentially performs a comparison between the domain knowledge information recorded in the domain knowledge storage means, in case of conflict between the two has occurred is output as the analysis operation status information including the inconsistent content Do the processing ,
The discourse management means,
Upon receiving the analysis operation status information, a natural language output for realizing an interrupt composed of interjections is output through the language generating means, and a process of interrupting the user's utterance is performed. Natural language processor.

The language analysis means,
In addition to the above-described processing, the user interrupts the utterance or utters a verbal utterance during comprehension, and performs processing for outputting analysis operation status information when performing analysis that the user has accepted an interrupt ,
The discourse management means,
When receiving the analysis operation status information from the language analysis means, the ambiguity at the time of analysis of the natural language input from the user that caused the interrupt, or inconsistency in the meaning of the user's natural language input 3. The natural language processing device according to claim 1, wherein a dialogue for eliminating the problem is performed.

The language analysis means,
In addition to the above processing, when interrupting the user by the interjection under the control of the discourse management means, the user interrupts the utterance, or generates an interjection during understanding and interrupts the dialogue device. Detects whether the user has accepted it, outputs analysis operation status information indicating the result,
The discourse management means,
4. The natural language processing apparatus according to claim 3, wherein control is performed to activate a dialog for ambiguity resolution and inconsistency resolution in accordance with the analysis operation status information.

An interruptable position feature storing means for storing in advance information relating to the end position of the utterance in the natural language dialogue, the syntactic feature of the partner utterance immediately before the interruptible position, or the prosodic feature of the partner utterance. ,
The discourse management means,
In addition to the above processing, when an interrupt is required, the analysis result information obtained from the language analyzing means is compared with the contents of the interruptable position feature storage means. 3. The natural language processing device according to claim 1, wherein control is performed such that interruption is performed by the interjection through the generation unit.

Having a dialog history storage means for holding a history of the dialog with the user,
From the contents of the dialog history storage means, the position where the utterance change occurs or the position of the silent section is extracted as the utterance end position or the interruptable position,
Extract syntactic features or prosodic features immediately before the position,
6. The natural language processing apparatus according to claim 5, further comprising an interruptable position extracting unit that additionally records the feature in the interruptable position feature storage unit.

The interruptable position feature storage unit holds information on the reliability of the interruptable position information,
The language analysis means,
An interrupt is performed based on the information recorded in the interruptable position feature storage means, and it is determined whether or not the user has interrupted the utterance or generated an interjection during the response to the interrupt and accepted the interrupt. Output analysis operation status information including the result,
The discourse management means,
6. The natural language processing device according to claim 5, wherein the reliability of the interruptable position information stored in the interruptable position feature storage means is adjusted according to the analysis operation status information.

A language analyzing means for sequentially analyzing a natural language input from a user to output analysis result information of syntactic information, prosodic information or semantic information, and outputting analysis operation status information indicating the status of the analysis operation; ,
In response to the analysis result information from the linguistic analysis means, response content information is determined according to the content thereof, and response determination operation status information indicating the status of the determination operation from receiving the analysis result information to determining the response content information Response content determining means for outputting
In response to the response content information from the response content determination means, the content is converted into a natural language output and sequentially presented to the user, and generation operation status information indicating the status of the natural language output generation operation by the conversion is generated. A language generating means for outputting,
According to at least one of the analysis operation status information, the response determination operation status information, and the generation operation status information, a simultaneous utterance such as a response to a user's natural language input or an interrupt is given to the language generation unit. Having discourse management means for controlling to be performed,
The language analysis means,
Outputs analysis operation status information including information on ambiguity according to the number of candidates having ambiguity of the analysis result generated in the process of analyzing natural language input,
The discourse management means,
At the time of receiving the analysis operation status information, the language generating means outputs a natural language for realizing an interrupt made up of an interjection, thereby responding to the utterance of the user or interrupting the user. A natural language processing method characterized by the following.

A language analysis step of sequentially analyzing a natural language input from a user to output analysis result information of syntactic information, prosodic information or semantic information, and outputting analysis operation status information indicating the status of the analysis operation; ,
In response to the analysis result information from the language analysis step, response content information is determined according to the content thereof, and response determination operation status information indicating the status of the determination operation until the analysis result information is received and the response content information is determined. Response content determining step of outputting
With respect to the response content information from the response content determination step, the content is converted into a natural language output and sequentially presented to the user, and the generation operation status information indicating the status of the natural language output generation operation by the conversion is generated. A language generation step to output,
According to at least one of the analysis operation status information, the response determination operation status information, and the generation operation status information, a simultaneous utterance such as an answer to a user's natural language input or an interrupt to the language generation step is performed. A discourse management step to control what to do,
Having a domain knowledge storage step of retaining domain knowledge information that is knowledge related to the content of the dialogue;
The language analysis step includes:
The obtained analysis result information is sequentially compared with the area knowledge information recorded in the area knowledge storage step, and when inconsistency occurs between the two, it is output as analysis operation status information including the inconsistent contents. ,
The discourse management step includes:
Upon receiving the analysis operation status information, a natural language output for realizing an interrupt consisting of an interjection is output through a language generation step to interrupt a user's utterance. Method.

Input means for inputting natural language information;
In the natural language information, similar sound detection means for detecting a similar sound phrase that is a phrase having the same pronunciation or similar pronunciation phonetically,
Paraphrase candidate generation means for generating paraphrase candidates for the detected similar sound phrase;
When the detected similar sound phrases are plural, delete the paraphrase candidates having a similar sound relationship among the paraphrase candidates of each of the similar sound phrases, and replace the non-deleted paraphrase candidates with the respective similar sound phrases. A natural language processing device comprising: a paraphrase candidate determining unit for performing a replacement process.

The paraphrase candidate determining means includes:
At the time of replacement of the synonymous phrase and the target paraphrase candidate, to add a control marker for changing the tone of the output voice before and after the target paraphrase candidate,
Phoneme processing means for changing the tone of the output voice by the control marker;
11. The natural language processing apparatus according to claim 10, further comprising a tone change selecting means for selecting whether or not to actually change the tone of the phoneme processing means.

The paraphrase candidate determining means includes:
11. The natural language according to claim 10 , wherein, in addition to the processing, a history of the replacement operation to the paraphrase candidate is held, and the replacement operation is controlled according to a predetermined substitution rule for the paraphrase candidate corresponding to the history. Processing equipment.

4. The apparatus according to claim 1, further comprising a similar sound determining unit configured to remove unnecessary similar sound phrases from the similar sound phrases detected by the similar sound detecting unit based on word co-occurrence relation and appearance frequency information. 10. The natural language processing device according to 10.

Detects similar phonetic phrases that are phonetically identical or similar in the input natural language information,
Generate paraphrase candidates for the detected similar phrases,
When the detected similar sound phrases are plural, delete the paraphrase candidates having a similar sound relationship among the paraphrase candidates of each of the similar sound phrases, and replace the non-deleted paraphrase candidates with the respective similar sound phrases. A natural language processing method characterized by replacing each of them.