JP2004102764A

JP2004102764A - Conversation expression generating apparatus and conversation expression generating program

Info

Publication number: JP2004102764A
Application number: JP2002265209A
Authority: JP
Inventors: Hidekazu Kubota; 久保田　秀和; Toyoaki Nishida; 西田　豊明; Koji Yamashita; 山下　耕二; Tomohiro Fukuhara; 福原　知宏
Original assignee: Communications Research Laboratory
Current assignee: Communications Research Laboratory
Priority date: 2002-09-11
Filing date: 2002-09-11
Publication date: 2004-04-02
Anticipated expiration: 2022-09-11
Also published as: JP3787623B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve the degree of understanding of readers and listeners by automatically generating conversation expressions based on described monologue sentences. <P>SOLUTION: This conversation expression generating apparatus A comprises a monologue text acquisition means acquiring a monologue text, a pretreatment means generating one or more simple texts by dividing the acquired monologue text, a sentence end treatment means analyzing the sentence end expression of the generated simple text and allowing the sentence end expression to correspond to either of a plurality of sentence end expression patterns, a comment selection means selecting one comment text corresponding to the simple sentence text allowed to correspond to the any one sentence end expression pattern from a plurality of comment texts set as expressions to respond to the sentence end expression patterns while being related thereto, and a means generating and outputting a conversation text formed of the simple sentence text and the comment text by inserting the selected comment text after the simple sentence text. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、モノローグ的な表現を元にして会話形式の表現を好適に生成し得る会話表現生成装置及びそのためのプログラムに関するものである。
【０００２】
【従来の技術】
現在のところ、インターネット等の通信手段を介した情報提供の手段は、一部に画像を伴ったものはあるが、Ｗｅｂページ中の文章、電子メールやチャットや電子掲示板等の文字コミュニケーションが中心的である。特に、Ｗｅｂページ中の文章の多くは、単独の書き手が一方的に叙述するモノローグ的な文章であるといえる。また、電子メール等の文字コミュニケーションでは、一人の書き手（話者）の記述（発言）中に、他者が質問を差し挟むような対話（会話）の調整が、現実に対面して話し合う対面対話の場合よりも困難であるため、個々の発言は一般的に長く、発言毎に断片化されたモノローグ的な性質を有しているものと考えられる。このように、インターネット上には、モノローグ的な文章として蓄えられた情報が大量に存在しているのが現状である。しかしながら、モノローグ的な文章は、万人向けの情報提示手段ではなく、専門的なモノローグ的文章よりも、重要な部分を質問応答形式で表した会話形式の文章の方が一般的に親しみやすく、理解の度合いも高い傾向にある。ここで、会話形式の文章とは、インタビュー記事やテレビ番組の台本に代表されるように、複数の話し手が会話を積み重ねる形式の文章である、会話形式の文章は、モノローグ的な文章と比較して、客観的で厳密に構造化された叙述を行うことは困難であるが、重要な部分に焦点を当てた簡潔な情報提示を行うことができるという利点を有しており、また、日常的なコミュニケーションに利用する最も一般的な情報交換手段であるため、モノローグ的な文章よりも親しみやすく、対話相手や聞き手の理解も進みやすいという特徴がある。このことを、テレビ放送のニュース番組を例にして説明すると、日常的な話題についてはアナウンサーが一人でニュース原稿を読み上げるよりも、二人以上のパーソナリティが会話形式で紹介する方が親しみやすく感じられる。
【０００３】
会話形式の表現を利用する方法としては、例えば、ユーザのＷｅｂブラウジングに対応して同一のキャラクタエージェント達に、当該ページに関連する会話や寸劇を行わせることによって、ユーザによるＷｅｂページの理解に一貫性を持たせようとするＡｇｎｅｔａ＆Ｆｒｉｄａという試みがなされている（非特許文献１）。また、サッカー中継の試合情報を元に、チームに対する各々の態度や正確に従って会話を生成するエージェント（Ｇｅｒｄ＆Ｍａｔｚｅ）も考えられている（非特許文献２）。さらに、展示会参加者の個人情報に対し簡単な規則を適用して生成されたエージェント同士の会話を行わせるエージェントサロンも考えられている（非特許文献３）。
【０００４】
【非特許文献１】クリスティナ・フック（Ｈ”ｏ”ｏｋ，Ｋ．）、他３名著「インタフェースにおける隠れルター派的な見方の取り扱い：アグネタ＆フリーダシステムの評価（Ｄｅａｌｉｎｇ　ｗｉｔｈ　ｔｈｅ　Ｌｕｒｋｉｎｇ　Ｌｕｔｈｅｒａｎ　ｖｉｅｗ　ｏｎ　Ｉｎｔｅｒｆａｃｅｓ　：　Ｅｖａｌｕａｔｉｏｎ　ｏｆ　ｔｈｅ　Ａｇｎｅｔａ　ａｎｄ　Ｆｒｉｄａ　ｓｙｓｔｅｍ）」，（スペイン・サイチェス（Ｓｉｔｇｅｓ，Ｓｐａｉｎ）），生命的合成キャラクターの行動プラン具に関するワークショップ（ｔｈｅ　ｗｏｒｋｓｈｏｐ　Ｂｅｈａｖｉｏｕｒ　Ｐｌａｎｎｉｎｇ　ｆｏｒ　Ｌｉｆｅ−Ｌｉｋｅ　Ｓｙｎｔｈｅｔｉｃ　Ｃｈａｒａｃｔｅｒｓ）」，１９９９年，ｐ１２５−１３６
【非特許文献２】エリザベス・アンドレ（Ａｎｄｒ’ｅ，Ｅ．）、他１名著，「パフォーマンスによる表現：知識ベースプレゼンテーションシステムにおける複数の生命的キャラクターの利用（Ｐｒｅｓｅｎｔｉｎｇ　Ｔｈｒｏｕｇｈ　Ｐｅｒｆｏｒｍｉｎ：　Ｏｎ　ｔｈｅ　Ｕｓｅ　ｏｆ　Ｍｕｌｔｉｐｌｅ　Ｌｉｆｅｌｉｋｅ　Ｃｈａｒａｃｔｅｒｓ　ｉｎ　Ｋｎｏｗｌｅｄｇｅ−Ｂａｓｅｄ　Ｐｒｅｓｅｎｔａｔｉｏｎ　Ｓｙｓｔｅｍｓ）」，（米国），第２回知的ユーザインタフェース国際会議論文集（ｔｈｅ　Ｓｅｃｏｎｄ　Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｃｏｎｆｅｒｅｎｃｅ　ｏｎ　Ｉｎｔｅｌｌｉｇｅｎｔ　Ｕｓｅｒ　Ｉｎｔｅｒｆａｃｅｓ（　ＩＵＩ２０００）），２０００年，ｐ．１−８
【非特許文献３】角，間瀬著，「エージェントサロン：パーソナルエージェント同士のおしゃべりを利用した出会いと対話の促進」電子情報通信学会論文誌，第８４巻Ｄ−Ｉ，第８号，２００１年，ｐ１２３１−１２４３
【０００５】
【発明が解決しようとする課題】
ところが、Ａｇｎｅｔａ＆Ｆｒｉｄａにおいて利用される発話内容は、予め定められており、新たに会話形式の文章を生成するものではない。また、Ｇｅｒｄ＆Ｍａｔｚｅでは、発話はパスやインターセプト等のサッカーにおけるボールの移動に関する離散的なイベントに呼応したものであり、分野の制限されていない事象に関して述べられたモノローグ的な文章から会話形式の表現を生成することはできない。さらに、エージェントサロンにおいて会話に利用される項目は、個人の見学履歴と展示に対する評価データという離散的なものであるため、Ｇｅｒｄ＆Ｍａｔｚｅと同様に、限定的な事象についてにのみ対応するものである。
【０００６】
したがって、従来の何れの態様にしても、任意のモノローグ的文章を元にして理解が容易で親しみやすい会話表現を生成することはできないものであった。
【０００７】
そこで本発明は、以上のような問題に鑑みて、分野の制限されない任意のモノローグ的文章に基づいて、それをより親しみやすく理解しやすい会話表現に変換することができるようにすることを主たる目的としている。
【０００８】
【課題を解決するための手段】
すなわち、本発明において第１の態様に係る会話表現生成装置Ａ１は、図１に示すように、モノローグ的文章からなるモノローグテキストに基づいて、会話表現を生成するものであって、モノローグテキストを格納したモノローグテキスト格納部ＭＴＤから取得するモノローグテキスト取得手段１と、取得したモノローグテキストを単文形式に分割し一以上の単文テキストを生成する前処理手段２と、生成された単文テキストの文末表現を解析し当該文末表現を予め設定された複数の文末表現パターンの何れか一つに対応付ける文末処理手段３と、各文末表現パターンに対応付けてそれらに応答する表現として設定された複数のコメントテキストを格納するコメント格納部ＣＭＤから前記何れかの一の文末表現パターンに対応付けられた単文テキストに対応する一のコメントテキストを選択するコメント選択手段４と、前記単文テキストの後に選択されたコメントテキストを挿入し単文テキストとコメントテキストからなる会話テキストを生成する会話表現生成手段５と、生成した会話テキストを出力する会話テキスト出力手段６とを具備してなることを特徴とするものであり、コンピュータをこの会話表現生成装置Ａ１用のプログラムに基づいて動作させることによって上述の機能を奏する。以下に説明する各会話表現生成装置においてもそれ専用のプログラムに基づいて機能する点で同様である。
【０００９】
ここで、モノローグ的な文章とは、上述したように、単独の書き手が一方的に叙述した文章であり、モノローグテキストとは、このような文章からなるテキストデータを意味している。また、単文テキストとは、句点で終了する一文のみからなるテキストを意味している。また、モノローグテキスト格納部ＭＴＤは、モノローグテキストをデータベースとして格納してあるものであってもよいし、入力されたモノローグテキストを一時的なメモリに格納するものであってもよい。
【００１０】
このような会話表現生成装置Ａ１であれば、一般には理解しづらいモノローグテキストを、より理解の容易な会話形式の表現からなる会話テキストに自動変換することができるので、出力された会話テキストの利用者の情報理解に要する負担を軽減することができる。特に、従来の会話形式を利用した各システムとは異なり、モノローグテキストを単文化したうえでその文末表現に着目することによって、適切な会話テキストを生成するようにしているため、予め用意された会話文をそのまま利用したり定まった分野の会話表現のみを実現するのではなく、新規に作成された分野の制限のないモノローグ的な文章にも容易に対応できる点で、従来のものとは全く異なり且つ応用範囲の広いものである。
【００１１】
本発明の第２の態様に係る会話表現生成装置Ａ２は、図２に示すように、前記会話表現生成装置Ａ１の構成に加えて、ユーザにより入力されたキーワードを取得するキーワード取得手段７を更に具備するものであり、モノローグテキスト取得手段１が、モノローグテキスト格納部ＭＴＤから前記キーワードに対応する一以上のモノローグテキストを取得し、前処理手段２が、取得されたモノローグテキストのそれぞれについて単文テキストを生成するように構成したものである。
【００１２】
このような構成であれば、ユーザが興味のあるキーワードを入力すれば、それに対応するモノローグテキストから生成した会話テキストが出力されるので、興味のある話題についてより理解を深めることができる。
【００１３】
このような会話表現生成装置Ａ１、Ａ２において、モノローグテキスト格納部ＭＴＤが、ユーザの入力により電子掲示板に投稿された意見文テキストをモノローグテキストとして格納するものであり、モノローグテキスト取得手段１において、この意見文であるモノローグテキストを取得するようにしてあれば、多様な人の意見の理解が容易になる。
【００１４】
本発明の第３の態様に係る会話表現生成装置Ａ３は、図３に示すように、前記会話表現生成装置Ａ１又はＡ２の構成に加えて、二以上の予め設定された話者エージェントのそれぞれに対して、出力された会話テキストのうち単文テキストの読み手として一の話者エージェントを対応付けるとともに、コメントテキストの読み手として他の話者エージェントを対応付ける処理を行う話者決定手段８を更に具備するものである。なお、この会話表現生成装置Ａ３において、図３に破線で示したキーワード取得手段７は必須の機能ではなく、オプション的にこの機能を設けるか否かを選択できるものである。また、以下に説明する図４〜図７に示される各会話表現生成装置においても、破線で示される各手段は、当該装置又はプログラムにおいてはオプション的な機能であることを示している。
【００１５】
このような会話表現生成装置Ａ３であれば、会話形式である会話テキストは、単文テキストとコメントテキストとから構成されており、それらを別々の話者エージェントが読み手として区別されるため、例えばユーザが使用するディスプレイに単文テキストとコメントテキストを文字表示する場合に、単文テキスト又はコメントテキストに対応付けられた何れかの話者エージェントの画像を同時に表示するようにすれば、現実に二人以上の話者が対話しているかの如き状態を模擬的に表現することができるため、聞き手であるユーザにとって極めて親しみやすく、理解度も向上させることができる。
【００１６】
この会話表現生成装置Ａ３の効果をより向上する本発明の第４の態様に係る会話表現生成装置Ａ４は、図４に示すように、会話表現生成装置Ａ３の構成に加えて、話者決定手段８で決定した各話者エージェントごとに異なる音声で対応する単文テキスト又はコメントテキストを音声出力する音声出力手段９を更に具備している。すなわち、単文テキストとコメントテキストとを異なる話者エージェントが発話しているように、それぞれ異なる音声で出力することによって、聞き手の理解のし易さがさらに向上される。
【００１７】
そして、この会話表現生成装置Ａ４による効果をさらに向上する本発明の第５の態様に係る会話表現生成装置Ａ５は、図５に示すように、会話表現生成装置Ａ４の構成に加えて、音声出力手段９で出力される単文テキスト又はコメントテキストの音声に対応して、各話者エージェントに当該話者エージェントの画像のうち少なくとも口を動かせるアニメーション動作を付加し出力するアニメーション処理手段１０を更に備えたものである。すなわち、話者エージェントが現実に話しているかのような単文テキスト又はコメントテキストの音声データに加えて、話者エージェントの画像を動作させることによって、会話表現生成装置Ａ４の場合よりも更に現実味を帯びた態様で会話を進行させることが可能となる。
【００１８】
さらに、会話表現生成装置Ａ３、Ａ４、Ａ５の何れかに対する補助的な機能を有する本発明の第６の態様に係る会話表現生成装置Ａ６は、各話者エージェントと共に、対応する単文テキスト又はコメントテキストを画面表示可能な文字データとして出力する文字データ出力手段１１を更に具備するものである。このような構成とすれば、文字データとして出力された単文テキスト及びコメントテキストを話者エージェントごとに関連づけてディスプレイに表示することができるため、耳の不自由なユーザであっても内容の理解を深めることができ、また、音声データやアニメーション画像とともに出力することで、一般的なユーザも目と耳から情報を受け取ることで内容の把握が容易となる。
【００１９】
また、モノローグテキストが画像データを伴っている場合も考えられるが、この場合における本発明の第７の態様に係る会話表現生成装置Ａ７には、上述した各会話表現生成装置Ａ１、Ａ２、Ａ３、Ａ４、Ａ５、Ａ６の何れかの構成に加えて、モノローグテキスト格納部ＭＴＤから当該画像データを取得し出力する画像データ処理手段１２を更に設けるとよい。すなわち、出力する画像データを会話テキストの理解のための補助として役立てることができる。
【００２０】
特に、前記会話表現生成装置Ａ３、Ａ４、Ａ５、Ａ６の何れかにおいて、話者エージェントの一つとして、会話表現の進行役となるメインキャスタエージェントを設定しておき、話者決定手段８が、メインキャスタエージェントをコメントテキストの読み手として決定するものとしている場合には、会話表現の進行をスムーズに行うことができる。そして、モノローグテキストが、その内容の本質的部分である本文部と概要を示す表題部とから構成されるものである場合には、話者決定手段８において、メインキャスタエージェントを表題部の読み手として決定するように構成するとよい。また、コメント格納部ＣＭＴに、会話表現の開始を示すコメントテキストが格納されていれば、コメント選択手段４において当該単文テキストの前に他の単文テキストがない場合に前記開始を示すコメントテキストを選択し、話者選択手段８において、当該コメントテキストに表題部を合成したものをメインキャスタエージェントに対応付けるように構成することによっても、会話の流れをスムーズなものとすることができる。一方、前記メインキャスタエージェントとは異なる話者エージェントとして一以上のアナウンサーエージェントが設定している場合には、話者決定手段８が、アナウンサーエージェントを本文部の読み手として決定するようにすれば、メインキャスタエージェントとの役割分担を明確なものとすることができる。
【００２１】
以上に説明した各会話表現生成装置Ａ１〜Ａ７の何れかにおいて、複数のモノローグテキストに基づく複数の会話表現の流れを円滑なものとするためには、コメント選択手段４において、一のモノローグテキストにおける最終の単文テキストを認識し、最終の単文テキストにおける文末表現パターンに対応付けて次のモノローグテキストへ接続する表現として設定されコメント格納部に格納されたコメントテキストから当該最終の単文テキストの文末表現に対応するコメントテキストを選択するように構成することが好ましい。
【００２２】
また、文末表現は、ある程度パターン化して分類しておくことができる。すなわち、文末表現パターンに、現象を述べ立てることを示す現象叙述形式と伝聞であることを示す伝聞形式とが少なくとも含ませるとともに、コメントテキストに、現象叙述形式に対応する質問文形式に該当するコメントテキストと伝聞形式に対応する予想文形式に該当するコメントテキストとが少なくとも含ませておき、文末処理手段３において、単文テキストの文末表現を現象叙述形式又は伝聞形式の何れかに対応付けるとともに、それに対応してコメント選択手段４において、質問文形式又は予想文形式の何れか一方のコメントテキストを選択するような態様が好ましいものとしてあげることができる。さらに加えて、質問文形式及び予想文形式のコメントテキストを、それぞれ複数ずつ設定しておき、コメント選択手段４において、それら複数のコメントテキストから何れか一のコメントテキストを選択するようにしておけば、会話が単調とならないようにバリエーションを持たせることができる。
【００２３】
ところで、最近では、パブリック・オピニオン・チャンネル（以下、「ＰＯＣ」と称する）と呼ばれるコミュニティのためのインタラクティブ放送システムが開発されつつある。このＰＯＣは、コミュニティメンバが他のメンバに向けて電子掲示板に投稿した意見文を処理の対象とし、この意見文に会話生成処理を加えたうえで、仮想的な話者であるメインキャスタエージェントとアナウンサーエージェントの会話による意見紹介番組の形でその意見文をコミュニティメンバに向けて放送するものである。したがって、本発明に係る会話表現生成装置Ａ１〜Ａ７を、複数のユーザが入力することにより利用可能な電子掲示板に入力されたモノローグテキストを格納する前記モノローグテキスト格納部と、前記コメント格納部とを有し、且つ、入力されたモノローグテキストに基づいて生成される会話表現テキストを放送可能な、ＰＯＣをはじめとするインタラクティブ放送システムにおいて適用し、入力されたモノローグテキストに基づいて会話テキストを生成し、当該会話テキストを放送可能に出力することで、ＰＯＣ等のコミュニティ向け意見紹介放送においても極めて重要な役割を果たすことができる。
【００２４】
【発明の実施の形態】
以下、本発明の一実施形態を、図面を参照して説明する。
【００２５】
この実施形態は、図８にシステム全体の概要を示すように、上述したパブリック・オピニオン・チャンネル（以下、「ＰＯＣ」と称する）に適用される会話表現生成装置であり、特に本発明における第７の態様の会話表現生成装置Ａ７を利用したものである。以下、この会話表現生成装置Ａ７は、ＰＯＣキャスタＡ７と呼ぶものとする。ＰＯＣは、コミュニティメンバである各ユーザＵが使用するパーソナルコンピュータやＰＤＡや携帯電話等のクライアントコンピュータＣＣ、クライアントコンピュータＣＣに対してユーザＵがクライアントコンピュータＣＣからアクセス可能な電子掲示板を提供するとともにクライアントコンピュータＣＣから投稿されたユーザＵの意見文を格納するＰＯＣサーバＰＳ、ＰＯＣキャスタＡ７とから基本的に構成されており、これらクライアントコンピュータＣＣ、ＰＯＣサーバＰＳ、ＰＯＣキャスタＡ７はインターネットＩＮを通じて双方向通信可能に接続されている。なお、ＰＯＣキャスタＡ７は、ユーザＵから投稿された意見文を元に生成した会話形式の放送をクライアントコンピュータＣＣへ放送するための放送用クライアントとしての機能も有しており、ユーザＵは自己のクライアントコンピュータＣＣの画面上で当該放送を視聴することができる。
【００２６】
まず、各機器の内部機器構成について説明する。ＰＯＣサーバＰＳは、汎用サーバコンピュータによって構成されるものであり、データベースサーバ機能やＷｅｂサーバ機能を有している。そのうち、Ｗｅｂサーバが、クライアントコンピュータＣＣから閲覧可能なホームページや電子掲示板を提供している。また、データベースサーバが、電子掲示板に入力された意見文を格納するモノローグテキスト格納部ＭＴＤとしての機能を有している。一方、ＰＯＣキャスタＡ７は、一般的なサーバコンピュータやパーソナルコンピュータによって構築されるものであり、図９に示すように、ＣＰＵ１０１、内部メモリ１０２、ハードディスク等の記憶装置１０３、キーボードやマウス等の入力デバイス１０４、ディスプレイやスピーカ等の出力デバイス１０５、各種通信インタフェース１０６等を内部機器として有している。なお、データベース装置１０７を更に内部機器として有していてもよいし、外部機器として有していてもよい。そして、記憶装置１０３に記録されたプログラムをＣＰＵ１０１の指示に従って内部メモリ１０２に読み込み、適宜データベース装置１０７から必要なデータ等を読み出し、また、通信インタフェース１０６を介してＰＯＣサーバＰＳやクライアントコンピュータＣＣと情報通信を行うことによって、このＰＯＣキャスタＡ７が動作する。なお、ＰＯＣキャスタＡ７において情報の入力や画面表示等の出力が必要な場合には、適宜入力デバイス１０４や出力デバイス１０５が利用される。また、この実施形態では、ＰＯＣサーバＰＳとＰＯＣキャスタＡ７とをインターネットＩＮを通じて双方向通信可能な別個のコンピュータとして示しているが、これらは専用通信回線で接続されていてもよいし、単一のコンピュータによって実現されるものであってもよい。さらにまた、クライアントコンピュータＣＣは、上述したように一般的なパーソナルコンピュータやＰＤＡ、携帯電話等からなるものであるが、ここでは少なくともインターネットＩＮへの接続機能、文字や画像の入出力機能、ディスプレイ等の画像表示機能、スピーカ等の音声出力機能を有しているものとする。
【００２７】
次に、ＰＯＣキャスタＡ７の機能について説明すると、このＰＯＣキャスタＡ７は、会話表現生成プログラムに基づくＣＰＵ１０１の指示に従って各内部機器及び外部機器が協動し、図７に示したように、モノローグテキスト取得手段１、前処理手段２、文末処理手段３、コメント選択手段４、会話表現生成手段５、会話テキスト出力手段６、キーワード取得手段７、話者決定手段８、音声出力手段９、アニメーション処理手段１０、文字データ出力手段１１、画像データ処理手段１２としての機能を有している。これらの各手段を動作させるためのプログラムをコンピュータにインストールすることによって、コンピュータがＰＯＣキャスタＡ７として機能することになる。なお、このプログラムは、例えばＣＤ−ＲＯＭ等の記録媒体に記録したものをコンピュータに読み込ませたり、インターネットＩＮ等を通じてコンピュータにダウンロードすることによって実装される。本実施形態では特に、意見紹介の会話を行う仮想的な話者であるエージェント（メインキャスタエージェントＭＡ及びアナウンサーエージェントＡＡ）の音声合成にはＴＳＳシステム（株式会社東芝製）を、それらエージェントの画像には、写真顔キャラクター作成システム（株式会社シャープ）を、ＰＯＣキャスタＡ７に組み込んで使用しているが、これらと同等機能を有する他の製品を利用することも可能である。
【００２８】
また、前記データベース１０７は、コメント格納部ＣＭＤとしての機能を有している。ここで、格納されるコメントの一例を図１０に示す。この例では、ユーザＵにより投稿された意見文の文末表現パターンを２種類に大別し、そのうち１種類を更に３種類に分類し、そのそれぞれに対応するコメントが複数ずつ用意されている。具体的に説明すると、文末表現は、同図左欄に示すように、「現象を述べ立てる」現象叙述形式と、伝聞であることを示す『伝聞形式』に大別されている。さらに現象叙述形式は、『現象を述べ立てる「がある」形式』、『現象を述べ立てるアスペクト辞「ている」形式（現在・現在進行形）』、『現象を述べ立てるアスペクト辞「ている」（過去・過去進行形）』の３種類に分類されている。これら文末表現の例としては、同図中欄に示すようなものが挙げられる。すなわち、まず、『現象を述べ立てる「がある」形式』の表現例には、「〜がある。」、「〜があります。」、「〜があった。」、「〜がありました。」等が挙げられる。『現象を述べ立てるアスペクト辞「ている」形式（現在・現在進行形）』の表現例には、「ている。」、「ています。」、「〜が人気を呼んでいる。」等が挙げられる。『現象を述べ立てるアスペクト辞「ている」（過去・過去進行形）』の表現例には、「ていた。」、「ていました。」が挙げられる。『伝聞形式』の表現例には、「〜だそうです。」、「〜だそうだ。」、「〜という。」、「〜といいます。」等が挙げられる。このような表現例は、物事を紹介する際の文章の文末に関する様相を分析した結果、上述のような合計４種類の文末表現パターンに分類されることが判明したことに基づく。そして、各文末表現パターンに対応して、それらの後に挿入すべきコメントテキストは、同図右欄に示すようなものである。すなわち、現象を述べ立てる「がある」形式』及び『現象を述べ立てるアスペクト辞「ている」形式（現在・現在進行形）』には、『詳細質問文（現在の内容）』として、「どういうものなの？」、「もっと教えてよ」、「それはなに？」、「どんなものなの？」等のコメントテキストを対応付けている。また、『現象を述べ立てるアスペクト辞「ている」（過去・過去進行形）』には、『詳細質問文（過去の内容）』として、「どうだったの。」、「それで、どうだったの。」等のコメントテキストを対応付けている。さらに、『伝聞形式』には、『詳細予想文』として、「どんなのだろう。」、「どんなかんじなんだろう。」、「どんなのかな。」等のコメントテキストを対応付けている。なお、以上に示した表現例やコメントテキスト例は、一例であって、これら以外のものを含む場合もある。そして、図示していないが、各コメントテキストのパターンごと及び個々のコメントテキストには、適宜の識別子が付与されていて他のコメントテキストと区別されるようにしている。
【００２９】
ただし、コメント格納部ＣＭＤとして機能するデータベース１０７には、上述したコメントテキストの他に、例えば、単に相づちを打つ表現である「はい。」や「そうですか。」等のコメントテキスト、エージェントが音声出力により仮想的に読み上げる元になる意見文と意見文との間に挿入され話題を接続したり他のユーザに呼びかけることを表す「みなさん、どう思われますか。」等のコメントテキスト、会話を開始することを表す「では、〜の話題です。」等のコメントテキスト等も単数又は複数ずつ格納されている。
【００３０】
また、ユーザＵが各自のクライアントコンピュータＣＣで入力し送信により投稿した意見文は、ＰＯＣサーバＰＳに格納されるが、これら意見文それぞれは、ユーザＵが各自で叙述したモノローグ的な文章からなるモノローグテキストである。図１１に、意見文の一例を示す。同図に示すように、意見文ＯＰＴは、「題目」欄に記述された表題部ＯＰＨと、「本文」欄に記述された本文部ＯＰＭとから構成されており、本文部には、一以上の文が記述されている。各意見文ＯＰＴは、他の意見文ＯＰＴと区別される固有の識別子により管理されている。なお、本実施形態では、意見文はＰＯＣサーバＰＳ内において、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅ　Ｍａｒｋｕｐ　Ｌａｎｇｕａｇｅ）形式に変換されているが、これ以外の形式であってもよいのは勿論である。さらに各意見文には、動画又は静止画からなる関連画像ＯＰＩが添付される場合もある。
【００３１】
また、話者エージェントは、メインキャスタエージェントとアナウンサーエージェントの２種類が予め設定されているものとする。すなわち、メインキャスタエージェントとアナウンサーエージェントのそれぞれに、写真顔キャラクター作成システムで作成された顔のキャラクター画像と、ＴＳＳシステムにより作成された合成音声とが関連づけて設定してあるものとする。ここで、アナウンサーエージェントは、意見文に基づいて作成される会話テキストのうち、元の意見文の本文部ＯＰＭを読み上げる話者として設定されている。一方、メインキャスタエージェントには、表題部ＯＰＨ及びコメントテキストを読み上げる役割が設定されているものとする。
【００３２】
以下、ＰＯＣキャスタＡ７の動作例について、図１１に示した意見文例、図１２に示す会話テキスト例、図１３に示すフローチャート例、及び図　に示す画面例等を利用して説明する。
【００３３】
まずはじめに、前提として、例えば図１１に示したようなユーザＵからの意見文がモノローグテキストとしてＰＯＣサーバＰＳに多数格納されているものとする。すなわち、ユーザＵは、ＰＯＣサーバＰＳにより提供された電子掲示板を利用して、各自の意見文の投稿を行っている。また、ＰＯＣサーバＰＳ又はＰＯＣキャスタＡ７は、意見文の紹介を視聴したいユーザＵのクライアントコンピュータＣＣに対して、キーワードの入力欄を表示した画面を送信して表示させており、ユーザＵはその画面に何らかのキーワードを入力したうえでそれをクライアントコンピュータＣＣからＰＯＣキャスタＡ７へ送信しているものとする。
【００３４】
ＰＯＣキャスタＡ７は、クライアントコンピュータＣＣから送信されたキーワードを取得すると（図１３；ステップＳ１）、ＰＯＣサーバＰＳを検索して取得したキーワドに関連する意見文（モノローグテキスト）ＯＰＴを検索する（ステップＳ２）。なお、この検索に際しては、例えば表題部ＯＰＨのみの検索、表題部ＯＰＨ及び本文部ＯＰＭの全文検索等、適宜の方法を採用することができる。意見文ＯＰＴがＰＯＣサーバＰＳから１以上取得できた場合（ステップＳ２ａ；Ｙｅｓ）は、次へ進む。ここで、取得した意見文ＯＰＴが複数あった場合は、例えば識別子の昇順又は降順、又は日付順などの適宜の順番に並べられる。一方、取得したキーワードに該当する意見文ＯＰＴがなかった場合には（ステップＳ２ａ；Ｎｏ）、その旨の情報をクライアントコンピュータＣＣへ送信する（ステップＳ２ｂ）。
【００３５】
次に、全ての意見文ＯＰＴを紹介したか否かを判断し、紹介し終えて以内場合（ステップＳ３；Ｎｏ）には、１件の意見文ＯＰＴを句点「。」ごとに区切った単文テキストに分割する（ステップＳ４）。そして、各単文テキストの文末表現を解析し（ステップＳ５）、各単文テキストの文末表現に該当する文末表現パターンに対応するコメントテキストをデータベース１０７から抽出して（ステップＳ６）、抽出したコメントテキストを各単文テキストの後に挿入することによって、会話テキストＣＶＴを生成する（ステップＳ７）。ここで、一例として、取得した意見文ＯＰＴが図１１に示したようなものであれば、生成される会話テキストＣＶＴは、図１２に示すようなものとなる。すなわち、図１１に示す意見文ＯＰＴの本文部ＯＰＭは、４つの単文テキストに分割される。まず、表題部ＯＰＨは、メインキャスタエージェントＭＣＡに割り振られる。ここで、表題部ＯＰＨの前には当該意見文ＯＰＴにおいて先行する単文テキストが存在しないので、会話の開始を示すコメントテキストと表題部ＯＰＨの記載とを合成して、「まずはじめは、ウォーキングの話題です。」というテキストが生成されている。なお、このように、「まずはじめは…」とするか否かは、意見文の紹介順により適宜変更することができ、例えば「次は、…」や「最後は…」等というようなコメントテキストを利用することができる。次に、本文部ＯＰＭの第１文はアナウンサーエージェントＡＮＡに割り振られるが、その第１分の文末表現は、「…そうです。」という『伝聞形式』に該当するので、「どんなのだろう」というコメントテキストが対応付けられており、それをメインキャスタエージェントＭＣＡに割り振ることになる。同様に、第２文、第３文がアナウンサーエージェントＡＮＡに割り振られるとともに、それら第２文、第３文の文末表現に対応するコメントテキスト「もっと教えてよ。」や「はい。」等がメインキャスタエージェントＭＣＡに割り振られる。さらに、第４文もアナウンサーエージェントＡＮＡに割り振られるが、この第４文は、当該意見文ＯＰＴの末尾の単文テキストであることから、他のユーザＵに呼びかける表現であり、次の意見文ＯＰＴにつなげることにもなるコメントテキスト「みなさん、どうでしょう？」がメインキャスタエージェントＭＣＡに割り振られる。
【００３６】
図１３に示したフローチャートに戻って説明を続けると、生成された会話テキストＣＶＴのうち、表題部ＯＰＨについては（ステップＳ８；Ｙｅｓ）、メインキャスタエージェントＭＣＡにコメントテキストの発話動作を与える一方、アナウンサーエージェントＡＮＡには休憩動作を与える（ステップＳ８ａ）。また、表題部ＯＰＨではない、すなわち本文部ＯＰＭである場合（ステップＳ８；Ｎｏ）、文末表現が紹介を述べ立てる表現であるか否かを判断し、そうでなければ（ステップＳ８ｂ；Ｎｏ）、アナウンサーエージェントＡＮＡには単文テキストの読み上げ動作を与える一方、メインキャスタエージェントＭＣＡには休憩動作を与える（ステップＳ８ｃ）。一方、文末表現が紹介を述べ立てる表現である場合（ステップＳ８；Ｙｅｓ）、メインキャスタエージェントＭＣＡにはコメントテキストの読み上げ動作を与える一方、アナウンサーエージェントＡＮＡには休憩動作を与える（ステップＳ８ａ）。そして、メインキャスタエージェントＭＣＡ及びアナウンサーエージェントＡＮＡにそれぞれ動作を与えると、それに対応するアニメーションを生成すると音声を合成する。ここで、アニメーション動作には、少なくともメインキャスタエージェントＭＣＡ及びアナウンサーエージェントＡＮＡが口を動かせる動作が含まれるが、後述するように添付画像が意見文ＯＰＴに付帯されている場合には、何れかのエージェントに指差し動作をさせたり、頷く動作をさせるなどのバリエーションがある。また、上述したように、意見文ＯＰＴに添付画像ＯＰＩがあれば（ステップＳ１０；Ｙｅｓ）、その添付画像を、例えばクライアントコンピュータＣＣに表示させるための送信画像の中央に配置するなどして、送信画像に添付画像を合成する。その後、又は添付画像がない場合（ステップＳ１０；Ｎｏ）、送信画像の例えば下欄に会話テキストＣＶＴの文字データを合成し（ステップＳ１１）、全てのデータをクライアントコンピュータＣＣで視聴可能な形式として送信する（ステップＳ１２）。送信の結果、クライアントコンピュータＣＣのディスプレイに表示される画像は、例えば図１４〜図２２に示すようなものであり、クライアントコンピュータＣＣのスピーカからは、メインキャスタエージェントＭＣＡ及びアナウンサーエージェントＡＮＡそれぞれの音声が出力される。
【００３７】
クライアントコンピュータＣＣのディスプレイに表示される画面、及びスピーカから出力される音声について説明すると、図１４〜図２２では、画面中央に添付画像が表示されており、その左側にメインキャスタエージェントＭＣＡの画像、右側にアナウンサーエージェントＡＮＡの画像が配置された状態を示している。そして、図１２に示した会話テキストＣＶＴに従って、順次各エージェントの動作及びコメント又は意見文を読み上げる音声出力、並びにこの音声出力に伴った文字データの表示（画面下欄）が行われる。まず、図１４に示すように、メインキャスタエージェントＭＣＡが、当該意見文の表題部ＯＰＨ及び会話の開始を示すコメントテキストを合成したコメント「まずはじめは…」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。次に、図１５に示すように、アナウンサーエージェントＡＮＡが、本文部ＯＰＭの第１文「伊勢志摩の…」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。このとき、アナウンサーエージェントＡＮＡには、添付画像ＯＰＩを指し示すアニメーション動作を行わせるようにしている。次に、図１６に示すように、話者がメインキャスタエージェントＭＣＡに交代して、コメント「どんなのだろう。」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。さらに、図１７に示すように、話者がアナウンサーエージェントＡＮＡに交代して、本文部ＯＰＭの第２文「大型の施設観光を…」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。続いて図１８に示すように、メインキャスタエージェントＭＣＡが、コメント「もっと教えてよ。」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。次に、図１９に示すように、アナウンサーエージェントＡＮＡに話者が交代し、本文部ＯＰＭの第３文「美しい景色や…」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。これに対して、図２０に示すように、メインキャスタエージェントＭＣＡが、コメント「はい。」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。そして、図２１に示すように、アナウンサーエージェントＡＮＡが、本文部ＯＰＭの最終文「どなたかご一緒しませんか？」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示する。そして最後は、図２２に示すように、メインキャスタエージェントＭＣＡが、コメント「みなさん、どうでしょう？」を読み上げる動作を行うとともに、その音声を出力し、画面下欄にこのコメントの文字データを表示して、次の意見文ＯＰＴの紹介へとつなげる。
【００３８】
すなわち、以上の各ステップが終了すると、ステップＳ３に戻り、次の意見文ＯＰＴの処理を行う。そして、全ての意見文ＯＰＴの紹介が終了すると（ステップＳ３；Ｙｅｓ）、当初に取得した位キーワードについての処理が終了となる。なお、ステップＳ９〜Ｓ１１は、必ずしもこの順番である必要はなく、適宜順番を入れ替えてもよい。
【００３９】
以上のようにして、ユーザＵが投稿した意見文は、その意見文の表層的な手がかりである文末表現から対象となる意見文の意図を推測して、コメントテキストの挿入や付加合成処理を行うことによって生成された会話テキストに変換されるので、他のユーザＵは、当該意見文を会話形式で視聴できることになる。したがって、元の情報提示がモノローグ的な文章である意見文であっても、それを視聴するユーザＵにはより親しみやすく理解へ負担を低減した態様で情報を提供することができる。
【００４０】
なお、本発明は上述した実施形態に限られるものではない。例えば分割された単文テキストの文末表現のパターンを増減することや、各パターンに該当する表現例、対応するコメントテキストの数も適宜増減することができる。また、電子掲示板に投稿される文章はフォーマルな文章ではないため、投稿された意見文において、表題部と本文部とが一つの文章として繋がっている場合があるが、この場合、本文部の文頭が「を」、「が」、「の」等の格助詞や「…」等の記号から始まっているような文章を正規化するなどの処理を行うようにすることもできる。その他、各部の具体的構成についても上記実施形態に限られるものではなく、本発明の趣旨を逸脱しない範囲で種々変形が可能である。また、生成される会話テキストは、二者に限らず三者以上の会話文とすることができる。さらに、本発明をＰＯＣ以外の分野又はシステムに適用することも可能である。
【００４１】
【発明の効果】
本発明によれば、以上に詳述したように、一人が叙述した文章であるモノローグテキストを、単文に分割するという処理を経て、聞き手又は読み手が要する理解への負担を軽減し得る会話形式に変換することができるものである。すなわち、モノローグテキストの表層的な手がかりである文末表現をパターン化することで、当該モノローグテキストの意図するところを推測して単文の末尾に適切なコメントを挿入することによって、話題の内容又は分野に制限なく、会話表現を生成することが可能となる。したがって、本発明を応用することで、ＰＯＣ等の不特定の話題が登場するコミュニティにおける意見紹介番組の運営や、会話に関する研究にも大いに役立つことになる。
【図面の簡単な説明】
【図１】本発明の請求項１に対応する会話表現生成装置の機能構成を示すブロック図。
【図２】本発明の請求項２に対応する会話表現生成装置の機能構成を示すブロック図。
【図３】本発明の請求項４に対応する会話表現生成装置の機能構成を示すブロック図。
【図４】本発明の請求項５に対応する会話表現生成装置の機能構成を示すブロック図。
【図５】本発明の請求項６に対応する会話表現生成装置の機能構成を示すブロック図。
【図６】本発明の請求項７に対応する会話表現生成装置の機能構成を示すブロック図。
【図７】本発明の請求項８に対応する会話表現生成装置の機能構成を示すブロック図。
【図８】本発明の一実施形態を適用したＰＯＣのシステムを示す概観図。
【図９】同実施形態のＰＯＣキャスタの概略的な内部機器構成図。
【図１０】同実施形態に適用されるコメント格納部の内部データの一例を示す図。
【図１１】同実施形態に適用される意見文の一例を示す図。
【図１２】同実施形態で生成された会話テキストの一例を示す図。
【図１３】同実施形態の動作を概略的に示すフローチャート。
【図１４】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図１５】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図１６】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図１７】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図１８】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図１９】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図２０】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図２１】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【図２２】同実施形態においてクライアントコンピュータに表示される画面例を示す図。
【符号の説明】
１…モノローグテキスト取得手段
２…前処理手段
３…文末処理手段
４…コメント選択手段
５…会話表現生成手段
６…会話テキスト出力手段
７…キーワード取得手段
８…話者決定手段
９…音声出力手段
１０…アニメーション処理手段
１１…文字データ出力手段
１２…画像データ処理手段
Ａ１、Ａ２、Ａ３、Ａ４、Ａ５、Ａ６、Ａ７…会話表現生成装置
ＣＭＤ…コメント格納部
ＭＴＤ…モノローグテキスト格納部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a conversation expression generation device capable of suitably generating a conversational expression based on a monologue expression, and a program therefor.
[0002]
[Prior art]
At present, some means of providing information via communication means such as the Internet are accompanied by images, but text communication in Web pages, text communication such as e-mail, chat, electronic bulletin boards, etc. are mainly used. It is. In particular, it can be said that many of the sentences in the Web page are monologous sentences unilaterally described by a single writer. Also, in text communication such as e-mail, in the description (speech) of one writer (speaker), the coordination of a dialogue (conversation) in which another person inserts a question is a face-to-face dialogue in which face-to-face conversations are actually faced. It is considered that each utterance is generally long and has a monolog-like property that is fragmented for each utterance because it is more difficult than in the case of. As described above, at present, a large amount of information stored as monologous sentences exists on the Internet. However, monologue sentences are not a means of presenting information for everyone, and conversational sentences that express important parts in a question-answer format are generally more friendly than specialized monologue sentences, The level of understanding also tends to be high. Here, a conversational form is a form in which multiple speakers stack up conversations, as typified by an interview article or a script of a TV program.A conversational form is compared with a monologue form. Although it is difficult to make objective and strictly structured narratives, it has the advantage of being able to provide concise information focusing on important parts, Because it is the most common means of exchanging information for communication, it is easier to understand than monologous sentences, and it is easier to understand the conversation partner and listeners. Taking this as an example of a TV broadcast news program, it is more familiar to introduce two or more personalities in a conversational form than to announcer reading a news manuscript alone for everyday topics. .
[0003]
As a method of using the conversational expression, for example, by having the same character agents perform a conversation or skit related to the page in response to the user's Web browsing, the user can understand the Web page consistently. There has been an attempt to make Agneta & Frida to have a property (Non-Patent Document 1). An agent (Gerd & Matze) that generates a conversation in accordance with each attitude and accuracy with respect to a team based on game information of a live soccer relay is also considered (Non-Patent Document 2). Further, an agent salon that allows a conversation between agents generated by applying simple rules to personal information of exhibition participants has been considered (Non-Patent Document 3).
[0004]
[Non-Patent Document 1] Christina Hook (H "o" ok, K.) and three other authors, "Handling of Hidden Lutheran Views on Interfaces: Evaluation of Agneta & Fleida System (Dealing with the Lurking Lutheran viewon)" Interfaces: Evaluation of the Agneta and Frida system ", (Sitges, Spain), Workshop on Behavioral Planning Tools for Biosynthetic Characters (the workshop, Beyond Life Insurance Company , P125-136
[Non-Patent Document 2] Elizabeth André (Andr'e, E.), and one other author, "Expression by Performance: Use of Multiple Life Characters in a Knowledge-Based Presentation System (Presenting Through Performing: On the Use of Multiple Lifelike) Characters in Knowledge-Based Presentation Systems), (USA), The Second International Conference on Intelligent User Interfaces (IUI 2000, IUI 2000). 1-8
[Non-Patent Document 3] Kado, Mase, "Agent Salon: Encouragement of Engagement and Dialogue Using Personal Agents'Talking" Transactions of the Institute of Electronics, Information and Communication Engineers, Vol. 84, DI, No. 8, 2001, p1231-1243
[0005]
[Problems to be solved by the invention]
However, the utterance content used in Agneta & Frida is predetermined, and does not generate a new conversational sentence. In Gerd & Matze, the utterance is in response to discrete events related to the movement of the ball in soccer, such as passes and intercepts, and a conversational expression is formed from a monologous sentence describing events that are not restricted in the field. Cannot be generated. Further, since the items used for conversation in the agent salon are discrete items such as the visit history of the individual and the evaluation data for the exhibition, the items correspond to only limited events like Gerd & Matze.
[0006]
Therefore, in any of the conventional modes, it is impossible to generate an easily understandable and familiar conversational expression based on an arbitrary monologue-like sentence.
[0007]
Accordingly, the present invention has been made in view of the above-described problems, and has as its main object to convert any monologue-like sentence, whose field is not limited, into a more familiar and understandable conversational expression. And
[0008]
[Means for Solving the Problems]
That is, as shown in FIG. 1, the conversation expression generation device A1 according to the first embodiment of the present invention generates a conversation expression based on a monolog text composed of monolog texts, and stores the monolog text. A monologue text acquisition unit 1 for acquiring from the obtained monologue text storage unit MTD, a preprocessing unit 2 for dividing the acquired monologue text into a single sentence format to generate one or more single sentence texts, and analyzing a sentence end expression of the generated single sentence texts The end-of-sentence processing means 3 for associating the end-of-sentence expression with one of a plurality of end-of-sentence expression patterns set in advance, and a plurality of comment texts set as expressions corresponding to the end-of-sentence expression patterns in response to the end-of-sentence expression patterns Sentence associated with any one of the end-of-sentence expression patterns from the comment storage unit CMD A comment selecting means 4 for selecting one comment text corresponding to the text, a conversation expression generating means 5 for inserting a selected comment text after the simple text and generating a conversation text composed of the simple text and the comment text; And a conversation text output unit 6 for outputting the conversation text. The computer has the above function by operating a computer based on the program for the conversation expression generation device A1. The same applies to each of the conversation expression generation devices described below in that they function based on dedicated programs.
[0009]
Here, a monolog-like text is a text unilaterally described by a single writer as described above, and a monolog text means text data composed of such a text. The simple sentence text means a text consisting of only one sentence ending at a period. The monologue text storage unit MTD may store the monologue text as a database, or may store the input monologue text in a temporary memory.
[0010]
With such a conversation expression generation device A1, a monologue text that is generally difficult to understand can be automatically converted into a conversation text composed of a conversational expression that is easier to understand. It is possible to reduce the burden on the user for understanding information. In particular, unlike systems that use conventional conversation styles, monologue text is monoculturalized, and by paying attention to the end-of-sentence expression, appropriate conversation text is generated. It is completely different from conventional ones in that it can easily handle monologous sentences that have no restrictions in newly created fields, instead of using sentences as it is or only achieving conversational expressions in fixed fields. And it has a wide range of applications.
[0011]
As shown in FIG. 2, the conversation expression generation device A2 according to the second aspect of the present invention further includes a keyword acquisition unit 7 for acquiring a keyword input by a user, in addition to the configuration of the conversation expression generation device A1. The monologue text obtaining means 1 obtains one or more monologue texts corresponding to the keywords from the monologue text storage unit MTD, and the preprocessing means 2 generates a single sentence text for each of the obtained monologue texts. It is configured to generate.
[0012]
With such a configuration, if the user inputs a keyword of interest, the conversation text generated from the corresponding monologue text is output, so that the user can deepen the understanding of the topic of interest.
[0013]
In such conversational expression generation devices A1 and A2, the monologue text storage unit MTD stores opinion texts posted on the electronic bulletin board by user input as monologue texts. If a monologue text, which is an opinion sentence, is obtained, various people can easily understand the opinion.
[0014]
As shown in FIG. 3, the conversation expression generation device A3 according to the third embodiment of the present invention includes two or more preset speaker agents in addition to the configuration of the conversation expression generation device A1 or A2. On the other hand, further comprising speaker determination means 8 for associating one speaker agent as a reader of a single sentence text in the output conversation text and associating another speaker agent as a reader of a comment text. is there. In the conversation expression generation device A3, the keyword acquisition means 7 shown by a broken line in FIG. 3 is not an indispensable function, and can select whether or not to provide this function as an option. Also, in each of the conversation expression generation devices shown in FIGS. 4 to 7 described below, each means indicated by a broken line indicates that the device or program is an optional function.
[0015]
In such a conversation expression generation device A3, the conversation text in the form of conversation is composed of a simple sentence text and a comment text, and these are distinguished by different speaker agents as readers. In the case where the simple text and the comment text are displayed in characters on the display to be used, by simultaneously displaying the images of any of the speaker agents associated with the simple text or the comment text, two or more people can actually talk. Since it is possible to simulate the state as if the user is interacting, it is extremely easy for the user who is the listener to be familiar with, and the degree of understanding can be improved.
[0016]
A conversation expression generation device A4 according to a fourth embodiment of the present invention, which further improves the effect of the conversation expression generation device A3, includes a speaker determination unit in addition to the configuration of the conversation expression generation device A3, as shown in FIG. Further, there is provided a voice output means 9 for outputting a single sentence text or a comment text corresponding to each of the speaker agents determined in 8 with a different voice. That is, by outputting the single sentence text and the comment text with different voices as if different speaker agents were speaking, the listener's understandability was further improved.
[0017]
The conversation expression generation device A5 according to the fifth aspect of the present invention, which further improves the effect of the conversation expression generation device A4, has a voice output in addition to the configuration of the conversation expression generation device A4 as shown in FIG. Further provided is an animation processing means 10 for adding and outputting an animation operation for moving at least a mouth of an image of the speaker agent to each speaker agent in response to the voice of the simple sentence text or comment text output by the means 9. Things. That is, by operating the image of the speaker agent in addition to the voice data of the single sentence text or the comment text as if the speaker agent is actually speaking, the reality becomes more realistic than in the case of the conversation expression generation device A4. It is possible to proceed with the conversation in an appropriate manner.
[0018]
Further, the conversation expression generation device A6 according to the sixth aspect of the present invention, which has an auxiliary function for any one of the conversation expression generation devices A3, A4, and A5, includes a corresponding single sentence text or comment text together with each speaker agent. Is further provided as character data output means 11 for outputting the character data which can be displayed on the screen. With such a configuration, since the simple sentence text and the comment text output as character data can be displayed on the display in association with each speaker agent, even a user who is deaf can understand the contents. By outputting the information together with the audio data and the animation image, the general user can easily receive the information from the eyes and ears, and can easily understand the contents.
[0019]
Further, it is conceivable that the monologue text accompanies the image data. In this case, the conversation expression generation device A7 according to the seventh aspect of the present invention includes the above-described conversation expression generation devices A1, A2, A3, In addition to any one of A4, A5, and A6, an image data processing unit 12 that acquires and outputs the image data from the monolog text storage unit MTD may be further provided. That is, the output image data can be used as an aid for understanding the conversation text.
[0020]
In particular, in any one of the conversation expression generation devices A3, A4, A5, and A6, a main caster agent serving as a facilitator of the conversation expression is set as one of the speaker agents. When the main caster agent is determined as the reader of the comment text, the conversation expression can be smoothly performed. If the monologue text is composed of a text part which is an essential part of the content and a title part indicating the outline, the main caster agent is set as a reader of the title part by the speaker determining means 8. It may be configured to determine. If a comment text indicating the start of the conversational expression is stored in the comment storage unit CMT, the comment selecting means 4 selects the comment text indicating the start when there is no other single sentence text before the single sentence text. Also, the flow of the conversation can be made smoother by configuring the speaker selection means 8 to associate the comment text with the title part in association with the main caster agent. On the other hand, when one or more announcer agents are set as speaker agents different from the main caster agent, if the speaker determination means 8 determines the announcer agent as a reader of the main body, the main Role assignment with caster agents can be clarified.
[0021]
In any one of the conversation expression generation devices A1 to A7 described above, in order to make the flow of the plurality of conversation expressions based on the plurality of monolog texts smooth, the comment selecting means 4 uses Recognize the final simple sentence text and set the expression to be connected to the next monologue text in association with the end-of-sentence expression pattern in the final simple sentence text, and from the comment text stored in the comment storage to the end-of-sentence expression of the final simple sentence text Preferably, it is configured to select the corresponding comment text.
[0022]
The end-of-sentence expression can be classified into some patterns. In other words, the end-of-sentence expression pattern includes at least a phenomenon description format indicating that the phenomenon is described and a message format indicating that it is a report, and the comment text includes a comment corresponding to the question format corresponding to the phenomenon description format. At least the text and the comment text corresponding to the expected sentence format corresponding to the message format are included, and the sentence end processing unit 3 associates the sentence end expression of the simple sentence text with either the phenomenon description format or the message format, and responds to it. Then, it is preferable that the comment selecting means 4 selects one of the comment texts in the question sentence format or the expected sentence format. In addition, a plurality of comment texts in the question sentence format and the expected sentence format are set, and one of the comment texts is selected by the comment selection means 4 from the plurality of comment texts. In addition, variations can be provided so that conversation is not monotonous.
[0023]
By the way, recently, an interactive broadcasting system for a community called a public opinion channel (hereinafter, referred to as “POC”) is being developed. The POC processes opinion sentence posted by the community members to the other members on the electronic bulletin board, adds conversation generation processing to the opinion sentence, and communicates with the main caster agent who is a virtual speaker. The opinion sentence is broadcasted to community members in the form of an opinion introduction program through conversation of an announcer agent. Therefore, the conversational expression generation devices A1 to A7 according to the present invention can be used as a monologue text storage unit that stores a monologue text input to an electronic bulletin board that can be used by a plurality of users, and the comment storage unit. It has an application that can be applied to an interactive broadcasting system such as a POC that can broadcast a conversation expression text generated based on the input monolog text, and generates a conversation text based on the input monolog text; By outputting the conversation text in a broadcastable manner, it can play a very important role in opinion introduction broadcasting for communities such as POC.
[0024]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
[0025]
This embodiment is a conversation expression generation apparatus applied to the above-mentioned public opinion channel (hereinafter referred to as “POC”), as shown in FIG. In this embodiment, the conversation expression generation device A7 is used. Hereinafter, the conversation expression generation device A7 is referred to as a POC caster A7. The POC provides a client computer CC such as a personal computer, a PDA or a mobile phone used by each user U who is a community member, an electronic bulletin board that the user U can access from the client computer CC to the client computer CC, and a client computer. It is basically composed of a POC server PS and a POC caster A7 for storing the opinion sentence of the user U posted from the CC, and these client computers CC, POC server PS and POC caster A7 can perform two-way communication through the Internet IN. It is connected to the. The POC caster A7 also has a function as a broadcast client for broadcasting a conversational broadcast generated based on the opinion sentence posted by the user U to the client computer CC. The broadcast can be viewed on the screen of the client computer CC.
[0026]
First, the internal device configuration of each device will be described. The POC server PS is configured by a general-purpose server computer, and has a database server function and a Web server function. Among them, the Web server provides a home page and an electronic bulletin board that can be browsed from the client computer CC. Further, the database server has a function as a monologue text storage unit MTD for storing opinion sentences input to the electronic bulletin board. On the other hand, the POC caster A7 is constructed by a general server computer or personal computer, and as shown in FIG. 9, a CPU 101, an internal memory 102, a storage device 103 such as a hard disk, and an input device such as a keyboard and a mouse. An internal device 104 includes an output device 105 such as a display and a speaker, various communication interfaces 106, and the like. Note that the database device 107 may be further included as an internal device or may be included as an external device. Then, the program recorded in the storage device 103 is read into the internal memory 102 in accordance with the instruction of the CPU 101, and necessary data and the like are read from the database device 107 as appropriate. By performing communication, the POC caster A7 operates. When the POC caster A7 needs to input information or output a screen display or the like, the input device 104 or the output device 105 is appropriately used. Further, in this embodiment, the POC server PS and the POC caster A7 are shown as separate computers capable of bidirectional communication through the Internet IN, but they may be connected by a dedicated communication line, or may be connected to a single computer. It may be realized by a computer. Furthermore, the client computer CC is a general personal computer, PDA, mobile phone, or the like, as described above. Here, at least the function of connecting to the Internet IN, the input / output function of characters and images, the display, etc. And an audio output function such as a speaker.
[0027]
Next, the function of the POC caster A7 will be described. In the POC caster A7, the internal device and the external device cooperate according to the instruction of the CPU 101 based on the conversation expression generation program, and as shown in FIG. Means 1, preprocessing means 2, sentence end processing means 3, comment selection means 4, conversation expression generation means 5, conversation text output means 6, keyword acquisition means 7, speaker determination means 8, voice output means 9, animation processing means 10 , Character data output means 11 and image data processing means 12. By installing a program for operating each of these units in the computer, the computer functions as the POC caster A7. Note that this program is implemented by causing a computer to read a program recorded on a recording medium such as a CD-ROM, or by downloading the program to a computer via the Internet IN or the like. In the present embodiment, in particular, a TSS system (manufactured by Toshiba Corporation) is used for the speech synthesis of agents (main caster agent MA and announcer agent AA), which are virtual speakers who have a conversation of opinion introduction, and images of those agents are used. Uses a photo face character creation system (Sharp Corporation) incorporated in a POC caster A7, but other products having the same function as these can also be used.
[0028]
The database 107 has a function as a comment storage unit CMD. Here, an example of the stored comment is shown in FIG. In this example, the sentence end expression pattern of the opinion sentence posted by the user U is roughly classified into two types, one of which is further classified into three types, and a plurality of comments corresponding to each are prepared. More specifically, as shown in the left column of the figure, the end-of-sentence expression is roughly divided into a phenomenon description form that "states a phenomenon" and a "hearing form" that indicates that it is a hearsay. Furthermore, the phenomena are described in the form of "there is a form" that describes the phenomenon, the "form" that describes the phenomenon ("current" and "current"), and the "form" that describes the phenomenon. (Past / Past progress type)]. Examples of these end-of-sentence expressions include those shown in the columns in FIG. In other words, first, in the expression examples of "the form that describes the phenomenon", there is "is,""is,""is," and "was." And the like. Examples of expressions of “aspect phrasing“ phenomenon ”that describes the phenomenon (present and present progressive form)” include “has.”, “Has.”, “Has become popular.” No. Examples of expressions for the aspect word “was” (past / past progressive form) describing the phenomenon include “was.” And “was.” Examples of the expression of "hearsay format" include "It seems to be.", "It seems to be.", "It is called.", "It is said that it is." Such an expression example is based on the fact that the appearance of the sentence at the time of introducing a thing is analyzed, and as a result, it has been found that the sentence is classified into four types of sentence end expression patterns as described above. The comment text to be inserted after each sentence end expression pattern is as shown in the right column of FIG. In other words, the “form” that describes the phenomenon and the “form” that describes the phenomenon (“current” and “current”) have “detailed question text (current content)” as “what kind of Comment text such as "What is it?", "Tell me more", "What is it?", "What is it?" In addition, "Aspect words that describe the phenomenon""(past and past progressive forms)" include "How was it?" And "How was it?" ., Etc. are associated with each other. Further, the "hearsay format" is associated with comment text such as "what is it?", "What is it?", "What is it?" The expression examples and comment text examples described above are merely examples, and may include other examples. Although not shown, an appropriate identifier is given to each pattern of each comment text and to each comment text so as to be distinguished from other comment texts.
[0029]
However, in the database 107 functioning as the comment storage unit CMD, in addition to the above-described comment text, for example, comment text such as “Yes.” Comment texts and conversations, such as "What do you think?", Which are inserted between opinion sentences that are virtually read out from the output and are used to connect topics and call other users, are inserted between opinion sentences. Comment texts, such as "I'm talking about," indicating the start, are also stored singly or plurally.
[0030]
Opinions sent by the user U on his / her client computer CC and transmitted are stored in the POC server PS. Each of these opinion sentences is a monologue composed of monologous sentences described by the user U by himself. It is text. FIG. 11 shows an example of the opinion sentence. As shown in the figure, the opinion sentence OPT is composed of a title part OPH described in the “title” column and a text part OPM described in the “text” column. Statement is described. Each opinion sentence OPT is managed by a unique identifier that is distinguished from other opinion sentences OPT. In the present embodiment, the opinion sentence is converted into the XML (extensible Markup Language) format in the POC server PS, but may be of any other format. Further, a related image OPI composed of a moving image or a still image may be attached to each opinion sentence.
[0031]
It is also assumed that two types of speaker agents, a main caster agent and an announcer agent, are set in advance. In other words, it is assumed that a face character image created by the photo face character creation system and a synthetic voice created by the TSS system are set in association with each of the main caster agent and the announcer agent. Here, the announcer agent is set as a speaker who reads out the body part OPM of the original opinion sentence in the conversation text created based on the opinion sentence. On the other hand, it is assumed that the main caster agent is set to read the title OPH and comment text.
[0032]
Hereinafter, an operation example of the POC caster A7 will be described using the opinion sentence example shown in FIG. 11, the conversation text example shown in FIG. 12, the flowchart example shown in FIG. 13, the screen example shown in FIG.
[0033]
First, it is assumed that a large number of opinion sentences from the user U as shown in FIG. 11 are stored in the POC server PS as monolog texts, for example. That is, the user U posts his / her opinion using the electronic bulletin board provided by the POC server PS. Further, the POC server PS or the POC caster A7 transmits and displays a screen displaying a keyword input field to the client computer CC of the user U who wants to view the introduction of the opinion sentence. Is input from the client computer CC to the POC caster A7.
[0034]
When the POC caster A7 acquires the keyword transmitted from the client computer CC (FIG. 13; step S1), the POC caster A7 searches the POC server PS for an opinion sentence (monologue text) OPT related to the acquired keyword (step S2). ). In this search, an appropriate method such as a search of only the title part OPH, a full-text search of the title part OPH and the body part OPM, and the like can be adopted. When one or more opinion sentences OPT can be obtained from the POC server PS (step S2a; Yes), the process proceeds to the next step. Here, when there are a plurality of obtained opinion sentences OPT, they are arranged in an appropriate order such as, for example, ascending or descending order of the identifiers, or date order. On the other hand, when there is no opinion sentence OPT corresponding to the acquired keyword (step S2a; No), information to that effect is transmitted to the client computer CC (step S2b).
[0035]
Next, it is determined whether or not all the opinion sentences OPT have been introduced, and if the introduction has been completed (step S3; No), a single sentence text in which one opinion sentence OPT is divided for each period "." (Step S4). Then, the sentence end expression of each simple sentence text is analyzed (step S5), and the comment text corresponding to the sentence end expression pattern corresponding to the sentence end expression of each simple sentence text is extracted from the database 107 (step S6). A conversation text CVT is generated by inserting it after each simple sentence text (step S7). Here, as an example, if the obtained opinion sentence OPT is as shown in FIG. 11, the generated conversation text CVT is as shown in FIG. That is, the body part OPM of the opinion sentence OPT shown in FIG. 11 is divided into four simple sentence texts. First, the title OPH is allocated to the main caster agent MCA. Here, since there is no preceding single sentence text in the opinion sentence OPT before the title part OPH, the comment text indicating the start of the conversation and the description of the title part OPH are combined, and " It is a topic. " In this way, whether or not "First of all ..." can be appropriately changed according to the order of introduction of the opinion sentence. For example, comments such as "Next is ..." or "Last is ..." Text is available. Next, the first sentence of the body part OPM is assigned to the announcer agent ANA. The first sentence end expression corresponds to the "hearing style" of "... yes." Is assigned to the main caster agent MCA. Similarly, the second sentence and the third sentence are allocated to the announcer agent ANA, and comment texts corresponding to the end sentence expressions of the second sentence and the third sentence, such as "Tell me more." And "Yes." Allocated to caster agent MCA. Furthermore, the fourth sentence is also assigned to the announcer agent ANA. Since this fourth sentence is a single sentence text at the end of the opinion sentence OPT, it is an expression calling on another user U. The comment text "What about everyone," which is also connected, is allocated to the main caster agent MCA.
[0036]
Returning to the flowchart shown in FIG. 13, in the conversation text CVT generated, for the title part OPH (Step S8; Yes), the main caster agent MCA is given an utterance operation of the comment text, while the announcer is operated. A rest operation is given to the agent ANA (step S8a). If it is not the title part OPH, that is, if it is the body part OPM (step S8; No), it is determined whether or not the end-of-sentence expression is an expression that states introduction, otherwise (step S8b; No). The announcer agent ANA is given a reading operation of a single sentence text, while the main caster agent MCA is given a resting operation (step S8c). On the other hand, if the sentence end expression is an expression that states an introduction (step S8; Yes), the main caster agent MCA is given a comment text reading operation, while the announcer agent ANA is given a resting operation (step S8a). When an action is given to each of the main caster agent MCA and the announcer agent ANA, an animation corresponding to the action is generated and a voice is synthesized. Here, the animation operation includes an operation in which at least the main caster agent MCA and the announcer agent ANA can move their mouths. However, if the attached image is attached to the opinion sentence OPT as described later, any of the agents There are variations such as making a finger pointing motion or a nodding motion. Further, as described above, if the opinion image OPT includes the attached image OPI (Step S10; Yes), the attached image is transmitted, for example, by being arranged at the center of the transmitted image to be displayed on the client computer CC. The attached image is combined with the image. Thereafter, or when there is no attached image (Step S10; No), the character data of the conversation text CVT is combined in, for example, the lower column of the transmission image (Step S11), and all data is transmitted in a format that can be viewed on the client computer CC. (Step S12). As a result of the transmission, the image displayed on the display of the client computer CC is, for example, as shown in FIGS. 14 to 22. From the speaker of the client computer CC, the respective voices of the main caster agent MCA and the announcer agent ANA are output. Is output.
[0037]
The screen displayed on the display of the client computer CC and the sound output from the speaker will be described. In FIGS. 14 to 22, the attached image is displayed at the center of the screen, and the image of the main caster agent MCA is displayed on the left side. The right side shows a state where an image of the announcer agent ANA is arranged. Then, in accordance with the conversation text CVT shown in FIG. 12, the operation of each agent and the voice output for reading out the comment or opinion sentence, and the display of the character data accompanying this voice output (the lower part of the screen) are performed. First, as shown in FIG. 14, the main caster agent MCA performs an operation of reading out a comment "First step is ..." in which the title part OPH of the opinion sentence and the comment text indicating the start of the conversation are synthesized, and the voice is reproduced. Output and display the character data of this comment in the lower column of the screen. Next, as shown in FIG. 15, the announcer agent ANA performs an operation of reading out the first sentence “Ise-Shima no...” Of the body part OPM, outputs its voice, and displays the character data of this comment in the lower column of the screen. Is displayed. At this time, the announcer agent ANA is caused to perform an animation operation indicating the attached image OPI. Next, as shown in FIG. 16, the speaker takes the place of the main caster agent MCA, performs an operation of reading out the comment "What is it?", Outputs the voice, and outputs the voice of the comment in the lower column of the screen. Display character data. Further, as shown in FIG. 17, the speaker takes the place of the announcer agent ANA, performs the operation of reading out the second sentence "Large facility sightseeing ..." of the body part OPM, outputs the voice, and outputs the voice at the bottom of the screen. Column displays the character data of this comment. Subsequently, as shown in FIG. 18, the main caster agent MCA reads out the comment "Tell me more.", Outputs its voice, and displays the character data of the comment in the lower column of the screen. Next, as shown in FIG. 19, the speaker changes to the announcer agent ANA, performs the operation of reading out the third sentence “Beautiful scenery and so on” of the body part OPM, outputs the voice, and outputs the voice to the lower part of the screen. Displays the character data of this comment. On the other hand, as shown in FIG. 20, the main caster agent MCA performs an operation of reading out the comment "Yes.", Outputs its voice, and displays the character data of this comment in the lower column of the screen. Then, as shown in FIG. 21, the announcer agent ANA performs an operation of reading out the final sentence “Would you like to join together?” In the body part OPM, and outputs the voice thereof, and the text of this comment is displayed in the lower column of the screen. Display data. Finally, as shown in FIG. 22, the main caster agent MCA performs an operation of reading out the comment "What are you doing?", Outputs the voice, and displays the character data of the comment in the lower column of the screen. Leads to the introduction of the next opinion statement OPT.
[0038]
That is, when each of the above steps is completed, the process returns to step S3 to process the next opinion sentence OPT. Then, when the introduction of all the opinion sentences OPT is completed (step S3; Yes), the process for the initially acquired rank keyword is completed. Steps S9 to S11 do not necessarily have to be performed in this order, and the order may be appropriately changed.
[0039]
As described above, in the opinion sentence posted by the user U, the intention of the target opinion sentence is guessed from the sentence end expression which is a surface clue of the opinion sentence, and the comment text is inserted and the addition synthesis process is performed. Thus, the other user U can view the opinion sentence in a conversation format. Therefore, even if the original information presentation is an opinion sentence which is a monologous sentence, the information can be provided to the user U who views the information in a manner that is more familiar and reduces the burden on understanding.
[0040]
Note that the present invention is not limited to the above embodiment. For example, it is possible to increase or decrease the pattern of the sentence end expression of the divided single sentence text, and to appropriately increase or decrease the number of expression examples corresponding to each pattern and the corresponding comment text. In addition, since the text posted on the electronic bulletin board is not a formal text, the title part and the text part may be connected as a single text in the posted opinion sentence. It is also possible to perform processing such as normalizing a sentence starting with a case particle such as "", "", "" or "" or a symbol such as "...". In addition, the specific configuration of each unit is not limited to the above embodiment, and various modifications can be made without departing from the spirit of the present invention. Further, the generated conversation text is not limited to two, but may be a conversation sentence of three or more. Further, the present invention can be applied to fields or systems other than POC.
[0041]
【The invention's effect】
According to the present invention, as described in detail above, through a process of dividing a monologue text, which is a sentence written by one person, into a single sentence, into a conversation format that can reduce the burden on the understanding required by the listener or reader. It can be converted. That is, by patterning the sentence end expression, which is a surface clue of the monologue text, by guessing the intended purpose of the monologue text and inserting an appropriate comment at the end of the single sentence, It is possible to generate a conversational expression without any restrictions. Therefore, by applying the present invention, it will be very useful for the management of opinion introduction programs in communities where unspecified topics such as POC appear, and for research on conversation.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a functional configuration of a conversation expression generation device according to claim 1 of the present invention.
FIG. 2 is a block diagram showing a functional configuration of a conversation expression generation device according to a second embodiment of the present invention.
FIG. 3 is a block diagram showing a functional configuration of a conversation expression generation device according to claim 4 of the present invention.
FIG. 4 is a block diagram showing a functional configuration of a conversation expression generation device according to claim 5 of the present invention.
FIG. 5 is a block diagram showing a functional configuration of a conversation expression generation device according to claim 6 of the present invention.
FIG. 6 is a block diagram showing a functional configuration of a conversation expression generation device according to claim 7 of the present invention.
FIG. 7 is a block diagram showing a functional configuration of a conversation expression generation device according to claim 8 of the present invention.
FIG. 8 is a schematic view showing a POC system to which an embodiment of the present invention is applied.
FIG. 9 is a schematic internal device configuration diagram of the POC caster of the embodiment.
FIG. 10 is an exemplary view showing an example of internal data of a comment storage unit applied to the embodiment.
FIG. 11 is an exemplary view showing an example of an opinion sentence applied to the embodiment;
FIG. 12 is an exemplary view showing an example of a conversation text generated in the embodiment.
FIG. 13 is a flowchart schematically showing the operation of the embodiment.
FIG. 14 is an exemplary view showing an example of a screen displayed on a client computer in the embodiment.
FIG. 15 is an exemplary view showing an example of a screen displayed on the client computer in the embodiment.
FIG. 16 is an exemplary view showing an example of a screen displayed on the client computer in the embodiment.
FIG. 17 is an exemplary view showing an example of a screen displayed on a client computer in the embodiment.
FIG. 18 is an exemplary view showing an example of a screen displayed on a client computer in the embodiment.
FIG. 19 is an exemplary view showing an example of a screen displayed on the client computer in the embodiment.
FIG. 20 is an exemplary view showing an example of a screen displayed on a client computer in the embodiment.
FIG. 21 is an exemplary view showing an example of a screen displayed on the client computer in the embodiment.
FIG. 22 is an exemplary view showing an example of a screen displayed on the client computer in the embodiment.
[Explanation of symbols]
1. Monologue text acquisition means
2 ... Pretreatment means
3 ... sentence end processing means
4: Comment selection means
5. Conversational expression generation means
6… Conversation text output means
7 ... Keyword acquisition means
8 ... Speaker determination means
9 ... Audio output means
10. Animation processing means
11 ... Character data output means
12 ... Image data processing means
A1, A2, A3, A4, A5, A6, A7 ... Conversational expression generation device
CMD: Comment storage
MTD: Monologue text storage

Claims

It is for generating a conversational expression based on a monologue text composed of monologue-like sentences. The monologue text acquisition means acquires from a monologue text storage unit that stores the monologue text, and the acquired monologue text is divided into a single sentence format. Pre-processing means for generating the above simple sentence text; sentence end processing means for analyzing the sentence end expression of the generated simple sentence text and associating the sentence end expression with one of a plurality of preset sentence end expression patterns; One comment text corresponding to a single sentence text associated with any one of the end-of-sentence expression patterns is stored in a comment storage unit that stores a plurality of comment texts set as expressions corresponding to the expression patterns. A comment selecting means to be selected, and a comment selected after the simple text A conversation expression generating means for generating a conversation text by inserting the instrument text consisting sentence text and comment text conversation representation generator apparatus characterized by comprising; and a conversation text output means for outputting the generated conversation text.

The apparatus further includes keyword acquisition means for acquiring a keyword input by the user, wherein the monologue text acquisition means acquires one or more monologue texts corresponding to the keyword from the monologue text storage unit, and the preprocessing means The conversation expression generation device according to claim 1, wherein a single sentence text is generated for each of the obtained monologue texts.

The monolog text storage unit stores opinion texts posted on the electronic bulletin board by user input as monolog texts, and the monolog text acquisition means acquires the monolog texts as the opinion texts. 3. The conversation expression generation device according to 1 or 2.

A process of associating one speaker agent as a reader of a single sentence text in an output conversation text with another speaker agent as a reader of a comment text, for each of two or more preset speaker agents. 4. The conversation expression generation device according to claim 1, further comprising speaker determination means for performing the following.

5. The conversation expression generation device according to claim 4, further comprising: voice output means for outputting a single sentence text or comment text corresponding to each of the speaker agents determined by the speaker determination means with different voices.

An animation processing means for adding and outputting an animation operation for moving at least a mouth out of an image of the speaker agent to each speaker agent in accordance with the voice of the single sentence text or comment text output by the voice output means. The conversation expression generation device according to claim 5, wherein

7. The conversation expression generation device according to claim 4, further comprising character data output means for outputting, with each speaker agent, the corresponding simple sentence text or comment text as character data that can be displayed on a screen.

8. The image processing apparatus according to claim 1, further comprising image data processing means for acquiring and outputting the image data from the monologue text storage unit when the monologue text accompanies the image data. Conversation expression generator.

A main caster agent, which is a facilitator of conversational expression, is set as one of the speaker agents, and the speaker determining means determines the main caster agent as a reader of the comment text. The conversation expression generation device according to 6, 7, or 8.

The monologue text is composed of a text part which is an essential part of the content and a title part indicating the outline, and the speaker determining means determines the main caster agent as a reader of the title part. Item 9. The conversation expression generation device according to Item 9.

A comment text indicating the start of the conversational expression is stored in the comment storage unit, and the comment selecting means selects the comment text indicating the start when there is no other single sentence text before the single sentence text. 11. The conversation expression generation device according to claim 10, wherein the selecting unit associates the comment text with a title part in association with the main caster agent.

The monologue text is composed of a text part which is an essential part of the content and a title part indicating the outline, and one or more announcer agents are set as speaker agents different from the main caster agent, 12. The conversation expression generation device according to claim 9, wherein the speaker determination means determines the announcer agent as a reader of the body part.

The comment selecting means recognizes the final single sentence text in one monologue text, and sets the comment text stored in the comment storage unit as an expression connected to the next monologue text in association with the end-of-sentence expression pattern in the final single sentence text The conversation expression generation according to claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12, wherein a comment text corresponding to the last sentence expression of the final simple sentence text is selected from the following. apparatus.

The sentence end expression pattern includes at least a phenomenon description format indicating that the phenomenon is stated and a hearing format indicating that it is a report, and the comment text includes a comment text corresponding to a question format corresponding to the phenomenon description format. At least a comment text corresponding to the expected sentence format corresponding to the message format is included, and the sentence end processing means associates the sentence end expression of the simple sentence text with either the phenomenon description format or the message format, and the comment selecting device correspondingly. The comment text of one of a question text format and a predicted text format is selected in claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13. Conversation expression generator.

15. The conversation expression according to claim 14, wherein a plurality of comment texts in a question text format and a predicted text format are respectively set, and the comment selection means selects any one of the comment texts. Generator.

The monologue text storage unit stores a monologue text input to an electronic bulletin board that can be used by a plurality of users, and the comment storage unit, and is generated based on the input monologue text. 3. An interactive broadcast system capable of broadcasting a conversation expression text, generating a conversation text based on an input monologue text, and outputting the conversation text in a broadcastable manner. A conversational expression generation device according to any one of claims 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.

A program for causing a computer to function as a conversation expression generation device that generates a conversation expression based on a monolog text composed of monolog texts by operating the computer, the computer comprising: a monolog that stores a monolog text A monologue text acquisition unit that acquires from the text storage unit, a preprocessing unit that divides the acquired monologue text into a single sentence format and generates one or more single sentence texts, analyzes a sentence end expression of the generated simple sentence text, and analyzes the sentence end expression A sentence end processing unit for associating with any one of a plurality of preset sentence end expression patterns; and a comment storage unit for storing a plurality of comment texts set as expressions corresponding to the end of sentence expression patterns in response to the end of sentence expression patterns. Any one of the sentence end expression patterns Comment selecting means for selecting one comment text corresponding to the associated simple text, and conversation expression generating means for inserting the selected comment text after the simple text to generate a conversation text composed of the simple text and the comment text A conversation expression output program for outputting the generated conversation text.

The computer further functions as a keyword acquiring unit for acquiring a keyword input by a user.In the monologue text acquiring unit, one or more monologue texts corresponding to the keyword are acquired from a monologue text storage unit. 18. The conversation expression generation program according to claim 17, wherein a single sentence text is generated for each of the acquired monologue texts.

A monologue text storage unit for storing opinion texts posted on the electronic bulletin board by user input as monologue texts, wherein the monologue text obtaining means obtains the monologue texts as the opinion texts. Item 19. The conversation expression generation program according to Item 17 or 18.

The computer associates one speaker agent as a reader of a single sentence text in the output conversation text with each of two or more preset speaker agents, and another speaker as a reader of a comment text. 20. The conversation expression generation program according to claim 17, further comprising a speaker determination unit that performs a process of associating an agent.

21. The conversation expression generation program according to claim 20, further causing the computer to function as voice output means for outputting a single sentence text or comment text corresponding to each speaker agent determined by the speaker determination means with a different voice.

An animation processing means for adding, to the speaker agent, an animation operation for moving at least a mouth out of an image of the speaker agent to each speaker agent in accordance with the voice of a simple sentence text or a comment text output by a voice output means, and outputting the computer; 22. The conversation expression generation program according to claim 21, further comprising:

23. The conversation expression generation program according to claim 19, wherein the computer further causes the computer to function together with each speaker agent as character data output means for outputting a corresponding single sentence text or comment text as character data that can be displayed on a screen.

21. When the monologue text is accompanied by image data, the computer further functions as image data processing means for acquiring and outputting the image data from the monologue text storage unit. 23. A conversation expression generation program according to claim 23.

22. A main caster agent serving as a facilitator of conversational expression is set as one of the speaker agents, and the speaker determining means determines the main caster agent as a reader of the comment text. , 22, 23 or 24.

The monologue text is composed of a text part which is an essential part of the content and a title part indicating the outline, and the speaker determination means determines the main caster agent as a reader of the title part. A conversation expression generation program according to claim 25.

A comment text indicating the start of the conversational expression is stored in the comment storage unit, and when there is no other single sentence text before the single sentence text in the comment selection means, the comment text indicating the start is selected, and the speaker selection is performed. 27. The conversation expression generation program according to claim 26, wherein the means associates the comment text with a title part in association with the main caster agent.

The monologue text is composed of a text part which is an essential part of the content and a title part indicating the outline, and one or more announcer agents are set as speaker agents different from the main caster agent, 28. The conversation expression generation program according to claim 25, wherein the speaker determination means determines the announcer agent as a reader of the body part.

The comment selecting means recognizes the final single sentence text in one monologue text, and sets the comment text stored in the comment storage unit as an expression connected to the next monologue text in association with the end-of-sentence expression pattern in the final single sentence text 29. The conversation expression according to claim 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 or 28, wherein a comment text corresponding to the last sentence expression of the final simple sentence text is selected from the following. Generation program.

The sentence end expression pattern includes at least a phenomenon description format indicating that the phenomenon is stated and a hearing format indicating that it is a report, and the comment text includes a comment text corresponding to a question format corresponding to the phenomenon description format. At least a comment text corresponding to the expected sentence format corresponding to the message format is included, and the sentence end expression of the simple sentence text is associated with either the phenomenon description format or the message format in the sentence end processing means, and the comment selection means The comment text of one of a question text format and a predicted text format is selected. 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. Conversation expression generator.

31. The conversation according to claim 30, wherein a plurality of comment texts in a question sentence format and a predicted sentence format are respectively set, and one of the comment texts is selected by the comment selection means. Expression generator.

The monologue text storage unit stores a monologue text input to an electronic bulletin board that can be used by a plurality of users, and the comment storage unit, and is generated based on the input monologue text. 21. An interactive broadcast system capable of broadcasting conversational expression texts, generating conversational texts based on input monolog texts, and outputting the conversational texts in a broadcastable manner. , 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or 31.