JP2010244498A

JP2010244498A - Automatic answer sentence generation system

Info

Publication number: JP2010244498A
Application number: JP2009111558A
Authority: JP
Inventors: Yutaka Inada; 裕稲田; Hideo Nakano; 英雄中野; Shinkaku Kashiji; 真確樫地
Original assignee: GENGO RIKAI KENKYUSHO KK
Current assignee: GENGO RIKAI KENKYUSHO KK
Priority date: 2009-04-07
Filing date: 2009-04-07
Publication date: 2010-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem in conventional automatic answer sentence generation in which since it is uncertain for what kind of items a user has a dialogue on a posting board, blog or chat on the Internet, and the topic frequently changes in accordance with the dialogue, it is impossible to prepare an answer sentence or knowledge for an item to be answered, and an answer sentence having no matching property can be generated. <P>SOLUTION: A key word is extracted from language information transmitted by a person, information on Webs in the Internet is collected for this keyword to complement the keyword, and such a keyword is verified with a field association word dictionary 5, thereby determining types of topics of the language information transmitted by the person. Thereafter, sentence generation is performed based on the keyword extracted from the language information transmitted by the person, the field information thereof, and a text template that is a model of answer sentence, thereby generating an answer sentence appropriate to a topic, even when it is an unspecified topic. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は，インターネット上の掲示板やブログ，チャットなど，不特定の話題に対する会話文に対して，その話題を理解し，話題に対して適切な応答文を生成するシステムに関する． The present invention relates to a system that understands a topic for a conversation sentence on an unspecified topic such as a bulletin board, a blog, or a chat on the Internet and generates an appropriate response sentence for the topic.

従来の自動応答文生成としては，特定の装置やインターネット上の画面において，特定のシチュエーションに対して，次にどのような操作を行うべきかを案内する操作方法案内ガイドなどが存在する．また，同様の特定の装置やインターネット上の画面において，質問文に対して，あらかじめ準備されたマニュアルやＱ＆Ａ集などで類似する情報を検索し，その検索結果を応答文として表示する手段などが存在する．
また，自動応答文生成に関する学術研究は様々な取り組みが行われており，数多くの研究者がこの課題に取り組んでいる．その研究成果の報告として［非特許文献１］や［非特許文献２］，［非特許文献３］に示す文献が報告されている．
清田陽司，黒橋禎夫，木戸冬子：“大規模テキスト知識ベースに基づく自動質問応答−ダイアログナビ−”，自然言語処理学会論文誌，Ｖｏｌ．１０，Ｎｏ．４，ｐｐ．１４５−１７５，２００３年．松本匡史，白井清昭：“質問の曖昧性を検出し複数の解答を提示する質問応答システム”，言語処理学会第１２回年次大会発表論文集，ｐｐ．９３５−９３８，２００６年．坂本篤史：“対話型質問応答システムにおける問い返し文の生成に関する研究”，Ｍａｓｔｅｒ’ｓｔｈｅｓｉｓ，北陸先端科学技術大学院大学，２００７年．中川裕志，森辰則，湯本紘彰：“出現頻度と連接頻度に基づく専門語抽出”，自然言語処理学会論文誌，Ｖｏｌ．１０，Ｎｏ．１，ｐｐ．２７−４５，２００３年梅村恭司：“未踏テキスト情報中のキーワード抽出システム開発”，未踏ソフトウェア創造事業，２０００年 As conventional automatic response sentence generation, there is an operation method guidance guide that guides what operation should be performed for a specific situation on a specific device or the screen on the Internet. In addition, there is a means to search similar information in a manual or Q & A collection prepared in advance for a question sentence on the same specific device or Internet screen, and display the search result as a response sentence. Do it.
In addition, there are various efforts in academic research on automatic response sentence generation, and many researchers are working on this issue. As a report of the research results, documents shown in [Non-patent document 1], [Non-patent document 2] and [Non-patent document 3] have been reported.
Yoji Kiyota, Ikuo Kurohashi, Toko Kido: “Automatic Question Answering Based on Large-Scale Text Knowledge Base-Dialog Navi”, Journal of Natural Language Processing, Vol. 10, no. 4, pp. 145-175, 2003. Atsushi Matsumoto, Kiyoaki Shirai: “Question answering system that detects multiple ambiguities and presents multiple answers”, Proc. 935-938, 2006. Atsushi Sakamoto: “Study on the generation of question answering in interactive question answering system”, Master'ssis, Japan Advanced Institute of Science and Technology, 2007. Hiroshi Nakagawa, Akinori Mori, Akira Yumoto: “Extraction of technical terms based on appearance frequency and connection frequency”, Journal of Natural Language Processing, Vol. 10, no. 1, pp. 27-45, 2003 Junji Umemura: “Development of keyword extraction system for unexplored text information”, Unexplored Software Creation Project, 2000

従来の自動応答文生成としては，前記［０００２］に示すように，特定の装置やインターネット上の画面において，特定のシチュエーションを対象とする応答文を生成する技術は存在するが，不特定のシチュエーションを対象とする応答文の生成システムは存在していない． As conventional automatic response sentence generation, as shown in the above [0002], there is a technique for generating a response sentence for a specific situation on a specific device or a screen on the Internet, but an unspecified situation. There is no response sentence generation system for.

これは，特定の装置やインターネット上の画面における特定のシチュエーションを対象とする場合，そのシチュエーションを限定できるため，利用者がそのシチュエーションにおいて次に何をするのか，そのシチュエーションにおいて何を考えるのか，などが限定できるため，あらかじめ応答すべき事項を限定することが可能である．そのため，あらかじめ応答文もしくは応答すべき事項の知識を準備することができるが，インターネット上の掲示板やブログ，チャットなどでは，利用者がどのような事項に対する対話を行うか不明確であるとともに，対話が進むにつれ話題が変化することもしばしばあるため，あらかじめ応答文もしくは応答すべき事項の知識を準備することは不可能である． This is because if you are targeting a specific situation on a specific device or screen on the Internet, you can limit the situation, so what the user will do next in that situation, what to think about in that situation, etc. It is possible to limit the items that should be answered in advance. For this reason, it is possible to prepare in advance the knowledge of the response text or the matters to be answered, but it is not clear what the user will interact with on the Internet bulletin board, blog, chat, etc. Since the topic often changes as the progress of, it is impossible to prepare the knowledge of the response sentence or the item to be responded in advance.

また，現存する国語辞典や類義語辞典や単語間の意味的な階層構造を表したシソーラス辞典などの知識と，文書中から名詞や固有名詞など話題を特定するために必要となる単語をキーワードとして抽出する文中からのキーワード抽出技術は既に存在するが，これらを用いキーワード抽出技術で抽出した話題を特定するために必要となるキーワードを，シソーラス辞典などと照合し，話題の分野を特定する手法がある．しかし，インターネット上の掲示板やブログ，チャットなどでは，複数単語を１つにまとめた略語，英文表記の頭文字をつなぎ合わせた略語，さらには俗称などが頻繁に使用されて文書を記述する場合が多く，文書中からキーワード抽出技術にて切り出した単語と，現存する国語辞典や類義語辞典や単語間の意味的な階層構造を表したシソーラス辞典などと，照合できない場合が多々あり，分野特定が出来ない場合があり，特定した分野の正解率が低くなる．それ故，特定した分野に対応した応答文を生成しても，まったく整合性のない応答文となるという問題が生じる． In addition, knowledge such as existing national dictionaries, synonym dictionaries, a thesaurus that expresses the semantic hierarchical structure between words, and words needed to identify topics such as nouns and proper nouns from documents are extracted as keywords. There is already a keyword extraction technique from the sentence to be used, but there is a technique to identify the topic area by comparing the keywords required to identify the topic extracted by the keyword extraction technique with a thesaurus such as thesaurus. . However, on bulletin boards, blogs, and chats on the Internet, abbreviations that combine multiple words into one, abbreviations that combine English initials, and common names are often used to describe documents. In many cases, it is not possible to collate words extracted from documents with keyword extraction technology with existing Japanese dictionaries, synonym dictionaries, or thesaurus that expresses the semantic hierarchical structure between words. In some cases, the accuracy rate in the identified field is low. Therefore, there is a problem that even if a response sentence corresponding to the specified field is generated, the response sentence is completely inconsistent.

また，分野を特定する単語の中には，同音異義語が多く含まれるため，キーワード抽出技術にて分野を特定するに足りる単語を正確に抽出でき，かつ，抽出した単語と現存する国語辞典や類義語辞典や単語間の意味的な階層構造を表したシソーラス辞典などとの照合も正しく実行されたとしても，特定した分野が間違ってしまい，特定した分野に対応した応答文を生成しても，まったく整合性のない応答文となるという問題が生じる． In addition, because there are many homonyms in the words that identify the field, it is possible to accurately extract words that are sufficient to identify the field using the keyword extraction technology, and the extracted words and existing Japanese dictionary or Even if collation with a thesaurus or a thesaurus that expresses a semantic hierarchical structure between words is executed correctly, the identified field is wrong, and even if a response sentence corresponding to the identified field is generated, The problem arises that the response sentence is completely inconsistent.

かかる課題を解決するため本発明では，インターネットにおけるＷｅｂ上の掲示板やブログ，チャットなど，不特定の話題に対する会話文が入力されるシステムや様々な機器の操作方法や利用方法の案内システムや各種展示会場や企業内における案内システムなど，人間が書き込むこともしくは発話することで発信した言語情報に対して機械もしくはシステムが応答する環境において，人間が発信した言語情報を取り込む言語情報入力手段と，言語情報から分野を確定するための情報となるキーワードを抽出するキーワード抽出手段と，人間が発信した言語情報に対してインターネットにおけるＷｅｂ上の情報を収集しキーワードを補完するＷｅｂキーワード補完手段と，キーワードと分野名の対応情報を格納した分野連想語辞書とを用い，人間が発信した言語情報に対してキーワード抽出手段で抽出したキーワードとＷｅｂキーワード補完手段で補完したキーワードと分野連想語辞書を照合することで各キーワードの属する分野情報を取得する分野情報取得手段と，分野間の階層構造を格納した分野階層構造辞書と，人間が発信した言語情報に対してキーワード抽出手段で抽出したキーワードとＷｅｂキーワード補完手段で補完したキーワードの属する分野情報と分野階層構造辞書とを照合することで分野を推論し出力する分野推論手段と，応答文の雛形となる文テンプレートを格納した応答文辞書と人間が発信した言語情報に対してキーワード抽出手段で抽出したキーワードと，分野推論手段で推論した話題を，応答文辞書に格納された応答文の雛形となる文テンプレートを接続する応答文構築手段を用いることで，人間が発信した言語情報に対して，不特定の話題であっても話題に対して適切な応答文を生成することを特徴とする自動応答文生成システムを採用する． In order to solve such a problem, in the present invention, a system for inputting a conversation sentence on an unspecified topic such as a bulletin board, a blog, a chat on the Internet on the Internet, a guidance system for various device operation and usage methods, and various exhibitions Language information input means for capturing language information sent by humans in an environment where the machine or system responds to language information sent by human writing or speaking, such as a guidance system in a venue or company, and language information Keyword extracting means for extracting a keyword as information for determining a field from the keyword, Web keyword complementing means for collecting information on the Internet on the Internet for language information transmitted by humans and complementing the keyword, and keyword and field Using a field-associative dictionary that stores name correspondence information, Field information acquisition means for acquiring the field information to which each keyword belongs by collating the keyword extracted by the keyword extraction means with the keyword supplemented by the Web keyword complementing means and the field associative word dictionary with respect to the language information transmitted between them; A field hierarchical structure dictionary storing a hierarchical structure between fields, a field extracted by a keyword extracting means for language information transmitted by a person, field information to which a keyword complemented by a web keyword complementing means belongs, and a field hierarchical structure dictionary Field inference means that infers and outputs fields by collation, response sentence dictionary that stores sentence templates that serve as response sentence templates, keywords extracted by language extraction means for language information sent by humans, and field inference The topic inferred by means is connected to a sentence template that serves as a template for the reply sentence stored in the reply sentence dictionary. An automatic response sentence generation system that generates an appropriate response sentence for a topic even if it is an unspecified topic with respect to linguistic information transmitted by a human adopt.

更には，複数単語を１つにまとめた略語，英文表記の頭文字をつなぎ合わせた略語，さらには俗称など国語辞典などの現存する辞典に登録されていないが一般的に用いられる単語に対応すると共に，関連する情報となる単語数を増加させるための手段であって，人間が発信した言語情報に対してキーワード抽出手段で抽出したキーワードを用いてインターネット上のＷｅｂページを検索するＷｅｂ検索手段と，Ｗｅｂ検索手段にて検索した結果の言語情報に対しキーワード抽出手段にてキーワードを抽出し抽出したキーワードを集約するＷｅｂキーワード集約手段を用いることで，人間が発信した言語情報に対してキーワード情報を補完することを特徴とするＷｅｂキーワード補完手段を採用する． Furthermore, it corresponds to abbreviations that combine multiple words into one, abbreviations that combine English initials, and words that are not registered in existing dictionaries such as national dictionaries but are commonly used. And means for increasing the number of words serving as related information, and Web search means for searching a Web page on the Internet using a keyword extracted by a keyword extraction means for language information transmitted by a human. , Keyword information is extracted from human language information by using Web keyword aggregating means for extracting keywords by keyword extracting means and aggregating the extracted keywords with respect to the linguistic information as a result of searching by Web searching means. Adopt Web keyword complementing means characterized by complementing.

更には，キーワードとキーワードの属する分野情報の集合からこの集合の属する分野を推論するため，推論する分野を分野階層構造辞書の下位階層へと限定することで出来る限り分野の絞り込みを行うと共に，異なる分野で使用される同音異義語の曖昧性を消去するための手段であって，キーワードの属する分野情報の集合に対して同一の分野情報数を数える分野頻度抽出手段と，集約された分野に対して分野階層構造辞書中の分野階層で親子関係があった場合に親子関係を抽出する分野親子関係抽出手段と，集約された分野に対して分野階層構造辞書中の分野階層で兄弟関係があった場合に兄弟関係を抽出する分野兄弟関係抽出手段を用いることで，分野頻度抽出手段から出力された情報に対して分野頻度抽出手段と分野兄弟関係抽出手段の情報を用いて各分野の重み付けを行うと共に統計処理を行う分野情報分析手段を用いることにより，入力されたキーワードとキーワードの属する分野情報の集合がどのような分野の話題であるかを推論することを特徴とする分野推論手段を採用する． Furthermore, since the field to which this set belongs is inferred from the set of keywords and the field information to which the keyword belongs, the fields to be inferred are limited to the lower layers of the field hierarchy dictionary, and the fields are narrowed down as much as possible. A means for eliminating the ambiguity of homonyms used in a field, including a field frequency extraction means for counting the same number of field information for a set of field information to which a keyword belongs, and an aggregated field If there is a parent-child relationship in the field hierarchy in the field hierarchy dictionary, there is a field parent-child relationship extraction means for extracting the parent-child relationship, and there is a sibling relationship in the field hierarchy in the field hierarchy dictionary for the aggregated field The field frequency extraction means and the field sibling relation extraction means for the information output from the field frequency extraction means by using the field sibling relation extraction means for extracting sibling relations Inferring the topic of the input keyword and the set of field information to which the keyword belongs by using the field information analysis means that weights each field using information and performs statistical processing Adopt a field reasoning feature characterized by.

最後に本発明では，人間が発信した言語情報に対して話題に対して適切な応答文を生成する手段であって，人間が発信した言語情報に対してキーワード抽出手段で抽出したキーワードとその数，および分野推論手段で推論した話題とその数，分野推論手段で推論する際に用いた分野親子関係や分野兄弟関係の知識およびその数をパラメータとして，応答文辞書に格納された応答文の雛形となる文テンプレートを組み合わせることや変形することで，より利用者に違和感を与えない応答文を生成することを特徴とする応答文構築手段を採用する． Finally, in the present invention, a means for generating an appropriate response sentence to a topic with respect to linguistic information transmitted by a person, the keywords extracted by the keyword extracting means with respect to the linguistic information transmitted by the person, and the number thereof , And the number of topics inferred by the field reasoning means, the knowledge of the field parent-child relationship and the field sibling relation used in the reasoning by the field reasoning means, and the number of parameters as parameters. We adopt response sentence construction means characterized by generating a response sentence that does not make the user feel more uncomfortable by combining or transforming sentence templates.

以上，本発明は，インターネット上の掲示板やブログ，チャットなど，不特定の話題に対する会話文に対して，インターネット上のＷｅｂ検索を行うことにより複数単語を１つにまとめた略語，英文表記の頭文字をつなぎ合わせた略語，さらには俗称や新語などに対して，分野連想語辞書や分野階層構造辞書のメンテナンスを行わなくても柔軟に対応してその話題を理解し，話題に対して適切な応答文を生成することが可能である． As described above, the present invention is the head of abbreviations and English notations that combine a plurality of words into one by performing a Web search on the Internet for conversation sentences on unspecified topics such as bulletin boards, blogs, and chats on the Internet. For the abbreviations that connect the characters, as well as the popular names and new words, the topic can be flexibly handled without maintenance of the field association word dictionary or the field hierarchy dictionary, and the topic is appropriate. It is possible to generate a response sentence.

更には，人間が発信した言語情報から抽出した分野の親子関係や兄弟関係を抽出することで，取得する話題の精度を向上させることが可能となると共に，分野の親子関係や兄弟関係を考慮した応答文を生成することが可能となる． Furthermore, by extracting parent-child relationships and sibling relationships in fields extracted from language information sent by humans, it is possible to improve the accuracy of topics acquired, and consider parent-child relationships and sibling relationships in fields. A response sentence can be generated.

これによって，利用者にとっては自らが発信した言語情報を正確に理解されているという安心感が生じ，更なる対話を進めるきっかけとなる． This gives the user a sense of security that the language information he / she has sent has been accurately understood, leading to further dialogue.

本発明の自動応答文生成システムは，人間が書き込むこともしくは発話することで発信した言語情報を入力とし，その言語情報からキーワードを抽出し，抽出したキーワードによりインターネット上のＷｅｂページを検索し，検索結果からもキーワードを抽出する．そして，これらのキーワードを分野名と当該分野に属する単語を格納した分野分類辞典と照合し，キーワードに分野情報を付与し，これらのキーワードと付与された分野情報を元に人間が書き込むこともしくは発話することで発信した言語情報の話題分野を特定する．特定された話題の分野を元に，応答文辞書より適切と思われる応答文の文テンプレートを選択し，文テンプレート中に該キーワードと分野情報を埋め込むことで，利用者が発信した言語情報に対して適切な応答文を生成するシステムである． The automatic response sentence generation system of the present invention receives language information transmitted by human writing or speaking as an input, extracts keywords from the language information, searches Web pages on the Internet using the extracted keywords, and performs a search. Keywords are also extracted from the results. Then, these keywords are collated with a field classification dictionary storing field names and words belonging to the field, field information is given to the keywords, and humans write or utterances based on these keywords and the field information provided. By doing so, the topic area of the language information transmitted is specified. Based on the identified topic area, the sentence template of the response sentence that seems to be appropriate is selected from the response sentence dictionary, and the keyword and field information are embedded in the sentence template. This is a system that generates an appropriate response sentence.

以下，本発明をその実施例を示す図に基づいて説明する．［図１］は，本発明の自動応答文生成システムに対する処理フローの概要を示した図である． The present invention will be described below with reference to the drawings showing the embodiments. [Fig. 1] is a diagram showing an overview of the processing flow for the automatic response sentence generation system of the present invention.

まず，インターネット上の掲示板やブログ，チャットなど，不特定の話題に対する会話文など，人間が書き込むこともしくは発話することで発信した言語情報は言語情報入力手段１によってコンピュータで処理可能なテキスト情報に変換される． First, language information sent by human writing or speaking, such as conversation sentences on unspecified topics such as bulletin boards, blogs and chats on the Internet, is converted into text information that can be processed by a computer using the language information input means 1 It is done.

言語情報入力手段１によってコンピュータで処理可能なテキスト情報は，キーワード抽出手段２によりテキスト情報中でそのテキスト情報の話題を特定するために重要と思われる単語を抽出する．具体的には，まず形態素解析と呼ばれるソフトウェアにて文を単語単位に区切ると共に各単語の品詞情報を付与し，主に名詞や固有名詞などをキーワードとして抽出する．この処理は様々な研究が進んでおり，例えば［非特許文献２］や［非特許文献３］に示す学術論文が公表されている．本発明におけるキーワード抽出手段２ではこのように一般的に実施されているキーワード抽出の処理を用いる． The text information that can be processed by the computer by the language information input means 1 is extracted by the keyword extraction means 2 from the word that is considered important for identifying the topic of the text information in the text information. Specifically, the sentence is first segmented into words by software called morphological analysis, and part-of-speech information of each word is given, and mainly nouns and proper nouns are extracted as keywords. Various researches have been conducted on this process, for example, [Non-Patent Document 2] and [Non-Patent Document 3] academic papers have been published. The keyword extraction means 2 in the present invention uses the keyword extraction process that is generally performed in this way.

キーワード抽出手段２によりテキスト情報から抽出されたキーワードは，Ｗｅｂキーワード補完手段３に送られ，キーワード抽出手段２で抽出したキーワードを用いてインターネット上のＷｅｂを検索し，ヒットするページを収集する．この収集したページのテキスト情報に対してキーワード抽出処理を行うことで，更なるキーワードを取得する． The keywords extracted from the text information by the keyword extracting means 2 are sent to the Web keyword complementing means 3, and the Web on the Internet is searched using the keywords extracted by the keyword extracting means 2, and hit pages are collected. Additional keywords are obtained by performing keyword extraction processing on the text information of the collected pages.

この処理については，図２に詳細を示す．キーワード抽出手段２によりテキスト情報から抽出されたキーワードに対して，Ｗｅｂ検索手段３１でインターネット上のＷｅｂページを検索すると共に検索されたＷｅｂページのテキスト情報を収集する．その後，Ｗｅｂキーワード集約手段３２にて収集された全てのＷｅｂページのテキスト情報に対してキーワード抽出処理を行い，抽出された全てのキーワードに対して分野連想語辞書５と照合し分野情報を付与し，それらの情報を集約する． Details of this process are shown in Fig. 2. With respect to the keywords extracted from the text information by the keyword extraction means 2, the Web search means 31 searches the Web pages on the Internet and collects the text information of the searched Web pages. Thereafter, keyword extraction processing is performed on the text information of all Web pages collected by the Web keyword aggregating unit 32, and field information is assigned to all extracted keywords by collating with the field associative word dictionary 5. , To collect the information.

この処理では，利用者がインターネット上の掲示板やブログ，チャットなどに入力する言語情報には，複数単語を１つにまとめた略語，英文表記の頭文字をつなぎ合わせた略語，さらには俗称などが頻繁に使用されて文書を記述する場合が多く，文書中からキーワード抽出技術にて切り出した単語と，現存する国語辞典や類義語辞典や単語間の意味的な階層構造を表したシソーラス辞典などと，照合できない場合が多々あるという問題点を解決できる．なぜなら利用者が入力する言語情報中に使用している言葉は，その時点において一般的に用いられていることが多く，辞典類には登録されていないがインターネット上のＷｅｂでは頻繁に用いられている場合が多々あるからである．
例えば「ギザかわいい」などの「ギザ」という言葉は２０００年時点では存在しない言葉であり，当然２００９年の辞典には登録されていない言葉である．しかし，Ｗｅｂ上には多用され，２００９年辞典では「ギザ」＝「とっても」ということは大半の人が知る言葉である．さらには「約４０才」ということを「アラウンド・フォーティー」と言い，略して「アラフォー」という言葉がある．「アラフォー」は多くの人が知りうる言葉であるが，最近では６０歳近い年齢のことを，「還暦」とかけ合わせ「アラカン」などという造語が頻繁に使われている．本発明ではＷｅｂキーワード補完手段３を採用することでこのような一時的なはやり言葉など辞典に掲載されていない言葉に対応することを可能としているという特徴を持つ．In this process, language information that users enter on bulletin boards, blogs, chats, etc. on the Internet includes abbreviations that combine multiple words into abbreviations, acronyms that combine English initials, and common names. Documents that are frequently used to describe documents are often used, such as words extracted from documents using keyword extraction technology, existing national and synonym dictionaries, and a thesaurus that expresses the semantic hierarchical structure between words. This solves the problem that there are many cases where collation is not possible. This is because the words used in the linguistic information input by the user are generally used at that time, and are not registered in dictionaries, but are frequently used on the Web on the Internet. This is because there are many cases.
For example, the word “Giza” such as “Giza cute” is a word that does not exist in 2000, and of course is a word that is not registered in the 2009 dictionary. However, it is frequently used on the Web, and in the 2009 dictionary, “Giza” = “very” is a word that most people know. Furthermore, “about 40 years old” is referred to as “around forty”, and there is the term “all four” for short. “Alafor” is a word that many people can know, but recently, the term “Arakan” is frequently used to cross the age of nearly 60 years with “60th birthday”. The present invention has a feature that it is possible to cope with words that are not listed in the dictionary such as temporary words by adopting the Web keyword complementing means 3.

キーワード抽出手段２によりテキスト情報から抽出されたキーワードおよびＷｅｂキーワード補完手段３で補完されたキーワードは，分野情報取得手段４にて分野連想語辞書５に格納された情報と照合し，各キーワードがどのような分野に属するのかという情報を付与する． The keyword extracted from the text information by the keyword extracting means 2 and the keyword supplemented by the Web keyword complementing means 3 are collated with the information stored in the field associative word dictionary 5 by the field information acquiring means 4, and each keyword is identified. Give information about whether it belongs to such a field.

分野連想語辞書５は，図４に示すように分野の名前とその分野に属する単語を対応付けて格納したものである． As shown in FIG. 4, the field association word dictionary 5 stores a field name and a word belonging to the field in association with each other.

分野情報取得手段４の出力であるキーワード抽出手段２によりテキスト情報から抽出されたキーワードおよびＷｅｂキーワード補完手段３で補完されたキーワードおよびその各キーワードに付与された分野情報は，１つの集合体として，分野推論手段６へ送られる．分野推論手段６では，図３に示すように，まず分野頻度抽出手段６１にて，分野情報取得手段４から送られてきたキーワードとその分野情報を分野ごとに集約する．集約された情報の形態は，図５に示すように「家具（２９）」などのような情報になる．これは，「家具」という分野に属するキーワードが２９個存在する，ということを意味する． The keywords extracted from the text information by the keyword extraction means 2 that is the output of the field information acquisition means 4, the keywords supplemented by the Web keyword complementing means 3, and the field information given to each keyword are as one aggregate. Sent to the field reasoning means 6. In the field inference means 6, as shown in FIG. 3, first, the field frequency extraction means 61 aggregates the keywords sent from the field information acquisition means 4 and the field information for each field. The form of the aggregated information is information such as “Furniture (29)” as shown in FIG. This means that there are 29 keywords belonging to the field of “furniture”.

次に分野頻度抽出手段６１にて集約した分野情報に対して，分野階層構造辞書７と照合する．分野階層構造辞書７の内容は図６に示すような，分野名が階層構造上に関連性を示す情報となっている．例えば，＜映像＞という分野に対して下層階層に＜ＳＦ＞，＜邦画＞，＜洋画＞，＜映画製作＞，＜アニメ＞，＜テレビ＞などが存在していることを示している． Next, the field information gathered by the field frequency extracting means 61 is collated with the field hierarchical structure dictionary 7. The contents of the field hierarchical structure dictionary 7 are information indicating that the field names are related in the hierarchical structure as shown in FIG. For example, it shows that <SF>, <Japanese film>, <foreign film>, <movie production>, <animation>, <TV>, etc. exist in the lower layer for the field of <video>.

この分野階層構造辞書７を基にして，分野頻度抽出手段６１にて集約した分野情報に対して，分野親子関係抽出手段６２を用いて親子関係の関係性のある分野が存在するのかを判断すると共に，分野兄弟関係抽出手段６３を用いて兄弟関係の関係性のある分野が存在するのかを判断する．ここで，分野親子関係とは分野階層構造辞書７における上位階層の分野名と下位階層の分野名のことであり，図６に示す例では＜映像＞と＜アニメ＞などが親子関係に当たる．兄弟関係とは分野階層構造辞書７において同一の上位階層に属する並列の下位階層にある分野のことであり，図６に示す例では＜邦画＞と＜洋画＞などが兄弟関係に当たる． Based on the field hierarchical structure dictionary 7, it is determined whether there is a field having a parent-child relationship relationship using the field parent-child relationship extracting unit 62 for the field information collected by the field frequency extracting unit 61. At the same time, the field sibling relation extraction means 63 is used to determine whether there is a field having sibling relations. Here, the field parent-child relationship is the field name of the upper hierarchy and the field name of the lower hierarchy in the field hierarchical structure dictionary 7, and in the example shown in FIG. 6, <video> and <animation> correspond to the parent-child relationship. A sibling relationship is a field in a parallel lower layer belonging to the same upper layer in the field hierarchy dictionary 7. In the example shown in FIG. 6, <Japanese film> and <Western film> correspond to the brother relationship.

分野情報分析手段６４では，分野頻度抽出手段６１にて集約した分野情報と分野親子関係抽出手段６２を用いて抽出した分野の親子関係と分野兄弟関係抽出手段６３を用いて抽出した分野の兄弟関係の情報を用いて分野情報の分析を行う．本発明においては，分野頻度抽出手段６１にて集約した分野情報のうち親子関係や兄弟関係にある分野を融合することで，より正確に分野を判定する手順を採用する． In the field information analysis means 64, the field information aggregated by the field frequency extraction means 61, the parent-child relationship of the field extracted using the field parent-child relationship extraction means 62, and the sibling relation of the field extracted using the field sibling relation extraction means 63. The field information is analyzed using the information of. In the present invention, a procedure for determining a field more accurately is adopted by merging fields in a parent-child relationship or sibling relationship among the field information aggregated by the field frequency extraction means 61.

図５に親子関係にある分野の融合について説明する．分野頻度抽出手段６１にて集約した分野情報は「家具（２９），交通手段（２５），タンス・戸棚類（１１），自動車（１０），ＲＶ車（９）」となっている．ここで，「タンス・戸棚類（１１）」と「家具（２９）」とは親子関係であると共に，「自動車（１０）」と「ＲＶ車（９）」は親子関係が存在する．そこで，親分野を子分野に融合する．具体的には，親分野に属するキーワード数を子分野に属するキーワードに足し込み，その数を子分野のキーワード数にするという処理を行う．この融合を順融合という．順融合を行った結果，分野情報は「タンス・戸棚類（４０），交通手段（２５），ＲＶ車（１９）」となる．さらに「交通手段（２５）」と「ＲＶ車（１９）」の間にも親子関係が存在するのでこれも融合する．ここでは子分野に属するキーワード数を親分野に属するキーワードに足し込み，その数を親分野のキーワード数にするという，逆融合の処理を示している．この逆融合処理を行った結果「タンス・戸棚類（４０），交通手段（４４）」となる． Fig. 5 illustrates the fusion of fields that have a parent-child relationship. The field information gathered by the field frequency extracting means 61 is “furniture (29), transportation means (25), chest / cupboards (11), automobile (10), RV car (9)”. Here, “Chance and cupboards (11)” and “Furniture (29)” have a parent-child relationship, and “Automobile (10)” and “RV car (9)” have a parent-child relationship. Therefore, the parent field is merged with the child field. Specifically, the number of keywords belonging to the parent field is added to the keywords belonging to the child field, and the number of keywords is added to the number of keywords in the child field. This fusion is called forward fusion. As a result of the forward fusion, the field information is “Chance / cupboards (40), transportation (25), RV car (19)”. Furthermore, since there is a parent-child relationship between "transportation (25)" and "RV car (19)", they are also fused. Here, the reverse fusion process is shown in which the number of keywords belonging to the child field is added to the number of keywords belonging to the parent field, and that number is added to the number of keywords in the parent field. As a result of this reverse fusion processing, it becomes “Chance, cupboards (40), transportation (44)”.

当初の分野情報は「家具（２９），交通手段（２５），タンス・戸棚類（１１），自動車（１０），ＲＶ車（９）」であり，もっとも数が多かったのは「家具（２９）」であったが，分野融合を行った結果は「タンス・戸棚類（４０），交通手段（４４）」となり，「交通手段（４４）」がもっとも数が多くなっている．このように親子関係や兄弟関係である類似する分野のキーワードが多くの分野に渡って多くのキーワードが出現しているが故，単独分野で判断した場合に一番多くのキーワードが出現した分野が利用者が入力した言語情報の話題である，という誤認識を回避することが可能となる． The initial field information is “Furniture (29), Transportation (25), Chance and cupboards (11), Automobile (10), RV car (9)”, and “Furniture (29 ) ", But the result of the field integration was" Chance and cupboards (40), transportation means (44) ", and" transportation means (44) "has the largest number. In this way, since keywords in similar fields, such as parent-child relationships and sibling relationships, appear in many fields, the field in which the most keywords appear when judged in a single field It is possible to avoid misrecognition that it is a topic of language information input by the user.

応答文構築手段８では，応答文辞書９に格納された応答文の文テンプレートに対して，分野や分野情報取得手段４で取得した分野情報，キーワード抽出手段２やＷｅｂキーワード補完手段３で獲得したキーワードを埋め込むことで生成する．このとき分野推論手段６で推論した利用者が入力した言語情報の話題を考慮して応答文辞書９に格納された応答文の文テンプレートを選択すると共に，埋め込むキーワードを選択する．
図７は応答文辞書９に格納された応答文の文テンプレートの一例を示したものである．応答文の文テンプレートはｘｍｌ形式のタグつきテキストで記述されている．ここではタグの一例について解説する．The response sentence construction unit 8 acquires the field information acquired by the field and the field information acquisition unit 4, the keyword extraction unit 2, and the Web keyword complementing unit 3 for the response sentence sentence template stored in the response sentence dictionary 9. Generated by embedding keywords. At this time, the sentence template of the response sentence stored in the response sentence dictionary 9 is selected in consideration of the topic of the linguistic information input by the user inferred by the field reasoning means 6, and the keyword to be embedded is selected.
FIG. 7 shows an example of a sentence template of a response sentence stored in the response sentence dictionary 9. The sentence template of the response sentence is described as tagged text in xml format. Here is an example of a tag.

＜ｐａｔ＞とは，分野数と人間が入力した言語情報の話題によりメジャー情報数で応答文テンプレートのパターンを分類するタグである．これにより分野数と人間が入力した言語情報の話題によって，応答文テンプレートにはめ込む応答情報の数や位置の違いに対応するためである．図９に応答文テンプレートのパターンの一例としては７つのパターンを示す．このように取得できた分野数や入力された言語情報から取得された話題数に対応した応答文を生成することで，より利用者に違和感を与えない応答文を生成する．図中の＜Ｋｅｙ＞は入力キーワードを埋め込むためのタグであり，＜Ｆｉｅｌｄ＞は分野を埋め込むためのタグである． <Pat> is a tag that classifies response sentence template patterns by the number of major information according to the number of fields and the topic of language information entered by humans. This is to cope with the difference in the number and location of response information to be embedded in the response sentence template, depending on the number of fields and the topic of language information entered by humans. Fig. 9 shows seven patterns as examples of response sentence template patterns. By generating response sentences corresponding to the number of topics acquired from the number of fields acquired and the input language information in this way, a response sentence that does not make the user feel more uncomfortable is generated. In the figure, <Key> is a tag for embedding an input keyword, and <Field> is a tag for embedding a field.

＜ｒｅｓｐｏｎｓｅ＞ではタグ内に１つの応答文を記述し，条件には１位分野の比率、及び兄弟分野や親子分野の有無を条件として場合分けが可能である．例えば，＜ｒｅｓｐｏｎｓｅｒａｔｉｏ＿ｕｐ＝″７０″＞とすると，１位分野の比率が７０％以上のテンプレートを作成でき，＜ｒｅｓｐｏｎｓｅｎｅａｒ＝″１″＞とすると，１位分野の兄弟分野が１つ以上存在する場合のテンプレートを作成できる．＜ｒｅｓｐｏｎｓｅ＞による条件付けは，これら３つの条件を組み合わせることも可能であり，様々な場合分けが可能である． In <response>, one response sentence is described in the tag, and the conditions can be classified by the ratio of the first field and the presence / absence of sibling fields and parent / child fields. For example, if <response ratio_up = “70”>, a template with a ratio of the first field of 70% or more can be created, and if <response near = “1”>, one or more sibling fields of the first field exist You can create a template for Conditioning by <response> can combine these three conditions, and can be classified into various cases.

＜ｋｅｙ＞タグでは抽出したキーワードを応答文テンプレートにはめ込むためのタグである．例えば「＜ｋｅｙ＞ですか？」という文テンプレートに対して，「風邪」というキーワードを与えると，「風邪ですか？」という応答文が生成される． The <key> tag is a tag for inserting the extracted keyword into the response sentence template. For example, if the keyword “cold” is given to the sentence template “<key>?”, A response sentence “Is it a cold?” Is generated.

＜ｆｉｅｌｄ＞タグでは，抽出した分野を応答文テンプレートにはめ込むためのタグである．例えば＜ｆｉｅｌｄｎｕｍ＝″０″＞とすると，１位分野を応答文テンプレートにはめ込む．要素ｎｕｍの値を設定しない場合は，抽出した分野からランダムに選択する．例えば「＜ｋｅｙ＞なら＜ｆｉｅｌｄ＞ですね．」という文テンプレートに対して，「ＲＶ車」というキーワードを与えると，ＲＶ車というキーワードより「自動車」という＜ｆｉｅｌｄ＞が抽出されているので「ＲＶ車なら自動車ですね」という応答文が生成される． The <field> tag is a tag for inserting the extracted field into the response sentence template. For example, if <field num = “0”>, the first field is inserted into the response sentence template. If the value of element num is not set, a random selection is made from the extracted fields. For example, if the keyword “RV car” is given to the sentence template “<key> is <field> if it is <key>”, the “field” of “automobile” is extracted from the keyword RV car. A response sentence is generated saying "If it is a car, it is a car."

＜ｓｅｌｅｃｔ＞タグでは，タグ内の＜ｓｅｔ＞タグで指定された中から乱数を用いてランダムに応答文を選択するという機能を持つ．これによりいつも同じ応答文が生成されて利用者が飽きるという状態を回避する． The <select> tag has a function of randomly selecting a response sentence using a random number from those specified by the <set> tag in the tag. This avoids the situation where the same response sentence is always generated and the user gets bored.

ここで図７に示す応答文の文テンプレートに対して，「レッドクリフ」というキーワードが入力された場合を例に応答文生成のフォローを説明する．まず，「レッドクリフ」というキーワードは分野推論手段６にて「洋画」という分野が得られる．まず＜ｋｅｙ＞として「レッドクリフ」その後ろに＜ｓｅｌｅｃｔ＞により「といえば」か「は」という言葉の何れかが続き，その後ろに＜ｆｉｅｌｄ＞の「洋画」が続き，最後に＜ｓｅｌｅｃｔ＞により「でしょう！！」か「しかないよね！！」という言葉が続く．これにより生成される応答文は，「レッドクリフといえば洋画でしょう！！」，「レッドクリフは洋画でしょう！！」，「レッドクリフといえば洋画しかないよね！！」，「レッドクリフは洋画しかないよね！！」という４通りの何れかの応答文が生成される． Here, the follow-up of response sentence generation will be described by taking the case where the keyword “red cliff” is input to the sentence template of the response sentence shown in FIG. First, the keyword “Red Cliff” is obtained by the field reasoning means 6 as a field of “Western painting”. First, “key” is followed by “red cliff” followed by either “to say” or “ha” by <select>, followed by “field” of <field>, and finally <select> Followed by the words “I wonder !!” or “I only have it!”. The response sentence generated by this is “Red Cliff is a Western painting!”, “Red Cliff is a Western painting!”, “Red Cliff is only a Western painting!”, “Red Cliff is a Western painting” One of the four response sentences is generated.

以上により，利用者が入力した言語情報に対して，話題の分野を考慮しかつ何時も同じ応答文とならず利用者を飽きさせない応答文を生成するシステムとなる． With the above, it becomes a system that generates a response sentence that does not get tired of the user because the linguistic information input by the user takes the topical field into consideration and is not always the same response sentence.

本発明の自動応答文生成システムは，コンピュータ上において，インターネット上のＷｅｂを閲覧できる環境，および，人間が書き込むこともしくは発話することで発信した言語情報を受諾できる状態において稼動するシステムである．すなわち，コンピュータ上におけるＷｅｂ上の掲示板やブログ，チャットなどのほか，様々な対話を行うシステムに適用可能であると共に，待ち時間が長くなる処理を実行する環境において，一時的に利用者の暇つぶし的なサービスを提供するという利用方法なども存在する．また，様々な案内システムなどにも適用可能である． The automatic response sentence generation system of the present invention is a system that operates on a computer in an environment in which Web on the Internet can be browsed and in a state in which language information transmitted by human writing or speaking can be accepted. In other words, it can be applied to various bulletin board systems, such as Web bulletin boards, blogs, and chats on computers, and it can be used to temporarily kill users in an environment where processing that increases waiting time is executed. There is also a usage method that provides a simple service. It can also be applied to various guidance systems.

は，本発明の自動応答文生成システムの処理フローの概略を示した図である．Is a diagram showing an outline of the processing flow of the automatic response sentence generation system of the present invention. は，本発明のＷｅｂキーワード補完手段の処理フローの概略を示した図である．These are diagrams showing an outline of the processing flow of the Web keyword complementing means of the present invention. は，本発明の分野推論手段の処理フローの概略を示した図である．Is a diagram showing an outline of the processing flow of the field reasoning means of the present invention. は，本発明の分野連想語辞書に登録されている分野と単語の対応一例を示した図である．Fig. 4 is a diagram showing an example of correspondence between a field and a word registered in the field associative word dictionary of the present invention. は，本発明の分野の融合処理の一例を示した図である．Fig. 5 is a diagram showing an example of fusion processing in the field of the present invention. は，本発明の分野階層構造辞書に登録されている分野階層構造の一例を示した図である．FIG. 4 is a diagram showing an example of a field hierarchical structure registered in the field hierarchical dictionary of the present invention. は，本発明の応答文辞書に登録されている応答文の一例を示した図である．Is a diagram showing an example of a response sentence registered in the response sentence dictionary of the present invention. は，本発明の応答文構築手段の処理の一例を示した図である．These are diagrams showing an example of processing of the response sentence construction means of the present invention. は，本発明の応答文辞書に登録されている応答文のバリエーション例を示した図である．Is a diagram showing a variation example of response sentences registered in the response sentence dictionary of the present invention.

１言語情報入力手段
２キーワード抽出手段
３Ｗｅｂキーワード補完手段
３１Ｗｅｂ検索手段
３２Ｗｅｂキーワード集約手段
４分野情報取得手段
５分野連想語辞書
６分野推論手段
６１分野頻度抽出手段
６２分野親子関係抽出手段
６３分野兄弟関係抽出手段
６４分野情報分析手段
７分野階層構造辞書
８応答文構築手段
９応答文辞書DESCRIPTION OF SYMBOLS 1 Language information input means 2 Keyword extraction means 3 Web keyword complementing means 31 Web search means 32 Web keyword aggregation means 4 Field information acquisition means 5 Field association word dictionary 6 Field reasoning means 61 Field frequency extraction means 62 Field parent-child relation extraction means 63 Field Sibling relation extraction means 64 Field information analysis means 7 Field hierarchical structure dictionary 8 Response sentence construction means 9 Response sentence dictionary

Claims

Humans such as systems that input conversation texts on unspecified topics such as bulletin boards, blogs, and chats on the Internet, guidance systems for operating and using various devices, guidance systems for various exhibition venues, and companies In the environment where the machine or system responds to the language information sent or written by the user, the language information input means for taking in the language information sent by the human and the keyword that becomes the information for determining the field from the language information Keyword extracting means for extracting words, Web keyword complementing means for collecting information on the Internet for language information sent by humans and supplementing keywords, and a field association word dictionary storing correspondence information between keywords and field names The keyword is used for language information sent by humans. Field information acquisition means for acquiring field information to which each keyword belongs by collating the keyword extracted by the extraction means with the keyword complemented by the Web keyword complementing means and the field associative dictionary, and a field hierarchy storing a hierarchical structure between the fields Inferring and outputting the field by collating the structure dictionary and the field information to which the keyword extracted by the keyword extraction means and the keyword complemented by the Web keyword complementing means with respect to the language information transmitted by humans is matched with the field hierarchical structure dictionary Response sentence dictionary containing field inference means, response sentence dictionary storing sentence templates that form response sentences, keywords extracted by keyword extraction means for language information sent by humans, and topics inferred by field inference means By using a response sentence constructing means that connects sentence templates that form the response sentence stored in There automatic answering sentence generating system and generates an appropriate response sentence for outgoing language information for the topic even unspecified topic.

Corresponds to commonly used words that are not registered in existing dictionaries, such as abbreviations that combine multiple words into one, acronyms that combine English initials, and national dictionaries such as popular names. Web search means for increasing the number of words serving as information to be searched, searching for a Web page on the Internet using a keyword extracted by keyword extraction means for language information transmitted by a human, and Web search The keyword information is supplemented with respect to the language information transmitted by the person by using the Web keyword aggregation means for extracting the keywords by the keyword extraction means and aggregating the extracted keywords with respect to the linguistic information obtained by the means. Web keyword complementing means characterized by

In order to infer the field to which this set belongs from the set of keywords and the field information to which the keyword belongs, the field to be inferred is limited to the lower hierarchy of the field hierarchy dictionary, and the fields are narrowed down as much as possible and used in different fields. A means for eliminating the ambiguity of the homonyms that are generated, a field frequency extracting means for counting the same number of field information for a set of field information to which the keyword belongs, and a field hierarchy for the aggregated fields Field parent-child relationship extraction means for extracting a parent-child relationship when there is a parent-child relationship in the field hierarchy in the structure dictionary, and siblings if there is a sibling relationship in the field hierarchy in the field hierarchy structure dictionary for the aggregated field By using the field sibling relation extracting means for extracting the relationship, the information of the field frequency extracting means and the field sibling relation extracting means is obtained for the information output from the field frequency extracting means. It is characterized by inferring the topic of the input keyword and the set of field information to which the keyword belongs by using a field information analysis means that performs statistical processing and weights each field. Field reasoning means.

A means for generating an appropriate response sentence to the topic of linguistic information sent by humans, and for the linguistic information sent by humans, the keywords extracted by the keyword extracting means, the number of them, and the reasoning means By combining the topic and number of topics, knowledge of the field parent-child relationship and the field sibling relationship used when inferring with the field reasoning means, and the number of parameters as parameters, combining the sentence templates that form the response sentence stored in the response sentence dictionary A response sentence constructing means that generates a response sentence that does not make the user feel more uncomfortable by transforming.