JP3082889B2

JP3082889B2 - Topic structure recognition method and apparatus for monolog data

Info

Publication number: JP3082889B2
Application number: JP05306288A
Authority: JP
Inventors: 敦竹下; 透中川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-12-07
Filing date: 1993-12-07
Publication date: 2000-08-28
Anticipated expiration: 2015-08-28
Also published as: JPH07160710A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語解析における
話題構造認識の方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for recognizing a topic structure in natural language analysis.

【０００２】[0002]

【従来の技術】従来より話題とその構造に関するモデル
が提案されている。これについては例えば、B.J.Grosz
and C.L.Sidner: “Attention, intention and the str
uctureof discourse ”，Computational Lintuistics
誌，volume 12, number 3, pp.175-204(1986) に説明さ
れている。話題は入れ子構造を持つので、話題の展開は
スタックを用いてモデル化している。また、話題の入れ
子構造の変化、すなわちスタックへの話題のプッシュや
ポップの操作は、話者の意図の遷移によって決定され
る。また、どのような話題が展開するかということは、
ドメイン知識と呼ばれる常識が関係する。2. Description of the Related Art Conventionally, models relating to topics and their structures have been proposed. For this, for example, BJGrosz
and CLSidner: “Attention, intention and the str
uctureof discourse ”, Computational Lintuistics
Journal, volume 12, number 3, pp. 175-204 (1986). Since topics have a nested structure, the development of topics is modeled using a stack. The change of the nested structure of topics, that is, the operation of pushing or popping a topic on the stack, is determined by the transition of the intention of the speaker. In addition, what kind of topic develops,
Common sense called domain knowledge is involved.

【０００３】ここで、ドメイン知識とは例えば「“会社
Ａ”とは“電話会社”の一種である」といった概念の上
位−下位関係や、「“会社Ａ”は“サービスＡ”という
サービスを行ない、そのために宣伝を行なっている」と
いった行為間の関係を含んでいる。[0003] Here, the domain knowledge is, for example, a superordinate-subordinate relationship of a concept such as "" Company A "is a kind of" telephone company "" or "" Company A "provides a service" Service A ". , For which it advertises. "

【０００４】[0004]

【発明が解決しようとしている課題】しかしながら、上
記の話題と構造に関するモデルでは、意図を認識する方
法が与えられていないので、実際には話題の構造を認識
することはできない。また、話題展開に関しても、どの
ようなドメイン知識が必要であり、それをどのように用
いれば良いかという方法が与えられていないだけでな
く、たとえそれらが与えられたとしても話題構造認識に
必要なドメイン知識をあらかじめ準備しておくことは不
可能である。However, in the above-described model relating to topics and structures, a method of recognizing intention is not provided, so that the structure of topics cannot be actually recognized. In addition, in terms of topic development, not only is there no way to provide what domain knowledge is needed and how to use it, but even if they are given, it is necessary for topic structure recognition. It is impossible to prepare in advance domain knowledge.

【０００５】本発明は上記の点に鑑みなされたもので、
モノローグ・データに対して、ドメイン知識ではなく、
話題展開様式や言語的知識を用いることにより、話題を
認識することを目的とする。[0005] The present invention has been made in view of the above points,
For monolog data, not domain knowledge,
The purpose is to recognize topics by using topic development style and linguistic knowledge.

【０００６】[0006]

【課題を解決するための手段】本発明のモノローグ・デ
ータに対する話題構造認識方法は、話題構造認識前処理
用辞書記憶手段と話題構造認識前処理手段と基盤展開処
理規則記憶手段と基盤展開処理手段と意味的展開処理規
則記憶手段と意味的展開処理手段と統合処理規則記憶手
段と統合処理手段とを有する話題構造認識装置を用いて
モノローグ・データに対しての話題構造を認識する方法
であって、まず、前記話題構造認識前処理手段にて前記
話題構造認識前処理用辞書記憶手段に記憶されている話
題構造認識前処理用辞書を用いて入力されたモノローグ
・データに対して形態素解析処理と単文区切り処理と顕
著名詞句抽出処理とブロック認識処理とからなる話題構
造認識前処理を行い、次に、前記基盤展開処理手段にて
前記基盤展開処理規則記憶手段に記憶されている基盤展
開規則を用いて前記話題構造認識前処理の結果から、前
記モノローグ・データにおける話題の展開を手掛かり句
などによって明示的に示される基盤展開についての、話
題が提示・確立される話題確立区間の同定処理と話題確
立区間における話題語の同定処理と話題語の入れ子のレ
ベルと継続区間の同定処理とを順次に行う基盤展開処理
を行い、次に、前記意味的展開処理手段にて前記意味的
展開処理規則記憶記憶手段に記憶されている意味的展開
処理規則を用いて前記話題構造認識前処理の結果と前記
基盤展開処理の各処理における結果から、基盤展開の中
で話題が展開する意味的展開についての、話題が提示・
確立される話題確立区間の同定処理と話題確立区間にお
ける話題語の同定処理と話題語の入れ子のレベルと継続
区間の同定処理とを順次に行う意味的展開処理を行い、
その後、前記統合処理手段にて前記統合処理規則記憶手
段に記憶されている統合処理規則を用いて前記基盤展開
処理の結果と前記意味的展開処理の結果とから、統合処
理を行うことにより、モノローグ・データ全体の話題構
造を認識し、モノローグ・データに対する話題構造認識
前処理における顕著名詞句抽出処理として、単文区切り
処理結果のモノローグ・データ中の各単文に対して、顕
著名詞句候補を示す言語表現を顕著名詞句を提示する機
能しか持たない明示マーカとそれ以外の非明示マーカに
分類し、その種類と優先順位を登録した、顕著名詞句マ
ーカ優先順位規則とマッチングを取ることにより、顕著
名詞句候補の抽出と優先順位付けを行い、最も優先順位
の高い候補を顕著名詞句と選び、モノローグ・データに
対する話題構造認識前処理におけるブロック認識処理と
して、話題が継続している言語表現である話題継続句を
登録しておき、モノローグ・データ中に含まれる話題継
続句の情報と、句点等で示される１文という単位の情報
を用いることにより、意味的なまとまりのある文集合で
あるブロックを認識することを特徴とする。According to the present invention, there is provided a topic structure recognition method for monolog data, which includes a dictionary storage unit for topic structure recognition preprocessing, a topic structure recognition preprocessing unit, a base expansion processing rule storage unit, and a base expansion processing unit. A topic structure recognizer for monolog data using a topic structure recognizer having a storage means for semantic expansion processing rules, a storage means for semantic expansion processing, a storage means for integrated processing rules and an integration processing means. First, morphological analysis processing is performed on the monolog data input using the topic structure recognition preprocessing dictionary stored in the topic structure recognition preprocessing dictionary storage means by the topic structure recognition preprocessing means. Performs pre-topic structure recognition processing consisting of simple sentence separation processing, salient noun phrase extraction processing, and block recognition processing, and then performs the base expansion processing by the base expansion processing means. From the result of the topic structure recognition preprocessing using the base development rule stored in the rule storage means, a topic about the base development that is explicitly indicated by a clue phrase or the like in the development of the topic in the monolog data is presented. Performing a base development process of sequentially performing identification processing of a topic establishment section to be established, topic word identification processing in a topic establishment section, and topic word nesting level and continuation section identification processing, and then the semantic The expansion processing means uses the semantic expansion processing rules stored in the semantic expansion processing rule storage storage means to determine the base expansion from the result of the topic structure recognition preprocessing and the result of each processing of the base expansion processing. The topic is presented about the semantic development that the topic develops in
Perform semantic expansion processing to sequentially identify the topic establishment section to be established, identify the topic word in the topic establishment section, and identify the nesting level of the topic word and the continuation section.
Then, the integration processing means performs integration processing from the result of the base expansion processing and the result of the semantic expansion processing using the integration processing rule stored in the integration processing rule storage means, thereby performing a monologue processing. A language that recognizes the topic structure of the entire data and extracts salient noun phrases for each single sentence in the monologue data as a result of simple sentence delimitation processing, as a prominent noun phrase extraction process in topic structure recognition preprocessing for monologue data By classifying expressions into explicit markers that only have the function of presenting salient noun phrases and other non-explicit markers, and registering their types and priorities, matching them with salient noun phrase marker priority rules, Extract and prioritize phrase candidates, select the candidate with the highest priority as a prominent noun phrase, and identify the topic structure for the monolog data. As a block recognition process in the preprocessing, a topic continuation phrase, which is a linguistic expression in which the topic continues, is registered, and information of the topic continuation phrase included in the monolog data and a unit of one sentence indicated by a period, etc. By using this information, a block that is a sentence set having a meaningful unity is recognized.

【０００７】この場合、モノローグ・データに対する話
題構造認識の基盤展開処理として、前記話題構造認識前
処理の結果である、モノローグ・データから抽出される
手掛かり句とその種類の情報と、前記ブロック情報を用
いることにより、基盤展開の話題が提示・確立される話
題確立区間を同定処理し、前記同定された基盤展開での
話題確立区間において、基盤展開用話題候補優先順位に
したがって、顕著名詞句から、最も優先順位が高い話題
候補を選び、選ばれた候補が１つしかない場合はその候
補を話題とし、選ばれた候補が複数ある場合は、時間的
に最も早く出現した候補を選ぶことにより、基盤展開に
おける話題語を同定し、前記同定された基盤展開での話
題語の中で最初の話題語の話題レベルを１とし、それ以
外の話題に関しては基盤展開用話題レベル付け規則にし
たがい話題レベルを決定し、各話題が属する話題確立区
間の先頭をその話題の継続区間の開始点とし、その話題
レベル以下の話題の開始直前とモノローグ・データ終了
点の２つのうち時間的に早い方を話題確立区間の終了点
とすることにより、基盤展開における話題語のレベルと
継続区間を同定処理することとしてもよい。In this case, as a base development process of the topic structure recognition for the monolog data, the clue phrase extracted from the monolog data and the information of the type and the block information, which are the result of the topic structure recognition preprocessing, are obtained. By using, the topic establishment section in which the topic of the base development is presented / established is identified, and in the topic establishment section in the identified base development, from the prominent noun phrases according to the base candidate topic priorities for the base development, By selecting the topic candidate with the highest priority, if there is only one selected candidate, the candidate is taken as a topic, and if there are multiple selected candidates, the candidate that appeared earliest in time is selected, The topic word in the base development is identified, the topic level of the first topic word in the identified base development is set to 1, and the other topics are Determine the topic level according to the topic leveling rules for infrastructure development, set the beginning of the topic establishment section to which each topic belongs as the start point of the continuation section of the topic, and immediately before the start of the topic below the topic level and the end point of the monolog data By determining the temporally earlier one of the two as the end point of the topic establishment section, the topic word level and the continuation section in the base development may be identified.

【０００８】また、モノローグ・データに対する話題構
造認識の意味的展開処理として、文章全体のタイトル
と、前記話題構造認識前処理の結果である、前記ブロッ
ク情報と、前記顕著名詞句と、意味的展開用話題候補優
先順位と、前記基盤展開処理の結果である基盤展開での
話題確立区間を用いることにより、モノローグ・データ
に対する話題構造認識の意味的展開についての話題が提
示・確立される話題確立区間を同定し、前記同定された
意味的展開での話題確立区間において、前記意味的展開
用話題候補優先順位にしたがって、顕著名詞句から、最
も優先順位が最も高い話題候補を選び、選ばれた候補が
１つしかない場合はその候補を話題とし、選ばれた候補
が複数ある場合は、時間的に最も早く出現した候補を選
ぶことにより、意味的展開における話題語を同定し、前
記同定された意味的展開における全ての話題語に対して
仮の話題レベルを１とし、各話題が属する話題確立区間
の先頭をその話題の継続区間の開始点とし、意味的展開
でその話題の次の話題の開始直前と基盤展開での話題確
立区間の開始点の直前と言語データ終了点の３つのうち
時間的に早い方を話題確立区間の終了点とすることによ
り、意味的展開における話題語の仮レベルと継続区間を
決定することとしてもよい。As a semantic development process of topic structure recognition for monolog data, the title of the whole sentence, the block information as a result of the topic structure pre-recognition process, the salient noun phrase, The topic establishment section in which topics about the semantic development of topic structure recognition for monolog data are presented and established by using the topic topic candidate priorities and the topic establishment section in the base development as a result of the base development processing. In the topic establishment section in the identified semantic expansion, according to the topic candidate priority for semantic expansion, from the salient noun phrases, the topic candidate with the highest priority is selected, and the selected candidate is selected. If there is only one, the candidate is discussed as a topic, and if there are multiple selected candidates, the candidate that appeared earliest in terms of time is selected, The topic words in the expansion are identified, the provisional topic level is set to 1 for all the topic words in the identified semantic expansion, and the head of the topic establishment section to which each topic belongs is set as the start point of the continuation section of the topic. The end point of the topic establishment section is the earlier one of the three immediately before the start of the topic following the topic in the semantic development, immediately before the start point of the topic establishment section in the base development, and the end point of the language data. Thereby, the provisional level and the continuation section of the topic word in the semantic development may be determined.

【０００９】本発明の話題構造認識装置は、モノローグ
・データを入力するための入力部と、話題構造認識の前
処理辞書の辞書内容を取り出す手段と、その辞書内容を
用いて前処理を行なって前処理記憶部に記憶する手段
と、話題の展開を手掛かり句などによって明示的に示さ
れる基盤展開と、その基盤展開の中で展開する意味的展
開に分離し、基盤展開と意味的展開のそれぞれについて
の処理結果を記憶する基盤展開処理記憶部と意味的展開
処理記憶部と、基盤展開処理規則の各規則を取り出す手
段と、基盤展開処理規則を用いて、話題確立区間決定処
理と話題語決定処理と話題レベル区間決定処理を行なっ
て基盤展開処理記憶部に記憶する手段と、意味的展開処
理規則の各規則を取り出す手段と、意味的展開処理規則
を用いて、話題確立区間決定処理と話題語決定処理と話
題レベル区間決定処理を行なって意味的展開処理記憶部
に記憶する手段と、統合処理規則の各規則を取り出す手
段と、統合処理規則規則を用いて、統合処理を行なって
統合処理記憶部に記憶する手段と、統合処理記憶部の内
容を表示するための表示部を有することを特徴とする。A topic structure recognition apparatus according to the present invention includes an input unit for inputting monolog data, a unit for extracting a dictionary content of a preprocessing dictionary for topic structure recognition, and performing preprocessing using the dictionary content. The means for storing in the preprocessing storage unit, the development of the topic is divided into a base development explicitly indicated by clue phrases and the like, and a semantic development developed in the base development. Expansion processing storage section and semantic expansion processing storage section for storing the processing results of, and means for extracting each of the base expansion processing rules, and topic establishment section determination processing and topic word determination using the base expansion processing rules Means for performing processing and topic level section determination processing and storing them in the base development processing storage unit, means for extracting each rule of the semantic development processing rules, and topic establishment using the semantic development processing rules Means for performing interval determination processing, topic word determination processing, and topic level section determination processing and storing them in the semantic development processing storage unit, means for extracting each rule of the integration processing rules, and integration processing using the integration processing rule rules And storing the integrated processing storage unit in the integrated processing storage unit, and a display unit for displaying the contents of the integrated processing storage unit.

【００１０】[0010]

【００１１】[0011]

【００１２】[0012]

【００１３】[0013]

【００１４】[0014]

【００１５】[0015]

【００１６】[0016]

【作用】本発明は、話題継続句などによって意味的にま
とまりのあるブロックを認識し、話題展開を手掛かり句
などによって明示的に示される基盤展開と、その中で展
開する意味展開に分け、それぞれについてブロック情報
等を用いて話題が提示、確立される話題確立区間を求
め、各話題確立区間における話題語を話題マーカで示さ
れた候補から選び、基盤展開と意味展開における話題を
統合することにより、モノローグ・データに対して、話
題構造を認識する。このように、モノローグ・データに
対するドメイン知識を必要とすることなく、話題展開様
式や言語的知識のみを用いて話題が認識される。The present invention recognizes semantically coherent blocks by topic continuation phrases and the like, and divides topic development into basic development explicitly indicated by clue phrases and the like, and semantic development developed in them. The topic is presented and established using block information, etc., and the topic establishment section in which the topic is established is determined. The topic word in each topic establishment section is selected from the candidates indicated by the topic marker, and the topics in the basic development and the semantic development are integrated. Recognize topic structure for monolog data. In this way, topics are recognized using only topic development styles and linguistic knowledge without requiring domain knowledge for monolog data.

【００１７】[0017]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１８】図１は本発明の一実施例の話題構造認識処
理の概要を示す図、図２は本発明の一実施例の話題構造
認識装置のブロック図であり、これらを参照して本発明
の処理および話題構造認識装置の概要について説明す
る。FIG. 1 is a diagram showing an outline of a topic structure recognition process according to one embodiment of the present invention, and FIG. 2 is a block diagram of a topic structure recognition device according to one embodiment of the present invention. The outline of the processing and the topic structure recognition device will be described.

【００１９】[0019]

【００２０】話題構造認識処理は図１に示す以下の手順
にて行われる。The topic structure recognition processing is performed according to the following procedure shown in FIG.

【００２１】入力されたモノローグ・データ１１０に対
して話題構造認識前処理１２０を施し、ブロックとす
る。この後、該ブロックに対して基盤展開処理１３０お
よび意味的展開処理１４０を施して話題レベル区間の決
定をそれぞれ独立に行う。続いて基盤展開処理１３０と
意味的展開処理１４０にてそれぞれ決定された話題レベ
ルに基づいて、基盤展開と意味的展開の統合処理１５０
を行い、話題構造１６０を決定する。The input monolog data 110 is subjected to topic structure recognition pre-processing 120 to form blocks. Thereafter, the block is subjected to a base development process 130 and a semantic development process 140 to determine a topic level section independently. Subsequently, based on the topic level determined in the base development processing 130 and the semantic development processing 140, respectively, the integration processing 150 of the base development and the semantic development is performed.
To determine the topic structure 160.

【００２２】上記の話題構造認識前処理１２０は、形態
素解析１２１、単文区切り処理１２２、顕著名詞句抽出
１２３およびブロック認識１２４を順に行うように構成
されている。基盤展開処理１３０は話題確立区間の決定
１３１、話題語の決定および話題レベル区間の決定を順
に行い、意味的展開処理１４０では話題確立区間の決定
１４１、話題語の決定１４２および話題レベル区間の決
定１４３が順に行われる。The topic structure pre-recognition process 120 is configured to sequentially perform a morphological analysis 121, a single sentence separation process 122, a salient noun phrase extraction 123, and a block recognition 124. The base development processing 130 sequentially determines a topic establishment section determination 131, a topic word determination, and a topic level section determination , and the semantic development processing 140 determines a topic establishment section determination 141, a topic word determination 142, and a topic level section determination. 143 are performed in order.

【００２３】話題構造認識前処理１２０で行われる形態
素解析１２１、単文区切り処理１２２、顕著名詞句抽出
１２３およびブロック認識１２４は図２に示すブロック
図では、話題構造認識前処理部２０３がデータ入力部２
０１から入力された対話データに対して前処理記憶部２
０２に記憶されている処理手順に従って辞書管理部２１
６と前処理用辞書２０４を用いて行う。まず、入力のモ
ノローグ・データ１１０に対して形態素解析処理を行な
う。形態素解析処理は入力されたモノローグ・データ１
１０の文字列を単語毎に区切って単語列とし、さらに各
単語の品詞や活用語の活用形等を同定する。 In the block diagram shown in FIG. 2, the morphological analysis 121, the single sentence separation process 122, the salient noun phrase extraction 123, and the block recognition 124 performed in the topic structure recognition preprocessing 120 are performed by the topic structure recognition preprocessing unit 203 by the data input unit. 2
Pre-processing storage unit 2 for dialog data input from 01
02 according to the processing procedure stored in the dictionary management unit 21.
6 and the preprocessing dictionary 204. First, the input mode
A morphological analysis process is performed on the norolog data 110.
U. Morphological analysis processing is the input monologue data 1
The character string of 10 is divided into words to form a word string.
Identify the parts of speech of words and the inflected forms of inflected words.

【００２４】形態素解析１２１がなされると、続いて形
態素解析の結果について単文区切り処理１２２が行なわ
れる。単文区切り処理１２２は埋め込み文や重文のよう
に複数の述語を含む文を、１つの述語のみを含む単文に
分割するもので、図２のブロック図では、話題構造認識
前処理部２０３が前処理記憶部２０２に記憶されている
単文規則管理および単語区切り規則を用いて行なう。After the morphological analysis 121 is performed, a single sentence segmentation process 122 is subsequently performed on the result of the morphological analysis. The simple sentence separator process 122 sentence including a plurality of predicates as embedded text and compound sentence, one that divides the simple sentence containing only one predicate, the block diagram of FIG. 2, the topic structure recognition preprocessing unit 203 pretreatment This is performed using the single sentence rule management and the word segmentation rules stored in the storage unit 202.

【００２５】次に、顕著名詞句抽出１２３で、入力され
た単文区切り処理結果に対する各単文において最も強調
されている名詞句を抽出することが行われる。Next, in the salient noun phrase extraction 123, the noun phrase that is most emphasized in each simple sentence with respect to the input simple sentence delimitation processing is extracted.

【００２６】次に、意味的にまとまりのある単位である
ブロックを認識する。ブロックはモノローグ・データに
おける段落に相当する。なお、ここでの処理は、図２の
ブロック図では話題構造認識前処理部２０３が、前処理
記憶部２０２に記憶されている話題管理規則と話題構造
認識規則を用いて行なう。Next, a block that is a unit that is semantically united is recognized. A block corresponds to a paragraph in the monolog data. Note that the processing here is performed by the topic structure recognition preprocessing unit 203 using the topic management rules and the topic structure recognition rules stored in the preprocessing storage unit 202 in the block diagram of FIG.

【００２７】次に、認識されたブロックについて、基盤
展開処理１３０と意味的展開処理１４０がそれぞれ行わ
れ、話題確立区間の決定１３１，１４１、話題語の決定
１３２，１４２、話題レベル区間の決定１３３，１４３
という３つの処理が順次行なわれる。ここで、話題確立
区間とは、話題が提示・確立される区間のことである。
この３つの処理によって、基盤展開処理１３０と意味的
展開処理１４０のそれぞれにおける話題構造を求めるこ
とができる。Next, the base expansion processing 130 and the semantic expansion processing 140 are performed on the recognized block, respectively, to determine topic establishment sections 131 and 141, topic words 132 and 142, and topic level section 133. , 143
Are sequentially performed. Here, the topic establishment section is a section in which a topic is presented and established.
By these three processes, the topic structure in each of the base expansion process 130 and the semantic expansion process 140 can be obtained.

【００２８】基盤展開処理１３０に関しては、各処理の
入力としては、基盤展開の直前の処理の結果だけが必要
である。これに対し、意味的展開処理１４０に関して
は、話題語の決定を行なうためには、意味的展開処理１
４０の直前の処理の結果と、基盤展開処理１３０での同
じ種類の処理の結果が必要である。すなわち、意味的展
開処理１４０における話題確立区間の決定１４１を行う
ための入力としては、ブロック認識１２４の結果と基盤
展開処理１３０における話題確立区間の決定１３１の結
果の両方が必要である。As for the base development processing 130, only the result of the processing immediately before the base development is required as an input of each processing. On the other hand, regarding the semantic expansion processing 140, in order to determine a topic word, the semantic expansion processing 1
The result of the process immediately before 40 and the result of the same type of process in the base development process 130 are required. That is, as the input for performing the determination 141 of the topic establishment section in the semantic development processing 140, both the result of the block recognition 124 and the result of the determination 131 of the topic establishment section in the base development processing 130 are required.

【００２９】同様に、意味的展開処理１４０における話
題語の決定１４２の入力としては、意味的展開処理１４
０における話題確立区間の決定１４１の結果と、基盤展
開処理１３０における話題語の決定１３２の結果が必要
である。また、意味的展開処理１４０における話題レベ
ル区間の決定１４３の入力としては、意味的展開処理１
４０における話題語の決定１４２の結果と、基盤展開処
理１３０における話題レベル区間の決定１３３の結果が
必要である。Similarly, the input of the topic word determination 142 in the semantic expansion processing 140 is as follows.
0 and the result of the topic word determination 132 in the base development processing 130 are required. The input of the topic level section determination 143 in the semantic expansion processing 140 includes the semantic expansion processing 1
The result of the topic word determination 142 in 40 and the result of the topic level section determination 133 in the base development process 130 are required.

【００３０】最後に、基盤展開処理１３０と意味的展開
処理１４０で求められたそれぞれの話題構造を入力とし
て、基盤展開処理１３０と意味的展開処理１４０の統合
処理１５０を行ない、その結果としてモノローグ・デー
タ全体の話題構造１６０を出力する。Finally, the respective topic structures obtained by the base development processing 130 and the semantic development processing 140 are input, and the integration processing 150 of the base development processing 130 and the semantic development processing 140 is performed. The topic structure 160 of the entire data is output.

【００３１】上述した基盤展開処理１３０は、図２に示
すブロック図では、話題確立区間決定処理部２３１，話
題語決定処理部２３２，話題レベル区間決定処理部２３
３からなる基盤展開処理部２３０が、基盤展開処理の手
順を記憶する基盤展開処理記憶部２０７の記憶内容にし
たがい、基盤展開処理規則管理部２０５および基盤展開
処理規則２０６を参照して行う。また、意味的展開処理
１４０は、話題確立区間決定処理部２４１，話題語決定
処理部２４２，話題レベル区間決定処理部２４３からな
る意味的展開処理部２４０が、意味的展開処理の手順を
記憶する意味的展開処理記憶部２１０の記憶内容にした
がい、意味的展開処理規則管理部２０８および意味的展
開処理規則２０９を参照して行う。In the block diagram shown in FIG. 2, the above-described infrastructure development processing 130 is a topic establishment section determination processing section 231, a topic word determination processing section 232, and a topic level section determination processing section 23.
3 is performed by referring to the base development processing rule management unit 205 and the base development processing rule 206 according to the storage contents of the base development processing storage unit 207 that stores the procedure of the base development processing. In the semantic development processing 140, the semantic development processing unit 240 including the topic establishment section determination processing unit 241, the topic word determination processing unit 242, and the topic level section determination processing unit 243 stores the procedure of the semantic development processing. In accordance with the contents stored in the semantic development processing storage unit 210, the processing is performed by referring to the semantic development processing rule management unit 208 and the semantic development processing rule 209.

【００３２】次に、本発明における話題構造認識を行う
ための各処理の具体的な内容について説明する。Next, the specific contents of each process for performing topic structure recognition in the present invention will be described.

【００３３】話題構造認識前処理１２０形態素解析１２１形態素解析１２１では日本語文字列を入力とし、それを
単語ごとに区切った結果と各単語の品詞等の情報を出力
とする。例えば、「特許を書く」という日本語文字列を
入力として形態素解析を行なうと、出力としては「特
許」「を」「書く」のように３つの単語に分割された日
本語文字列と、「特許＝名詞」、「を＝格助詞」、「書
く＝動詞の終止形」のような各単語の品詞情報が出力さ
れる。ただし、動詞は活用語であるので、「終止形」の
ような活用形の情報も付加される。Topological Structure Recognition Preprocessing 120 Morphological Analysis 121 The morphological analysis 121 receives a Japanese character string, and outputs a result obtained by dividing the character string for each word and the part of speech of each word. For example, when a morphological analysis is performed using a Japanese character string “writing a patent” as an input, a Japanese character string divided into three words such as “patent”, “wo”, and “writing” is output, Part-of-speech information of each word such as "patent = noun", "wo = case particle", and "writing = termination of verb" is output. However, since the verb is a conjugation word, conjugation information such as "end form" is also added.

【００３４】形態素解析１２１を行なうためには、各単
語の品詞を記した単語辞書と、日本語文字列において品
詞同士の並びやすさを記述した連接辞書が必要である。
連接辞書には例えば、『「特許」「を」』のように名詞
の後には格助詞が続きやすいが、『「書く」「を」』の
ように動詞の後には格助詞は続きにくいという情報が記
されている。In order to perform the morphological analysis 121, a word dictionary that describes the parts of speech of each word and a concatenation dictionary that describes the ease of arrangement of parts of speech in a Japanese character string are required.
For example, in a conjunctive dictionary, a case particle is likely to follow a noun like "patent" or "wo", but a case particle is hard to follow after a verb like "write" or "wo". Is written.

【００３５】日本語文字列を単語に区切る場合、例えば
『特許』という文字列が「特許」という１つの名詞から
構成されるか、「特」と「許」という２つも単語から構
成されるかという曖昧性が存在するが、形態素解析では
単語辞書と連接辞書を用いることにより、最も適切な解
析結果を選択する。形態素解析に関する詳細な手法は、
『吉村、日高、吉田：「文節数最小法を用いたべた書き
日本語文の形態素解析」情報処理学会論文誌Ｖｏｌ．２
４，Ｎｏ．１，ｐｐ．４０−４６（１９８３）』で述べ
られている。When a Japanese character string is divided into words, for example, whether the character string "patent" is composed of one noun "patent" or whether the character string "patent" and two characters "permit" are also composed of words However, in the morphological analysis, the most appropriate analysis result is selected by using a word dictionary and a concatenation dictionary. The detailed method for morphological analysis is
"Yoshimura, Hidaka, Yoshida:" Morphological analysis of solid Japanese sentences using the minimum number of clauses method "Transactions of Information Processing Society of Japan, Vol. 2
4, No. 1, pp. 40-46 (1983)].

【００３６】単文区切り処理１２２単文区切り処理１２２は埋め込み文や重文のように複数
の述語を含む文を、図３に示すようなあらかじめ準備し
た単文区切り規則を用いることにより、１つの述語のみ
を含む単文に分割する。例えば、「私は特許を書く」と
いう文に含まれる述語は「書く」という動詞だけである
ので、これは単文である。これに対し、「発明したら、
特許を書く」という文には「発明し」という動詞と「書
く」という動詞の２つの述語が含まれているので、「発
明したら、」と「特許を書く」という２つの単文に分割
する。Single sentence delimiting process 122 The single sentence delimiting process 122 includes only one predicate by using a sentence containing a plurality of predicates such as an embedded sentence or a multiple sentence by using a prepared single sentence delimiting rule as shown in FIG. Break it into simple sentences. For example, this is a simple sentence because the only predicate in the sentence "I write a patent" is the verb "write". On the other hand, "If you invent,
Since the sentence "write a patent" includes two predicates, a verb "invent" and a verb "write", the sentence is divided into two simple sentences "if you invent" and "write a patent".

【００３７】図３に示す単文区切り規則は、以下の通り
である。（１）句点で切る（２）以下の場合を除き、原則として関係の直後で切る（２−１）関係が形容詞または形容動詞の連体形の場合（２−２）関係が形容詞のまたは形容動詞の連用形の場
合（３）読点では区切らない。ただし、読点より前の単文
内に関係を含んでいる場合は、読点の後で区切る（４）終助詞に格助詞「と」が続いている場合は、格助
詞「と」の前で区切る形態素解析で求められた単語の種
類や品詞の種類、活用形に応じて、複数の述語を含む日
本語文を単文に分割する。与えられた日本語文に対し
て、各規則を適用できるかどうかを調べ、可能なものに
ついては適用を行なうことによって単文区切り処理を行
なう。The single sentence separation rule shown in FIG. 3 is as follows. (1) Cut at a period (2) Except in the following cases, in principle, cut immediately after a relation (2-1) When the relation is an adjective or an adjunct form of an adjective (2-2) The relation is an adjective or an adjective verb A field of continuous use
If (3) it is not separated by a comma. However, if the relation is contained in the simple sentence before the reading point, it is separated after the reading point. (4) If the final particle is followed by the case particle "to", the morpheme is separated before the case particle "to". The Japanese sentence including a plurality of predicates is divided into simple sentences according to the type of word, the type of part of speech, and the inflected form obtained by the analysis. It checks whether each rule can be applied to a given Japanese sentence, and if possible, applies single rules to perform single sentence delimitation processing.

【００３８】顕著名詞句の抽出１２３各単文において最も強調されている顕著名詞句を抽出す
る。日本語では顕著名詞句は助詞等のマーカによって示
される。マーカには、「について」「に関して」「は」
のように語句を提示する機能しか持たない明示マーカ
と、「が」「を」のように主語や目的語のような文法的
役割を示すマーカが語句を提示するためにも用いられた
「非明示マーカ」が存在する。これらは、優先順位とと
もにあらかじめ規則として人間が与えておく。Extraction of salient noun phrases 123 The salient noun phrases that are most emphasized in each simple sentence are extracted. In Japanese, salient noun phrases are indicated by markers such as particles. Markers include “about”, “about”, “ha”
An explicit marker that only has the function of presenting a phrase, such as, and a marker that indicates a grammatical role, such as subject or object, such as "ga" or "wo", are used to present a phrase. An "explicit marker" exists. These are given by humans as rules in advance along with the priorities.

【００３９】顕著名詞句のマーカ優先順位の例を図４に
示す。FIG. 4 shows an example of marker priority of salient noun phrases.

【００４０】最も優先されるのは、（１）「は」以外の
明示マーカであり、次に（２）明示マーカ「は」、
（３）非明示マーカと続く。The highest priority is given to (1) explicit markers other than "ha", and then (2) explicit markers "ha",
(3) Followed by an unspecified marker.

【００４１】モノローグ・データとこれらのマーカとの
間でマッチングを取ることにより、顕著名詞句の候補を
抽出する。ただし、マーカで示されている語句が、代名
詞や「こと」「もの」のようにそれだけでは具体的な意
味を持たないダイクシス表現の場合は、顕著名詞句の候
補とはしない。By matching between the monolog data and these markers, candidates for salient noun phrases are extracted. However, in the case where the phrase indicated by the marker is a Dyxis expression that does not have a specific meaning by itself, such as a pronoun or “koto” or “mono,” it is not considered as a prominent noun phrase candidate.

【００４２】１単文から複数の候補が抽出された場合
は、図４に示す優先順位にしたがい、最も優先順位が高
いものを顕著名詞句として選ぶ、また、優先順位が最高
位のものが複数ある場合は、時間的に最も早く出現して
いるものを顕著名詞句として選ぶ。When a plurality of candidates are extracted from one simple sentence, the one having the highest priority is selected as a prominent noun phrase according to the priority shown in FIG. 4, and there are a plurality of those having the highest priority. In this case, the one that appears earliest in time is selected as the salient noun phrase.

【００４３】ブロックの認識１２４モノローグ・データにおける意味的なまとまりであるブ
ロックを認識する。ブロックはモノローグ・テータにお
ける段落に相当する。Block Recognition 124 A block which is a semantic unit in the monolog data is recognized. A block corresponds to a paragraph in the monologueta data.

【００４４】ブロック認識規則は以下に示す通りであ
り、図５にその例を示す。（ａ）１文内では話題が継続しているので、１文内は同
一ブロックとする。（ｂ）話題継続句によって話題が継続していることが示
されている場合は、現在の文は、直前の文と同じブロッ
クに含まれるものとする。（ｃ）上記の各規則によって現在の文が既に存在するブ
ロックに属することが認識されなければ、現在の文から
新しいブロックが始まるものとする。The blanking lock recognition rule is as shown below, an example of which is shown in FIG. (A) Since topics continue in one sentence, the same block is set in one sentence. (B) If the topic continuation phrase indicates that the topic continues, the current sentence is included in the same block as the immediately preceding sentence. (C) to be recognized to belong to block the current sentence already by the rules described above, it is assumed that the new block starts from the current sentence.

【００４５】句点等で示される１文という単位は、話題
の継続も示しているので、１文内は同一ブロックとす
る。また、「によりますと」、「これに対して」等の話
題継続句によって、直前の文から話題が継続しているこ
とが明示的に示されている場合は、現在の文は直前の文
が属するブロックに含まれるものとする。また、上記以
外の場合は、現在の文から、新しいブロックが開始して
いるものとする。The unit of one sentence indicated by a period etc. also indicates the continuation of a topic, so that one sentence is the same block. If a topic continuation phrase such as “depends on” or “to this” explicitly indicates that the topic continues from the previous sentence, the current sentence is It shall be included in the block to which it belongs. In other cases, it is assumed that a new block has started from the current sentence.

【００４６】基盤展開処理１３０話題確立区間の決定１３１手掛かり句などによって明示的に話題が展開される基盤
展開において、話題が提示・確立される話題確立区間を
同定する。図６に基盤展開における話題確立区間の決定
処理の流れを示す。 Base development processing 130 Determination of topic establishment section 131 In the base development in which topics are explicitly developed by clue phrases or the like, topic establishment sections in which topics are presented and established are identified. FIG. 6 shows a flow of a topic establishment section determination process in the base development.

【００４７】まず、モノローグ・データ開始時と手掛か
り句が含まれているブロックの先頭を話題確立区間の開
始点とする（ステップＳ６０１）。ここで、手掛かり句
には、図６に示すように、「まず」、「第１に」などの
入れ子開始型と、「次に」「第２に」などの話題転換型
と、「最後に」「終わりに」などの入れ子終了型に分類
する。First, the start of the topic establishment section is set at the start of the monolog data and at the beginning of the block containing the clue phrase (step S601). Here, the clue phrases include, as shown in FIG. 6, a nesting start type such as “first” and “first”, a topic conversion type such as “next” and “second”, and a “last” ”And“ at the end ”.

【００４８】話題確立区間の開始点がまとまったら、各
話題確立区間での話題提示型を同定する（ステップＳ６
０２）。ここで、話題提示型の同定処理については後述
する。続いて、話題提示型が漸次型であるかを確認する
（ステップＳ６０３）。話題提示型が、話題を少しずつ
提示する漸次型であれば、話題確立区間終点は話題提示
型の同定処理において認識されているので、それを話題
確立区間の終点とし（ステップＳ６０４）、さらに、元
のブロックのうち、この話題確立区間以外の部分を「疑
似ブロック」として認定する（ステップＳ６０５）。話
題提示型が漸次型ではない場合、すなわち話題を一度に
提示する一括型の場合は、話題確立区間の開始点から第
１文の範囲を話題確立区間として認定する（ステップＳ
６０６）。When the starting points of the topic establishment sections are collected, the topic presentation type in each topic establishment section is identified (step S6).
02). Here, the topic presentation type identification processing will be described later. Next, it is checked whether the topic presentation type is a gradual type (step S603). If the topic presentation type is a gradual type that presents a topic little by little, the topic establishment section end point has been recognized in the topic presentation type identification process, so that it is set as the topic establishment section end point (step S604). Of the original block, a part other than the topic establishment section is recognized as a “pseudo block” (step S605). If the topic presentation type is not the gradual type, that is, if the topic presentation type is a batch type, the range of the first sentence from the start point of the topic establishment section is recognized as the topic establishment section (step S).
606).

【００４９】次に、上記の各ステップＳ６０５，Ｓ６０
６にて求めた話題確立区間を要約区間として認定する
（ステップＳ６０７）。これらの一連の処理をテキスト
全体に対して行い、その後終了とする（ステップＳ６０
８）。Next, each of the above steps S605 and S60
The topic establishment section obtained in 6 is recognized as a summary section (step S607). These series of processes are performed on the entire text, and then the process is terminated (step S60).
8).

【００５０】次に、ステップＳ６０２にて行われる話題
提示型の同定処理について説明する。Next, the topic presentation type identification processing performed in step S602 will be described.

【００５１】図７に示すように、この処理では、話題提
示型が、話題を少しずつ提示する漸次型と、話題を一度
に提示する一括型のどちらであるかを同定するものであ
る。As shown in FIG. 7, this process identifies whether the topic presentation type is a gradual type that presents topics little by little or a batch type that presents topics at once.

【００５２】まず、基盤展開での話題確立区間開始点か
らの１文に対して、図８に示す基盤展開用話題候補優先
順位にしたがって、上位の２つの顕著名詞句を抽出し
（ステップＳ７０１）、この２つのうち、時間的に先に
現われた顕著名詞句をｐ、後に現われた顕著名詞句をｑ
と呼ぶことにする（ステップＳ７０２）。First, for one sentence from the start point of the topic establishment section in the base development, the top two salient noun phrases are extracted in accordance with the base development topic candidate priority shown in FIG. 8 (step S701). Of these two, the prominent noun phrase appearing earlier in time is p, and the prominent noun phrase appearing later in time is q
(Step S702).

【００５３】ここで、上位の２つの顕著名詞句を抽出す
る際の基準として、（ａ）タイトルに含まれている顕著
名詞句、（ｂ）固有名詞を含む顕著名詞句、（ｃ）明示
マーカで示された顕著名詞句のそれぞれは同じ優先順位
として取り扱い、非明示マーカで提示された顕著名詞句
はこれらの下位として取り扱う。Here, as criteria for extracting the top two salient noun phrases, (a) salient noun phrases included in the title, (b) salient noun phrases including proper nouns, and (c) explicit markers Are treated as the same priority, and prominent noun phrases presented with the implicit markers are treated as lower ranks of these.

【００５４】次に、基盤展開用話題候補優先順位がｐ、
ｑともに１であるかを確認する（ステップＳ７０３）。
基盤展開用話題候補優先順位がｐ、ｑともに１である場
合、すなわち、タイトルに含まれているか、固有名詞を
含んでいるか、明示マーカで示されているかのいずれか
である場合は、話題提示型は漸次型であるとし（ステッ
プＳ７０４）、さらに、ｑを含む単文の１つ前の単文の
終わりを話題確立区間の終了点とする（ステップＳ７０
５）。もし、基盤展開用話題候補優先順位がｑ、ｐとも
には１でない場合、すなわち、少なくとも一方が非明示
マーカで示されているか、あるいは基盤展開用話題候補
優先順位を満たすものが１つしかない場合は、話題提示
型は一括型であるとする（ステップＳ７０６）。Next, the topic candidate for base development is p,
It is checked whether both q are 1 (step S703).
When the priority of the topic candidate for base development is 1 for both p and q, that is, when it is included in the title, contains a proper noun, or is indicated by an explicit marker, the topic is presented. The type is assumed to be a gradual type (step S704), and the end of the single sentence immediately before the single sentence including q is set as the end point of the topic establishment section (step S70).
5). If the base development topic candidate priority is not 1 for both q and p, that is, if at least one of them is indicated by an implicit marker, or if there is only one that satisfies the base development topic candidate priority Assume that the topic presentation type is a batch type (step S706).

【００５５】話題語の決定１３２手掛かり句等で明示的に話題が展開される基盤展開での
話題確立区間において、どのような話題が提示されてい
るかを認識する。図９に基盤展開における話題語決定処
理の流れの例を示す。Determining Topic Words 132 Recognizing what topics are presented in a topic establishment section in a basic development where a topic is explicitly developed using clue phrases or the like. FIG. 9 shows an example of the flow of the topic word determination processing in the base development.

【００５６】基盤展開における各話題確立区間につい
て、先に説明し、図８に例を示した「基盤展開用話題候
補優先順位」に基づいて最も優先順位が高いものを抽出
する。もし、抽出された候補が１つであれば、それを話
題として認定する（ステップＳ９０１）。この後、抽出
された候補が１つであるかを確認し（ステップＳ９０
２）、複数の候補が抽出されれば、時間的に最も早く出
現したものを選び（ステップＳ９０３）。これを話題と
して認定する（ステップＳ９０４）。Each topic establishment section in the base development will be described first, and the section having the highest priority is extracted based on “topic candidates for base development” shown in FIG. If there is only one extracted candidate, it is identified as a topic (step S901). Thereafter, it is checked whether the number of extracted candidates is one (step S90).
2) If a plurality of candidates are extracted, the one that appears earliest in time is selected (step S903). This is recognized as a topic (step S904).

【００５７】話題レベル区間の決定１３３基盤展開における話題に対して、その話題レベルと話題
が継続する区間を決定する。ここで、一番外側の話題の
話題レベルを１とし、それより入れ子が１つ増えるごと
に、話題レベルも１ずつ増加するものとする。Determining Topic Level Section 133 For the topic in the base development, the topic level and the section in which the topic continues are determined. Here, it is assumed that the topic level of the outermost topic is 1, and each time the nesting increases by one, the topic level also increases by one.

【００５８】図１０に基盤展開用レベル付け規則の例を
示す。第１に、モノローグ・データの最初の話題の話題
レベルを１とする。第２に、手掛かり句による話題レベ
ルの変化に関しては、直前の手掛かり句から現在の手掛
かり句への遷移パターンによって話題レベルを増減させ
る。FIG. 10 shows an example of a rule for assigning levels for board development. First, the topic level of the first topic of the monolog data is set to 1. Second, regarding the change in topic level due to a clue phrase, the topic level is increased or decreased according to a transition pattern from the immediately preceding clue phrase to the current clue phrase.

【００５９】現在の手掛かり句が入れ子開始型であれ
ば、直前の手掛かり句の種類にかかわらず話題レベルを
１増加させる。また、現在の手掛かり句が話題転換型か
入れ子終了型で、かつ直前の手掛かり句が入れ子開始型
か話題転換型のいずれかであれば、話題レベルは変わら
ないものとする。また、現在の手掛かり句が話題転換型
か入れ子終了型で、かつ直前の手掛かり句が入れ子終了
型であれば、話題レベルを１減少させる。If the current clue phrase is the nesting start type, the topic level is increased by 1 regardless of the type of the clue phrase immediately before. Also, if the current clue phrase is the topic conversion type or the nesting end type, and the immediately preceding clue phrase is the nesting start type or the topic conversion type, the topic level does not change. If the current clue phrase is the topic conversion type or the nested type and the immediately preceding clue phrase is the nested type, the topic level is decreased by one.

【００６０】図１１に基盤展開における話題継続区間決
定処理の規則の例を示す。現在処理対象としている話題
をＡ、その話題レベルをｍとすると、Ａの属する話題確
立区間の先頭を話題継続区間の開始点とし、話題レベル
がｍ以下の話題の開始直前とモノローグ・データ終了の
２つのうち時間的に早く出現した方を話題継続区間の終
了点とする。FIG. 11 shows an example of a rule of topic continuation section determination processing in base development. Assuming that the topic currently being processed is A and its topic level is m, the beginning of the topic establishment section to which A belongs is the start point of the topic continuation section, and immediately before the start of the topic whose topic level is m or less, and the end of the monolog data. The one that appears earlier in time is the end point of the topic continuation section.

【００６１】意味的展開処理１４０話題確立区間の決定１４１基盤展開の中で話題が展開される意味的展開において、
話題が提示・確立される話題確立区間を同定する。図１
２に意味的展開における話題確立区間の決定処理の流れ
の例を示す。 Semantic development processing 140 Determination of topic establishment section 141 In semantic development in which topics are developed in base development,
A topic establishment section in which a topic is presented and established is identified. FIG.
FIG. 2 shows an example of the flow of the topic establishment section determination process in the semantic development.

【００６２】まず、基盤展開での話題確立区間を含まな
い各ブロック、あるいは基盤展開での話題確立区間を含
む各ブロック内の疑似ブロックに対して、ブロックまた
は疑似ブロックに含まれる単文数が４以上であるかを確
認し（ステップＳ１２０１）、ブロック内での第１文が
２単文以上含むかを確認する（ステップＳ１２０２）。
単文数が４以上であり、かつ、ブロックまたは疑似ブロ
ック内での第１文が２単文以上含めば、そのブロックま
たは疑似ブロックから話題確立区間の候補を抽出する
（ステップＳ１２０３）。First, a topic establishment section in the base development is not included.
Including the topic establishment section in each block or foundation expansion, have
For each pseudo block in each block, it is checked whether the number of single sentences included in the block or the pseudo block is four or more (step S1201), and it is checked whether the first sentence in the block includes two or more single sentences. (Step S1202).
If the number of single sentences is 4 or more and the first sentence in the block or pseudo block includes two or more single sentences, a topic establishment section candidate is extracted from the block or pseudo block (step S1203).

【００６３】ここで、この話題確立区間候補は、そのブ
ロックまたは疑似ブロックの先頭を開始点とし、そのブ
ロックまたは疑似ブロック内の第１文の終わりと、モノ
ローグの最後から４単文目の終わりの２つのうち、時間
的に早いほうを終了点とする。Here, this topic establishment section candidate starts from the head of the block or the pseudo block, and ends at the end of the first sentence in the block or the pseudo block and at the end of the fourth single sentence from the end of the monologue. The earlier of the two is the end point.

【００６４】もし、ブロックまたは疑似ブロックに含ま
れる単文数が４未満であるか、あるいはブロックまたは
疑似ブロック内での第１文が１単文しか含まなければ、
この、ブロックまたは疑似ブロックには話題はないもの
とする（ステップＳ１２０６）。ただし、この「４単
文」や「１単文」という値は、モノローグ・データの性
質に応じて人間があらかじめ適切な値を与えるものとす
る。If the number of simple sentences included in the block or the pseudo block is less than 4 , or if the first sentence in the block or the pseudo block includes only one simple sentence,
It is assumed that there is no topic in this block or pseudo block (step S1206). However, it is assumed that a human gives an appropriate value in advance to the value of "4 simple sentences" or "1 simple sentence" according to the property of the monolog data.

【００６５】話題確立区間の候補が存在した場合、その
候補内に、意味的展開用話題候補優先順位が含まれる顕
著名詞句があるかを調べる（ステップＳ１２０４）。If there is a topic establishment section candidate, it is checked whether there is a prominent noun phrase that includes the topic candidate priority for semantic development (step S1204).

【００６６】ここで、意味的展開用話題候補優先順位の
例を図１３に示す。この例では、優先順位は２レベルに
別れる。優先順位の高い方の候補は、（１）同じ文に「尋ねる」、「問う」などの疑問表現を
伴う顕著名詞句である。（２）優先順位の低い方の候補は、（ａ）タイトルか直
前の要約区間に含まれている顕著名詞句、（ｂ）固有名
詞を含む顕著名詞句、（ｃ）明示マーカで示された顕著
名詞句であり、この３つは同じ優先順位である。FIG. 13 shows an example of the priority of the topic candidates for semantic development. In this example, the priority is divided into two levels. The candidates with the higher priority are (1) salient noun phrases accompanied by question expressions such as “ask” and “ask” in the same sentence. (2) The candidate with the lower priority is indicated by (a) a prominent noun phrase included in the title or the immediately preceding summary section, (b) a prominent noun phrase including a proper noun, and (c) an explicit marker. Prominent noun phrases, which have the same priority.

【００６７】もし、話題確立区間候補の中に、意味的展
開用話題候補優先順位が１つ以上含まれていれば、この
話題確立区間候補を話題確立区間として認定する（ステ
ップＳ１２０５）。もし、意味的展開用話題候補優先順
位が１つも含まれていなければ、この話題確立区間候補
を棄却し、そのブロックには話題確立区間は存在しない
ものとする（ステップＳ１２０６）。If the topic establishment section candidate includes one or more semantic development topic candidate priorities, the topic establishment section candidate is identified as a topic establishment section (step S1205). If no topic candidate priority for semantic development is included, the topic establishment section candidate is rejected, and it is assumed that no topic establishment section exists in the block (step S1206).

【００６８】上記の各動作による処理をテキスト全体に
施して終了する（ステップＳ１２０７）。The processing according to each of the above operations is performed on the entire text, and the processing is terminated (step S1207).

【００６９】話題語の決定１４２基盤展開の中で話題が展開される意味的展開での話題確
立区間において、どのような話題が提示されているかを
同定する。図１４に意味的展開における話題語決定処理
の流れの例を示す。Determination of Topic Word 142 In the topic establishment section in the semantic development in which the topic is developed in the basic development, what kind of topic is presented is identified. FIG. 14 shows an example of the flow of topic word determination processing in semantic development.

【００７０】まず、意味的展開における各話題確立区間
について、図１３に例を示した「意味的展開用話題候補
優先順位」に基づいて最も優先順位が高いものを抽出す
る（ステップＳ１４０１）。次に、抽出された候補が１
つであるかを確認し（ステップＳ１４０２）、もし、抽
出された候補が１つであれば、それを話題として認定す
る（ステップＳ１４０４）。もし、複数の候補が抽出さ
れていれば、最も時間的に早く出現した候補を選び（ス
テップＳ１４０３）、話題として認定する（ステップＳ
１４０４）。First, for each topic establishment section in the semantic development, the section having the highest priority is extracted based on “topic candidates for semantic development priority” shown in FIG. 13 (step S1401). Next, the extracted candidate is 1
(Step S1402), and if there is only one extracted candidate, it is identified as a topic (Step S1404). If a plurality of candidates have been extracted, the candidate that appears earliest in time is selected (step S1403) and identified as a topic (step S140).
1404).

【００７１】話題レベル区間の決定１４３意味的展開における話題に対して、その話題レベルと話
題が継続する区間を決定する。図１５に意味的展開にお
ける話題レベル区間決定処理の流れの例を示す。ここ
で、現在処理対象としている話題をＡとする。まず、話
題Ａの話題レベルを１とし（ステップＳ１５０１）次
に、Ａの属する話題確立区間の先頭をＡの継続区間の開
始点とする。また、意味的展開で話題Ａの次に現われる
話題の直前と、基盤展開での話題確立区間の開始点の直
前と、モノローグ・データ終了時の３つのうち、時間的
に最も早く起きたものを話題Ａの継続区間の終了点とす
る（ステップＳ１５０２）。Determination of topic level section 143 For the topic in the semantic development, the topic level and the section where the topic continues are determined. FIG. 15 shows an example of the flow of topic level section determination processing in semantic development. Here, the topic currently being processed is A. First, the topic level of topic A is set to 1 (step S1501). Next, the head of the topic establishment section to which A belongs is set as the start point of the continuation section of A. In addition, immediately before the topic that appears next to topic A in the semantic development, immediately before the start point of the topic establishment section in the base development, and at the end of the monolog data, The end point of the continuation section of the topic A is set (step S1502).

【００７２】基盤展開と意味的展開の統合処理１５０これまで求めた基盤展開における話題構造と意味的展開
における話題構造を統合する。図１６に基盤展開と意味
的展開の統合処理の流れの例を示す。 Integration processing of base development and semantic development 150 The topic structure in the base development and the topic structure in the semantic development determined so far are integrated. FIG. 16 shows an example of the flow of the integration processing of the base development and the semantic development.

【００７３】まず、話題が基盤展開の話題であるかの確
認を行う（ステップＳ１６０１）。確認の結果、基盤展
開における話題に対しては、そのまま話題構造を統合し
（ステップＳ１６０３，Ｓ１６０４）、意味的展開にお
ける話題に対しては、話題レベルの補正を行ない（ステ
ップＳ１６０２）、その後で統合を行なう。First, it is confirmed whether the topic is a topic of the base development (step S1601). As a result of the confirmation, for topics in the base development, the topic structure is integrated as it is (steps S1603 and S1604), and for topics in the semantic development, the topic level is corrected (step S1602). Perform

【００７４】ステップＳ１６０２で行われる意味的展開
の話題レベルの補正は、元の話題レベルに、その時点で
の基盤展開の話題レベルの最大値を加えることにより行
なう。統合後に得られた話題構造が、最終的な話題構造
である。The topic level correction of the semantic development performed in step S1602 is performed by adding the maximum value of the topic level of the basic development at that time to the original topic level. The topic structure obtained after the integration is the final topic structure.

【００７５】モノローグ・データ例を用いた説明次に、具体的なモノローグ・データ例を用いて、本発明
を適用した場合の、話題構造認識方法を詳細にする。Description Using Monolog Data Example Next, a topic structure recognition method when the present invention is applied will be described in detail using a specific monolog data example.

【００７６】話題構造認識前処理１２０単文区切り処理１２２の具体例図１７は本発明の一実施例のモノローグ・データ例を示
す図である。このモノローグ・データ例を、図３に示し
た単文区切り規則例等を用いて単文に分割した例を図１
８に示す。[0076] Examples 17 topic structure recognition preprocessing 120 sentence separator process 122 is a diagram showing a monologue data example of an embodiment of the present invention. FIG. 1 shows an example in which the monolog data example is divided into simple sentences using the simple sentence separation rule example shown in FIG.
FIG.

【００７７】顕著名詞句の抽出１２３の具体例分割した各単文に対して、顕著名詞句を抽出する。図１
８の単文分割結果に対して、図４に示すような顕著名詞
句マーカを用いて顕著名詞句を抽出した結果を図１９に
示す。ここで、図１９においては、顕著名詞句をアンダ
ーラインで示し、また、説明のための単文番号を（１−
１），（１−２），…のように示す。Specific Example of Extraction of Salient Noun Phrases 123 A salient noun phrase is extracted from each of the divided simple sentences. FIG.
FIG. 19 shows the result of extracting salient noun phrases from the eight single sentence segmentation results using salient noun phrase markers as shown in FIG. Here, in FIG. 19, salient noun phrases are indicated by underlining, and a single sentence number for explanation is (1-
1), (1-2),...

【００７８】単文（１−１）では明示マーカ「は」によ
って顕著名詞句「会社Ａの通信サービス」が抽出され
る。また、単文（１−２）では、非明示マーカ「を」が
存在するが、マークされている語が具体的な意味を持た
ないダイクシス表現「それら」であるので、この単文か
らは顕著名詞句は抽出されない。また、単文（３−１）
には、明示マーカ「は」によって示される「サービス
Ａ」と、非明示マーカ「に」によって示される「競合他
社」が含まれるが、図４の優先順位によれば、明示マー
カ「は」の方が優先順位が高いので、「サービスＡ」を
顕著名詞句として選択する。他の単文でも同様にして、
顕著名詞句マーカによって顕著名詞句を抽出する。In the simple sentence (1-1), the prominent noun phrase “communication service of company A” is extracted by the explicit marker “ha”. In the simple sentence (1-2), the implicit marker "wo" exists, but the marked word is a dichsis expression "them" having no specific meaning. Is not extracted. In addition, simple sentence (3-1)
Include “service A” indicated by the explicit marker “ha” and “competitor” indicated by the non-explicit marker “ni”. According to the priority shown in FIG. Since the priority is higher, “service A” is selected as a salient noun phrase. Similarly in other simple sentences,
Salient noun phrases are extracted by salient noun phrase markers.

【００７９】ブロック認識１２４の具体例モノローグ・データにおけるブロックを認識する。図１
７に示したモノローグ・データ例に対するブロックの認
識例を図２０に示す。図５に示したブロック認識規則例
（ａ）により、１文内は同一ブロックに属するので、図
２０の（１）から（７）までの各文の文内は同一ブロッ
クに属する。さらに、図５のブロック認識規則例（ｂ）
に示したように、文（６）の話題継続句「これは」によ
り、文（５）と（６）同一ブロックに属する。Specific Example of Block Recognition 124 A block in monolog data is recognized. FIG.
FIG. 20 shows an example of block recognition with respect to the monolog data example shown in FIG. According to the block recognition rule example (a) shown in FIG. 5, one sentence belongs to the same block, so the sentences of (1) to (7) in FIG. 20 belong to the same block. Further, an example of the block recognition rule (b) in FIG.
As shown in (1), sentences (5) and (6) belong to the same block by the topic continuation phrase "this" of sentence (6).

【００８０】基盤展開処理１３０基盤展開処理１３０における話題確立区間決定１３１の
具体例基盤展開における話題確立区間を同定する。モノローグ
・データ例における基盤展開の話題確立区間の同定結果
を図２１に示す。Base Development Process 130 Specific Example of Topic Establishment Section Determination 131 in the Base Development Process 130 A topic establishment section in the base development is identified. FIG. 21 shows the identification result of the topic establishment section of the base development in the example of monolog data.

【００８１】第１に、図６に示した話題確立区間の決定
処理に従って、モノローグ開始時と手掛かり句を含むブ
ロックの先頭を話題確立区間の開始点とする。図２１で
はａ，ｂ，ｃの各ブロックの先頭を話題確立区間の開始
点とする。First, according to the topic establishment section determination process shown in FIG. 6, the start of the block including the clue phrase at the start of the monologue and the clue phrase is set. In FIG. 21, the head of each block of a, b, and c is the start point of the topic establishment section.

【００８２】第２に、各話題確立区間に対して話題提示
型の同定処理を行う。図７の話題提示型同定処理例に示
すように、各話題確立区間の開始点からの第１文に含ま
れる基盤展開用話題候補優先の上位２つの抽出を行う
と、ブロックａから始まる話題確立区間からは「会社Ａ
の通信サービス」の１つだけが検出されるので、この話
題確立区間は一括型である。また、ブロックｂから始ま
る話題確立区間からは「様々な新規サービス」と「サー
ビスＡやサービスＢ、サービスＣなど」が検出される
が、「様々な新規サービス」は優先順位が（２）である
ので、この話題確立区間も一括型である。また、ブロッ
クｅから始まる話題確立区間からは「従来からのサービ
ス」だけが検出されるので、この話題確立区間も一括型
である。Second, topic presentation type identification processing is performed for each topic establishment section. As shown in the example of topic presentation type identification processing in FIG. 7, when the top two candidates for the base development topic candidates included in the first sentence from the start point of each topic establishment section are extracted, the topic establishment starting from block a is performed. From section, "Company A
Since only one communication service is detected, the topic establishment section is a collective type. Also, “various new services” and “service A, service B, service C, etc.” are detected from the topic establishment section starting from block b, but “various new services” have priority (2). Therefore, this topic establishment section is also a collective type. Further, since only the “conventional service” is detected from the topic establishment section starting from the block e, the topic establishment section is also a collective type.

【００８３】第３に、モノローグ例におけるすべての話
題確立区間での話題提示型は一括型であったので、図６
の話題確立区間決定処理例にしたがって、話題確立区間
の開始点から第１文の範囲を話題確立区間とする。その
結果を図２１に示した。また、上記処理により認識した
話題確立区間を要約区間としても認定する。Third, since the topic presentation type in all the topic establishment sections in the monologue example is the collective type, FIG.
The range of the first sentence from the start point of the topic establishment section is set as the topic establishment section according to the topic establishment section determination processing example. The results are shown in FIG. The topic establishment section recognized by the above processing is also recognized as a summary section.

【００８４】基盤展開処理１３０における話題語決定１
３２の具体例基盤展開の話題確立区間における話題語の決定を行な
う。基盤展開には図２１に示すように３つの話題確立区
間が認定された。ブロックａの「会社Ａの通信サービ
ス」は、図８に示した基盤展開用話題候補優先順位を満
たすので、図１１の基盤展開における話題語決定処理に
したがって、話題語候補として抽出する。ブロックａに
含まれる話題確立区間から抽出される候補はこれだけで
あるので、「会社Ａの通信サービス」をブロックａに含
まれる話題確立区間での話題語として決定する。同様
に、ブロックｂに含まれる話題確立区間での話題語とし
て「様々な新規サービス」を、ブロックｅに含まれる話
題確立区間での話題語として「従来からのサービス」を
抽出する。Topic word determination 1 in base development processing 130
32 Specific Examples A topic word in a topic establishment section of the base development is determined. As shown in FIG. 21, three topic establishment sections were recognized in the base development. The “communication service of company A” in block a satisfies the topic candidate priority for platform development shown in FIG. 8, and is extracted as a topic word candidate according to the topic word determination process in the platform development of FIG. 11. Since this is the only candidate extracted from the topic establishment section included in block a, “communication service of company A” is determined as the topic word in the topic establishment section included in block a. Similarly, “various new services” are extracted as topic words in the topic establishment section included in the block b, and “conventional services” are extracted as topic words in the topic establishment section included in the block e.

【００８５】基盤展開における話題レベル区間決定１３
３の具体例基盤展開における話題語の話題レベル区間を決定する。
図１０の規則にしたがって、モノローグの最初の話題
「会社Ａの通信サービス」の話題レベルを１とする。次
の話題「様々な新規サービス」が提示されているブロッ
クには「まず」という入れ子開始型の手掛かり句が提示
されているので、図１０の規則にしたがって話題レベル
を１増加させ、２とする。Topic level section determination 13 in base development
Specific Example 3 The topic level section of the topic word in the base development is determined.
According to the rule of FIG. 10, the topic level of the first topic of the monologue “communication service of company A” is set to 1. Since the nesting-type clue phrase “first” is presented in the block in which the next topic “various new services” is presented, the topic level is increased by 1 according to the rule of FIG. .

【００８６】次の話題「従来からのサービス」が提示さ
れているブロックには「次に」という話題転換開始型の
手掛かり句が提示されており、その前に提示されている
手掛かり句は「まず」という入れ子開始型であるので、
図１０に示した規則にしたがって話題レベルはそれ以前
と同じ２とする。The block in which the next topic "conventional service" is presented has a topic conversion start type clue phrase "next", and the clue phrase presented before it is "first". Nested start type,
According to the rule shown in FIG. 10, the topic level is set to 2, which is the same as before.

【００８７】また、各話題の継続区間は、図１１に示し
た規則を用いて決定する。「会社Ａの通信サービス」の
継続区間はモノローグ・データ開始からモノローグ・デ
ータ終了まで、「様々な新規サービス」の継続区間は図
２１のブロックｂ，ｃ，ｄで、「従来からのサービス」
の継続区間は図２２のブロックｅ，ｆである。これらの
話題レベル区間の認識結果を図２２に示す。The continuation section of each topic is determined by using the rule shown in FIG. The continuation section of “Communication service of company A” is from the start of monologue data to the end of monologue data, and the continuation section of “various new services” is blocks b, c, and d in FIG.
Are the blocks e and f in FIG. FIG. 22 shows the recognition results of these topic level sections.

【００８８】意味的展開処理１４０意味的展開処理１４０における話題確立区間決定１４１
の具体例意味的展開における話題確立区間を同定する。図１２の
話題確立区間決定処理の例に示したように、基盤展開で
の話題確立区間を含まない各ブロック、あるいは基盤展
開での話題確立区間を含む各ブロック内の疑似ブロック
だけをここでは対象とするので、図２２に示すように、
ブロックｃ，ｄ，ｆを処理対象とする。[0088] topic established interval determined in semantic expansion processing 140 semantic expansion processing 140 141
Specific example of topic Identify the topic establishment section in the semantic expansion. As shown in the example of the topic establishment section determination process in FIG. 12, here, only the blocks that do not include the topic establishment section in the base development or the pseudo blocks in the blocks that include the topic establishment section in the base development are targeted. Therefore, as shown in FIG.
Blocks c, d, and f are processing targets.

【００８９】ブロックｄに関しては、図１２の話題確立
区間決定処理における「ブロックに含まれる単文数が４
以上」という条件を満たさない。したがって、ブロック
ｄには話題確立区間は含まれないとみなす。As for the block d, “the number of single sentences included in the block is 4
Or more. " Therefore, it is assumed that the topic establishment section is not included in the block d.

【００９０】ブロックｃとｆに関しては、図１２の話題
確立区間決定処理における「ブロックに含まれる単文数
が４以上」と「ブロック内での第１文が２単文以上含
む」という条件をともに満たす。したがって、両ブロッ
クとも、ブロック内の第１文の終わりと、モノローグの
最後から４単文目の終わりのうち、時間的に最初に現わ
れた方までの区間を話題確立区間候補とする。For blocks c and f, both the conditions of “the number of single sentences included in the block is 4 or more” and “the first sentence in the block includes two or more single sentences” are satisfied in the topic establishment section determination processing of FIG. . Therefore, in both blocks, the section from the end of the first sentence in the block and the end of the fourth single sentence from the end of the monologue to the one that appears first temporally is set as a topic establishment section candidate.

【００９１】これらの話題確立区間候補には、ともに図
１３に示した意味的展開用話題候補優先順位」に含まれ
る候補が１つ以上ある。したがって、これらの話題確立
区間候補を話題確立区間として認定する。Each of these topic establishment section candidates has at least one candidate included in the “semantic development topic candidate priority” shown in FIG. Therefore, these topic establishment section candidates are recognized as topic establishment sections.

【００９２】意味的展開処理１４０における話題語決定
１４２の具体例意味的展開の話題確立区間における話題語の決定を行な
う。意味的展開には図２２に示すように２つの話題確立
区間が認定された。ブロックｃの「サービスＡ」は、図
１３に示した意味的展開用話題候補優先順位を満たすの
で、図１４の意味的展開における話題語決定処理にした
がって、話題語候補として抽出する。図１３に示した意
味的展開用話題候補優先順位において、「サービスＡ」
よりも優先順位の高いもの、すなわち「疑問表現を伴う
顕著名詞句」がブロックｃの話題確立区間に含まれてい
ないと仮定する。すると、たとえこの話題確立区間にお
いて「サービスＡ」以外の候補が検出されても、時間的
に最も早く現われたものを選択することになるので、最
初に出現した「サービスＡ」を話題として認定する。同
様に、ブロックｆでは、「番号案内の有料化」が話題と
して認定される。Specific Example of Topic Word Determination 142 in Semantic Expansion Processing 140 A topic word is determined in a topic establishment section of the semantic expansion. In the semantic development, two topic establishment sections were identified as shown in FIG. Since "service A" in block c satisfies the topic candidate priority for semantic development shown in FIG. 13, it is extracted as a topic word candidate according to the topic word determination process in the semantic development of FIG. In the topic candidate priority for semantic development shown in FIG.
It is assumed that a higher priority than that, that is, “prominent noun phrase with question expression” is not included in the topic establishment section of block c. Then, even if a candidate other than “Service A” is detected in this topic establishment section, the one that appears first earliest in time will be selected. Therefore, the first appearing “Service A” is recognized as a topic. . Similarly, in block f, “payment of directory guidance” is recognized as a topic.

【００９３】意味的展開処理１４０における話題レベル
区間決定１４３の具体例意味的展開における話題語の話題レベル区間を決定す
る。図１５の規則にしたがって、全ての話題語の話題レ
ベルを仮に１とする。また、各話題の継続区間も図１６
の規則にしたがって決定する。「サービスＡ」の継続区
間は、図２２のブロックｃ，ｄとし、「番号案内の有料
化」の継続区間は図２２のブロックｆとする。これらの
話題レベル区間の認識結果を図２４に示す。Specific Example of Topic Level Section Determination 143 in Semantic Expansion Processing 140 The topic level section of the topic word in the semantic expansion is determined. According to the rule of FIG. 15, the topic level of all topic words is temporarily set to 1. In addition, the continuation section of each topic is shown in FIG.
Determined according to the rules. The continuation section of "service A" is blocks c and d in FIG. 22, and the continuation section of "payment of directory assistance" is block f in FIG. FIG. 24 shows the recognition results of these topic level sections.

【００９４】基盤展開と意味的展開の統合処理１５０の
具体例基盤展開と意味的展開における話題構造を統合する。図
１６に示した基盤展開と意味的展開の統合規則にしたが
って、意味的展開での話題の話題レベルを補正する。統
合結果を図２５に示す。これが、図１７のモノローグ・
データ例に対して、本発明の話題構造認識方法を適用し
てえられた話題構造である。Specific Example of Integration Process 150 of Base Expansion and Semantic Expansion The topic structures in the base expansion and the semantic expansion are integrated. The topic level of the topic in the semantic development is corrected according to the integration rule of the base development and the semantic development shown in FIG. FIG. 25 shows the integration result. This is the monologue
This is a topic structure obtained by applying the topic structure recognition method of the present invention to a data example.

【００９５】実験データ本発明の話題構造認識方法を実際のモノローグ・データ
に適用した評価実験の結果を示す。評価としては、人間
が認識した話題構造と計算機が認識した話題構造を比較
することにより、再現率と適合率を求める方法を採用し
た。ここで、再現率とは人間が認識した話題構造のう
ち、どれだけが計算機によっても認識されているかを示
す尺度であり、適合率とは計算機が認識した話題構造の
うち、どれだけが人間によっても認識されているかを示
す尺度である。もし、人間と計算機がそれぞれ認識した
話題構造が一致すれば、再現率、適合率とも１００％と
なる。Experimental Data The results of an evaluation experiment in which the topic structure recognition method of the present invention is applied to actual monolog data are shown. As the evaluation, a method of calculating the recall and the precision by comparing the topic structure recognized by humans with the topic structure recognized by the computer was adopted. Here, the recall is a measure of how much of the topic structure recognized by humans is also recognized by the computer, and the precision is how much of the topic structure recognized by the computer is human. Is also a measure of whether it is recognized. If the topic structures recognized by the human and the computer match, both the recall and the precision are 100%.

【００９６】実験に用いたモノローグ・データは、全部
でテレビニュース原稿１０件であり、単文数にすると３
００である。評価を行なった結果、再現率が７４．０％
で、適合率が６４．０％であった。The monologue data used in the experiment was a total of 10 TV news manuscripts, which could be 3 words per sentence.
00. As a result of evaluation, the recall was 74.0%.
And the precision was 64.0%.

【００９７】[0097]

【発明の効果】上述のように本発明により、特定のドメ
インに依存した知識を用いることなく、話題とその構造
を認識することができる。As described above, according to the present invention, a topic and its structure can be recognized without using knowledge depending on a specific domain.

【００９８】このような、認識された話題と話題構造を
利用者に提示することにより、利用者によるモノローグ
・データ内容の大まかな把握を支援することが可能とな
る。また、話題構造を目次として使用することも可能で
ある。By presenting such a recognized topic and topic structure to the user, it is possible to assist the user in roughly grasping the contents of the monolog data. The topic structure can also be used as a table of contents.

[Brief description of the drawings]

【図１】本発明の一実施例のモノローグ・データ話題構
造認識のための処理を示すフローチャートである。FIG. 1 is a flowchart showing a process for recognizing a monologue data topic structure according to an embodiment of the present invention.

【図２】本発明の一実施例のモノローグ・データ話題構
造認識装置のブロック図である。FIG. 2 is a block diagram of a monologue data topic structure recognition device according to an embodiment of the present invention.

【図３】本発明の一実施例に用いられる単文区切り規則
の例を示す図である。FIG. 3 is a diagram showing an example of a single sentence separation rule used in one embodiment of the present invention.

【図４】本発明の一実施例に用いられる顕著名詞句のマ
ーカ優先順位規則の例を示す図である。FIG. 4 is a diagram showing an example of a marker priority order rule for salient noun phrases used in one embodiment of the present invention.

【図５】本発明の一実施例に用いられるブロック認識規
則の例を示す図である。FIG. 5 is a diagram showing an example of a block recognition rule used in one embodiment of the present invention.

【図６】本発明の一実施例の基盤展開における話題確立
区間の決定処理を示すフローチャートである。FIG. 6 is a flowchart showing a process of determining a topic establishment section in a base development according to an embodiment of the present invention.

【図７】本発明の一実施例の話題提示型の同定処理を示
すフローチャートである。FIG. 7 is a flowchart illustrating a topic presentation type identification process according to an embodiment of the present invention.

【図８】本発明の一実施例の基盤展開用話題候補優先順
位を示す図である。FIG. 8 is a diagram showing the priority order of topic candidates for base development according to one embodiment of the present invention.

【図９】本発明の一実施例の基盤展開における話題語決
定処理を示すフローチャートである。FIG. 9 is a flowchart illustrating a topic word determination process in a base development according to an embodiment of the present invention.

【図１０】本発明の一実施例に用いられる基盤展開用レ
ベル付け規則の例を示す図である。FIG. 10 is a diagram showing an example of a leveling rule for board development used in one embodiment of the present invention.

【図１１】本発明の一実施例の意味的展開における話題
確立区間の決定処理を示すフローチャートである。FIG. 11 is a flowchart illustrating a topic establishment section determination process in a semantic development according to an embodiment of the present invention.

【図１２】本発明の一実施例の意味的展開用話題候補優
先順位を示す図である。FIG. 12 is a diagram showing priority of topic candidates for semantic development according to one embodiment of the present invention.

【図１３】本発明の一実施例の意味展開における話題語
決定処理を示すフローチャートである。FIG. 13 is a flowchart showing a topic word determination process in the semantic expansion of one embodiment of the present invention.

【図１４】本発明の一実施例の基盤展開における話題語
継続区間決定処理を示すフローチャートである。FIG. 14 is a flowchart showing a topic word continuation section determination process in base development according to an embodiment of the present invention.

【図１５】本発明の一実施例の意味的展開での仮の話題
レベル区間決定処理を示すフローチャートである。FIG. 15 is a flowchart showing a provisional topic level section determination process in the semantic development of one embodiment of the present invention.

【図１６】本発明の一実施例の基盤展開と意味的展開の
統合処理を示すフローチャートである。FIG. 16 is a flowchart showing an integration process of base development and semantic development according to one embodiment of the present invention.

【図１７】本発明の一実施例のモノローグ・データ例を
示す図である。FIG. 17 is a diagram showing an example of monolog data according to an embodiment of the present invention.

【図１８】本発明の一実施例のモノローグ・データ例に
対する単文区切り結果の例を示す図である。FIG. 18 is a diagram illustrating an example of a single sentence segmentation result with respect to an example of monolog data according to an embodiment of the present invention.

【図１９】本発明の一実施例のモノローグ・データ例に
おける顕著名詞句の例を示す図である。FIG. 19 is a diagram showing an example of salient noun phrases in an example of monolog data according to an embodiment of the present invention.

【図２０】本発明の一実施例のモノローグ・データ例に
おけるブロックを示す図である。FIG. 20 is a diagram showing blocks in an example of monolog data according to an embodiment of the present invention.

【図２１】本発明の一実施例のモノローグ・データ例で
の基盤展開における話題確立区間を示す図である。FIG. 21 is a diagram showing a topic establishment section in a base development in a monolog data example according to an embodiment of the present invention.

【図２２】本発明の一実施例のモノローグ・データ例で
の基盤展開における話題構造を示す図である。FIG. 22 is a diagram illustrating a topic structure in a base development in a monolog data example according to an embodiment of the present invention.

【図２３】本発明の一実施例のモノローグ・データ例で
の意味的展開における話題確立区間を示す図である。FIG. 23 is a diagram illustrating a topic establishment section in a semantic expansion in a monolog data example according to an embodiment of the present invention.

【図２４】本発明の一実施例のモノローグ・データ例で
の意味的展開における話題構造をを示す図である。FIG. 24 is a diagram showing a topic structure in a semantic expansion in an example of monolog data according to an embodiment of the present invention.

【図２５】本発明の一実施例のモノローグ・データ例に
おける話題構造を示す図である。FIG. 25 is a diagram showing a topic structure in a monolog data example according to an embodiment of the present invention.

[Explanation of symbols]

１１０モノローグ・データ１２０話題構造認識前処理１２１形態素解析１２２単文区切り処理１２３顕著名詞句抽出１２４ブロック認識１３０基盤展開処理１３１，１４１話題確立区間の決定１３２，１４２話題語の決定１３３，１４３話題レベル区間の決定１４１意味的展開処理１５０基盤展開と意味的展開の統合処理１６０話題構造２０１データ入力部２０２前処理記憶部２０３話題構造認識前処理部２０４前処理用辞書２０５基盤展開処理規則管理部２０６基盤展開処理規則２０７基盤展開処理記憶部２０８意味的展開処理規則管理部２０９意味的展開処理規則２１０意味的展開処理記憶部２１１統合処理記憶部２１２統合処理部２１３統合処理規則管理部２１４統合処理規則２１５表示部２３０基盤展開処理部２３１，２４１話題区間確立決定処理部２３２，２４２話題語決定処理部２３３，２４３話題レベル区間決定処理部２４０意味的展開処理部 110 Monologue Data 120 Topic Structure Recognition Preprocessing 121 Morphological Analysis 122 Simple Sentence Separation Processing 123 Salient Noun Phrase Extraction 124 Block Recognition 130 Base Expansion Processing 131,141 Determination of Topic Establishment Section 132,142 Determination of Topic Word 133,143 Topic Level Section Determination 141 Semantic expansion processing 150 Integration processing of base expansion and semantic expansion 160 Topic structure 201 Data input unit 202 Preprocessing storage unit 203 Topic structure recognition preprocessing unit 204 Preprocessing dictionary 205 Base expansion processing rule management unit 206 Base Expansion processing rule 207 Base expansion processing storage unit 208 Semantic expansion processing rule management unit 209 Semantic expansion processing rule 210 Semantic expansion processing storage unit 211 Integration processing storage unit 212 Integration processing unit 213 Integration processing rule management unit 214 Integration processing rule 215 Display 2 0 base expansion processing unit 231, 241 topic section establishment determination processing unit 232 and 242 the topic word determination processing unit 233, 243 the topic level section determination processing unit 240 semantic development processing unit

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−306768（ＪＰ，Ａ) 特開平４−332084（ＪＰ，Ａ) 特開平５−266072（ＪＰ，Ａ) 竹下敦、“４Ｈ−３対話構造を用いた話題の同定”、情報処理学会第43回（平成３年後期）全国大会講演論文集（３）、平成３年９月24日、ｐ．３− 229〜３−230 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/27 G06F 17/30 ────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-4-306768 (JP, A) JP-A-4-332084 (JP, A) JP-A-5-266607 (JP, A) Atsushi Takeshita, “4H -3 Identification of Topics Using Dialogue Structure ”, Proc. Of the 43rd IPSJ (Second 1991) National Convention (3), September 24, 1991, p. 3-229 to 3-230 (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 17/27 G06F 17/30

Claims

(57) [Claims]

1. A topic structure recognition preprocessing dictionary storage unit, a topic structure recognition preprocessing unit, a base expansion processing rule storage unit, a base expansion processing rule storage unit, a semantic expansion processing rule storage unit, a semantic expansion processing unit, and an integrated process. A method for recognizing a topic structure for monolog data using a topic structure recognizing device having a rule storage means and an integrated processing means, comprising: A topic consisting of morphological analysis processing, simple sentence separation processing, salient noun phrase extraction processing, and block recognition processing for monolog data input using the topic structure recognition preprocessing dictionary stored in the processing dictionary storage means. Performing a pre-recognition process, and then using the pre-expansion rule stored in the pre-expansion process rule storage means in the pre-expansion process means to perform the topic structure pre-recognition process. From the results of the above, the process of identifying a topic establishment section in which topics are presented and established and the process of identifying topic words in a topic establishment section are described with respect to a basic development in which the development of topics in the monolog data is explicitly indicated by clue phrases or the like. And the base word expansion processing for sequentially performing the nesting level of the topic word and the identification processing of the continuation section. Then, the semantic expansion processing means stores the meaning stored in the semantic expansion processing rule storage means. From the results of the topic structure pre-recognition processing and the results of each processing of the base development processing using the dynamic development processing rules, the topics for which the topics are presented / established about the semantic development in which the topics are developed in the base development A semantic expansion process is performed in which the identification process of the established section, the identification process of the topic word in the topic establishment interval, and the identification process of the nesting level of the topic word and the continuation section are sequentially performed. After that, the integration processing means performs integration processing from the result of the base expansion processing and the result of the semantic expansion processing using the integration processing rules stored in the integration processing rule storage means, thereby performing a monologue processing. A language that recognizes the topic structure of the entire data and extracts salient noun phrases for each single sentence in the monologue data as a result of simple sentence demarcation processing, as a prominent noun phrase extraction process in topic structure recognition preprocessing for monologue data By classifying expressions into explicit markers that only have the function of presenting salient noun phrases and other non-explicit markers, and registering their types and priorities, matching them with salient noun phrase marker priority rules, Extract and prioritize phrase candidates, select the candidate with the highest priority as a prominent noun phrase, and construct a topic structure for monolog data As a block recognition process in the knowledge processing, a topic continuation phrase which is a linguistic expression in which a topic continues is registered, and information of the topic continuation phrase included in the monolog data and one sentence indicated by a punctuation mark or the like are registered. A topic structure recognition method for monolog data characterized by recognizing a block, which is a sentence set having a meaningful unity, by using unit information.

2. The topic structure recognition method for monolog data according to claim 1, wherein the topic structure recognition processing for the monolog data is extracted from monolog data that is a result of the topic structure recognition preprocessing. By using the clue phrase and the information of the type and the block information, a topic establishment section in which a topic of the base development is presented / established is identified, and in the topic establishment section in the identified base development, According to the development topic candidate priority, the topic candidate with the highest priority is selected from the prominent noun phrases. If there is only one selected candidate, the candidate is set as a topic. By selecting a candidate that appears earliest in time, a topic word in the base development is identified, and a topic word in the identified base development is identified. Sets the topic level of the first topic word to 1 and determines the topic level for other topics according to the topic leveling rules for infrastructure development, and sets the beginning of the topic establishment section to which each topic belongs to the start of the continuation section of that topic By setting the end point of the topic establishment section as the end point of the topic establishment section between the time immediately before the start of the topic below the topic level and the end point of the monologue data and the end point of the monolog data, the level and the continuation section of the topic word in the base development are set. A topic structure recognition method for monolog data characterized by performing identification processing.

3. The topic structure recognition method for monolog data according to claim 1, wherein the semantic expansion processing of the topic structure recognition for the monolog data is a title of the whole sentence and a result of the topic structure recognition preprocessing. By using the block information, the salient noun phrase, the topic candidate priority for semantic development, and the topic establishment section in the base development that is the result of the base development processing,
Identify a topic establishment section in which a topic about the semantic development of topic structure recognition for monolog data is presented and established, and in the topic establishment section in the identified semantic development,
According to the topic candidate priority for semantic development, from the salient noun phrases, select the topic candidate with the highest priority,
If there is only one selected candidate, the candidate is discussed as a topic. If there are multiple selected candidates, the topic word in the semantic expansion is identified by selecting the candidate that appears earliest in time, The provisional topic level is set to 1 for all the topic words in the identified semantic expansion, the head of the topic establishment section to which each topic belongs is set as the start point of the continuation section of the topic, and the topic is identified by the semantic expansion. By setting the end point of the topic establishment section as the end point of the topic establishment section immediately before the start of the next topic, immediately before the start point of the topic establishment section in the base development, and the end point of the language data, the topic in the semantic development is obtained. A topic structure recognition method for monolog data, comprising determining a provisional word level and a continuation section.

4. An input unit for inputting monolog data, means for extracting dictionary contents of a pre-processing dictionary for topic structure recognition, pre-processing using the dictionary contents, and storing them in a pre-processing storage unit. Separation of means and topic development into base development explicitly indicated by clue phrases, etc., and semantic development deployed in the base development, and storing the processing results for each of the base development and semantic development Base expansion processing storage unit and semantic expansion processing storage unit, means for extracting each rule of the base expansion processing rules, topic establishment section determination processing, topic word determination processing, and topic level section determination processing using the base expansion processing rules Means for storing the information in the base development processing storage unit, means for extracting each rule of the semantic development processing rules, and topic establishment section determination processing and topic word determination using the semantic development processing rules. Means for performing processing and topic level section determination processing and storing them in a semantic expansion processing storage unit; means for extracting each rule of the integrated processing rules; and performing integrated processing using the integrated processing rule rules to perform integrated processing storage unit. And a display unit for displaying the contents of the integrated processing storage unit.