JPH07160710A

JPH07160710A - Method for recognizing topic structure of monologue data and device therefor

Info

Publication number: JPH07160710A
Application number: JP5306288A
Authority: JP
Inventors: Atsushi Takeshita; 敦竹下; Toru Nakagawa; 透中川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-12-07
Filing date: 1993-12-07
Publication date: 1995-06-23
Anticipated expiration: 2015-08-28
Also published as: JP3082889B2

Abstract

PURPOSE:To use not domain knowledge but a topic expansion mode and linguistic knowledge for monologue data to recognize a topic by integrating topics in basic expansion and semantic expansion. CONSTITUTION:Input monologue data 110 is subjected to morpheme analysis processing. This processing segments the character string of inputted monologue data 110 into individual words to obtain a word string and identifies the part of speech of each word, conjugations of inflective words, etc. That is, inputted monologue data 110 is subjected to topic structure recognition preprocessing 120 to obtain a block, and this block is subjected to basic expansion processing 130 and semantic expansion processing 140 to determine the topic level sections independently of each other. Topics in basic expansion and semantic expansion are integrated based on topic levels determined by basic expansion processing 130 and semantic expansion processing 140, thus determining the topic structure.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語解析における
話題構造認識の方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of topic structure recognition in natural language analysis.

【０００２】[0002]

【従来の技術】従来より話題とその構造に関するモデル
が提案されている。これについては例えば、B.J.Grosz
and C.L.Sidner: “Attention, intention and the str
uctureof discourse ”，Computational Lintuistics
誌，volume 12, number 3, pp.175-204(1986) に説明さ
れている。話題は入れ子構造を持つので、話題の展開は
スタックを用いてモデル化している。また、話題の入れ
子構造の変化、すなわちスタックへの話題のプッシュや
ポップの操作は、話者の意図の遷移によって決定され
る。また、どのような話題が展開するかということは、
ドメイン知識と呼ばれる常識が関係する。2. Description of the Related Art Conventionally, models related to topics and their structures have been proposed. About this, for example, BJGrosz
and CLSidner: “Attention, intention and the str
uctureof discourse ”, Computational Lintuistics
, Volume 12, number 3, pp.175-204 (1986). Since topics have a nested structure, topic development is modeled using a stack. The change in the nesting structure of topics, that is, the operation of pushing or popping topics on the stack is determined by the transition of the intention of the speaker. Also, what kind of topic will develop is
Common sense called domain knowledge is involved.

【０００３】ここで、ドメイン知識とは例えば「“会社
Ａ”とは“電話会社”の一種である」といった概念の上
位−下位関係や、「“会社Ａ”は“サービスＡ”という
サービスを行ない、そのために宣伝を行なっている」と
いった行為間の関係を含んでいる。Here, the domain knowledge is, for example, the upper-lower relationship of the concept that "" company A "is a kind of" telephone company "" or "" company A "performs a service" service A ". , That is why they are advertising for that. "

【０００４】[0004]

【発明が解決しようとしている課題】しかしながら、上
記の話題と構造に関するモデルでは、意図を認識する方
法が与えられていないので、実際には話題の構造を認識
することはできない。また、話題展開に関しても、どの
ようなドメイン知識が必要であり、それをどのように用
いれば良いかという方法が与えられていないだけでな
く、たとえそれらが与えられたとしても話題構造認識に
必要なドメイン知識をあらかじめ準備しておくことは不
可能である。However, in the above-mentioned model concerning the topic and the structure, since the method of recognizing the intention is not given, the structure of the topic cannot be recognized actually. Also, regarding topic development, not only is there no method of what domain knowledge is required and how to use it, but even if they are given, it is necessary for topic structure recognition. It is impossible to prepare good domain knowledge in advance.

【０００５】本発明は上記の点に鑑みなされたもので、
モノローグ・データに対して、ドメイン知識ではなく、
話題展開様式や言語的知識を用いることにより、話題を
認識することを目的とする。The present invention has been made in view of the above points,
For monologue data, not domain knowledge,
The purpose is to recognize topics by using topic development styles and linguistic knowledge.

【０００６】[0006]

【課題を解決するための手段】本発明の話題構造認識方
法は、モノローグ・データ中に書かれている話題の構造
の認識において、話題の展開を手掛かり句などによって
明示的に示される基盤展開と、その基盤展開の中で意味
的展開に分けてそれぞれにおける話題を認識し、その後
に基盤展開と意味的展開での話題を統合することによ
り、モノローグ・データ全体の話題を認識する。The topic structure recognition method of the present invention is based on the basic expansion explicitly indicated by a clue phrase or the like in the recognition of the structure of the topic written in the monologue data. By recognizing the topics in each of the basic development and semantic development, and then integrating the topics in the basic development and the semantic development, the topics of the entire monologue data are recognized.

【０００７】また、本発明の話題構造認識方法は、モノ
ローグ・データ中の各単文に対して、顕著名詞句候補を
示すマーカをマーカの種類と優先順位とともに登録した
顕著名詞句マーカ優先順位規則とマッチング取ることに
より、顕著名詞句候補の抽出と優先順位付けを行ない、
最も優先順位の高い候補を顕著名詞句として選ぶ。Further, according to the topic structure recognition method of the present invention, a marker indicating a prominent noun phrase candidate is registered for each simple sentence in the monologue data together with the marker type and the priority, and a prominent noun phrase marker priority rule is set. By matching, the prominent noun phrase candidates are extracted and prioritized.
The candidate with the highest priority is selected as the salient noun phrase.

【０００８】また、本発明の話題構造認識方法は、話題
が継続している言語表現である話題継続句を登録してお
き、モノローグ・データ中に含まれる話題継続句の情報
と、句点等で示される文という単位の情報を用いること
により、モノローグ・データにおける意味的なまとまり
であるブロックを認識する。Further, in the topic structure recognition method of the present invention, a topic continuation phrase, which is a language expression in which a topic continues, is registered, and information of the topic continuation phrase included in the monologue data and the phrase are used. By using the information of the unit shown, the block that is a semantic unit in the monologue data is recognized.

【０００９】また、本発明の話題構造認識方法は、話題
展開を明示的に示す言語表現である手掛かり句をその種
類と共に登録しておき、モノローグ・データ中の手掛か
り句の情報と、上記のブロック情報を用いることによ
り、上記の基盤展開において、話題が提示・確立される
話題確立区間を同定する。Further, in the topic structure recognition method of the present invention, a clue phrase, which is a language expression that explicitly indicates a topic development, is registered together with its type, and the clue phrase information in the monologue data and the above block are registered. By using the information, the topic establishment section in which the topic is presented / established is identified in the above-mentioned infrastructure development.

【００１０】また、本発明の話題構造認識方法は、モノ
ローグ・データにおけるブロック情報と、文章全体のタ
イトルと、上記の顕著名詞句と、意味的展開用話題候補
優先順位と、上記の基盤展開話題確立区間を用いること
により、上記の意味的展開において、話題が提示・確立
される話題確立区間を同定する。In the topic structure recognition method of the present invention, the block information in the monologue data, the title of the entire sentence, the prominent noun phrase, the topic candidate priority for semantic development, and the basic development topic described above. By using the establishment section, the topic establishment section in which the topic is presented / established is identified in the semantic development.

【００１１】また、本発明の話題構造認識方法は、基盤
展開での各話題確立区間において、基盤展開用話題候補
優先順位にしたがって、顕著名詞句から、最も優先順位
が最も高い話題候補を選び、選ばれた候補が１つしかな
い場合はその候補を話題とし、選ばれた候補が複数ある
場合は、時間的に最も早く出現した候補を選ぶことによ
り、話題確立区間における話題語を決定する。Further, the topic structure recognition method of the present invention selects a topic candidate having the highest priority from prominent noun phrases in accordance with the topic candidate priority for infrastructure development in each topic establishment section in infrastructure development, If there is only one candidate selected, that candidate is made a topic, and if there are multiple selected candidates, the candidate that appears earliest in time is selected to determine the topic word in the topic establishment section.

【００１２】また、本発明の話題構造認識方法は、意味
的展開での各話題確立区間において、意味的展開用話題
候補優先順位にしたがって、顕著名詞句から、最も優先
順位が最も高い話題候補を選び、選ばれた候補が１つし
かない場合はその候補を話題とし、選ばれた候補が１つ
しかない場合はその候補を話題とし、選ばれた候補が複
数ある場合は、時間的に最も早く出現した候補を選ぶこ
とにより、話題確立区間における話題語を決定する。Further, the topic structure recognition method of the present invention selects the topic candidate having the highest priority from the salient noun phrase in accordance with the priority order of topic candidates for semantic development in each topic establishment section in the semantic development. If there is only one candidate selected, then that candidate is the topic. If there is only one candidate selected, then that candidate is the topic. By selecting a candidate that appears earlier, the topic word in the topic establishment section is determined.

【００１３】また、本発明の話題構造認識方法は、基盤
展開での最初の話題語の話題レベルを１とし、それ以外
の話題に関しては基盤展開用話題レベル付け規則にした
がい話題レベルを決定し、各話題が属する話題確立区間
の先頭をその話題の継続区間の開始点とし、その課題レ
ベル以下の話題の開始直前とモノローグ・データ終了点
の２つのうち時間的に早い方を話題確立区間の終了点と
することにより、基盤展開における話題語のレベルと継
続区間を決定する。Further, in the topic structure recognition method of the present invention, the topic level of the first topic word in the infrastructure development is set to 1, and for other topics, the topic level is determined in accordance with the infrastructure development topic leveling rule, The top of the topic establishment section to which each topic belongs is set as the start point of the continuation section of the topic, and the topic establishment section is the end of the topic establishment section, which is earlier in time than the start of the topic below the task level and the monologue / data end point. By setting the points, the level and continuation section of the topic word in the infrastructure development are determined.

【００１４】また、本発明の話題構造認識方法は、意味
的展開における全ての話題語に対して仮の話題レベルを
１とし、各話題が属する話題確立区間の先頭をその話題
の継続区間の開始点とし、意味的展開でその話題の次の
話題の開始直前と基盤展開での話題確立区間の開始点の
直前とモノローグ・データ終了点の３つのうち時間的に
早い方を話題確立区間の終了点とすることにより、意味
的展開における話題語の仮レベルと継続区間を決定す
る。Further, in the topic structure recognition method of the present invention, the provisional topic level is set to 1 for all topic words in the semantic expansion, and the beginning of the topic establishment section to which each topic belongs starts the continuation section of the topic. The point immediately before the start of the next topic of the topic in the semantic expansion, immediately before the start point of the topic establishment section in the base development, and the monologue / data end point, whichever is earlier in time, ends the topic establishment section. By setting the points, the temporary level and the continuation section of the topic word in the semantic expansion are determined.

【００１５】また、本発明の話題構造認識方法は、基盤
展開と意味的展開での話題構造を統合する際に、意味的
展開における話題の仮の話題レベルにその時点での基盤
展開での話題レベルを加算して、意味的展開の話題レベ
ルを補正することにより、モノローグ・データ全体の話
題構造を認識する。Further, according to the topic structure recognition method of the present invention, when integrating the topic structures in the basic expansion and the semantic expansion, the topic in the basic expansion at the time is added to the temporary topic level of the topic in the semantic expansion. The topic structure of the entire monologue data is recognized by adding the levels and correcting the topic level of the semantic expansion.

【００１６】[0016]

【作用】本発明は、話題継続句などによって意味的にま
とまりのあるブロックを認識し、話題展開を手掛かり句
などによって明示的に示される基盤展開と、その中で展
開する意味展開に分け、それぞれについてブロック情報
等を用いて話題が提示、確立される話題確立区間を求
め、各話題確立区間における話題語を話題マーカで示さ
れた候補から選び、基盤展開と意味展開における話題を
統合することにより、モノローグ・データに対して、話
題構造を認識する。このように、モノローグ・データに
対するドメイン知識を必要とすることなく、話題展開様
式や言語的知識のみを用いて話題が認識される。The present invention recognizes blocks that are semantically cohesive by topic continuation phrases and the like, and divides topic development into basic expansion explicitly indicated by clue phrases and semantic expansion developed in the basic expansion. About the topic establishment section where the topic is presented and established by using block information etc., the topic words in each topic establishment section are selected from the candidates indicated by the topic markers, and the topics in the basic development and the semantic development are integrated. , Recognize topic structure for monologue data. In this way, topics are recognized using only topic development styles and linguistic knowledge without requiring domain knowledge of monologue data.

【００１７】[0017]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１８】図１は本発明の一実施例の話題構造認識処
理の概要を示す図、図２は本発明の一実施例の話題構造
認識装置のブロック図であり、これらを参照して本発明
の処理および話題構造認識装置の概要について説明す
る。FIG. 1 is a diagram showing an outline of topic structure recognition processing according to one embodiment of the present invention, and FIG. 2 is a block diagram of a topic structure recognition apparatus according to one embodiment of the present invention. The processing and the topic structure recognition device will be outlined.

【００１９】まず、入力のモノローグ・データ１１０に
対して形態素解析処理を行なう。形態素解析処理は入力
されたモノローグ・データ１１０の文字列を単語毎に区
切って単語列とし、さらに各単語の品詞や活用語の活用
形等を同定する。First, morphological analysis processing is performed on the input monologue data 110. The morphological analysis process divides the input character string of the monologue data 110 into words to make a word string, and further identifies the part of speech of each word and the inflection of the inflection word.

【００２０】形態素解析は図１に示す以下の手順にて行
われる。The morphological analysis is performed by the following procedure shown in FIG.

【００２１】入力されたモノローグ・データ１１０に対
して話題構造認識前処理１２０を施し、ブロックとす
る。この後、該ブロックに対して基盤展開処理１３０お
よび意味的展開処理１４０を施して話題レベル区間の決
定をそれぞれ独立に行う。続いて基盤展開処理１３０と
意味的展開処理１４０にてそれぞれ決定された話題レベ
ルに基づいて、基盤展開と意味的展開の統合処理１５０
を行い、話題構造１６０を決定する。The topic structure recognition preprocessing 120 is applied to the input monologue data 110 to form a block. After that, the base development processing 130 and the semantic development processing 140 are applied to the block to independently determine the topic level section. Subsequently, based on the topic levels respectively determined by the base expansion processing 130 and the semantic expansion processing 140, the integrated processing 150 of the base expansion and the semantic expansion is performed.
And the topic structure 160 is determined.

【００２２】上記の話題構造認識前処理１２０は、形態
素解析１２１、単文区切り処理１２２、顕著名詞句抽出
１２３およびブロック認識１２４を順に行うように構成
されている。基盤展開処理１３０は話題区間の決定１３
１、話題後の決定および話題レベルを順に行い、意味的
展開処理１４０では話題確率区間の決定１４１、話題後
の決定１４２および話題レベル区間の決定１４３が順に
行われる。The topic structure recognition preprocessing 120 is configured to sequentially perform a morphological analysis 121, a simple sentence segmentation processing 122, a salient noun phrase extraction 123, and a block recognition 124. The base development processing 130 is for determining a topic section 13
1. The post-topic determination and the topic level are sequentially performed, and in the semantic expansion processing 140, the topic probability section determination 141, the post-topic determination 142, and the topic level section determination 143 are sequentially performed.

【００２３】話題構造認識前処理１２０はで行われる形
態素解析１２１、単文区切り処理１２２、顕著名詞句抽
出１２３およびブロック認識１２４は図２に示すブロッ
ク図では、話題構造認識前処理部２０３がデータ入力部
２０１から入力されたモノローグ・データに対して前処
理記憶部２０２に記憶されている処理手順に従って辞書
管理部２１６と前処理用辞書２０４を用いて行う。In the block diagram shown in FIG. 2, the topic structure recognition preprocessing unit 203 inputs the morphological analysis 121, the simple sentence segmentation processing 122, the prominent noun phrase extraction 123 and the block recognition 124 which are performed by the topic structure recognition preprocessing unit 203. The monologue data input from the unit 201 is performed using the dictionary management unit 216 and the preprocessing dictionary 204 according to the processing procedure stored in the preprocessing storage unit 202.

【００２４】形態素解析１２１がなされると、続いて形
態素解析の結果について単文区切り処理１２２が行なわ
れる。単文区切り処理１２２は埋め込み文や重文のよう
に複数の述語を含む文、１つの述語のみを含む単文に分
割するもので、図２のブロック図では、話題構造認識前
処理部２０３が前処理記憶部２０２に記憶されている単
文規則管理および単語区切り規則を用いて行なう。When the morphological analysis 121 is performed, subsequently, a simple sentence segmentation process 122 is performed on the result of the morphological analysis. The single sentence segmentation process 122 divides a sentence including a plurality of predicates, such as an embedded sentence or a compound sentence, into a single sentence including only one predicate. In the block diagram of FIG. 2, the topic structure recognition preprocessing unit 203 performs preprocessing storage. This is performed using the simple sentence rule management and word segmentation rules stored in the unit 202.

【００２５】次に、顕著名詞句抽出１２３で、入力され
た単文区切り処理結果に対する各単文において最も強調
されている名詞句を抽出することが行われる。Next, in the prominent noun phrase extraction 123, the most emphasized noun phrase in each simple sentence corresponding to the input simple sentence segmentation processing result is extracted.

【００２６】次に、意味的にまとまりのある単位である
ブロックを認識する。ブロックはモノローグ・データに
おける段落に相当する。なお、ここでの処理は、図２の
ブロック図では話題構造認識前処理部２０３が、前処理
記憶部２０２に記憶されている話題管理規則と話題構造
認識規則を用いて行なう。Next, a block, which is a unit that is semantically cohesive, is recognized. A block corresponds to a paragraph in monologue data. Note that the processing here is performed by the topic structure recognition preprocessing unit 203 in the block diagram of FIG. 2 using the topic management rules and the topic structure recognition rules stored in the preprocessing storage unit 202.

【００２７】次に、認識されたブロックについて、基盤
展開処理１３０と意味的展開処理１４０がそれぞれ行わ
れ、話題確立区間の決定１３１，１４１、話題語の決定
１３２，１４２、話題レベル区間の決定１３３，１４３
という３つの処理が順次行なわれる。ここで、話題確立
区間とは、話題が提示・確立される区間のことである。
この３つの処理によって、基盤展開処理１３０と意味的
展開処理１４０のそれぞれにおける話題構造を求めるこ
とができる。Next, the base expansion processing 130 and the semantic expansion processing 140 are performed on the recognized blocks, respectively, to determine topic establishment sections 131 and 141, topic words 132 and 142, and topic level section 133. , 143
These three processes are sequentially performed. Here, the topic establishment section is a section in which a topic is presented / established.
With these three processes, the topic structure in each of the basic expansion processing 130 and the semantic expansion processing 140 can be obtained.

【００２８】基盤展開処理１３０に関しては、各処理の
入力としては、基盤展開の直前の処理の結果だけが必要
である。これに対し、意味的展開処理１４０に関して
は、話題語の決定を行なうためには、意味的展開処理１
４０の直前の処理の結果と、基盤展開処理１３０での同
じ種類の処理の結果が必要である。すなわち、意味的展
開処理１４０における話題確立区間の決定１４１を行う
ための入力としては、ブロック認識１２４の結果と基盤
展開処理１３０における話題確立区間の決定１３１の結
果の両方が必要である。As for the base development processing 130, only the result of the processing immediately before the base development is necessary as an input for each processing. On the other hand, regarding the semantic expansion processing 140, in order to determine the topic word, the semantic expansion processing 1
The result of the process immediately before 40 and the result of the same type of process in the infrastructure expansion process 130 are required. That is, both the result of the block recognition 124 and the result of the topic establishment section determination 131 in the basic expansion processing 130 are necessary as inputs for making the determination 141 of the topic establishment section in the semantic expansion processing 140.

【００２９】同様に、意味的展開処理１４０における話
題語の決定１４２の入力としては、意味的展開処理１４
０における話題確立区間の決定１４１の結果と、基盤展
開処理１３０における話題語の決定１３２の結果が必要
である。また、意味的展開処理１４０における話題レベ
ル区間の決定１４３の入力としては、意味的展開処理１
４０における話題語の決定１４２の結果と、基盤展開処
理１３０における話題レベル区間の決定１３３の結果が
必要である。Similarly, the semantic expansion processing 14 is used as an input of the topic word determination 142 in the semantic expansion processing 140.
The result of the determination 141 of the topic establishment section in 0 and the result of the determination 132 of the topic word in the basic expansion processing 130 are necessary. Further, as an input of the topic level section determination 143 in the semantic expansion processing 140, the semantic expansion processing 1
The result of the decision 142 of the topic word in 40 and the result of the decision 133 of the topic level section in the infrastructure expansion processing 130 are necessary.

【００３０】最後に、基盤展開処理１３０と意味的展開
処理１４０で求められたそれぞれの話題構造を入力とし
て、基盤展開処理１３０と意味的展開処理１４０の統合
処理１５０を行ない、その結果としてモノローグ・デー
タ全体の話題構造１６０を出力する。Finally, the integrated structure 150 of the base expansion processing 130 and the semantic expansion processing 140 is performed by using the respective topic structures obtained by the base expansion processing 130 and the semantic expansion processing 140 as input, and as a result, the monologue The topic structure 160 of the entire data is output.

【００３１】上述した基盤展開処理１３０は、図２に示
すブロック図では、話題確立区間決定処理部２３１，話
題語決定処理部２３２，話題レベル区間決定処理部２３
３からなる基盤展開処理部２３０が、基盤展開処理の手
順を記憶する基盤展開処理記憶部２０７の記憶内容にし
たがい、基盤展開処理規則管理部２０５および基盤展開
処理規則２０６を参照して行う。また、意味的展開処理
１４０は、話題確立区間決定処理部２４１，話題語決定
処理部２４２，話題レベル区間決定処理部２４３からな
る意味的展開処理部２４０が、意味的展開処理の手順を
記憶する意味的展開処理記憶部２１０の記憶内容にした
がい、意味的展開処理規則管理部２０８および意味的展
開処理規則２０９を参照して行う。In the block diagram shown in FIG. 2, the above-described infrastructure expansion processing 130 is a topic establishment section determination processing section 231, a topic word determination processing section 232, and a topic level section determination processing section 23.
The infrastructure expansion processing unit 230 composed of 3 refers to the infrastructure expansion processing rule management unit 205 and the infrastructure expansion processing rule 206 according to the stored contents of the infrastructure expansion processing storage unit 207 that stores the procedure of the infrastructure expansion processing. In the semantic expansion processing 140, the semantic expansion processing unit 240 including the topic establishment interval determination processing unit 241, the topic word determination processing unit 242, and the topic level interval determination processing unit 243 stores the procedure of the semantic expansion process. According to the contents stored in the semantic expansion processing storage unit 210, the semantic expansion processing rule management unit 208 and the semantic expansion processing rule 209 are referred to.

【００３２】次に、本発明における話題構造認識を行う
ための各処理の具体的な内容について説明する。Next, the specific contents of each process for performing the topic structure recognition in the present invention will be described.

【００３３】話題構造認識前処理１２０形態素解析１２１形態素解析１２１では日本語文字列を入力とし、それを
単語ごとに区切った結果と各単語の品詞等の情報を出力
とする。例えば、「特許を書く」という日本語文字列を
入力として形態素解析を行なうと、出力としては「特
許」「を」「書く」のように３つの単語に分割された日
本語文字列と、「特許＝名詞」、「を＝格助詞」、「書
く＝動詞の終止形」のような各単語の品詞情報が出力さ
れる。ただし、動詞は活用語であるので、「終止形」の
ような活用形の情報も付加される。 Topic structure recognition preprocessing 120 Morphological analysis 121 In the morphological analysis 121, a Japanese character string is input, and the result obtained by dividing it into words and the information such as the part of speech of each word are output. For example, when morphological analysis is performed by inputting a Japanese character string "writing a patent", the output is a Japanese character string divided into three words such as "patent", "wa", "writing", and " Part-of-speech information of each word such as "patent = noun", "wo = case particle", "writing = verb end form" is output. However, since the verb is an inflection word, inflectional information such as "end form" is also added.

【００３４】形態素解析１２１を行なうためには、各単
語の品詞を記した単語辞書と、日本語文字列において品
詞同士の並びやすさを記述した連接辞書が必要である。
連接辞書には例えば、『「特許」「を」』のように名詞
の後には格助詞が続きやすいが、『「書く」「を」』の
ように動詞の後には格助詞は続きにくいという情報が記
されている。In order to perform the morphological analysis 121, a word dictionary describing the parts of speech of each word and a concatenation dictionary describing the ease of arranging the parts of speech in a Japanese character string are required.
Information that in a connected dictionary, a case particle is likely to follow a noun, such as “patent” and “wo”, but it is difficult to follow a case particle, such as “writing” and “wo”. Is written.

【００３５】日本語文字列を単語に区切る場合、例えば
『特許』という文字列が「特許」という１つの名詞から
構成されるか、「特」と「許」という２つも単語から構
成されるかという曖昧性が存在するが、形態素解析では
単語辞書と連接辞書を用いることにより、最も適切な解
析結果を選択する。形態素解析に関する詳細な手法は、
『吉村、日高、吉田：「文節数最小法を用いたべた書き
日本語文の形態素解析」情報処理学会論文誌Ｖｏｌ．２
４，Ｎｏ．１，ｐｐ．４０−４６（１９８３）』で述べ
られている。When a Japanese character string is divided into words, for example, does the character string "patent" consist of one noun "patent" or two words "special" and "permission"? However, in the morphological analysis, the most appropriate analysis result is selected by using the word dictionary and the concatenation dictionary. Detailed method for morphological analysis is
"Yoshimura, Hidaka, Yoshida:" Morphological analysis of solid Japanese sentences using the minimum number of clauses "Journal of Information Processing Society Vol. Two
4, No. 1, pp. 40-46 (1983) ".

【００３６】単文区切り処理１２２単文区切り処理１２２は埋め込み文や重文のように複数
の述語を含む文を、図３に示すようなあらかじめ準備し
た単文区切り規則を用いることにより、１つの述語のみ
を含む単文に分割する。例えば、「私は特許を書く」と
いう文に含まれる述語は「書く」という動詞だけである
ので、これは単文である。これに対し、「発明したら、
特許を書く」という文には「発明し」という動詞と「書
く」という動詞の２つの述語が含まれているので、「発
明したら、」と「特許を書く」という２つの単文に分割
する。Single sentence segmentation process 122 The single sentence segmentation process 122 includes only one predicate by using a sentence segment including a plurality of predicates, such as an embedded sentence or a compound sentence, prepared in advance as shown in FIG. Break into simple sentences. For example, this is a single sentence because the only predicate contained in the sentence "I write a patent" is the verb "write". In response to this,
Since the sentence "write a patent" includes two predicates, the verb "invent" and the verb "write", it is divided into two single sentences "when invented" and "write a patent".

【００３７】図３に示す単文区切り規則は、以下の通り
である。（１）句点で切る（２）以下の場合を除き、原則として関係の直後で切る（２−１）関係が形容詞または形容動詞の連体形の場合（２−２）（３）読点では区切らない。ただし、読点より前の単文
内に関係を含んでいる場合は、読点の後で区切る（４）終助詞に格助詞「と」が続いている場合は、格助
詞「と」の前で区切る形態素解析で求められた単語の種類や品詞の種類、活用
形に応じて、複数の述語を含む日本語文を単文に分割す
る。与えられた日本語文に対して、各規則を適用できる
かどうかを調べ、可能なものについては適用を行なうこ
とによって単文区切り処理を行なう。The simple sentence segmentation rule shown in FIG. 3 is as follows. (1) Cut at punctuation marks (2) As a general rule, cut immediately after a relation except in the following cases (2-1) If the relation is an adjective or adjectival verb form (2-2) (3) Do not separate at a reading point . However, if the relation is included in the simple sentence before the reading point, it is separated after the reading point. (4) If the final particle is followed by the case particle “to”, it is separated before the case particle “to”. A Japanese sentence containing multiple predicates is divided into single sentences according to the type of word, the type of part-of-speech, and inflection found by analysis. It checks whether each rule can be applied to a given Japanese sentence, and if it is possible, it applies it to perform simple sentence segmentation processing.

【００３８】顕著名詞句の抽出１２３各単文において最も強調されている顕著名詞句を抽出す
る。日本語では顕著名詞句は助詞等のマーカによって示
される。マーカには、「について」「に関して」「は」
のように語句を提示する機能しか持たない明示マーカ
と、「が」「を」のように主語や目的語のような文法的
役割を示すマーカが語句を提示するためにも用いられた
「非明示マーカ」が存在する。これらは、優先順位とと
もにあらかじめ規則として人間が与えておく。Extraction of prominent noun phrase 123 The prominent noun phrase most emphasized in each simple sentence is extracted. In Japanese, prominent noun phrases are indicated by markers such as particles. Markers include “About”, “About”, and “Ha”.
Explicit markers that only have the function of presenting words and phrases, and markers that show grammatical roles such as the subject and object, such as "ga" and "wo", are also used to present words and phrases. There is an "explicit marker". These are given in advance by humans as rules together with priorities.

【００３９】顕著名詞句のマーカ優先順位の例を図４に
示す。An example of the marker priority order of salient noun phrases is shown in FIG.

【００４０】最も優先されるのは、（１）「は」以外の
明示マーカであり、次に（２）明示マーカ「は」、
（３）非明示マーカと続く。The highest priority is (1) explicit markers other than "ha", and then (2) explicit marker "ha".
(3) Followed by an implicit marker.

【００４１】モノローグ・データとこれらのマーカとの
間でマッチングを取ることにより、顕著名詞句の候補を
抽出する。ただし、マーカで示されている語句が、代名
詞や「こと」「もの」のようにそれだけでは具体的な意
味を持たないダイクシス表現の場合は、顕著名詞句の候
補とはしない。The salient noun phrase candidates are extracted by matching between the monologue data and these markers. However, if the word / phrase indicated by the marker is a Dyxis expression such as a pronoun or “koto” “mono” that does not have a specific meaning by itself, it is not a candidate for a prominent noun phrase.

【００４２】１単文から複数の候補が抽出された場合
は、図４に示す優先順位にしたがい、最も優先順位が高
いものを顕著名詞句として選ぶ、また、優先順位が最高
位のものが複数ある場合は、時間的に最も早く出現して
いるものを顕著名詞句として選ぶ。When a plurality of candidates are extracted from one simple sentence, the one having the highest priority is selected as the prominent noun phrase in accordance with the priority shown in FIG. 4, and there are a plurality of the ones having the highest priority. In this case, choose the one that appears earliest in time as the prominent noun phrase.

【００４３】ブロックの認識１２４モノローグ・データにおける意味的なまとまりであるブ
ロックを認識する。ブロックはモノローグ・テータにお
ける段落に相当する。Block Recognition 124 Recognize blocks, which are semantic blocks in monologue data. A block corresponds to a paragraph in Monolog Theta.

【００４４】ブック認識規則は以下に示す通りであり、
図５にその例を示す。（ａ）１文内では話題が継続しているので、１文内は同
一ブロックとする。（ｂ）話題継続句によって話題が継続していることが示
されている場合は、現在の文は、直前の文と同じブロッ
クに含まれるものとする。（ｃ）上記の格規則によって現在の文が既に存在するブ
ロックに属することが認識されなければ、現在の文から
新しいブロックが始まるものとする。The book recognition rules are as follows:
FIG. 5 shows an example thereof. (A) Since the topic continues in one sentence, the same block is used in one sentence. (B) When the topic continuation phrase indicates that the topic continues, the current sentence is included in the same block as the immediately preceding sentence. (C) If the case rule does not recognize that the current sentence belongs to a block that already exists, a new block starts from the current sentence.

【００４５】句点等で示される１文という単位は、話題
の継続も示しているので、１文内は同一ブロックとす
る。また、「によりますと」、「これに対して」等の話
題継続句によって、直前の文から話題が継続しているこ
とが明示的に示されている場合は、現在の文は直前の文
が属するブロックに含まれるものとする。また、上記以
外の場合は、現在の文から、新しいブロックが開始して
いるものとする。The unit of one sentence indicated by a punctuation mark also indicates the continuation of the topic, and therefore, one sentence has the same block. If the topic continuation phrase such as “tomasuto” or “to this” explicitly indicates that the topic continues from the previous sentence, the current sentence is the previous sentence. It shall be included in the block to which it belongs. In other cases than the above, it is assumed that a new block starts from the current sentence.

【００４６】基盤展開処理１３０話題確立区間の決定１３１手掛かり句などによって明示的に話題が展開される基盤
展開において、話題が提示・確立される話題確立区間を
同定する。図６に基盤展開における話題確立区間の決定
処理の流れを示す。 Infrastructure Development Processing 130 Topic Establishment Section Determination 131 In the infrastructure development in which a topic is explicitly developed by a clue phrase or the like, a topic establishment section in which a topic is presented / established is identified. FIG. 6 shows the flow of the process of determining the topic establishment section in the infrastructure development.

【００４７】まず、モノローグ・データ開始時と手掛か
り句が含まれているブロックの先頭を話題確立区間の開
始点とする（ステップＳ６０１）。ここで、手掛かり句
には、図６に示すように、「まず」、「第１に」などの
入れ子開始型と、「次に」「第２に」などの話題転換型
と、「最後に」「終わりに」などの入れ子終了型に分類
する。First, the start point of the topic establishment section is set at the start of the monologue data and the beginning of the block containing the clue phrase (step S601). Here, the clue phrase includes, as shown in FIG. 6, a nesting start type such as "first" and "first", a topic conversion type such as "next" and "second", and "finally". It is classified as a nested end type such as "at the end".

【００４８】話題確立区間の開始点がまとまったら、各
話題確立区間での話題提示型を同定する（ステップＳ６
０２）。ここで、話題提示型の同定処理については後述
する。続いて、話題提示型が漸次型であるかを確認する
（ステップＳ６０３）。話題提示型が、話題を少しずつ
提示する漸次型であれば、話題確立区間終点は話題提示
型の同定処理において認識されているので、それを話題
確立区間の終点とし（ステップＳ６０４）、さらに、元
のブロックのうち、この話題確立区間以外の部分を「疑
似ブロック」として認定する（ステップＳ６０５）。話
題提示型が漸次型ではない場合、すなわち話題を一度に
提示する一括型の場合は、話題確立区間の開始点から第
１文の範囲を話題確立区間として認定する（ステップＳ
６０６）。When the start points of the topic establishment sections are collected, the topic presentation type in each topic establishment section is identified (step S6).
02). Here, the topic presentation type identification processing will be described later. Then, it is confirmed whether the topic presentation type is the gradual type (step S603). If the topic presentation type is a gradual type in which topics are presented little by little, the topic establishment section end point has been recognized in the topic presentation type identification processing, and is therefore set as the end point of the topic establishment section (step S604). The part of the original block other than this topic establishment section is recognized as a "pseudo block" (step S605). When the topic presentation type is not the gradual type, that is, when the topics are presented at once, the range of the first sentence from the start point of the topic establishment section is recognized as the topic establishment section (step S
606).

【００４９】次に、上記の各ステップＳ６０５，Ｓ６０
６にて求めた話題確立区間を要約区間として認定する
（ステップＳ６０７）。これらの一連の処理をテキスト
全体に対して行い、その後終了とする（ステップＳ６０
８）。Next, the above steps S605 and S60.
The topic establishment section obtained in 6 is recognized as a summary section (step S607). The series of processes is performed on the entire text, and then the process ends (step S60).
8).

【００５０】次に、ステップＳ６０２にて行われる話題
提示型の同定処理について説明する。Next, the topic presentation type identification processing performed in step S602 will be described.

【００５１】図７に示すように、この処理では、話題提
示型が、話題を少しずつ提示する漸次型と、話題を一度
に提示する一括型のどちらであるかを同定するものであ
る。As shown in FIG. 7, this process identifies whether the topic presentation type is a gradual type in which topics are presented little by little or a batch type in which topics are presented at once.

【００５２】まず、基盤展開での話題確立区間開始点か
らの１文に対して、図８に示す基盤展開用話題候補優先
順位にしたがって、上位の２つの顕著名詞句を抽出し
（ステップＳ７０１）、この２つのうち、時間的に先に
現われた顕著名詞句をｐ、後に現われた顕著名詞句をｑ
と呼ぶことにする（ステップＳ７０２）。First, with respect to one sentence from the start point of the topic establishment section in the infrastructure development, the two higher prominent noun phrases are extracted in accordance with the priority ranking of the infrastructure development topic candidates shown in FIG. 8 (step S701). , P is the prominent noun phrase that appears earlier in time, and q is the prominent noun phrase that appears later in time.
Will be called (step S702).

【００５３】ここで、上位の２つの顕著名詞句を抽出す
る際の基準として、（ａ）タイトルに含まれている顕著
名詞句、（ｂ）固有名詞を含む顕著名詞句、（ｃ）明示
マーカで示された顕著名詞句のそれぞれは同じ優先順位
として取り扱い、非明示マーカで提示された顕著名詞句
はこれらの下位として取り扱う。Here, as a standard for extracting the two higher prominent noun phrases, (a) a prominent noun phrase included in the title, (b) a prominent noun phrase including a proper noun, and (c) an explicit marker. The salient noun phrases shown by are treated as the same priority, and the salient noun phrases presented by the implicit marker are treated as their subordinates.

【００５４】次に、基盤展開用話題候補優先順位がｐ、
ｑともに１であるかを確認する（ステップＳ７０３）。
基盤展開用話題候補優先順位がｐ、ｑともに１である場
合、すなわち、タイトルに含まれているか、固有名詞を
含んでいるか、明示マーカで示されているかのいずれか
である場合は、話題提示型は漸次型であるとし（ステッ
プＳ７０４）、さらに、ｑを含む単文の１つ前の単文の
終わりを話題確立区間の終了点とする（ステップＳ７０
５）。もし、基盤展開用話題候補優先順位がｑ、ｐとも
には１でない場合、すなわち、少なくとも一方が非明示
マーカで示されているか、あるいは基盤展開用話題候補
優先順位を満たすものが１つしかない場合は、話題提示
型は一括型であるとする（ステップＳ７０６）。Next, the priority level of topic candidates for infrastructure development is p,
It is confirmed whether both q are 1 (step S703).
When the priority candidate for infrastructure development is 1 for both p and q, that is, it is included in the title, includes a proper noun, or is indicated by an explicit marker, presents a topic. The type is assumed to be a gradual type (step S704), and the end of the simple sentence immediately before the simple sentence containing q is set as the end point of the topic establishment section (step S70).
5). If both q and p of the basic candidate topics for infrastructure expansion are not 1, that is, at least one of them is indicated by an implicit marker, or there is only one that satisfies the priority candidate topic for infrastructure expansion. Assumes that the topic presentation type is a collective type (step S706).

【００５５】話題語の決定１３２手掛かり句等で明示的に話題が展開される基盤展開での
話題確立区間において、どのような話題が提示されてい
るかを認識する。図９に基盤展開における話題語決定処
理の流れの例を示す。Topic word determination 132 Recognize what kind of topic is presented in the topic establishment section in the infrastructure development in which the topic is explicitly developed with a clue phrase or the like. FIG. 9 shows an example of the flow of topic word determination processing in infrastructure development.

【００５６】基盤展開における各話題確立区間につい
て、先に説明し、図８に例を示した「基盤展開用話題候
補優先順位」に基づいて最も優先順位が高いものを抽出
する。もし、抽出された候補が１つであれば、それを話
題として認定する（ステップＳ９０１）。この後、抽出
された候補が１つであるかを確認し（ステップＳ９０
２）、複数の候補が抽出されれば、時間的に最も早く出
現したものを選び（ステップＳ９０３）。これを話題と
して認定する（ステップＳ９０４）。For each topic establishment section in the infrastructure development, the one having the highest priority order will be extracted based on the "topic development topic candidate priority order" described above and shown in FIG. If there is one extracted candidate, it is recognized as a topic (step S901). After that, it is confirmed whether the number of extracted candidates is one (step S90).
2) If a plurality of candidates are extracted, the one that appears earliest in time is selected (step S903). This is recognized as a topic (step S904).

【００５７】話題レベル区間の決定１３３基盤展開における話題に対して、その話題レベルと話題
が継続する区間を決定する。ここで、一番外側の話題の
話題レベルを１とし、それより入れ子が１つ増えるごと
に、話題レベルも１ずつ増加するものとする。Determination of topic level section 133 For a topic in the infrastructure development, the topic level and the section where the topic continues are determined. Here, the topic level of the outermost topic is set to 1, and the topic level is also incremented by 1 as the nesting level increases by 1.

【００５８】図１０に基盤展開用レベル付け規則の例を
示す。第１に、モノローグ・データの最初の話題の話題
レベルを１とする。第２に、手掛かり句による話題レベ
ルの変化に関しては、直前の手掛かり句から現在の手掛
かり句への遷移パターンによって話題レベルを増減させ
る。FIG. 10 shows an example of the leveling rule for infrastructure development. First, the topic level of the first topic in the monologue data is set to 1. Secondly, regarding the change of the topic level due to the clue phrase, the topic level is increased or decreased according to the transition pattern from the immediately preceding clue phrase to the current clue phrase.

【００５９】現在の手掛かり句が入れ子開始型であれ
ば、直前の手掛かり句の種類にかかわらず話題レベルを
１増加させる。また、現在の手掛かり句が話題転換型か
入れ子終了型で、かつ直前の手掛かり句が入れ子開始型
か話題転換型のいずれかであれば、話題レベルは変わら
ないものとする。また、現在の手掛かり句が話題転換型
か入れ子終了型で、かつ直前の手掛かり句が入れ子終了
型であれば、話題レベルを１減少させる。If the current clue phrase is the nested start type, the topic level is increased by 1 regardless of the type of the immediately preceding clue phrase. If the current clue phrase is the topic conversion type or the nest end type, and the immediately preceding clue phrase is either the nest start type or the topic conversion type, the topic level is not changed. If the current clue phrase is the topic conversion type or the nest end type, and the immediately preceding clue phrase is the nest end type, the topic level is decreased by one.

【００６０】図１１に基盤展開における話題継続区間決
定処理の規則の例を示す。現在処理対象としている話題
をＡ、その話題レベルをｍとすると、Ａの属する話題確
立区間の先頭を話題継続区間の開始点とし、話題レベル
がｍ以下の話題の開始直前とモノローグ・データ終了の
２つのうち時間的に早く出現した方を話題継続区間の終
了点とする。FIG. 11 shows an example of the rules of the topic continuation section determination processing in the infrastructure development. Assuming that the topic currently being processed is A and the topic level is m, the beginning of the topic establishment section to which A belongs is the start point of the topic continuation section, and immediately before the start of the topic whose topic level is m or less and the end of the monologue data. Of the two, the one that appears earlier in time is the end point of the topic continuation section.

【００６１】意味的展開処理１４０話題確立区間の決定１４１基盤展開の中で話題が展開される意味的展開において、
話題が提示・確立される話題確立区間を同定する。図１
２に意味的展開における話題確立区間の決定処理の流れ
の例を示す。 Semantic expansion processing 140 Determination of topic establishment section 141 In the semantic expansion in which the topic is expanded in the basic expansion,
The topic establishment section in which the topic is presented / established is identified. Figure 1
2 shows an example of the flow of processing for determining a topic establishment section in semantic expansion.

【００６２】まず、各ブロックあるいは疑似ブロックに
対して、ブロックまたは疑似ブロックに含まれる単文数
が４以上であるかを確認し（ステップＳ１２０１）、ブ
ロック内での第１文が２単文以上含むかを確認する（ス
テップＳ１２０２）。単文数が４以上であり、かつ、ブ
ロックまたは疑似ブロック内での第１文が２単文以上含
めば、そのブロックまたは疑似ブロックから話題確立区
間の候補を抽出する（ステップＳ１２０３）。First, for each block or pseudo block, it is confirmed whether the number of simple sentences contained in the block or pseudo block is 4 or more (step S1201), and whether the first sentence in the block includes two or more simple sentences. Is confirmed (step S1202). If the number of simple sentences is four or more and the first sentence in the block or pseudo block includes two or more simple sentences, a topic establishment section candidate is extracted from the block or pseudo block (step S1203).

【００６３】ここで、この話題確立区間候補は、そのブ
ロックまたは疑似ブロックの先頭を開始点とし、そのブ
ロックまたは疑似ブロック内の第１文の終わりと、モノ
ローグの最後から４単文目の終わりの２つのうち、時間
的に早いほうを終了点とする。Here, this topic establishment section candidate starts at the beginning of the block or pseudo block and ends at the end of the first sentence in the block or pseudo block and at the end of the fourth simple sentence from the end of the monologue. Of the two, the end point is the earliest in time.

【００６４】もし、ブロックまたは疑似ブロックに含ま
れる単文数が４以上未満であるか、あるいはブロックま
たは疑似ブロック内での第１文が１単文しか含まなけれ
ば、この、ブロックまたは疑似ブロックには話題はない
ものとする（ステップＳ１２０６）。ただし、この「４
単文」や「１単文」という値は、モノローグ・データの
性質に応じて人間があらかじめ適切な値を与えるものと
する。If the number of simple sentences contained in the block or pseudo block is less than 4 or if the first sentence in the block or pseudo block contains only one simple sentence, this block or pseudo block has a topic. It is assumed that there is none (step S1206). However, this "4
Values such as "single sentence" and "1 simple sentence" are given by humans in advance in accordance with the properties of the monologue data.

【００６５】話題確立区間の候補が存在した場合、その
候補内に、意味的展開用話題候補優先順位が含まれる顕
著名詞句があるかを調べる（ステップＳ１２０４）。When there is a candidate for the topic establishment section, it is checked whether or not there is a prominent noun phrase including the topic candidate priority order for semantic expansion in the candidate (step S1204).

【００６６】ここで、意味的展開用話題候補優先順位の
例を図１３に示す。この例では、優先順位は２レベルに
別れる。優先順位の高い方の候補は、（１）同じ文に「尋ねる」、「問う」などの疑問表現を
伴う顕著名詞句である。（２）優先順位の低い方の候補は、（ａ）タイトルか直
前の要約区間に含まれている顕著名詞句、（ｂ）固有名
詞を含む顕著名詞句、（ｃ）明示マーカで示された顕著
名詞句であり、この３つは同じ優先順位である。Here, an example of the topic candidates priority order for semantic expansion is shown in FIG. In this example, the priority is divided into two levels. The candidate with the higher priority is (1) a prominent noun phrase accompanied by an interrogative expression such as "ask" or "ask" in the same sentence. (2) The candidate with the lower priority is indicated by (a) a prominent noun phrase included in the title or the immediately preceding summary section, (b) a prominent noun phrase including a proper noun, and (c) an explicit marker. It is a prominent noun phrase, and these three have the same priority.

【００６７】もし、話題確立区間候補の中に、意味的展
開用話題候補優先順位が１つ以上含まれていれば、この
話題確立区間候補を話題確立区間として認定する（ステ
ップＳ１２０５）。もし、意味的展開用話題候補優先順
位が１つも含まれていなければ、この話題確立区間候補
を棄却し、そのブロックには話題確立区間は存在しない
ものとする（ステップＳ１２０６）。If one or more topic candidate priority orders for semantic development are included in the topic establishment section candidates, this topic establishment section candidate is recognized as a topic establishment section (step S1205). If no topic candidate priority for semantic expansion is included, this topic establishment section candidate is rejected, and it is assumed that there is no topic establishment section in the block (step S1206).

【００６８】上記の各動作による処理をテキスト全体に
施して終了する（ステップＳ１２０７）。The processing by each operation described above is applied to the entire text and the processing ends (step S1207).

【００６９】話題語の決定１４２基盤展開の中で話題が展開される意味的展開での話題確
立区間において、どのような話題が提示されているかを
同定する。図１４に意味的展開における話題語決定処理
の流れの例を示す。Determining Topic Word 142 Identify what kind of topic is presented in the topic establishment section in the semantic expansion in which the topic is expanded in the basic expansion. FIG. 14 shows an example of the flow of topic word determination processing in semantic expansion.

【００７０】まず、意味的展開における各話題確立区間
について、図１３に例を示した「意味的展開用話題候補
優先順位」に基づいて最も優先順位が高いものを抽出す
る（ステップＳ１４０１）。次に、抽出された候補が１
つであるかを確認し（ステップＳ１４０２）、もし、抽
出された候補が１つであれば、それを話題として認定す
る（ステップＳ１４０４）。もし、複数の候補が抽出さ
れていれば、最も時間的に早く出現した候補を選び（ス
テップＳ１４０３）、話題として認定する（ステップＳ
１４０４）。First, for each topic establishment section in the semantic expansion, the one having the highest priority is extracted based on the "semantic expansion topic candidate priority" shown in FIG. 13 (step S1401). Next, the extracted candidate is 1
If there is one extracted candidate, it is recognized as a topic (step S1404). If a plurality of candidates have been extracted, the candidate that appears earliest in time is selected (step S1403) and is recognized as a topic (step S).
1404).

【００７１】話題レベル区間の決定１４３意味的展開における話題に対して、その話題レベルと話
題が継続する区間を決定する。図１５に意味的展開にお
ける話題レベル区間決定処理の流れの例を示す。ここ
で、現在処理対象としている話題をＡとする。まず、話
題Ａの話題レベルを１とし（ステップＳ１５０１）次
に、Ａの属する話題確立区間の先頭をＡの継続区間の開
始点とする。また、意味的展開で話題Ａの次に現われる
話題の直前と、基盤展開での話題確立区間の開始点の直
前と、モノローグ・データ終了時の３つのうち、時間的
に最も早く起きたものを話題Ａの継続区間の終了点とす
る（ステップＳ１５０２）。Determination of topic level section 143 For a topic in semantic development, the topic level and the section in which the topic continues are determined. FIG. 15 shows an example of the flow of topic level section determination processing in semantic development. Here, it is assumed that the topic currently being processed is A. First, the topic level of the topic A is set to 1 (step S1501), and then the head of the topic establishment section to which A belongs is set as the start point of the continuation section of A. In addition, immediately before the topic that appears next to topic A in semantic expansion, immediately before the start point of the topic establishment section in infrastructure expansion, and at the end of monologue data, the one that occurred earliest in time The continuation section of the topic A is set as the end point (step S1502).

【００７２】基盤展開と意味的展開の統合処理１５０これまで求めた基盤展開における話題構造と意味的展開
における話題構造を統合する。図１６に基盤展開と意味
的展開の統合処理の流れの例を示す。 Integrated processing 150 of basic expansion and semantic expansion The topic structure in the basic expansion and the topic structure in the semantic expansion that have been obtained so far are integrated. FIG. 16 shows an example of the flow of the integrated processing of the basic expansion and the semantic expansion.

【００７３】まず、話題が基盤展開の話題であるかの確
認を行う（ステップＳ１６０１）。確認の結果、基盤展
開における話題に対しては、そのまま話題構造を統合し
（ステップＳ１６０３，Ｓ１６０４）、意味的展開にお
ける話題に対しては、話題レベルの補正を行ない（ステ
ップＳ１６０２）、その後で統合を行なう。First, it is confirmed whether or not the topic is a topic of infrastructure development (step S1601). As a result of the confirmation, the topic structure is integrated as it is with respect to the topic in the infrastructure development (steps S1603 and S1604), the topic level is corrected with respect to the topic in the semantic development (step S1602), and then integrated. Do.

【００７４】ステップＳ１６０２で行われる意味的展開
の話題レベルの補正は、元の話題レベルに、その時点で
の基盤展開の話題レベルの最大値を加えることにより行
なう。統合後に得られた話題構造が、最終的な話題構造
である。The correction of the topic level of the semantic expansion performed in step S1602 is performed by adding the maximum value of the topic level of the basic expansion at that time to the original topic level. The topic structure obtained after the integration is the final topic structure.

【００７５】モノローグ・データ例を用いた説明次に、具体的なモノローグ・データ例を用いて、本発明
を適用した場合の、話題構造認識方法を詳細にする。Description Using Monologue Data Example Next, a topic structure recognition method to which the present invention is applied will be described in detail using a specific monologue data example.

【００７６】話題構造認識前処理１２０単文区切り処理１２２の具体例図１７は本発明の一実施例のモノローグ・データ例を示
す図である。このモノローグ・データ例を、図３に示し
た単文区切り規則例等を用いて単文に分割した例を図１
８に示す。Specific Example of Topic Structure Recognition Preprocessing 120 Single Sentence Separation Processing 122 FIG. 17 is a diagram showing an example of monologue data according to an embodiment of the present invention. An example in which this monologue data example is divided into simple sentences using the simple sentence segmentation rule example shown in FIG.
8 shows.

【００７７】顕著名詞句の抽出１２３の具体例分割した各単文に対して、顕著名詞句を抽出する。図１
８の単文分割結果に対して、図４に示すような顕著名詞
句マーカを用いて顕著名詞句を抽出した結果を図１９に
示す。ここで、図１９においては、顕著名詞句をアンダ
ーラインで示し、また、説明のための単文番号を（１−
１），（１−２），…のように示す。Specific Example of Extraction 123 of Prominent Noun Phrase A prominent noun phrase is extracted for each divided single sentence. Figure 1
FIG. 19 shows the result of extracting prominent noun phrases using the prominent noun phrase marker as shown in FIG. Here, in FIG. 19, salient noun phrases are underlined, and simple sentence numbers for explanation are (1-
1), (1-2), ...

【００７８】単文（１−１）では明示マーカ「は」によ
って顕著名詞句「会社Ａの通信サービス」が抽出され
る。また、単文（１−２）では、非明示マーカ「を」が
存在するが、マークされている語が具体的な意味を持た
ないダイクシス表現「それら」であるので、この単文か
らは顕著名詞句は抽出されない。また、単文（３−１）
には、明示マーカ「は」によって示される「サービス
Ａ」と、非明示マーカ「に」によって示される「競合他
社」が含まれるが、図４の優先順位によれば、明示マー
カ「は」の方が優先順位が高いので、「サービスＡ」を
顕著名詞句として選択する。他の単文でも同様にして、
顕著名詞句マーカによって顕著名詞句を抽出する。In the simple sentence (1-1), the prominent noun phrase "communication service of company A" is extracted by the explicit marker "ha". Also, in the simple sentence (1-2), the implicit marker "wo" is present, but since the marked word is the DIXIS expression "these" that does not have a concrete meaning, the salient noun phrase is derived from this simple sentence. Is not extracted. Also, simple sentence (3-1)
Includes "service A" indicated by the explicit marker "ha" and "competitor" indicated by the implicit marker "ni", but according to the priority of FIG. Since the priority is higher, "service A" is selected as the salient noun phrase. Do the same for other simple sentences,
The salient noun phrase is extracted by the salient noun phrase marker.

【００７９】ブロック認識１２４の具体例モノローグ・データにおけるブロックを認識する。図１
７に示したモノローグ・データ例に対するブロックの認
識例を図２０に示す。図５に示したブロック認識規則例
（ａ）により、１文内は同一ブロックに属するので、図
２０の（１）から（７）までの各文の文内は同一ブロッ
クに属する。さらに、図５のブロック認識規則例（ｂ）
に示したように、文（６）の話題継続句「これは」によ
り、文（５）と（６）同一ブロックに属する。Specific Example of Block Recognition 124 A block in monologue data is recognized. Figure 1
An example of block recognition for the example monologue data shown in FIG. 7 is shown in FIG. According to the block recognition rule example (a) shown in FIG. 5, since one sentence belongs to the same block, the sentences of each sentence (1) to (7) in FIG. 20 belong to the same block. Furthermore, the block recognition rule example (b) in FIG.
As shown in FIG. 5, sentences (5) and (6) belong to the same block by the topic continuation phrase “konoha” of sentence (6).

【００８０】基盤展開処理１３０基盤展開処理１３０における話題確立区間決定１３１の
具体例基盤展開における話題確立区間を同定する。モノローグ
・データ例における基盤展開の話題確立区間の同定結果
を図２１に示す。Basic Development Processing 130 Specific Example of Topic Establishment Section Determination 131 in Basic Development Processing 130 The topic establishment section in the basic development is identified. FIG. 21 shows the identification result of the topic establishment section of the infrastructure development in the example of monologue data.

【００８１】第１に、図６に示した話題確立区間の決定
処理に従って、モノローグ開始時と手掛かり句を含むブ
ロックの先頭を話題確立区間の開始点とする。図２１で
はａ，ｂ，ｃの各ブロックの先頭を話題確立区間の開始
点とする。First, according to the topic establishment section determination processing shown in FIG. 6, the start point of the topic establishment section is set to the start of the monologue and the beginning of the block including the clue phrase. In FIG. 21, the beginning of each block of a, b, and c is the start point of the topic establishment section.

【００８２】第２に、各話題確立区間に対して話題提示
型の同定処理を行う。図７の話題提示型同定処理例に示
すように、各話題確立区間の開始点からの第１文に含ま
れる基盤展開用話題候補優先の上位２つの抽出を行う
と、ブロックａから始まる話題確立区間からは「会社Ａ
の通信サービス」の１つだけが検出されるので、この話
題確立区間は一括型である。また、ブロックｂから始ま
る話題確立区間からは「様々な新規サービス」と「サー
ビスＡやサービスＢ、サービスＣなど」が検出される
が、「様々な新規サービス」は優先順位が（２）である
ので、この話題確立区間も一括型である。また、ブロッ
クｅから始まる話題確立区間からは「従来からのサービ
ス」だけが検出されるので、この話題確立区間も一括型
である。Secondly, topic presentation type identification processing is performed for each topic establishment section. As shown in the topic presentation type identification processing example of FIG. 7, when the top two priority topic candidates for infrastructure development included in the first sentence from the start point of each topic establishment section are extracted, the topic establishment starting from block a is established. From the section, "Company A
This topic establishment section is a collective type because only one of the "communication services of" is detected. Also, "various new services" and "service A, service B, service C, etc." are detected from the topic establishment section starting from block b, but the priority of "various new services" is (2). Therefore, this topic establishment section is also a collective type. Also, since only the "conventional service" is detected from the topic establishment section starting from block e, this topic establishment section is also a collective type.

【００８３】第３に、モノローグ例におけるすべての話
題確立区間での話題提示型は一括型であったので、図６
の話題確立区間決定処理例にしたがって、話題確立区間
の開始点から第１文の範囲を話題確立区間とする。その
結果を図２１に示した。また、上記処理により認識した
話題確立区間を要約区間としても認定する。Thirdly, since the topic presentation type in all the topic establishment sections in the monologue example is the collective type, FIG.
According to the topic establishment section determination processing example, the range of the first sentence from the start point of the topic establishment section is set as the topic establishment section. The result is shown in FIG. Further, the topic establishment section recognized by the above processing is also recognized as a summary section.

【００８４】基盤展開処理１３０における話題語決定１
３２の具体例基盤展開の話題確立区間における話題語の決定を行な
う。基盤展開には図２１に示すように３つの話題確立区
間が認定された。ブロックａの「会社Ａの通信サービ
ス」は、図８に示した基盤展開用話題候補優先順位を満
たすので、図１１の基盤展開における話題語決定処理に
したがって、話題語候補として抽出する。ブロックａに
含まれる話題確立区間から抽出される候補はこれだけで
あるので、「会社Ａの通信サービス」をブロックａに含
まれる話題確立区間での話題語として決定する。同様
に、ブロックｂに含まれる話題確立区間での話題語とし
て「様々な新規サービス」を、ブロックｅに含まれる話
題確立区間での話題語として「従来からのサービス」を
抽出する。Topic word determination 1 in base development processing 130
32 specific examples Topic words are determined in the topic establishment section of the infrastructure development. As shown in FIG. 21, three topic establishment sections were approved for the infrastructure development. The “communication service of company A” in block a satisfies the topic candidate priority for infrastructure development shown in FIG. 8, so it is extracted as a topic word candidate in accordance with the topic word determination processing in infrastructure development in FIG. Since this is the only candidate extracted from the topic establishment section included in block a, “communication service of company A” is determined as the topic word in the topic establishment section included in block a. Similarly, "various new services" are extracted as topic words in the topic establishment section included in block b, and "conventional services" are extracted as topic words in the topic establishment section included in block e.

【００８５】基盤展開における話題レベル区間決定１３
３の具体例基盤展開における話題語の話題レベル区間を決定する。
図１０の規則にしたがって、モノローグの最初の話題
「会社Ａの通信サービス」の話題レベルを１とする。次
の話題「様々な新規サービス」が提示されているブロッ
クには「まず」という入れ子開始型の手掛かり句が提示
されているので、図１０の規則にしたがって話題レベル
を１増加させ、２とする。Topic level section determination 13 in infrastructure development
Specific example of 3 Determine the topic level section of the topic word in the infrastructure development.
According to the rule of FIG. 10, the topic level of the first topic “Communication Service of Company A” in the monologue is set to 1. In the block where the next topic "various new services" is presented, the nesting-type clue phrase "first" is presented, so the topic level is increased by 1 according to the rule of FIG. .

【００８６】次の話題「従来からのサービス」が提示さ
れているブロックには「次に」という話題転換開始型の
手掛かり句が提示されており、その前に提示されている
手掛かり句は「まず」という入れ子開始型であるので、
図１０に示した規則にしたがって話題レベルはそれ以前
と同じ２とする。In the block where the next topic “conventional service” is presented, the topic conversion start type clues “next” are presented, and the clues presented before that are “first Is a nested start type,
According to the rules shown in FIG. 10, the topic level is set to 2 which is the same as before.

【００８７】また、各話題の継続区間は、図１１に示し
た規則を用いて決定する。「会社Ａの通信サービス」の
継続区間はモノローグ・データ開始からモノローグ・デ
ータ終了まで、「様々な新規サービス」の継続区間は図
２１のブロックｂ，ｃ，ｄで、「従来からのサービス」
の継続区間は図２２のブロックｅ，ｆである。これらの
話題レベル区間の認識結果を図２２に示す。Further, the continuation section of each topic is determined using the rule shown in FIG. The continuation section of "Communication service of company A" is from the start of monologue data to the end of monologue data, and the continuation section of "various new services" is blocks b, c, and d in FIG.
22 is a block e, f of FIG. FIG. 22 shows the recognition results of these topic level sections.

【００８８】意味的展開処理１４０意味的展開処理１４０における話題確立区間決定１４１
の具体例意味的展開における話題確立区間を同定する。図１２の
話題確立区間決定処理の例に示したように、基盤展開で
の話題確立区間を含まない各ブロック、あるいは基盤展
開での話題確立区間を含む各ブロック内の疑似ブロック
だけをここでは対象とするので、図２２に示すように、
ブロックｃ，ｄ，ｆを処理対象とする。[0088] topic established interval determined in semantic expansion processing 140 semantic expansion processing 140 141
Example of identifying topic establishment intervals in semantic development. As shown in the example of the topic establishment section determination process of FIG. 12, only the blocks that do not include the topic establishment section in the basic expansion or the pseudo blocks in each block that includes the topic establishment section in the basic expansion are targeted here. Therefore, as shown in FIG.
The blocks c, d, and f are processed.

【００８９】ブロックｄに関しては、図１２の話題確立
区間決定処理における「ブロックに含まれる単文数が４
以上」という条件を満たさない。したがって、ブロック
ｄには話題確立区間は含まれないとみなす。Regarding the block d, in the topic establishment section determination process of FIG. 12, "the number of simple sentences contained in the block is 4
The above condition is not satisfied. Therefore, it is considered that the block d does not include the topic establishment section.

【００９０】ブロックｃとｆに関しては、図１２の話題
確立区間決定処理における「ブロックに含まれる単文数
が４以上」と「ブロック内での第１文が２単文以上含
む」という条件をともに満たす。したがって、両ブロッ
クとも、ブロック内の第１文の終わりと、モノローグの
最後から４単文目の終わりのうち、時間的に最初に現わ
れた方までの区間を話題確立区間候補とする。Regarding the blocks c and f, both the conditions "the number of simple sentences included in the block is 4 or more" and "the first sentence in the block includes 2 simple sentences or more" are both satisfied in the topic establishment section determination process of FIG. . Therefore, in both blocks, the section from the end of the first sentence in the block or the end of the fourth simple sentence from the end of the monologue to the one that appears first in time is set as the topic establishment section candidate.

【００９１】これらの話題確立区間候補には、ともに図
１３に示した意味的展開用話題候補優先順位」に含まれ
る候補が１つ以上ある。したがって、これらの話題確立
区間候補を話題確立区間として認定する。Among these topic establishment section candidates, there are at least one candidate included in the semantic development topic candidate priority shown in FIG. Therefore, these topic establishment section candidates are recognized as topic establishment sections.

【００９２】意味的展開処理１４０における話題語決定
１４２の具体例意味的展開の話題確立区間における話題語の決定を行な
う。意味的展開には図２２に示すように２つの話題確立
区間が認定された。ブロックｃの「サービスＡ」は、図
１３に示した意味的展開用話題候補優先順位を満たすの
で、図１４の意味的展開における話題語決定処理にした
がって、話題語候補として抽出する。図１３に示した意
味的展開用話題候補優先順位において、「サービスＡ」
よりも優先順位の高いもの、すなわち「疑問表現を伴う
顕著名詞句」がブロックｃの話題確立区間に含まれてい
ないと仮定する。すると、たとえこの話題確立区間にお
いて「サービスＡ」以外の候補が検出されても、時間的
に最も早く現われたものを選択することになるので、最
初に出現した「サービスＡ」を話題として認定する。同
様に、ブロックｆでは、「番号案内の有料化」が話題と
して認定される。Specific Example of Topic Word Determination 142 in Semantic Expansion 140 The topic word in the topic establishment section of the semantic expansion is determined. As shown in FIG. 22, two topic establishment sections were recognized in the semantic development. Since the "service A" of the block c satisfies the topic candidate priority for semantic expansion shown in FIG. 13, it is extracted as a topic word candidate in accordance with the topic word determination process in the semantic expansion of FIG. In the priority order of topic candidates for semantic development shown in FIG. 13, "service A"
It is assumed that a higher priority, that is, "a salient noun phrase with an interrogative expression", is not included in the topic establishment section of the block c. Then, even if a candidate other than “service A” is detected in this topic establishment section, the one that appears earliest in time is selected. Therefore, the first appearing “service A” is recognized as a topic. . Similarly, in block f, “payment of directory assistance” is recognized as a topic.

【００９３】意味的展開処理１４０における話題レベル
区間決定１４３の具体例意味的展開における話題語の話題レベル区間を決定す
る。図１５の規則にしたがって、全ての話題語の話題レ
ベルを仮に１とする。また、各話題の継続区間も図１６
の規則にしたがって決定する。「サービスＡ」の継続区
間は、図２２のブロックｃ，ｄとし、「番号案内の有料
化」の継続区間は図２２のブロックｆとする。これらの
話題レベル区間の認識結果を図２４に示す。Specific Example of Topic Level Section Determination 143 in Semantic Expansion 140 The topic level section of the topic word in the semantic expansion is determined. According to the rule of FIG. 15, the topic level of all topic words is temporarily set to 1. The continuation section of each topic is also shown in FIG.
It decides according to the rule of. The continuation section of "service A" is shown as blocks c and d in FIG. 22, and the continuation section of "charged number guidance" is shown as block f in FIG. FIG. 24 shows the recognition result of these topic level sections.

【００９４】基盤展開と意味的展開の統合処理１５０の
具体例基盤展開と意味的展開における話題構造を統合する。図
１６に示した基盤展開と意味的展開の統合規則にしたが
って、意味的展開での話題の話題レベルを補正する。統
合結果を図２５に示す。これが、図１７のモノローグ・
データ例に対して、本発明の話題構造認識方法を適用し
てえられた話題構造である。Specific Example of Integrated Processing 150 of Infrastructure Expansion and Semantic Expansion The topic structures in the infrastructure expansion and the semantic expansion are integrated. The topic level of the topic in the semantic expansion is corrected according to the integrated rule of the basic expansion and the semantic expansion shown in FIG. The integration result is shown in FIG. This is the monologue of Figure 17.
It is a topic structure obtained by applying the topic structure recognition method of the present invention to a data example.

【００９５】実験データ本発明の話題構造認識方法を実際のモノローグ・データ
に適用した評価実験の結果を示す。評価としては、人間
が認識した話題構造と計算機が認識した話題構造を比較
することにより、再現率と適合率を求める方法を採用し
た。ここで、再現率とは人間が認識した話題構造のう
ち、どれだけが計算機によっても認識されているかを示
す尺度であり、適合率とは計算機が認識した話題構造の
うち、どれだけが人間によっても認識されているかを示
す尺度である。もし、人間と計算機がそれぞれ認識した
話題構造が一致すれば、再現率、適合率とも１００％と
なる。Experimental Data The results of an evaluation experiment in which the topic structure recognition method of the present invention is applied to actual monologue data are shown. For evaluation, we adopted a method to obtain recall and precision by comparing the topic structure recognized by human and the topic structure recognized by computer. Here, recall is a measure of how much of a topic structure recognized by humans is recognized by a computer, and precision is how much of a topic structure recognized by a computer by humans. Is also a measure of whether or not is also recognized. If the topic structures recognized by the human and the computer respectively match, both the recall rate and the precision rate are 100%.

【００９６】実験に用いたモノローグ・データは、全部
でテレビニュース原稿１０件であり、単文数にすると３
００である。評価を行なった結果、再現率が７４．０％
で、適合率が６４．０％であった。The monologue data used in the experiment consisted of 10 TV news manuscripts in total, and 3 in the number of single sentences.
00. As a result of evaluation, the recall is 74.0%.
Then, the precision was 64.0%.

【００９７】[0097]

【発明の効果】上述のように本発明により、特定のドメ
インに依存した知識を用いることなく、話題とその構造
を認識することができる。As described above, according to the present invention, a topic and its structure can be recognized without using knowledge depending on a specific domain.

【００９８】このような、認識された話題と話題構造を
利用者に提示することにより、利用者によるモノローグ
・データ内容の大まかな把握を支援することが可能とな
る。また、話題構造を目次として使用することも可能で
ある。By presenting the recognized topic and topic structure to the user as described above, it is possible to assist the user in roughly understanding the monologue data content. It is also possible to use the topic structure as a table of contents.

[Brief description of drawings]

【図１】本発明の一実施例のモノローグ・データ話題構
造認識のための処理を示すフローチャートである。FIG. 1 is a flowchart showing a process for monologue data topic structure recognition according to an embodiment of the present invention.

【図２】本発明の一実施例のモノローグ・データ話題構
造認識装置のブロック図である。FIG. 2 is a block diagram of a monologue data topic structure recognition device according to an embodiment of the present invention.

【図３】本発明の一実施例に用いられる単文区切り規則
の例を示す図である。FIG. 3 is a diagram showing an example of a simple sentence break rule used in an embodiment of the present invention.

【図４】本発明の一実施例に用いられる顕著名詞句のマ
ーカ優先順位規則の例を示す図である。FIG. 4 is a diagram showing an example of marker priority rules for salient noun phrases used in an embodiment of the present invention.

【図５】本発明の一実施例に用いられるブロック認識規
則の例を示す図である。FIG. 5 is a diagram showing an example of a block recognition rule used in an embodiment of the present invention.

【図６】本発明の一実施例の基盤展開における話題確立
区間の決定処理を示すフローチャートである。FIG. 6 is a flowchart showing a process of determining a topic establishment section in the infrastructure development of the embodiment of the present invention.

【図７】本発明の一実施例の話題提示型の同定処理を示
すフローチャートである。FIG. 7 is a flowchart showing a topic presentation type identification process according to an embodiment of the present invention.

【図８】本発明の一実施例の基盤展開用話題候補優先順
位を示す図である。FIG. 8 is a diagram showing a topic candidate priority order for infrastructure development according to an embodiment of the present invention.

【図９】本発明の一実施例の基盤展開における話題語決
定処理を示すフローチャートである。FIG. 9 is a flowchart showing a topic word determination process in the infrastructure development of the embodiment of the present invention.

【図１０】本発明の一実施例に用いられる基盤展開用レ
ベル付け規則の例を示す図である。FIG. 10 is a diagram showing an example of a base deployment leveling rule used in an embodiment of the present invention.

【図１１】本発明の一実施例の意味的展開における話題
確立区間の決定処理を示すフローチャートである。FIG. 11 is a flow chart showing a process of determining a topic establishment section in the semantic development of one embodiment of the present invention.

【図１２】本発明の一実施例の意味的展開用話題候補優
先順位を示す図である。FIG. 12 is a diagram showing semantic development topic candidate priorities according to an embodiment of the present invention.

【図１３】本発明の一実施例の意味展開における話題語
決定処理を示すフローチャートである。FIG. 13 is a flowchart showing a topic word determination process in the semantic expansion according to the embodiment of the present invention.

【図１４】本発明の一実施例の基盤展開における話題語
継続区間決定処理を示すフローチャートである。FIG. 14 is a flowchart showing topic word continuation section determination processing in infrastructure development of an embodiment of the present invention.

【図１５】本発明の一実施例の意味的展開での仮の話題
レベル区間決定処理を示すフローチャートである。FIG. 15 is a flowchart showing a temporary topic level section determination process in semantic expansion according to an example of the present invention.

【図１６】本発明の一実施例の基盤展開と意味的展開の
統合処理を示すフローチャートである。FIG. 16 is a flow chart showing an integrated process of infrastructure expansion and semantic expansion according to an embodiment of the present invention.

【図１７】本発明の一実施例のモノローグ・データ例を
示す図である。FIG. 17 is a diagram showing an example of monologue data according to the embodiment of the present invention.

【図１８】本発明の一実施例のモノローグ・データ例に
対する単文区切り結果の例を示す図である。FIG. 18 is a diagram showing an example of a simple sentence segmentation result for an example of monologue data according to an embodiment of the present invention.

【図１９】本発明の一実施例のモノローグ・データ例に
おける顕著名詞句の例を示す図である。FIG. 19 is a diagram showing an example of a salient noun phrase in an example of monologue data according to an embodiment of the present invention.

【図２０】本発明の一実施例のモノローグ・データ例に
おけるブロックを示す図である。FIG. 20 is a diagram showing blocks in an example of monologue data according to an embodiment of the present invention.

【図２１】本発明の一実施例のモノローグ・データ例で
の基盤展開における話題確立区間を示す図である。FIG. 21 is a diagram showing a topic establishment section in the infrastructure development in the monologue data example of the embodiment of the present invention.

【図２２】本発明の一実施例のモノローグ・データ例で
の基盤展開における話題構造を示す図である。FIG. 22 is a diagram showing a topic structure in infrastructure development in an example of monologue data according to an embodiment of the present invention.

【図２３】本発明の一実施例のモノローグ・データ例で
の意味的展開における話題確立区間を示す図である。FIG. 23 is a diagram showing topic establishment sections in the semantic expansion in the monologue data example of the embodiment of the present invention.

【図２４】本発明の一実施例のモノローグ・データ例で
の意味的展開における話題構造をを示す図である。FIG. 24 is a diagram showing a topic structure in semantic expansion in an example of monologue data according to an embodiment of the present invention.

【図２５】本発明の一実施例のモノローグ・データ例に
おける話題構造を示す図である。FIG. 25 is a diagram showing a topic structure in an example of monologue data according to an embodiment of the present invention.

[Explanation of symbols]

１１０モノローグ・データ１２０話題構造認識前処理１２１形態素解析１２２単文区切り処理１２３顕著名詞句抽出１２４ブロック認識１３０基盤展開処理１３１，１４１話題確立区間の決定１３２，１４２話題語の決定１３３，１４３話題レベル区間の決定１４１意味的展開処理１５０基盤展開と意味的展開の統合処理１６０話題構造２０１データ入力部２０２前処理記憶部２０３話題構造認識前処理部２０４前処理用辞書２０５基盤展開処理規則管理部２０６基盤展開処理規則２０７基盤展開処理記憶部２０８意味的展開処理規則管理部２０９意味的展開処理規則２１０意味的展開処理記憶部２１１統合処理記憶部２１２統合処理部２１３統合処理規則管理部２１４統合処理規則２１５表示部２３０基盤展開処理部２３１，２４１話題区間確立決定処理部２３２，２４２話題語決定処理部２３３，２４３話題レベル区間決定処理部２４０意味的展開処理部 110 Monologue data 120 Topic structure recognition preprocessing 121 Morphological analysis 122 Simple sentence segmentation processing 123 Prominent noun phrase extraction 124 Block recognition 130 Infrastructure development processing 131, 141 Topic establishment section determination 132, 142 Topic word determination 133, 143 Topic level section 141 Semantic expansion processing 150 Integrated processing of infrastructure expansion and semantic expansion 160 Topic structure 201 Data input unit 202 Preprocessing storage unit 203 Topic structure recognition preprocessing unit 204 Preprocessing dictionary 205 Base expansion processing rule management unit 206 Base Expansion processing rule 207 Basic expansion processing storage unit 208 Semantic expansion processing rule management unit 209 Semantic expansion processing rule 210 Semantic expansion processing storage unit 211 Integrated processing storage unit 212 Integrated processing unit 213 Integrated processing rule management unit 214 Integrated processing rule 215 Display 2 0 base expansion processing unit 231, 241 topic section establishment determination processing unit 232 and 242 the topic word determination processing unit 233, 243 the topic level section determination processing unit 240 semantic development processing unit

Claims

[Claims]

1. The monologue data is first subjected to topic structure recognition preprocessing by using a topic structure recognition preprocessing dictionary, and then the development of the topic is explicitly indicated by a clue phrase or the like. Separation into the expansion and the semantic expansion that expands in the basic expansion, and the topic is presented and established for each of the basic expansion and the semantic expansion by using the basic expansion processing rule and the semantic expansion processing rule, respectively. Identification of topic establishment sections, identification of topics in topic establishment sections, identification of topic nesting levels and continuation sections are performed in sequence, and then, using integrated processing rules, processing results for basic expansion and semantic expansion respectively. In the topic structure recognition method that performs the process of integrating with each other, as a pre-process, an explicit marker that has only the function of presenting the prominent noun phrase as a prominent noun phrase candidate and its It is classified into the non-explicit markers outside and registered in the prominent noun phrase marker priority rule together with its type and priority order, and each simple sentence in the language data is matched with the prominent noun phrase marker priority rule. By doing so, the prominent noun phrase candidates are extracted and prioritized, the candidate with the highest priority is selected as the prominent noun phrase, and the topic continuation phrase, which is the language expression in which the topic continues, is registered, and the monologue For monologue data characterized by recognizing blocks, which are a set of semantically coherent sentences, by using information of topic continuation phrases contained in the data and information of units of sentences indicated by punctuation marks etc. Topic structure recognition method.

2. The topic structure recognition method for monologue data according to claim 1, wherein the base expansion processing uses the clue phrase and its type information included in the language data and the block information. , In the above-mentioned infrastructure development, the topic establishment section in which the topic is presented / established is identified, and in each topic establishment section in the infrastructure development, according to the topic candidate priority for infrastructure development, from the salient noun phrase, the highest priority is given. By selecting the highest topic candidate, if there is only one selected candidate, that candidate is made a topic, and if there are multiple selected candidates, the candidate that appears earliest in time is selected, and Decide the topic word, set the topic level of the first topic word in the infrastructure development to 1, and for other topics, according to the topic leveling rule for infrastructure development The top of the topic establishment section to which each topic belongs is set as the start point of the continuation section of that topic, and the one that is earlier in time than the start of the topic below the topic level or the monologue / data end point A topic structure recognition method for monologue data, characterized by determining the level and continuation section of a topic word in infrastructure development by setting the end point of a topic establishment section.

3. The topic structure recognition method for monologue data according to claim 1, wherein the block information, the title of the entire sentence, the prominent noun phrase, the topic candidate priority order for semantic expansion, and the basis described above. By using the expanded topic establishment section, the topic establishment section in which the topic is presented / established is identified in the above semantic expansion, and the topic candidate priority order for semantic development is set in each topic establishment section in the semantic expansion. Therefore, the topic candidate with the highest priority is selected from the salient noun phrases, and the selected candidate is 1
If there is only one, the candidate is treated as a topic, and if there are multiple candidates selected, the topic word in the semantic expansion is determined by selecting the candidate that appears earliest in time, and all the candidates in the semantic expansion are determined. A temporary topic level is set to 1 for a topic word, the beginning of the topic establishment section to which each topic belongs is set as the start point of the continuation section of the topic, and the topic expansion is performed immediately before the start of the topic next to the topic by semantic expansion. The temporary level and continuation section of the topic word in the semantic expansion are determined by setting the end point of the topic establishment section to be the earlier of the three points immediately before the start point of the topic establishment section and the end point of the language data. A topic structure recognition method for monologue data.

4. An input unit for inputting monologue data, means for extracting dictionary contents of a preprocessing dictionary for topic structure recognition, preprocessing using the dictionary contents, and storing in a preprocessing storage unit. The means and the development of the topic are separated into the basic expansion explicitly indicated by a clue phrase and the semantic expansion that expands in the basic expansion, and the processing results for each of the basic expansion and the semantic expansion are stored. Infrastructure expansion processing storage unit, semantic expansion processing storage unit, means for extracting each rule of infrastructure expansion processing rules, and topic expansion interval determination processing, topic word determination processing, and topic level interval determination processing using infrastructure expansion processing rules To store in the base expansion processing storage unit, to extract each rule of the semantic expansion processing rule, and to use the semantic expansion processing rule to determine the topic establishment interval determination processing and the topic word determination. Processing and topic level section determination processing for storing in the semantic expansion processing storage section, means for extracting each rule of the integrated processing rule, integrated processing rule using the integrated processing rule rule, and integrated processing storage section A topic structure recognizing device characterized by having a means for storing in the above and a display section for displaying the contents of the integrated processing storage section.