JP3333952B2

JP3333952B2 - Topic structure recognition method and apparatus

Info

Publication number: JP3333952B2
Application number: JP25679294A
Authority: JP
Inventors: 敦竹下; 一男田中; 孝史井上
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 1994-10-21
Filing date: 1994-10-21
Publication date: 2002-10-15
Anticipated expiration: 2017-10-15
Also published as: JPH08123812A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語解析における
話題構造認識の方法と装置に関する。The present invention relates to a method and an apparatus for topic structure recognition in natural language analysis.

【０００２】[0002]

【従来の技術】人間にテキストや対話データを提示して
「これらテキストないし対話データの中の同じことが書
いてあるブロックと、その『同じこと』を求めよ」とい
う課題を与えると、個人差なく同じ構造を答えるという
性質が実験的に確認されている。その実験については、
例えば『竹下他：「話題構造認識の観点からのヒューマ
ンコミュニケーションの研究」電子情報通信学会１９９
３年秋季大会D-62(p.6-64)』に記載されている。人間に
よって把握されるこのような構造を「話題構造」と呼
ぶ。話題構造は入れ子構造を形成するので、各話題は、
話題を示す「話題語」と、入れ子の深さを表す「話題レ
ベル」と、テキストないし対話データの中においてその
話題がどの文からどの文まで継続するかという「話題ス
コープ」によって表現できる。以下において、話題構造
の解析の対象となるテキストや対話データのことを言語
データと呼ぶ。2. Description of the Related Art When a human being is presented with text or conversation data and given the task of "finding the same thing in a block in which the same text or conversation data is written", there is no individual difference. The ability to answer the same structure has been experimentally confirmed. For that experiment,
For example, "Takeshita et al .:" Research on Human Communication from the Viewpoint of Topic Structure Recognition "IEICE 199
Three-Year Fall Meeting D-62 (p.6-64) ”. Such a structure grasped by a human is called a “topic structure”. Topic structures form a nested structure, so each topic
It can be expressed by a "topic word" indicating a topic, a "topic level" indicating the depth of nesting, and a "topic scope" that indicates from a sentence to a sentence in a text or conversation data. In the following, text and conversation data for which the topic structure is analyzed will be referred to as language data.

【０００３】図１は、電気通信政策に関連した内容の言
語データに対する話題構造の一例を示している。言語デ
ータは、第０文から始まって少なくとも第７７０文まで
続いている。そして、「通信サービス」という話題語を
持つ話題の話題レベルは１であり、その話題スコープは
第０文から第７７０文までの範囲である。なお、説明を
簡単にするために、以下においては、『「通信サービ
ス」の話題』のように、話題語を用いてその話題を指す
ことにする。FIG. 1 shows an example of a topic structure for language data having contents related to a telecommunications policy. The language data starts from the 0th sentence and continues to at least the 770th sentence. The topic level of the topic having the topic word “communication service” is 1, and the topic scope is in the range from the 0th sentence to the 770th sentence. For the sake of simplicity, in the following, the topic will be referred to using a topic word, such as "topic of" communication service "".

【０００４】「通信サービス」の話題の中には、話題レ
ベルが２である「新規サービス」と「従来からのサービ
ス」という話題が存在し、「新規サービス」の話題は第
１２５文から第４３１文までの話題スコープを持ち、
「従来からのサービス」の話題は第４３２文から第７７
０文までの話題スコープを持つ。また、「新規サービ
ス」の話題の中には「サービスＡ」という子話題が、
「従来からのサービス」の話題の中には「サービスＢ」
という子話題が存在し、それぞれの話題スコープは第３
０１文から第４３１文までと第５２１文と第７７０文ま
でである。[0004] Among the topics of "communication services", there are topics of "new service" having a topic level of 2 and "conventional service". The topics of "new service" are from the 125th sentence to the 431rd sentence. Has a topic scope up to the sentence,
The topic of "traditional services" is from the 432rd sentence to the 77th sentence
It has a topic scope of up to 0 sentences. Also, among the topics of "new service", a child topic of "service A"
"Service B" in the topic of "traditional services"
Child topic exists, and each topic scope is 3rd
It is from the 01st sentence to the 431st sentence, the 521st sentence and the 770th sentence.

【０００５】このような話題構造を計算機によって認識
することを話題構造認識と呼ぶ。話題構造を認識するた
めの方法は、これまでにもいくつか提案されている。こ
こでは、『竹下：「話題構造認識を用いた映像検索シス
テム」情報処理学会情報メディア研究会94-IM-15-1』で
述べられている話題構造の認識方法について簡単に説明
する。図２はこの認識方法で使用する話題構造認識装置
の一例の構成を示すブロック図であり、図３はこの認識
方法における話題構造認識処理を示すフローチャートで
あり、図４はこの話題構造認識処理における話題構造認
識前処理以降の処理の流れの一例を示す図である。Recognition of such a topic structure by a computer is called topic structure recognition. Several methods have been proposed for recognizing topic structures. Here, a brief description of the topic structure recognition method described in "Takeshita:" Video Retrieval System Using Topic Structure Recognition ", Information Processing Society of Japan 94-IM-15-1". FIG. 2 is a block diagram showing a configuration of an example of a topic structure recognition device used in this recognition method. FIG. 3 is a flowchart showing a topic structure recognition process in this recognition method. It is a figure showing an example of the flow of processing after topic structure recognition pre-processing.

【０００６】図２に示される従来の話題構造認識装置
は、言語データが入力するデータ入力部７０１と、各種
の処理を実行する処理部７０２と、結果を表示する表示
部７０３と、処理結果や処理途中で必要となるデータを
保持する記憶部７０４と、話題構造認識処理で使用され
る辞書や規則類を格納する辞書・規則部７０５によって
構成されている。記憶部７０４には、前処理後の言語デ
ータを記憶する言語データ記憶部７１０と、中間の処理
結果や最終的な処理結果を保持する話題構造記憶部７１
１とが設けられている。さらに話題構造記憶部７１１に
は、基盤展開記憶部７１２と意味的展開記憶部７１３と
統合話題記憶部７１４が設けられている。一方、辞書・
規則部７０５には、前処理用辞書７２１と意味的展開処
理規則７２２と基盤展開処理規則７２３と統合処理規則
７２４とが設けられている。The conventional topic structure recognition apparatus shown in FIG. 2 includes a data input unit 701 for inputting language data, a processing unit 702 for executing various processes, a display unit 703 for displaying results, a processing result and The storage unit 704 stores data required during processing, and the dictionary / rule unit 705 stores dictionaries and rules used in the topic structure recognition processing. The storage unit 704 includes a language data storage unit 710 that stores preprocessed language data, and a topic structure storage unit 71 that holds intermediate processing results and final processing results.
1 is provided. Further, the topic structure storage unit 711 includes a base development storage unit 712, a semantic development storage unit 713, and an integrated topic storage unit 714. On the other hand, a dictionary
The rule unit 705 is provided with a preprocessing dictionary 721, a semantic expansion processing rule 722, a base expansion processing rule 723, and an integration processing rule 724.

【０００７】この話題構造認識装置を用いて話題構造認
識処理を行う場合、まず、図３に示すように、入力され
た言語データ７３０に対する話題構造認識前処理７４０
を行う。この話題構造認識前処理７４０の第１ステップ
は、入力した言語データ７３０に対する形態素解析処理
７４１である。形態素解析処理７４１では、入力された
言語データ７３０の文字列を単語ごとに区切って単語列
とし、さらに各単語の品詞や活用語の活用形等を同定す
る。続いて、前処理７４０の第２ステップとして、形態
素解析の結果を入力として、単文区切り処理７４２を行
う。単文区切り処理７４２は、埋め込み文や重文のよう
に複数の述語を含む文を、１つの述語のみを含む単文に
分割する処理である。前処理７４０の第３ステップとし
て、顕著名詞句抽出７４３を実行する。顕著名詞句抽出
７４３は、単文区切り処理７４２の結果を入力として、
各単文において最も強調されている名詞句を抽出する処
理である。そして、前処理７４０の第４ステップとし
て、ブロック認識７４４を実行する。ブロック認識７４
４は、テキストでの段落に相当するブロックを認識する
処理である。これら、話題構造認識前処理７４０に属す
る各処理は、辞書・規則部７０５内にある前処理用辞書
７２１を用いて、処理部７０２によって実行され、その
結果は記憶部７０４内の言語データ記憶部７１０に格納
される。When a topic structure recognition process is performed using this topic structure recognition apparatus, first, as shown in FIG. 3, a topic structure recognition preprocessing 740 for input language data 730 is performed.
I do. The first step of the topic structure recognition pre-processing 740 is a morphological analysis process 741 for the input language data 730. In the morphological analysis process 741, the character string of the input language data 730 is divided into words to form a word string, and the part of speech of each word, the inflected form of the inflected word, and the like are identified. Subsequently, as a second step of the preprocessing 740, a single sentence separation process 742 is performed using the result of the morphological analysis as an input. The single sentence delimiting process 742 is a process of dividing a sentence including a plurality of predicates, such as an embedded sentence or a multiple sentence, into a single sentence including only one predicate. As a third step of preprocessing 740, salient noun phrase extraction 743 is executed. The prominent noun phrase extraction 743 receives the result of the single sentence separation process 742 as an input,
This is a process of extracting the noun phrase most emphasized in each simple sentence. Then, as a fourth step of the preprocessing 740, block recognition 744 is executed. Block recognition 74
4 is a process for recognizing a block corresponding to a paragraph in text. Each process belonging to the topic structure recognition pre-processing 740 is executed by the processing unit 702 using the pre-processing dictionary 721 in the dictionary / rule unit 705, and the result is stored in the language data storage unit in the storage unit 704. 710 is stored.

【０００８】話題構造認識前処理７４０が完了したら、
話題の展開の処理を基盤展開処理７５０と意味的展開処
理７６０とに分離して実行する。ここで基盤展開とは、
「まず」や「次に」のような手掛かり句や章立て、箇条
書きなどによって明示的に示された話題展開のことであ
り、意味的展開とは、基盤展開の各話題の中で、明示的
ではない形で提示、進行する話題の展開のことである。When the topic structure recognition pre-processing 740 is completed,
The topic development process is separated into a base development process 750 and a semantic development process 760 and executed. Here, infrastructure development
Topic development explicitly indicated by clue phrases such as "first" and "next", chapters, bullet points, etc. It is the development of a topic that is presented and progressed in a non-target form.

【０００９】まず、図３に示されるように、基盤展開処
理７５０において、話題確立区間の決定７５１、話題語
の決定７５２、話題スコープと話題レベルの決定７５３
という３つの処理を順次行う。ここで話題確立区間と
は、話題が提示、確立される区間のことである。話題語
の決定７５２では、各話題確立区間における顕著名詞句
を話題語候補とし、これら話題語候補の中で優先順位が
最も高いものを選んで話題語とする。話題スコープと話
題レベルの決定７５３では、箇条書き等の構造に基づい
て、処理が行われる。基盤展開処理７５０は、辞書・規
則部７０５内の基盤展開処理規則７２３を用いて処理部
７０２で実行され、その結果は記憶部７０４の中の話題
構造記憶部７１１内に含まれる基盤展開記憶部７１２に
格納される。First, as shown in FIG. 3, in the base development process 750, a topic establishment section determination 751, a topic word determination 752, a topic scope and a topic level determination 753 are performed.
Are sequentially performed. Here, the topic establishment section is a section in which a topic is presented and established. In the topic word determination 752, prominent noun phrases in each topic establishment section are set as topic word candidates, and the topic word with the highest priority among these topic word candidates is selected. In the topic scope and topic level determination 753, processing is performed based on a structure such as an itemized list. The base development processing 750 is executed by the processing unit 702 using the base development processing rule 723 in the dictionary / rule unit 705, and the result is stored in the base development storage unit 711 included in the topic structure storage unit 711 in the storage unit 704. 712.

【００１０】このような基盤展開処理７５０における処
理の具体例が図４に示されている。まず、言語データの
開始時点と「まず」とか「次に」といった手掛かり句の
近辺とを基盤展開の話題確立区間として決定している。
そして、話題語の決定７５２では、最初の話題確立区間
からは「通信サービス」が、２番目の話題確立区間から
は「新規サービス」が、３番目の話題確立区間からは
「従来からのサービス」が、それぞれ、話題語として選
ばれている。FIG. 4 shows a specific example of the processing in the base development processing 750. First, the starting point of the language data and the vicinity of a clue phrase such as "first" or "next" are determined as the topic establishment section of the base development.
Then, in the topic word determination 752, “communication service” from the first topic establishment section, “new service” from the second topic establishment section, and “conventional service” from the third topic establishment section. , Respectively, have been selected as topic words.

【００１１】基盤展開処理７５０の実行後、意味的展開
処理７６０が実行される。意味的展開処理７６０は、基
盤展開処理７５０と同様に、話題確立区間の決定７６
１、話題語の決定７６２、話題スコープと話題レベルの
決定７６３という３つの処理によって構成される。この
意味的展開処理７６０は、辞書・規則部７０５内の意味
的展開処理規則７２２を用いるとともに基盤展開処理７
５０の結果も利用して処理部７０２で実行され、その結
果は記憶部７０４の中の話題構造記憶部７１１に含まれ
る意味的展開記憶部７１３に格納される。After the execution of the base development process 750, a semantic development process 760 is performed. The semantic development process 760 is similar to the base development process 750, and is used to determine the topic establishment section 76
1. It is composed of three processes, namely, topic word determination 762, topic scope and topic level determination 763. This semantic expansion processing 760 uses the semantic expansion processing rule 722 in the dictionary / rule unit 705 and the base expansion processing 7
The processing is also executed by the processing unit 702 using the result of 50, and the result is stored in the semantic development storage unit 713 included in the topic structure storage unit 711 in the storage unit 704.

【００１２】図４に示した例では、話題確立区間とし
て、ある程度以上長い段落が選択され、それらにおける
話題語として、「サービスＡ」と「サービスＢ」が選ば
れている。話題スコープとしては、上述した話題確立区
間の開始点から基盤展開における次の話題確立区間の開
始点までが求められている。話題レベルは、テキストの
意味的展開の場合には、全て同じレベルすなわちレベル
１とされる。In the example shown in FIG. 4, paragraphs longer than a certain length are selected as topic establishment sections, and "service A" and "service B" are selected as topic words in these sections. As the topic scope, the range from the start point of the above-described topic establishment section to the start point of the next topic establishment section in the base development is required. The topic levels are all set to the same level, that is, level 1 in the case of the semantic development of the text.

【００１３】最後に、基盤展開と意味的展開の統合処理
７７０が行われ、その結果として、言語データ全体の話
題構造７８０が出力される。この統合処理７７０は、基
盤展開処理７５０と意味的展開処理７６０のそれぞれの
話題構造を入力とし、辞書・規則部７０５内の統合処理
規則７２４を用いて、処理部７０２によって実行され
る。図４に示した例では、統合処理の結果として、図１
に示したのと同様の話題構造７８０が得られている。Finally, an integration process 770 of the basic development and the semantic development is performed, and as a result, a topic structure 780 of the entire language data is output. The integration process 770 is executed by the processing unit 702 using the topic structures of the base expansion process 750 and the semantic expansion process 760 as input and using the integration process rules 724 in the dictionary / rule unit 705. In the example shown in FIG. 4, as a result of the integration processing, FIG.
The topic structure 780 similar to that shown in FIG.

【００１４】基盤展開と意味的展開のそれぞれにおい
て、話題確立区間や話題語、話題スコープ、話題レベル
を決定するための規則（意味的展開処理規則７２２や基
盤展開処理規則７２３）は、言語データが対話、モノロ
ーグ、書き言葉テキストなどのどの伝達形態によるもの
であるかによって異なる。伝達形態による話題展開様式
や話題構造認識規則の違いと、話題構造認識実験の結果
については、『竹下他：「話題構造認識の観点からのヒ
ューマンコミュニケーションの研究」電子情報通信学会
１９９３年秋季大会D-62(p.6-64)』に記載がある。In each of the basic development and the semantic development, the rules for determining the topic establishment section, topic word, topic scope, and topic level (semantic development processing rule 722 and base development processing rule 723) are based on language data. It depends on the form of communication, such as dialogue, monologue, or written text. For the differences between topic development styles and topic structure recognition rules depending on the transmission form, and the results of topic structure recognition experiments, see "Takeshita et al .:" Study of Human Communication from the Viewpoint of Topic Structure Recognition "IEICE 1993 Fall Meeting D -62 (p.6-64)].

【００１５】[0015]

【発明が解決しようとする課題】しかしながら、上述し
た従来の話題構造認識方法では、講義や解説のような説
明的な言語データに対しては、話題構造の認識精度が低
くなってしまうという問題点がある。特に、認識すべき
話題を認識し損ない、認識される話題の数が少なくなっ
てしまうという傾向がある。このため、認識した話題構
造を目次として利用したときに、人間が目次から話の概
要を把握するのが困難であったリ、目次の項目が少ない
ために必要な情報が記述されている範囲を絞れないとい
う問題も生じる。However, the conventional topic structure recognition method described above has a problem that the recognition accuracy of the topic structure is low for explanatory language data such as lectures and explanations. There is. In particular, there is a tendency that a topic to be recognized is not recognized, and the number of recognized topics is reduced. For this reason, when using the recognized topic structure as a table of contents, it was difficult for humans to grasp the outline of the story from the table of contents. There is also a problem that it cannot be narrowed.

【００１６】例えば、図５に示したモノローグ例に対し
て従来の話題構造認識方法を適用すると、図６のような
目次が自動的に生成される。ところが、図６の目次は、
項目が３つしかないため、この目次から元のモノローグ
例の内容を把握するのは困難である。For example, when a conventional topic structure recognition method is applied to the monolog example shown in FIG. 5, a table of contents as shown in FIG. 6 is automatically generated. However, the table of contents in FIG.
Since there are only three items, it is difficult to grasp the contents of the original monologue example from this table of contents.

【００１７】本発明の目的は、講義や解説などの説明的
な言語データに対して、より高精度に話題構造認識を行
うことが可能な方法及び装置を提供することにある。An object of the present invention is to provide a method and an apparatus capable of performing topic structure recognition with higher accuracy on descriptive language data such as lectures and explanations.

【００１８】[0018]

【課題を解決するための手段】本発明の話題構造認識方
法は、予め定められた規則を用い、言語データに対し
て、話題が提示・確立される話題確立区間の決定と、前
記話題確立区間における話題語の決定とを含む話題構造
認識を実行する話題構造認識方法において、話題構造認
識装置に、前処理用辞書、例示表現の単語と品詞を含む
情報を格納した例示表現辞書を具備し、話題構造認識装
置の話題構造認識前処理工程として、前処理用辞書を活
用して、言語データを述語を１つだけ含む単文として切
り出す単文区切り処理を行い、言語データをテキストに
おける段落に相当するブロックに区分けするブロック認
識処理を行い、これらの処理結果を言語データ記憶部に
格納し、話題構造認識装置の話題構造認識前処理工程の
次の工程として、ブロック単位に、例示表現辞書に記載
の例示表現が単文にあるか否かに応じて異なる処理を行
って話題確立区間を決定し、話題語を決定し、話題スコ
ープと話題レベルの決定を行い、話題構造記憶部に格納
し、話題語構造を作成する。 SUMMARY OF THE INVENTION A topic structure recognition method according to the present invention.
The method uses a predetermined rule to determine a topic establishment section in which a topic is presented / established with respect to language data, and performs topic structure recognition including determination of a topic word in the topic establishment section. In the structure recognition method, topic structure recognition
Includes pre-processing dictionary, words of example expression and part of speech in recognition device
It has an example expression dictionary storing information, and has a topic structure recognition device.
The preprocessing dictionary is used as a preprocessing
Language data as a single sentence containing only one predicate.
Performs single sentence delimitation processing and converts language data to text
Block is divided into blocks corresponding to paragraphs
Knowledge processing and store these processing results in the language data storage unit.
Storing the topic structure recognition preprocessing process of the topic structure recognition device
In the next step, write in the example expression dictionary in block units
Performs different processing depending on whether the example expression of
Determine the topic establishment section, determine the topic word, and
Group and topic level are determined and stored in the topic structure storage
And create a topic word structure.

【００１９】本発明においては、単文に例示表現が含ま
れているかどうかの情報が既に蓄積されていれば、それ
を用いて話題確立区間として認定するかどうかの判断を
行い、そうでなければ、予め準備した例示表現辞書と単
文とのマッチングを取って、各単文に例示表現が含まれ
ているかどうかを調ベればよい。In the present invention, if information as to whether or not an example expression is included in a simple sentence has already been stored, it is used to determine whether or not to be recognized as a topic establishment section. What is necessary is to match a simple sentence prepared with the example expression dictionary prepared in advance and check whether or not each simple sentence includes the example expression.

【００２０】また、ブロックを話題確立区間として認定
するかどうかを決定する際に、言語データが対話データ
である場合には、「現在着目しているブロックの先頭の
単文に予め用意された例示表現辞書に登録された例示表
現か明示マーカ型の前記顕著名詞句かが含まれていて、
かつ、当該ブロックの先頭単文から所定の話題確立区間
最小サイズ以内の単文数の範囲内には当該ブロックに先
立って意味的展開での話題確立区間として認定されてい
たブロックが存在せず、かつ、当該ブロックの開始位置
から次の基盤展開話題確立区間の開始位置までの単文数
が話題確立区間最小サイズより大きいかあるいは前記次
の基盤展開話題確立区間で話題レベルが１増えている」
という条件か、「現在着目しているブロックの先頭の単
文に非明示マーカ型の顕著名詞句が含まれ、かつ当該ブ
ロックに含まれる単文数が話題確立区間最小サイズ以上
である」という条件、のいずれかの条件が満たされてい
る場合に当該ブロックを意味的展開の話題確立区間とし
て認定するようにすればよい。When deciding whether or not to certify a block as a topic establishment section, if the language data is conversational data, an example expression prepared in advance at the head simple sentence of the currently focused block is used. illustrative representation either express marker types that are registered in the dictionary the contain or noticeable noun phrase,
And, within the range of the number of single sentences within a predetermined topic establishment section minimum size from the head single sentence of the block, there is no block that has been identified as a topic establishment section in a semantic expansion prior to the block, and Is the number of single sentences from the start position of the block to the start position of the next base development topic establishment section larger than the topic establishment section minimum size, or the topic level is increased by one in the next base development topic establishment section? "
Or the condition that "the first sentence of the block currently focused on contains a prominent noun phrase of the implicit marker type, and the number of simple sentences contained in the block is equal to or greater than the topic establishment section minimum size" If any one of the conditions is satisfied, the block may be identified as a topic establishment section of semantic development.

【００２１】同様に言語データがモノローグ・データで
ある場合には、「現在着目しているブロックに、例示表
現、疑問表現、明示マーカ型の顕著名詞句、言語データ
のタイトルに含まれている顕著名詞句、当該ブロックの
直前の基盤展開話題確立区間に含まれている顕著名詞句
のうちのいずれか１種以上のものが含まれ、かつ、当該
ブロックに含まれる単文数が所定の話題確立区間最小サ
イズ以上であり、当該ブロックの第１文に含まれる単文
数が、所定の話題確立区間第１文最小サイズ以上であ
る」という条件が成立する場合に、当該ブロックを意味
的展開の話題確立区間として認定するようにすればよ
い。Similarly, when the linguistic data is monolog data, the following expression is used: "The expression of interest, the question expression, the prominent noun phrase of the explicit marker type, the prominent noun phrase included in the title of the linguistic data, The noun phrase includes at least one of the salient noun phrases included in the base development topic establishment section immediately before the block, and the number of simple sentences included in the block is a predetermined topic establishment section. Is greater than or equal to the minimum size, and the number of single sentences included in the first sentence of the block is greater than or equal to a predetermined topic establishment section first sentence minimum size. " What is necessary is just to recognize it as a section.

【００２２】本発明の話題構造認識装置は、言語データ
を入力するための入力部と、話題構造認識のための規則
類を蓄える辞書・規則部と、該辞書・規則部の規則類を
用いた処理を行うともに再計算が必要になった場合に再
計算を実行する処理部と、前記処理部による結果を蓄え
る記憶部と、前記処理部による処理結果を表示する表示
部とを有する話題構造認識装置において、前記辞書・規
則部が、例示表現の単語と品詞を含む情報を格納した例
示表現辞書と、ブロックに含まれる単文中に例示表現が
含まれるか否かに応じて当該ブロックを話題が提示・確
立する範囲である話題確立区間として認定するかどうか
に関する情報を含む話題確立区間認識規則を含み、前記
記憶部が、入力された言語データに関する情報を蓄える
言語データ記憶部と、話題構造に関する情報を蓄える話
題構造記憶部とを含み、前記言語データ記憶部が、言語
データに含まれる各単語の文字列と品詞を含む情報を格
納する単語情報テーブルと、言語データの各単文に含ま
れる単語と単文で最も強調されている名詞句である顕著
名詞句と顕著名詞句のタイプと例示表現を含む情報を格
納する単文情報テーブルと、テキストにおける段落に相
当するブロックの開始と終了の単文番号を含む情報を格
納するブロック情報テーブルとを含み、話題構造記憶部
が、話題確立区間と話題語と話題レベルと話題スコープ
を含む情報を記述するテーブルを含む。The topic structure recognition apparatus of the present invention uses an input unit for inputting language data, a dictionary / rule unit for storing rules for topic structure recognition, and rules of the dictionary / rule unit. Topic structure recognition including: a processing unit that performs processing and performs recalculation when recalculation is necessary; a storage unit that stores results of the processing unit; and a display unit that displays processing results by the processing unit In the device, the dictionary / rule unit includes an example expression dictionary that stores information including words and parts of speech of an example expression, and a topic related to the block depending on whether an example expression is included in a simple sentence included in the block. A language data storage unit that includes a topic establishment section recognition rule including information on whether or not to be recognized as a topic establishment section which is a range to be presented and established, wherein the storage unit stores information on input language data; A word information table storing information including a character string and a part of speech of each word included in the language data; and a single sentence of the language data. , A simple sentence information table that stores information including the types of prominent noun phrases and prominent noun phrases that are the most emphasized noun phrases in simple sentences, and the start and end of blocks corresponding to paragraphs in the text And a block information table for storing information including a single sentence number, and the topic structure storage unit includes a table describing information including a topic establishment section, a topic word, a topic level, and a topic scope.

【００２３】[0023]

【作用】本発明は、話題構造認識対象の言語データに例
示表現が含まれるかどうかを調ベることにより、説明や
解説の言語データに対しても話題が提示・確立される話
題確立区間を正しく認識することを可能とする。その結
果、最終的な話題構造認識精度も向上する。According to the present invention, a topic establishment section in which a topic is presented / established even for linguistic data for explanation and commentary is determined by checking whether or not linguistic data for topic structure recognition includes an example expression. It enables correct recognition. As a result, the final topic structure recognition accuracy is also improved.

【００２４】[0024]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。図７は本発明の一実施例の話題構造認識装
置の構成を示すブロック図である。この話題構造認識装
置は、図２に示す従来の話題構造認識装置と比べ、特
に、辞書・規則部１０５に例示表現辞書１２５と話題確
立区間認識規則１２６とが設けられ、この例示表現辞書
１２５を用いて例示表現の有無が調べられ、その結果に
応じ話題確立区間認識規則１２６に基づいて意味的展開
での話題確立区間の認定が行われる点で、相違する。Next, embodiments of the present invention will be described with reference to the drawings. FIG. 7 is a block diagram showing a configuration of a topic structure recognition device according to one embodiment of the present invention. This topic structure recognition apparatus is different from the conventional topic structure recognition apparatus shown in FIG. 2 in that the dictionary / rule unit 105 is provided with an example expression dictionary 125 and a topic establishment section recognition rule 126. This is different in that the presence or absence of an example expression is checked using the search result, and the topic establishment section is recognized in a semantic expansion based on the topic establishment section recognition rule 126 according to the result.

【００２５】［本実施例の話題構造認識装置の構成］本
実施例の話題構造認識装置には、言語データが入力する
データ入力部１０１と、各種の処理を実行する処理部１
０２と、結果を表示する表示部１０３と、処理結果や処
理途中で必要となるデータを保持する記憶部１０４と、
話題構造認識処理で使用される辞書や規則類を格納する
辞書・規則部１０５によって構成されている。記憶部１
０４には、前処理語の言語データを記憶する言語データ
記憶部１１０と、中間の処理結果や最終的な処理結果を
保持する話題構造記憶部１１１とが設けられている。言
語データ記憶部１１０には、単文情報テーブル１１５と
単語情報テーブル１１６とブロック情報テーブル１１７
が設けられており、話題構造記憶部１１１には、基盤展
開記憶部１１２と意味的展開記憶部１１３と統合話題記
憶部１１４が設けられている。一方、辞書・規則部１０
５には、話題構造認識前処理に使用される前処理用辞書
１２１と、意味的展開処理に使用される意味的展開処理
規則１２２と、基盤展開処理に使用される基盤展開処理
規則１２３と、基盤展開と意味的展開の統合処理に使用
される統合処理規則１２４と、例示表現を格納した例示
表現辞書１２５と、意味的展開の話題確立区間の決定に
使用される話題確立区間認識規則１２６とが、設けられ
ている。[Configuration of Topic Structure Recognition Apparatus of this Embodiment] The topic structure recognition apparatus of this embodiment has a data input section 101 for inputting language data and a processing section 1 for executing various processes.
02, a display unit 103 for displaying a result, a storage unit 104 for holding a processing result and data required during the processing,
The dictionary / rule unit 105 stores dictionaries and rules used in the topic structure recognition processing. Storage unit 1
04 is provided with a language data storage unit 110 for storing language data of preprocessed words, and a topic structure storage unit 111 for holding intermediate processing results and final processing results. The language data storage unit 110 includes a simple sentence information table 115, a word information table 116, and a block information table 117.
The topic structure storage unit 111 is provided with a base development storage unit 112, a semantic development storage unit 113, and an integrated topic storage unit 114. On the other hand, the dictionary / rule part 10
5, a preprocessing dictionary 121 used for topic structure recognition preprocessing, a semantic expansion processing rule 122 used for semantic expansion processing, a base expansion processing rule 123 used for base expansion processing, An integration processing rule 124 used for integration processing of the base development and the semantic development, an example expression dictionary 125 storing example expressions, and a topic establishment section recognition rule 126 used for determining a topic establishment section of the semantic expansion. Are provided.

【００２６】この話題構造認識装置を用いて言語データ
の話題構造の解析を行う場合、その処理は図３に示した
従来の処理の流れと同様に行われるが、意味的展開処理
における話題確立区間の決定に際し、例示表現の有無を
含む情報に応じて判定が行われる点で相違する。以下、
意味的展開における話題確立区間の決定の処理につい
て、詳しく説明する。When the topic structure of language data is analyzed using this topic structure recognition apparatus, the processing is performed in the same manner as the conventional processing flow shown in FIG. Is different in that the determination is made according to the information including the presence or absence of the example expression. Less than,
The process of determining the topic establishment section in the semantic development will be described in detail.

【００２７】［意味的展開処理における話題確立区間決
定］意味的展開における話題確立区間決定処理では、そ
れ以前の処理で認識されたブロックのうち、既に基盤展
開の話題確立区間として認定されるなどの理由で不適切
なブロック以外の全てのブロックについて、話題確立区
間としての条件を満たすかどうかを調ベ、この条件を満
たすブロックを話題確立区間として認定する。この条件
は、話題確立区間認識規則１２６に記述されているが、
言語データの種類に応じて異なる条件が採用される。具
体的には対話データに対する場合とモノローグ・データ
に対する場合で異なる処理手順が採用される。まず、対
話データに対する話題確立区間決定の処理を説明する。[Topic Establishing Section Determination in Semantic Expansion Processing] In the topic establishment section determination processing in semantic expansion processing, among the blocks recognized in the preceding processing, the block already recognized as the topic establishment section of the basic expansion is used. It is checked whether or not all the blocks other than the blocks inappropriate for the reason satisfy the condition as the topic establishment section, and the block satisfying this condition is recognized as the topic establishment section. This condition is described in the topic establishment section recognition rule 126,
Different conditions are adopted depending on the type of language data. Specifically, different processing procedures are employed for the case of dialog data and the case of monolog data. First, a process of determining a topic establishment section for conversation data will be described.

【００２８】(1) 対話データの場合：現在着目しているブロックをb1とし、そのブロックb1の
先頭の単文をssとするときに、条件『単文ssに例示表現
か明示マーカ型の顕著名詞句が含まれ、現在着目してい
るブロックb1からそれより前であって予め定められた話
題確立区間最小サイズgs_minの範囲内に意味的展開の話
題確立区間として認定されたブロックb2が存在せず、か
つ、「b1の先頭とその次にある基盤展開確立区間の開始
位置との間の単文数がgs_minより大きい、あるいは、そ
の基盤展開確立区間で話題レベルが１増えている」』が
満たされるときか、条件『単文ssに非明示マーカ型の顕
著名詞句が含まれ、かつブロックb1に含まれる単文数が
gs_min以上である』が満たされる場合に、そのブロック
b1を意味的展開における話題確立区間として認定する。
以下、対話データに対するこの処理の具体的手順の一例
を図８のフローチャートを用いて説明する。(1) In the case of dialogue data: When the block of interest is b1 and the simple sentence at the beginning of the block b1 is ss, the condition "simple expression or explicit marker type prominent noun phrase in simple sentence ss" Is included, and there is no block b2 recognized as a topic establishment section of a semantic expansion within the range of the predetermined topic establishment section minimum size gs_min that is earlier than the block b1 currently focused on, In addition, when the number of single sentences between the beginning of b1 and the start position of the next base development establishment section is greater than gs_min, or the topic level is increased by 1 in the base development establishment section, is satisfied Or if the condition "Single sentence ss contains an implicit marker type prominent noun phrase and the number of simple sentences contained in block b1 is
gs_min or more ”is satisfied.
b1 is identified as a topic establishment section in semantic development.
Hereinafter, an example of a specific procedure of this processing for the conversation data will be described with reference to the flowchart of FIG.

【００２９】現在着目しているブロックをb1とし、ブロ
ックb1の先頭の単文をssとする（ステップ２００）。単
文ssに例示表現が含まれているかを調べ（ステップ２０
１）、含まれている場合にはステップ２０３に移行し、
含まれていない場合には、単文ssに明示マーカ型の顕著
名詞句が含まれているかを調べる（ステップ２０２）。
明示マーカ型の顕著名詞句が含まれていればステップ２
０３に移行し、含まれていなければステップ２０８に移
行する。It is assumed that the current block of interest is b1, and the head simple sentence of block b1 is ss (step 200). It is checked whether the simple sentence ss contains an example expression (step 20).
1) If it is included, proceed to step 203,
If not included, it is checked whether or not the simple sentence ss includes an explicit marker type prominent noun phrase (step 202).
Step 2 if an explicit marker type prominent noun phrase is included
03, and if not included, the process proceeds to step 208.

【００３０】ステップ２０３では、ブロックb1より前
に、意味的展開の話題確立区間として認定されたブロッ
クb2が存在するかどうかを調べる。存在しない場合には
ステップ２０５に移行し、存在する場合には、ブロック
b1とブロックb2に間に存在する単文の数が、予め定めら
れた話題確立区間最小サイズgs_minより大きいかどうか
を調べる（ステップ２０４）。単文数がgs_minより大き
い場合にはステップ２０５に移行し、そうでなければス
テップ２０８に移行する。In step 203, it is checked whether or not there is a block b2 that has been identified as a topic establishment section of the semantic development before the block b1. If it does not exist, the process proceeds to step 205.
It is checked whether the number of simple sentences existing between b1 and block b2 is larger than a predetermined topic establishment section minimum size gs_min (step 204). If the number of single sentences is greater than gs_min, the process proceeds to step 205; otherwise, the process proceeds to step 208.

【００３１】ステップ２０５では、ブロックb1の先頭
と、その次にある基盤展開での話題確立区間の開始位置
との間の単文数がgs_minより大きいかどうかを調べる。
単文数がgs_minより大きければ、現在着目しているブロ
ックb1を意味的展開の話題確立区間として認定し（ステ
ップ２０７）、単文数がgs_minより大きくなければ、ブ
ロックb1の次にある基盤展開話題確立区間で話題レベル
が１増えていれるかどうかを調べる（ステップ２０
６）。話題レベルが１増えていれば、ステップ２０７に
移行して現在着目しているブロックb1を意味的展開での
話題確立区間として認定し、話題レベルが１増えていな
ければステップ２０８に移行する。In step 205, it is checked whether or not the number of single sentences between the head of the block b1 and the start position of the topic establishment section in the next base development is greater than gs_min.
If the number of single sentences is larger than gs_min, the block b1 currently focused on is recognized as a topic establishment section of the semantic expansion (step 207). If the number of single sentences is not larger than gs_min, the basic development topic next to the block b1 is established. Check whether the topic level is increased by 1 in the section (step 20)
6). If the topic level has increased by one, the process proceeds to step 207 to identify the block b1 of current focus as a topic establishment section in a semantic expansion. If the topic level has not increased by one, the process proceeds to step 208.

【００３２】ステップ２０８では、単文ssに非明示マー
カ型の顕著名詞句が含まれるかどうかを調べ、含まれて
いない場合には、現在着目しているブロックb1は意味的
展開の話題確立区間ではないとし（ステップ２１０）、
非明示マーカ型の顕著名詞句が含まれている場合には、
ブロックb1に含まれる単文数がgs_min以上かを調べ（ス
テップ２０９）、gs_min以上であればステップ２０７に
移行して話題確立区間として認定し、gs_min未満であれ
ばステップ２１０に移行して話題確立区間ではないとす
る。In step 208, it is checked whether or not the simple sentence ss includes an implicit marker type prominent noun phrase. If not, the block b1 currently focused on is determined in the topic establishment section of the semantic expansion. If not (step 210),
If an implicit marker-type salient noun phrase is included,
It is checked whether the number of simple sentences included in the block b1 is equal to or greater than gs_min (step 209). If it is equal to or greater than gs_min, the process proceeds to step 207 to be recognized as a topic establishment section. Not.

【００３３】ステップ２０７，２１０で、それぞれ、現
在着目しているブロックを意味的展開の話題確立区間と
して認定する、あるいは話題確立区間ではないとした
ら、現在着目しているブロックについての話題確立区間
決定の処理を終了し、次のブロックについて同様の処理
を繰り返す。In steps 207 and 210, respectively, the block of interest is identified as a topic establishment section of the semantic expansion, or if it is not a topic establishment section, the topic establishment section of the block of current interest is determined. Is completed, and the same processing is repeated for the next block.

【００３４】(2) モノローグ・データの場合：次に、モ
ノローグ・データに対する意味的展開の話題確立区間決
定の処理について説明する。この処理では、現在着目し
ているブロックに、例示表現か、疑問表現か、明示マー
カ型の顕著名詞句か、タイトルに含まれている顕著名詞
句か、直前の基盤展開話題確立区間に含まれる顕著名詞
句が含まれ、さらに、ブロックに含まれる単文数が話題
確立区間最小サイズgs_min以上であって、このブロック
の第１文に含まれる単文数が第１文について予め定めら
れた最小サイズ1st_sent_min以上である場合に、この現
在着目しているブロックを意味的展開の話題確立区間と
して認定する。以下、図９のフローチャートを用いて、
モノローグ・データに対する話題確立区間決定の処理の
具体的手順の一例を説明する。(2) In the case of monolog data: Next, a process of determining a topic establishment section of a semantic expansion for monolog data will be described. In this processing, the block of interest at present includes an example expression, a question expression, an explicit marker type prominent noun phrase, a prominent noun phrase included in a title, and a preceding base development topic establishment section. A prominent noun phrase is included, and the number of simple sentences included in the block is equal to or more than the topic establishment section minimum size gs_min, and the number of simple sentences included in the first sentence of this block is a predetermined minimum size 1st_sent_min for the first sentence. In the case described above, the currently focused block is identified as the topic establishment section of the semantic development. Hereinafter, using the flowchart of FIG.
An example of a specific procedure of a process of determining a topic establishment section for monolog data will be described.

【００３５】まず、現在着目しているブロックをb1とす
る（ステップ３００）。ブロックb1に例示表現が含まれ
ているかを調べ（ステップ３０１）、含まれている場合
にはステップ３０５に移行し、含まれていない場合に
は、ブロックb1に疑問表現か明示マーカ型の顕著名詞句
が含まれているかどうかを調べる（ステップ３０２）。
ステップ３０２で疑問表現や明示マーカ型の顕著名詞句
が含まれている場合にはステップ３０５に移行し、含ま
れていない場合には、ブロックb1にタイトルに含まれて
いる顕著名詞句が存在するかを調べ（ステップ３０
３）、そのような顕著名詞句が含まれている場合にはス
テップ３０５に移行し、含まれていない場合には、ブロ
ックb1に直前の基盤展開話題確立区間に含まれている顕
著名詞句が存在するかどうかを調べる（ステップ３０
４）。そして、基盤展開話題確立区間に含まれている顕
著名詞句が存在する場合には、ステップ３０５に移行
し、存在しない場合には、現在着目しているブロックb1
は意味的展開の話題確立区間ではないとし（ステップ３
０８）、処理を終了する。First, the current block is set to b1 (step 300). It is checked whether or not the example expression is included in the block b1 (step 301). If it is included, the process proceeds to step 305. If not, the block b1 includes a question expression or an explicit marker type prominent noun. It is checked whether a phrase is included (step 302).
If a question expression or an explicit marker type prominent noun phrase is included in step 302, the process proceeds to step 305. If not, a prominent noun phrase included in the title exists in block b1. (Step 30)
3) If such a prominent noun phrase is included, the process proceeds to step 305; otherwise, the prominent noun phrase included in the immediately preceding base development topic establishment section is stored in block b1. Check whether it exists (step 30)
4). If there is a salient noun phrase included in the base development topic establishment section, the process proceeds to step 305. If not, the block b1 currently focused on.
Is not a topic establishment section of semantic development (step 3
08), end the processing.

【００３６】ステップ３０５では、ブロックb1に含まれ
る単文数が話題確立区間最小サイズgs_min以上であるか
どうかが判定され、gs_min未満であればステップ３０８
に移行してブロックb1は意味的展開の話題確立区間では
ないとし、gs_min以上であれば、ブロックb1の第１文に
含まれる単文の数が予め定められた第１文最小サイズ1s
t_sent_min以上であるかを判断する（ステップ３０
６）。1st_sent_min以上であれば、現在着目しているブ
ロックb1を意味的展開の話題確立区間として認定する
（ステップ３０７）。一方、1st_sent_min未満であれ
ば、ステップ３０８に移行してブロックb1は意味的展開
の話題確立区間ではないとする。In step 305, it is determined whether or not the number of single sentences included in the block b1 is equal to or larger than the topic establishment section minimum size gs_min.
And the block b1 is not a topic establishment section of the semantic expansion. If the block b1 is equal to or more than gs_min, the number of single sentences included in the first sentence of the block b1 is a predetermined first sentence minimum size 1s
It is determined whether it is not less than t_sent_min (step 30)
6). If it is equal to or more than 1st_sent_min, the block b1 currently focused on is recognized as a topic establishment section of the semantic development (step 307). On the other hand, if it is less than 1st_sent_min, the process proceeds to step 308, where block b1 is not a topic establishment section of semantic development.

【００３７】このように現在着目しているブロックb1が
話題確立区間であるかどうかを決定したら、次のブロッ
クについて同様に処理を繰り返す。After determining whether or not the block b1 currently focused on is a topic establishment section, the process is repeated for the next block.

【００３８】(3) 話題確立区間決定処理に使用されるテ
ーブル類と例示表現辞書：ここで、意味的展開での話題
確立区間決定の処理に使用されるテーブル類及び例示表
現辞書の構成について、図１０(a)〜(d)を用いて説明す
る。(3) Tables and example expression dictionary used in topic establishment section determination processing: Here, the tables and example expression dictionary used in the topic establishment section determination processing in semantic expansion will be described. This will be described with reference to FIGS.

【００３９】話題構造認識対象の言語データに関する情
報は、言語データ記憶部１１０内の単文情報テーブル１
１５と単語情報テーブル１１６とブロック情報テーブル
１１７に記憶されている。単語情報テーブル１１６に
は、言語データに対して形態素解析を行った結果が記憶
されており、単語を表す文字列と各単語の品詞情報が含
まれている。図１０(c)に示す例では、言語データの最
初の単語（単語番号０）はサ変名詞の「民営化」であ
り、その次の単語は接尾語の「後」である。ここで単語
番号とは、言語データ中での出現順に各単語に付与され
た０から始まる連続番号のことである。The information on the language data to be recognized in the topic structure is stored in the single sentence information table 1 in the language data storage unit 110.
15, the word information table 116, and the block information table 117. The word information table 116 stores the result of performing a morphological analysis on language data, and includes a character string representing a word and part of speech information of each word. In the example shown in FIG. 10C, the first word (word number 0) of the linguistic data is the suffix noun “privatization”, and the next word is the suffix “after”. Here, the word number is a consecutive number starting from 0 assigned to each word in the order of appearance in the language data.

【００４０】単文情報テーブル１１５には、述語を１つ
だけ持つ単位である単文に関する情報が記憶されてお
り、各単文がどの単語からどの単語までの範囲であるか
を示す単語範囲に関する情報と、各単文に含まれている
例示表現と顕著名詞句と顕著名詞句のタイプに関する情
報が含まれている。モノローグの場合には、どの文にど
の単文が含まれているかどうかという情報も必要となる
ので、単文情報テーブル１１５には、文番号を記述する
ためのフィールドが必要となる。図１０(c)の例では、
最初の単文は、図示矢印で対応関係が示されるように、
単語範囲が単語情報テーブル１１６の単語番号０から１
５までであり、顕著名詞句として、単語情報テーブル上
での単語番号が２,３,４,５の単語からなる句すなわち
「会社Ａの通信サービス」が、「明示マーカ型」の顕著
名詞句として検出・記憶されている。またこの単文は文
番号０に含まれる。例示表現については後述する。The simple sentence information table 115 stores information on simple sentences, which are units having only one predicate, and information on a word range indicating which word ranges from which word to which each simple sentence is stored. The information includes an example expression, salient noun phrases, and types of salient noun phrases included in each simple sentence. In the case of a monologue, information indicating which sentence contains which simple sentence is also required. Therefore, the single sentence information table 115 requires a field for describing a sentence number. In the example of FIG.
The first simple sentence, as indicated by the graphical arrow,
Word range is from word number 0 to 1 in word information table 116
No. 5 is a prominent noun phrase that is composed of words with word numbers 2, 3, 4, and 5 in the word information table, that is, “Company A's communication service” is a “clear marker type” prominent noun phrase. Is detected and stored as This simple sentence is included in sentence number 0. Exemplary expressions will be described later.

【００４１】ブロック情報テープル１１７には各ブロッ
クの開始と終了の単文番号が記録される。図１０(c)の
例では、最初のブロック（ブロック番号０）は単文番号
０から１までの範囲である。ブロック番号と単文番号
は、言語データ中での出現順に、ブロックと単文のそれ
ぞれについて付与した０から始まる連続番号のことであ
る。なお、図１０(a)〜(c)に示される基盤展開記憶部１
１２、意味的展開記憶部１１３、各テーブル１１５〜１
１７の内容は、図５に示したモノローグ例に対応してい
る。The block information table 117 records the start and end simple text numbers of each block. In the example of FIG. 10C, the first block (block number 0) is in the range from the simple sentence numbers 0 to 1. The block number and simple sentence number are consecutive numbers starting from 0 assigned to each of the block and simple sentence in the order of appearance in the language data. It should be noted that the base development storage unit 1 shown in FIGS.
12, semantic expansion storage unit 113, each table 115-1
The contents of 17 correspond to the monolog example shown in FIG.

【００４２】予め準備される例示表現辞書１２５には、
図１０(d)に示されるように、例示表現の単語列と各単
語の品詞に関する情報が含まれている。各単文に例示表
現が含まれるかどうかを調べる場合には、単文情報テー
ブル１１５において、上から順に１単文ずつ、各単文の
単語範囲で指定された範囲について、例示表現辞書に記
載された例示表現が単語情報テーブル１１６内の単語デ
ータにあるかどうかをマッチングで調べる。また、ある
ブロックやあるブロックの先頭のいくつかの単文などに
ついてだけ例示表現の有無を調ベたい場合は、単語情報
テーブル１１６と単文情報テーブル１１５に加えて、ブ
ロック情報テーブル１１７が必要となる。そして例示表
現が見つかった場合には、その旨を単文情報テーブル１
１５に記録する。単文情報テーブル１１５において、例
示表現の欄の値が−１となっているのは例示表現が見つ
からなかった場合を示し、例示表現が見つかった場合に
は、その例示表現の例示表現辞書１２５上での例示表現
番号を単文情報テーブル１１５の例示表現の欄に記す。
図１０(c)の例では、単文番号０の単文には例示表現は
検出されていないが、単文番号４の単文には例示表現番
号１の例示表現、すなわち「例えば」が検出されてい
る。The example expression dictionary 125 prepared in advance includes:
As shown in FIG. 10D, a word string of an example expression and information on the part of speech of each word are included. In order to check whether or not each simple sentence includes an example expression, in the simple sentence information table 115, the example expressions described in the example expression dictionary for the range specified by the word range of each simple sentence one by one from the top in the simple sentence information table 115 Is found in the word data in the word information table 116 by matching. If it is desired to check the presence or absence of an example expression only in a certain block or some simple sentences at the head of a certain block, a block information table 117 is required in addition to the word information table 116 and the simple sentence information table 115. If an example expression is found, that fact is indicated in the single sentence information table 1.
Record at 15. In the simple sentence information table 115, a value of -1 in the column of the example expression indicates that the example expression was not found. If the example expression was found, the value is displayed on the example expression dictionary 125 of the example expression. Is written in the column of example expressions in the single sentence information table 115.
In the example of FIG. 10 (c), no example expression is detected in the simple sentence with the simple sentence number 0, but in the simple sentence with the simple sentence number 4, the example expression of the example expression number 1, that is, "for example" is detected.

【００４３】基盤展開記憶部１１２と意味的展開記憶部
１１３には、それぞれ、基盤展開あるいは意味的展開に
おける話題確立区間の開始と終了の単語番号と、話題語
と、話題レベルと、話題継続範囲の開始と終了の単語番
号とが記憶される。図１０(a),(b)の例では、基盤展開
記憶部１１２での、話題番号０の話題の話題確立区間
は、単文番号０から１までである。ここでは直接、単文
番号で示したが、例えばブロック番号ヘのポインタで記
録することも可能であり、その場合は、話題番号０の話
題の話題確立区間の値はブロック番号０ヘのポインタと
なる。また、話題語の欄は、単文情報テーブル１１５の
顕著名詞句の欄と同様に単語番号で示してあるが、単文
番号で示すようにしてもよいし、「会社Ａの通信サービ
ス」のような文字列を直接記録するようにしてもよい。
また、話題番号０の話題の話題レベルは１であリ、その
話題継続範囲は単文番号０から２２の単文である。The base development storage unit 112 and the semantic development storage unit 113 store the word numbers of the start and end of the topic establishment section in the base development or the semantic development, the topic word, the topic level, and the topic continuation range, respectively. Is stored. In the examples of FIGS. 10A and 10B, the topic establishment section of the topic with the topic number 0 in the base development storage unit 112 is a single sentence number 0 to 1. Here, it is directly indicated by a simple sentence number, but it is also possible to record by a pointer to a block number. In this case, the value of the topic establishment section of the topic of the topic number 0 is a pointer to the block number 0. . The topic word column is indicated by a word number similarly to the prominent noun phrase column of the single sentence information table 115, but may be indicated by a single sentence number, or may be indicated by a "communication service of company A". The character string may be recorded directly.
The topic level of the topic with topic number 0 is 1, and the topic continuation range is the single sentence with simple sentence numbers 0 to 22.

【００４４】［言話データ例を用いた説明］次に、実際
の言語データに対して処理を行った場合を例に挙げて、
本実施例における話題確立区間決定方法を詳細に説明す
る。ここでは、図５に示したモノローグ例を言語データ
として使用するものとする。また、意味的展開での話題
確立区間はまだ１つも認定されておらず、現在、ブロッ
ク番号２のブロックに着目していると仮定する。処理で
用いる話題確立区間最小サイズgs_minは４単文とし、第
１文最小サイズ1st_sent_minは２単文とする。この時点
での基盤展開記憶部１１２及び各テーブル１１５〜１１
７の記憶内容は図１０に示されているものであるとす
る。[Explanation Using Example of Language Data] Next, a case where processing is performed on actual language data will be described as an example.
The topic establishment section determination method in the present embodiment will be described in detail. Here, the monolog example shown in FIG. 5 is used as language data. Also, it is assumed that no topic establishment section in the semantic development has been identified yet, and the block of block number 2 is currently focused on. The topic establishment section minimum size gs_min used in the processing is four simple sentences, and the first sentence minimum size 1st_sent_min is two simple sentences. At this time, the base development storage unit 112 and each of the tables 115 to 11
It is assumed that the storage contents of No. 7 are as shown in FIG.

【００４５】言語データがモノローグデータであるの
で、図９に示すフローチャートによる手順が適用され
る。以下、図９のフローチャートの手順に即して説明す
る。ブロック情報テーブル１１７によると、ブロック番
号２のブロックは単文番号が４から９までの単文で構成
されている。単文情報テーブル１１５には、単文番号４
の単文に例示表現番号１の例示表現が出現しているとい
うことが記録されているので、ステップ３０１の「例示
表現が含まれる」という条件が満たされている。Since the language data is monolog data, the procedure according to the flowchart shown in FIG. 9 is applied. Hereinafter, description will be given in accordance with the procedure of the flowchart of FIG. According to the block information table 117, the block with the block number 2 is composed of simple sentences with simple sentence numbers 4 to 9. The simple sentence information table 115 has a simple sentence number 4
The fact that the example expression of the example expression number 1 appears in the simple sentence of is recorded, so that the condition that “the example expression is included” in step 301 is satisfied.

【００４６】次に、ステップ３０５に進み、ブロック番
号２のブロックに含まれる単文数がgs_min以上であるか
どうかが調ベられる。ブロック情報テーブル１１７によ
ると、ブロック番号２は単文番号が４から９までの単文
から構成され、６単文含んでいる。したがってステップ
３０５の条件は満たされ、ステップ３０６に進む。ステ
ップ３０６では、この現在のブロックの第１文の単文数
が1st_sent_min以上であるかどうかが調ベられる。ブロ
ック情報テーブル１１７によると、ブロック番号２のブ
ロックは単文番号が４の単文から始まる。単文情報テー
ブル１１５を見ると、単文番号４の単文は文番号２に属
し、次の単文（単文番号５）も文番号２に属するので、
第１文の単文数が1st_sent_min以上であるという条件は
満たされる。したがって、ステップ３０７に進み、ブロ
ック番号２のブロックを意味的展開での話題確立区間と
して認定する。Next, the routine proceeds to step 305, where it is checked whether or not the number of simple sentences included in the block with the block number 2 is equal to or more than gs_min. According to the block information table 117, the block number 2 is composed of simple sentences with simple sentence numbers 4 to 9, and includes 6 simple sentences. Therefore, the condition of step 305 is satisfied, and the process proceeds to step 306. In step 306, it is checked whether the number of single sentences of the first sentence of the current block is equal to or more than 1st_sent_min. According to the block information table 117, the block with the block number 2 starts with the simple sentence with the simple sentence number 4. Looking at the simple sentence information table 115, the simple sentence with the simple sentence number 4 belongs to the sentence number 2, and the next simple sentence (the simple sentence number 5) also belongs to the sentence number 2.
The condition that the number of single sentences of the first sentence is equal to or more than 1st_sent_min is satisfied. Therefore, the process proceeds to step 307, and the block with the block number 2 is identified as the topic establishment section in the semantic development.

【００４７】ブロック番号２のブロックを意味的展開の
話題確立区間として認定した時点での意味的展開記憶部
１１３の状態が図１１に示されている。話題確立区間の
開始と終了の単文番号の欄には、ブロック番号２のブロ
ックの開始と終了の単文番号がコピーされる。話題語と
話題レベルは、これ以降の処理で決定される。また、継
続範囲も話題レベルと一緒に決定されるが、開始位置の
方は話題確立区間の開始位置と同じになるので、同じ値
が記録される。FIG. 11 shows the state of the semantic development storage unit 113 when the block with the block number 2 is identified as the topic establishment section of the semantic development. The starting and ending simple sentence numbers of the block with the block number 2 are copied in the column of the starting and ending simple sentence numbers of the topic establishment section. The topic word and topic level are determined by the subsequent processing. The continuation range is also determined together with the topic level, but since the start position is the same as the start position of the topic establishment section, the same value is recorded.

【００４８】本実施例の方法を用いて図５のモノローグ
例に対して話題構造認識を行った結果が、図１２に示さ
れている。従来の方法による図６と比較すると、本実施
例によれば目次項目が増えており、人間にとって話の流
れを把握しやすいものとなっている。FIG. 12 shows the result of topic structure recognition for the monolog example of FIG. 5 using the method of this embodiment. Compared with FIG. 6 according to the conventional method, according to the present embodiment, the table of contents is increased, and it is easy for a human to grasp the flow of the story.

【００４９】[0049]

【発明の効果】以上説明したように本発明は、単文中に
例示表現が含まれているかどうかの情報を利用して意味
的展開の話題確立区間の決定を行うことにより、講義や
解説のような説明的な言語データに対して、より高精度
に話題構造認識を行うことが可能となるという効果があ
る。講義や解説のような説明的な言語データは、知識と
して蓄積しておいて再利用する価値が高いが、そのよう
な言語データに対して話題構造認識を適用し、章立て・
目次構造を付与することにより、より再利用しやすくな
り、その結果、調査や情報収集などの人間の知的活動を
支援することになる。As described above, according to the present invention, a topic establishment section of a semantic expansion is determined by using information on whether or not an example expression is included in a single sentence, so that it can be used as a lecture or commentary. There is an effect that topic structure recognition can be performed with higher accuracy for simple explanatory language data. Descriptive linguistic data, such as lectures and commentaries, are highly valuable for storing and reusing knowledge. Applying topic structure recognition to such linguistic data,
By providing a table of contents structure, it becomes easier to reuse, and as a result, supports human intellectual activities such as research and information collection.

[Brief description of the drawings]

【図１】人間による話題構造認識の例を示すである。FIG. 1 illustrates an example of topic structure recognition by a human.

【図２】従来の話題構造認識装置の一例の構造を示すブ
ロック図である。FIG. 2 is a block diagram showing the structure of an example of a conventional topic structure recognition device.

【図３】従来の話題構造認識のための処理を示すフロー
チャートである。FIG. 3 is a flowchart showing a conventional process for topic structure recognition.

【図４】従来の話題構造認識における前処理以降の例を
示す図である。FIG. 4 is a diagram illustrating an example of pre-processing and subsequent processing in conventional topic structure recognition.

【図５】モノローグの一例を示す図である。FIG. 5 is a diagram showing an example of a monologue.

【図６】図５のモノローグに対して従来の話題構造認識
方法を適用した結果を示す図である。6 is a diagram showing a result of applying a conventional topic structure recognition method to the monolog of FIG. 5;

【図７】本発明の一実施例の話題構造認識装置の構成を
示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of a topic structure recognition device according to an embodiment of the present invention.

【図８】意味的展開処理における、対話データに対する
話題確立区間の認定方法を示すフローチャートである。FIG. 8 is a flowchart illustrating a method of identifying a topic establishment section for conversation data in a semantic development process.

【図９】意味的展開処理における、モノローグ・データ
に対する話題確立区間の認定方法を示すフローチャート
である。FIG. 9 is a flowchart illustrating a method of identifying a topic establishment section for monolog data in the semantic expansion processing.

【図１０】(a)は基盤展開記憶部の構成を示す図、(b)は
意味的展開記憶部の構成を示す図、(c)はブロック情報
テーブルと単文情報テーブルと単語情報テーブルとこれ
らの間の関係を示す図、(d)は例示表現辞書の構成を示
す図である。10A is a diagram illustrating a configuration of a base development storage unit, FIG. 10B is a diagram illustrating a configuration of a semantic development storage unit, and FIG. 10C is a block information table, a simple sentence information table, a word information table, and the like. FIG. 4D is a diagram showing a relationship between the expressions, and FIG. 4D is a diagram showing a configuration of an example expression dictionary.

【図１１】意味的展開処理で最初の話題確立区間を認定
した後の意味的展開記憶部の状態を示す図である。FIG. 11 is a diagram illustrating a state of a semantic expansion storage unit after a first topic establishment section is identified in the semantic expansion processing.

【図１２】図７の装置を用い図５に示すモノローグ例に
対して本発明の方法を適用した結果の例を示す図であ
る。12 is a diagram showing an example of a result of applying the method of the present invention to the monolog example shown in FIG. 5 using the apparatus of FIG. 7;

[Explanation of symbols]

１０１データ入力部１０２処理部１０３表示部１０４記憶部１０５辞書・規則部１１０言語データ記憶部１１１話題構造記憶部１１２基盤展開記憶部１１３意味的展開記憶部１１４統合話題記憶部１１５単文情報テーブル１１６単語情報テーブル１１７ブロック情報テーブル１２１前処理用辞書１２２意味的展開規則１２３基盤展開規則１２４統合処理規則１２５例示表現辞書１２６話題確立区間認識規則２００〜２１０，３００〜３０８ステップ Reference Signs List 101 data input unit 102 processing unit 103 display unit 104 storage unit 105 dictionary / rule unit 110 language data storage unit 111 topic structure storage unit 112 base development storage unit 113 semantic development storage unit 114 integrated topic storage unit 115 single sentence information table 116 words Information table 117 Block information table 121 Preprocessing dictionary 122 Semantic expansion rule 123 Base expansion rule 124 Integration processing rule 125 Illustrative expression dictionary 126 Topic establishment section recognition rule 200 to 210, 300 to 308 Step

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−160710（ＪＰ，Ａ) 特開平７−160711（ＪＰ，Ａ) 特開平７−160712（ＪＰ，Ａ) 特開平６−236410（ＪＰ，Ａ) 特開平６−139276（ＪＰ，Ａ) 特開平４−332084（ＪＰ，Ａ) 竹下敦，「話題構造認識を用いた映像情報検索システム」，情報処理学会研究報告 94−ＩＭ−15，日本，1994年３月11日，Ｖｏｌ．94，Ｎｏ．24，ｐ．１ −ｐ．８竹下敦，他，「Ｄ−67 モノローグにおける話題導入部の検出」，1994年電子情報通信学会秋季大会，日本，1994年９月５日，ｐ．70 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/30 ────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-7-160710 (JP, A) JP-A-7-160711 (JP, A) JP-A-7-160712 (JP, A) JP-A-6-160712 236410 (JP, A) JP-A-6-139276 (JP, A) JP-A-4-332084 (JP, A) Atsushi Takeshita, "Video information retrieval system using topic structure recognition," IPSJ research report 94 -IM-15, Japan, March 11, 1994, Vol. 94, no. 24, p. 1-p. 8. Atsushi Takeshita, et al., "Detection of topic introduction in D-67 monologue", IEICE Autumn Conference 1994, Japan, September 5, 1994, p. 70 (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/30

Claims

(57) [Claims]

1. Using a predetermined rule, execute topic structure recognition including determination of a topic establishment section in which a topic is presented / established and determination of a topic word in the topic establishment section for language data. In a topic structure recognition method, a preprocessing dictionary, a word of an example expression,
An example expression dictionary storing information including parts of speech is provided, and a topic structure recognition preprocessing process of the topic structure recognition apparatus is performed by
Utilizing a pre-processing dictionary, the language data is converted to a predicate by one.
Perform a single sentence separation process to cut out as a single sentence containing only
A block of linguistic data equivalent to a paragraph in text
Perform block recognition processing to classify
The result is stored in a language data storage unit, and the topic structure recognition pre-processing step of the topic structure recognition apparatus
As a process, described in the example expression dictionary in block units
Performs different processing depending on whether the example expression of
Determine the topic establishment section, determine the topic word, and
Group and topic level are determined and stored in the topic structure storage
And a topic word structure is created .

2. The method according to claim 1 , wherein the language data is determined by using a predetermined rule.
The topic establishment section where topics are presented and established
And a story including a topic word determination in the topic establishment section.
In a topic structure recognition method for executing a topic structure recognition, a preprocessing dictionary, a base development rule, a meaning,
Taste development rules, integration processing rules, words and parts of speech in example expressions
Comprising an exemplary representation dictionary that stores information including, as a topic structure recognition pretreatment step topic structure recognition apparatus, before
Morphological solution of the language data using a pre-processing dictionary
Language data into single sentences containing only one predicate.
Perform the sentence delimitation process and issue a clue phrase in each simple sentence
And highlight the most emphasized noun phrases
Perform prominent noun phrase extraction processing to extract
Data into blocks that correspond to paragraphs in the text
Block recognition processing, and
Word information, simple sentence information, salient noun phrases extracted,
Stores click information in the language data storing unit, as a basis expansion process of the topic structure recognition apparatus, said base
Using the expansion rules, the above
Developing topics that are explicitly indicated by phrasal phrases and sentence structures ,
The processing result, topic establishment section, topic word, topic scope
The topic level is stored in the topic structure storage unit, and the meaning is processed as a semantic development process of the topic structure recognition device.
Utilizing the taste development rules and the example expression dictionary, block
Developing topics that are not explicitly shown in units of
The processing result, topic establishment section, topic word, topic scope
The topic level is stored in the topic structure storage unit, and the semantic extension is stored.
When determining the topic establishment section in the open processing step,
Depending on whether the example expression described in the expression dictionary is in a simple sentence
The topic establishment section is determined by performing a different process, and the integration process of the topic structure recognition dictionary is performed as the integration process.
Topics obtained in the base deployment process using the rules
Established section, topic word, topic scope and topic level, and
And the topic establishment section obtained in the semantic expansion processing step,
Topic word, topic scope and topic level are integrated, and topic word structure
A topic structure recognition method characterized by creating a structure.

3. When deciding whether or not to certify a block as a topic establishment section, if the language data is conversation data, A: prepared in advance at the head simple sentence of the currently focused block; exemplified expression dictionary to contain either the pronounced noun phrase registered exemplary language or explicit marker types, and the range from the start simple sentence number of simple sentences within a predetermined topic established intervals the minimum size of the block There is no block that has been identified as a topic establishment section in the semantic development prior to the block, and the number of single sentences from the start position of the block to the start position of the next base development topic establishment section is the topic establishment section. Whether the condition is larger than the minimum section size or the condition that the topic level is increased by 1 in the next base development topic establishment section, B: Unclear in the first simple sentence of the current block of interest A condition in which a marker type prominent noun phrase is included and the number of simple sentences included in the block is equal to or greater than the minimum size of the topic establishment section. The topic structure recognition method according to claim 2 , wherein the topic structure is identified as a topic establishment section of the dynamic development.

4. When deciding whether to recognize a block as a topic establishment section, if the language data is monolog data, an example expression, a question expression, an explicit marker, At least one of a type prominent noun phrase, a prominent noun phrase included in the title of the linguistic data, and a prominent noun phrase included in the base development topic establishment section immediately before the block is included. And the number of simple sentences included in the block is equal to or greater than a predetermined topic establishment section minimum size, and the number of simple sentences included in the first sentence of the block is equal to or greater than the predetermined topic establishment section first sentence minimum size. 3. The topic structure recognition method according to claim 2 , wherein the block is identified as a topic establishment section of the semantic expansion.

5. An input unit for inputting language data,
A dictionary / rule part for storing rules for topic structure recognition;
A processing unit that performs processing using the rules of the dictionary / rule unit and performs recalculation when recalculation is required; a storage unit that stores results of the processing unit; and a processing result by the processing unit Wherein the dictionary / rule unit includes an example expression dictionary storing information including words of the example expressions and parts of speech, and an example expression is included in a single sentence included in the block. A topic establishment section recognition rule including information on whether or not the block is recognized as a topic establishment section, which is a range in which a topic is presented and established. And a topic structure storage unit for storing information on topic structure, wherein the language data storage unit includes a character string and a part of speech of each word included in the language data. A word information table storing information, and simple sentence information storing information including words included in each simple sentence of linguistic data, salient noun phrases that are the noun phrases most emphasized in the simple sentences, types of salient noun phrases, and exemplary expressions A table and a block information table for storing information including a single sentence number of a block corresponding to a paragraph in the text, and a topic structure storage unit including a topic establishment section, a topic word, a topic level, and a topic scope. A topic structure recognition device comprising a table describing information.

6. The dictionary / rule unit divides the linguistic data into simple sentences which are units having only one predicate, extracts clue phrases from each simple sentence, and extracts the salient noun phrases. A preprocessing dictionary for topic structure recognition preprocessing including identifying a phrase type and recognizing the block; a base expansion processing rule for performing processing for base expansion; and processing for semantic expansion. The topic structure storage unit includes a semantic expansion processing rule for integrating information relating to the base expansion, and a semantic expansion processing rule for integrating the base expansion and the semantic expansion. A semantic development storage unit for storing information, and an integrated topic storage unit for storing information after integration of the base development and the semantic development, wherein each of the base development storage unit and the semantic development storage unit is Including a topic established intervals, and topical words, the topic level, a table for storing information including the topic scope,
The topic structure recognition device according to claim 5 .

7. In the semantic expansion processing rule, as a rule for linguistic data that is interactive data, a simple sentence at the head of a currently focused block described in the block information table is exemplified on the simple sentence information table. If it is described that the expression or explicit marker type prominent noun phrase is included, and the topic establishment section is already recorded in the semantic expansion storage section, it is recorded at the end. The number of simple sentences between the head simple sentence of the topic establishing section and the simple sentence of the head of the block currently focused on is larger than a predetermined topic establishing section minimum size, and among the starting simple sentences of the topic establishing section recorded in the base development storage unit, The distance between the first sentence after the first sentence of the currently focused block and the first sentence of the currently focused block is larger than the minimum size of the topic establishment section. Either the condition that the topic level is increased by 1 in the section or the base development topic establishment section, or the implicit marker type prominent noun phrase is included in the head simple sentence of the block currently focused on, and If the number of single sentences included is equal to or greater than the topic establishment section minimum size, any of the following is satisfied, the block of current interest is recognized as a topic establishment section and the semantic expansion storage unit 7. The topic structure recognition device according to claim 6 , wherein a rule of recording is included.

8. The simple sentence information table includes information on whether a sentence number to which each simple sentence belongs and a question expression is included in the sentence, and the topic establishment section recognition rule includes language data that is monolog data. As a rule for, regarding the block of current focus noted in the block information table, regarding the range of the simple sentence of the block,
In the single sentence information table, example expressions, question expressions, explicit marker-type salient noun phrases, salient noun phrases included in the title of the linguistic data, salient words included in the base development topic establishment section immediately before the block Any one or more types of noun phrases are included, and the range of the simple sentence of the block recorded in the block information table is greater than or equal to a predetermined topic establishment section minimum size. If the condition that the single sentence having the same sentence number in the simple sentence information table is equal to or larger than the minimum size of the first sentence establishment section first sentence is satisfied, the block is recognized as the topic establishment section. 7. The topic structure recognition apparatus according to claim 6 , wherein a rule of recording in the semantic development storage unit is included.