JP3329353B2

JP3329353B2 - Topic Word Selection Method and Topic Structure Recognition Device in Text Topic Structure Recognition

Info

Publication number: JP3329353B2
Application number: JP22315294A
Authority: JP
Inventors: 敦竹下; 孝史井上; 珠喜斎藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-09-19
Filing date: 1994-09-19
Publication date: 2002-09-30
Anticipated expiration: 2017-09-30
Also published as: JPH0887502A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語解析における
話題構造認識の方法および装置に関し、特に話題語を選
択する話題語選択方法とこの話題語選択方法が適用され
る話題構造認識装置とに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for topic structure recognition in natural language analysis, and more particularly to a topic word selection method for selecting a topic word and a topic structure recognition apparatus to which the topic word selection method is applied. .

【０００２】[0002]

【従来の技術】人間にテキストや対話データを呈示して
「これらテキストないし対話データの中から同じことが
書いてあるブロックと、その『同じこと』を求めよ」と
いう課題を与えると、個人差なく同じ構造を答えるとい
う性質が実験的に確認されている。その実験について
は、例えば『竹下他：「話題構造認識の観点からのヒュ
ーマンコミュニケーションの研究」電子情報通信学会１
９９３年秋季大会D-62(p.6-64)』に記載されている。人
間によって把握されるこのような構造を「話題構造」と
呼ぶ。話題構造は入れ子構造を形成するので、各話題
は、話題を示す「話題語」と、入れ子の深さを表す「話
題レベル」と、テキストないし対話データの中において
その話題がどの文からどの文まで継続するかという「話
題スコープ」によって表現できる。以下において、話題
構造の解析の対象となるテキストや対話データのことを
言語データと呼ぶ。2. Description of the Related Art When a human being is presented with text or conversation data and given the task of "seeking the same thing in a block in which the same is written from these text or conversation data", there is no individual difference. The ability to answer the same structure has been experimentally confirmed. For the experiment, for example, "Takeshita et al .:" Research on Human Communication from the Viewpoint of Topic Structure Recognition "IEICE 1
99-93 Autumn Meeting D-62 (p.6-64) ”. Such a structure grasped by a human is called a “topic structure”. Since the topic structure forms a nested structure, each topic consists of a "topic word" indicating the topic, a "topic level" indicating the depth of the nesting, and a sentence of the sentence in the text or dialog data. Can be expressed by the "topic scope" of whether to continue. In the following, text and conversation data for which the topic structure is analyzed will be referred to as language data.

【０００３】図１は、電気通信政策に関連した内容の言
語データに対する話題構造の一例を示している。言語デ
ータは、第０文から始まって少なくとも第７７０文まで
続いている。そして、「通信サービス」という話題語を
持つ話題の話題レベルは１であり、その話題スコープは
第０文から第７７０文までの範囲である。なお、説明を
簡単にするために、以下においては、『「通信サービ
ス」の話題』のように、話題語を用いてその話題を指す
ことにする。FIG. 1 shows an example of a topic structure for language data having contents related to a telecommunications policy. The language data starts from the 0th sentence and continues to at least the 770th sentence. The topic level of the topic having the topic word “communication service” is 1, and the topic scope is in the range from the 0th sentence to the 770th sentence. For the sake of simplicity, in the following, the topic will be referred to using a topic word, such as "topic of" communication service "".

【０００４】「通信サービス」の話題の中には、話題レ
ベルが２である「新規サービス」と「従来からのサービ
ス」という話題が存在し、「新規サービス」の話題は第
１２５文から第４３１文までの話題スコープを持ち、
「従来からのサービス」の話題は第４３２文から第７７
０文までの話題スコープを持つ。また、「新規サービ
ス」の話題の中には「サービスＡ」という子話題が、
「従来からのサービス」の話題の中には「サービスＢ」
という子話題が存在し、それぞれの話題スコープは第３
０１文から第４３１文までと第５２１文と第７７０文ま
でである。[0004] Among the topics of "communication services", there are topics of "new service" having a topic level of 2 and "conventional service". The topics of "new service" are from the 125th sentence to the 431rd sentence. Has a topic scope up to the sentence,
The topic of "traditional services" is from the 432rd sentence to the 77th sentence
It has a topic scope of up to 0 sentences. Also, among the topics of "new service", a child topic of "service A"
"Service B" in the topic of "traditional services"
Child topic exists, and each topic scope is 3rd
It is from the 01st sentence to the 431st sentence, the 521st sentence and the 770th sentence.

【０００５】このような話題構造を計算機によって認識
することを話題構造認識と呼ぶ。話題構造を認識するた
めの方法は、これまでにもいくつか提案されている。こ
こでは、『竹下：「話題構造認識を用いた映像検索シス
テム」情報処理学会情報メディア研究会94-IM-15-1』で
述べられている話題構造の認識方法について簡単に説明
する。図２はこの認識方法で使用する話題構造認識装置
の一例の構成を示すブロック図であり、図３はこの認識
方法における話題構造認識処理を示すフローチャートで
あり、図４はこの話題構造認識処理における話題構造認
識前処理以降の処理の流れの一例を示す図である。Recognition of such a topic structure by a computer is called topic structure recognition. Several methods have been proposed for recognizing topic structures. Here, a brief description of the topic structure recognition method described in "Takeshita:" Video Retrieval System Using Topic Structure Recognition ", Information Processing Society of Japan 94-IM-15-1". FIG. 2 is a block diagram showing a configuration of an example of a topic structure recognition device used in this recognition method. FIG. 3 is a flowchart showing a topic structure recognition process in this recognition method. It is a figure showing an example of the flow of processing after topic structure recognition pre-processing.

【０００６】図２に示される従来の話題構造認識装置
は、言語データが入力するデータ入力部７０１と、各種
の処理を実行する処理部７０２と、結果を表示する表示
部７０３と、処理結果や処理途中で必要となるデータを
保持する記憶部７０４と、話題構造認識処理で使用され
る辞書や規則類を格納する辞書・規則部７０５によって
構成されている。記憶部７０４には、前処理後の言語デ
ータを記憶する言語データ記憶部７１０と、中間の処理
結果や最終的な処理結果を保持する話題構造記憶部７１
１とが設けられている。さらに話題構造記憶部７１１に
は、基盤展開記憶部７１２と意味的展開記憶部７１３と
統合話題記憶部７１４が設けられている。一方、辞書・
規則部７０５には、前処理用辞書７２１と意味的展開処
理規則７２２と基盤展開処理規則７２３と統合処理規則
７２４とが設けられている。The conventional topic structure recognition apparatus shown in FIG. 2 includes a data input unit 701 for inputting language data, a processing unit 702 for executing various processes, a display unit 703 for displaying results, a processing result and The storage unit 704 stores data required during processing, and the dictionary / rule unit 705 stores dictionaries and rules used in the topic structure recognition processing. The storage unit 704 includes a language data storage unit 710 that stores preprocessed language data, and a topic structure storage unit 71 that holds intermediate processing results and final processing results.
1 is provided. Further, the topic structure storage unit 711 includes a base development storage unit 712, a semantic development storage unit 713, and an integrated topic storage unit 714. On the other hand, a dictionary
The rule unit 705 is provided with a preprocessing dictionary 721, a semantic expansion processing rule 722, a base expansion processing rule 723, and an integration processing rule 724.

【０００７】この話題構造認識装置を用いて話題構造認
識処理を行なう場合、まず、図３に示すように、入力さ
れた言語データ７３０に対する話題構造認識前処理７４
０を行なう。この話題構造認識前処理７４０の第１ステ
ップは、入力した言語データ７３０に対する形態素解析
処理７４１である。形態素解析処理７４１では、入力さ
れた言語データ７３０の文字列を単語ごとに区切って単
語列とし、さらに各単語の品詞や活用語の活用形等を同
定する。続いて、前処理７４０の第２ステップとして、
形態素解析の結果を入力として、単文区切り処理７４２
を行なう。単文区切り処理７４２は、埋め込み文や重文
のように複数の述語を含む文を、１つの述語のみを含む
単文に分割する処理である。前処理７４０の第３ステッ
プとして、顕著名詞句抽出７４３を実行する。顕著名詞
句抽出７４３は、単文区切り処理７４２の結果を入力と
して、各単文において最も強調されている名詞句を抽出
する処理である。これら、話題構造認識前処理７４０に
属する各処理は、辞書・規則部７０５内にある前処理用
辞書７２１を用いて、処理部７０２によって実行され、
その結果は、記憶部７０４内の言語データ記憶部７１０
に格納される。When a topic structure recognition process is performed using this topic structure recognition apparatus, first, as shown in FIG. 3, a topic structure recognition preprocessing 74 for input language data 730 is performed.
Perform 0. The first step of the topic structure recognition pre-processing 740 is a morphological analysis process 741 for the input language data 730. In the morphological analysis process 741, the character string of the input language data 730 is divided into words to form a word string, and the part of speech of each word, the inflected form of the inflected word, and the like are identified. Subsequently, as a second step of the preprocessing 740,
Using the result of the morphological analysis as an input, a single sentence separation process 742
Perform The single sentence delimiting process 742 is a process of dividing a sentence including a plurality of predicates, such as an embedded sentence or a multiple sentence, into a single sentence including only one predicate. As a third step of preprocessing 740, salient noun phrase extraction 743 is executed. The prominent noun phrase extraction 743 is a process of extracting the noun phrase that is most emphasized in each simple sentence by using the result of the simple sentence separation process 742 as an input. These processes belonging to the topic structure recognition pre-processing 740 are executed by the processing unit 702 using the pre-processing dictionary 721 in the dictionary / rule unit 705,
The result is stored in the language data storage unit 710 in the storage unit 704.
Is stored in

【０００８】話題構造認識前処理７４０が完了したら、
話題の展開の処理を基盤展開処理７５０と意味的展開処
理７６０とに分離して実行する。ここで基盤展開とは、
「まず」や「次に」のような手掛かり句や章立て、箇条
書きなどによって明示的に示された話題展開のことであ
り、意味的展開とは、基盤展開の各話題の中で、明示的
ではない形で提示、進行する話題の展開のことである。When the topic structure recognition pre-processing 740 is completed,
The topic development process is separated into a base development process 750 and a semantic development process 760 and executed. Here, infrastructure development
Topic development explicitly indicated by clue phrases such as "first" and "next", chapters, bullet points, etc. It is the development of a topic that is presented and progressed in a non-target form.

【０００９】まず、図３に示されるように、基盤展開処
理７５０において、話題確立区間の決定７５１、話題語
の決定７５２、話題スコープと話題レベルの決定７５３
という３つの処理を順次行なう。ここで話題確立区間と
は、話題が提示、確立される区間のことである。話題語
の決定７５２では、各話題確立区間における顕著名詞句
を話題語候補とし、これら話題語候補の中で優先順位が
最も高いものを選んで話題語とする。話題スコープと話
題レベルの決定７５３では、箇条書き等の構造に基づい
て、処理が行なわれる。基盤展開処理７５０は、辞書・
規則部７０５内の基盤展開処理規則７２３を用いて処理
部７０２で実行され、その結果は記憶部７０４の中の話
題構造記憶部７１１内に含まれる基盤展開記憶部７１２
に格納される。First, as shown in FIG. 3, in the base development process 750, a topic establishment section determination 751, a topic word determination 752, a topic scope and a topic level determination 753 are performed.
Are sequentially performed. Here, the topic establishment section is a section in which a topic is presented and established. In the topic word determination 752, prominent noun phrases in each topic establishment section are set as topic word candidates, and the topic word with the highest priority among these topic word candidates is selected. In the topic scope and topic level determination 753, processing is performed based on a structure such as an itemized list. The base development process 750 is performed by
The processing is performed by the processing unit 702 using the base expansion processing rule 723 in the rule unit 705, and the result is stored in the base expansion storage unit 712 included in the topic structure storage unit 711 in the storage unit 704.
Is stored in

【００１０】このような基盤展開処理７５０における処
理の具体例が図４に示されている。まず、言語データ
（テキスト）の開始時点と(1),(2)で番号付けされた箇
条書きの各項目とを基盤展開の話題確立区間として決定
している。そして、話題語の決定７５２では、最初の話
題確立区間からは「通信サービス」が、２番目の話題確
立区間からは「新規サービス」が、３番目の話題確立区
間からは「従来からのサービス」が、それぞれ、話題語
として選ばれている。FIG. 4 shows a specific example of the processing in the base development processing 750. First, the starting point of the language data (text) and each item of the list numbered in (1) and (2) are determined as the topic establishment section of the base development. Then, in the topic word determination 752, “communication service” from the first topic establishment section, “new service” from the second topic establishment section, and “conventional service” from the third topic establishment section. , Respectively, have been selected as topic words.

【００１１】基盤展開処理７５０の実行後、意味的展開
処理７６０が実行される。意味的展開処理７６０は、基
盤展開処理７５０と同様に、話題確立区間の決定７６
１、話題語の決定７６２、話題スコープと話題レベルの
決定７６３という３つの処理によって構成される。この
意味的展開処理７６０は、辞書・規則部７０５内の意味
的展開処理規則７２２を用いるとともに基盤展開処理７
５０の結果も利用して処理部７０２で実行され、その結
果は記憶部７０４の中の話題構造記憶部７１１に含まれ
る意味的展開記憶部７１３に格納される。After the execution of the base development process 750, a semantic development process 760 is performed. The semantic development process 760 is similar to the base development process 750, and is used to determine the topic establishment section 76
1. It is composed of three processes, namely, topic word determination 762, topic scope and topic level determination 763. This semantic expansion processing 760 uses the semantic expansion processing rule 722 in the dictionary / rule unit 705 and the base expansion processing 7
The processing is also executed by the processing unit 702 using the result of 50, and the result is stored in the semantic development storage unit 713 included in the topic structure storage unit 711 in the storage unit 704.

【００１２】図４に示した例では、話題確立区間とし
て、ある程度以上長い段落が選択され、それらにおける
話題語として、「サービスＡ」と「サービスＢ」が選ば
れている。話題スコープとしては、上述した話題確立区
間の開始点から基盤展開における次の話題確立区間の開
始点までが求められている。話題レベルは、テキストの
意味的展開の場合には、全て同じレベルすなわちレベル
１とされる。In the example shown in FIG. 4, paragraphs longer than a certain length are selected as topic establishment sections, and "service A" and "service B" are selected as topic words in these sections. As the topic scope, the range from the start point of the above-described topic establishment section to the start point of the next topic establishment section in the base development is required. The topic levels are all set to the same level, that is, level 1 in the case of the semantic development of the text.

【００１３】最後に、基盤展開と意味的展開の統合処理
７７０が行なわれ、その結果として、言語データ全体の
話題構造７８０が出力される。この統合処理７７０は、
基盤展開処理７５０と意味的展開処理７６０のそれぞれ
の話題構造を入力とし、辞書・規則部７０５内の統合処
理規則７２４を用いて、処理部７０２によって実行され
る。図４に示した例では、統合処理の結果として、図１
に示したのと同様の話題構造７８０が得られている。Finally, an integration process 770 of base development and semantic development is performed, and as a result, a topic structure 780 of the entire language data is output. This integration processing 770
The respective topic structures of the base development process 750 and the semantic development process 760 are input, and the processing is executed by the processing unit 702 using the integrated processing rules 724 in the dictionary / rule unit 705. In the example shown in FIG. 4, as a result of the integration processing, FIG.
The topic structure 780 similar to that shown in FIG.

【００１４】基盤展開と意味的展開のそれぞれにおい
て、話題確立区間や話題語、話題スコープ、話題レベル
を決定するための規則（意味的展開処理規則７２２や基
盤展開処理規則７２３）は、言語データが対話、モノロ
ーグ、書き言葉テキストなどのどの伝達形態によるもの
であるかによって異なる。伝達形態による話題展開様式
や話題構造認識規則の違いと、話題構造認識実験の結果
については、『竹下他：「話題構造認識の観点からのヒ
ューマンコミュニケーションの研究」電子情報通信学会
１９９３年秋季大会D-62(p.6-64)』に記載がある。In each of the basic development and the semantic development, the rules for determining the topic establishment section, topic word, topic scope, and topic level (semantic development processing rule 722 and base development processing rule 723) are based on language data. It depends on the form of communication, such as dialogue, monologue, or written text. For the differences between topic development styles and topic structure recognition rules depending on the transmission form, and the results of topic structure recognition experiments, see "Takeshita et al .:" Study of Human Communication from the Viewpoint of Topic Structure Recognition "IEICE 1993 Fall Meeting D -62 (p.6-64)].

【００１５】[0015]

【発明が解決しようとする課題】しかしながら、上述し
た従来の話題構造認識方法では、複雑な連体修飾を含む
ような書き言葉テキストに対しては話題語を正しく認識
することが難しく、また、長い話題スコープを持つ話題
の話題語の周りでは複雑な連体修飾が出現することが多
いため、長い話題スコープを持つ話題の話題語の認識が
困難であった。このため、例えば、認識した話題構造を
人間のための章立て・目次構造として利用する場合、章
立ての大きな項目を間違えているために、全体の概要を
把握しにくいという問題点がある。However, in the above-described conventional topic structure recognition method, it is difficult to correctly recognize a topic word for a written word text including a complicated adjunct modification, and a long topic scope is used. It is difficult to recognize topic words of topics having a long topic scope because complex adjunct modification often appears around the topic word of the topic having. For this reason, for example, when the recognized topic structure is used as a chaptering / table-of-contents structure for a human, there is a problem that it is difficult to grasp the entire outline because a large item of the chaptering is mistaken.

【００１６】複雑な連体修飾を含むテキストとして、例
えば、新聞報道に現れるようなものがある。図５はこの
ようなテキストの一例を示している。図５では、文ごと
に、[s0],[s1]のような文番号を付与してある。このテ
キストの主題は明らかに「カンボジアの総選挙」であ
る。このテキストに対し人間が認識した話題構造の例が
図６(a)に示され、従来の話題構造認識方法によって得
られた話題構造が図６(b)に示されている。人間による
話題構造を目次として扱えば、図５に示す元のテキスト
で述べられている項目を推定することができる。これに
対し、従来の話題構造認識方法による話題構造では、特
に一番大きな話題「ＵＮＴＡＣの協力」が不適切である
ために、そもそも何について記述されているかが分かり
にくく、元のテキストの内容を推定することは困難であ
る。[0016] As texts including complicated association modification, for example, there are texts that appear in newspaper reports. FIG. 5 shows an example of such a text. In FIG. 5, sentence numbers such as [s0] and [s1] are assigned to each sentence. The subject of this text is clearly "Cambodia's general election". FIG. 6A shows an example of a topic structure recognized by a human for this text, and FIG. 6B shows a topic structure obtained by a conventional topic structure recognition method. If the topic structure by a human is treated as a table of contents, the items described in the original text shown in FIG. 5 can be estimated. On the other hand, in the topic structure based on the conventional topic structure recognition method, it is difficult to understand what is described in the first place because the biggest topic “UNTAC cooperation” is inappropriate. It is difficult to estimate.

【００１７】本発明の目的は、複雑な連体修飾を含むテ
キストに対しても話題語を正しく認識でき、的確な話題
構造を提供できる方法および装置を提供することにあ
る。An object of the present invention is to provide a method and an apparatus capable of correctly recognizing a topic word even for a text including a complex adjunct modification and providing an accurate topic structure.

【００１８】[0018]

【課題を解決するための手段】本発明の話題語選択方法
は、予め準備された規則を用いて言語データの話題構造
を認識する話題構造認識装置における話題語の決定方法
において、話題構造認識装置に、話題語候補に対して優
先順位を付与するための話題語候補優先順位規則と、前
記優先順位を修正するための修正条件と修正内容の組を
記述した話題語候補優先順位修正テーブルとを含んだ辞
書・規則部を具備し、前記言語データから抽出された話
題語候補に対して話題語候補優先順位規則にしたがって
優先順位を付与し、前記話題語候補優先順位修正テーブ
ルに基づき、予め与えられた修正条件と修正内容との組
に応じ、前記修正条件が成立する場合には対応する修正
内容に応じて前記話題語候補の優先順位を修正し、複数
の話題語候補に付与されている優先順位を比較すること
により話題語を決定する。Topic word selection method of the present invention, in order to solve the problem] is the topic terms a method of determining the in recognizing the topic structure recognition apparatus topic structure of the language data using the previously prepared rules, the topic structure recognition device Excellent for topic word candidates
Topic word candidate priority rules for assigning priorities, and
A set of correction conditions and correction contents for correcting the priority
A word including the described topic word candidate priority correction table
Comprising a write-rule section, the priority assigned according to topic word candidate precedence rules for the language topic word candidates extracted from the data, the topic word candidate priority modification table
Based on a set of a correction condition and a correction content given in advance, when the correction condition is satisfied, the priority of the topic word candidate is corrected according to the corresponding correction content; The topic word is determined by comparing the priorities assigned to the candidates.

【００１９】本発明の話題構造認識装置は、言語データ
を入力するための入力部と、話題構造認識のための規則
類を蓄える辞書・規則部と、該辞書・規則部の規則類を
用いた処理を行なう処理部と、前記処理部による結果を
蓄える記憶部と、前記処理部による処理結果を表示する
表示部とを有し、前記辞書・規則部が、話題語候補に対
して優先順位を付与するための話題語候補優先順位規則
と、前記優先順位を修正するための修正条件と修正内容
の組を記述した話題語候補優先順位修正テーブルとを含
み、前記記憶部が、入力部から入力された言語データに
関する情報を蓄える言語データ記憶部と、話題構造に関
する情報を蓄える話題構造記憶部とを含み、前記言語デ
ータ記憶部が、前記言語データに含まれる各単語の文字
列と品詞に関する情報を格納する単語情報テーブルと、
前記言語データの各単文に含まれる単語と顕著名詞句と
顕著名詞句のタイプに関する情報を格納する単文情報テ
ーブルとを含み、話題構造記憶部が、話題が提示、確立
される範囲である話題確立区間と話題語と話題レベルと
話題スコープとを含む情報を格納するテーブルを含む。The topic structure recognition apparatus of the present invention were used: an input unit for inputting language data, and the dictionary-rule section for storing rules such for topic structure recognition, rules such the dictionary specification and regulations section A processing unit that performs processing, a storage unit that stores the result of the processing unit, and a display unit that displays the processing result of the processing unit, wherein the dictionary / rule unit assigns priority to the topic word candidates. A topic word candidate priority rule for assigning, and a topic word candidate priority correction table describing a set of a correction condition and a correction content for correcting the priority, wherein the storage unit receives an input from an input unit. A language data storage unit for storing information about the generated language data, and a topic structure storage unit for storing information about the topic structure, wherein the language data storage unit stores a character string and a part of speech of each word included in the language data. And the word information table that stores the broadcast,
A single sentence information table for storing information on words, salient noun phrases, and types of salient noun phrases included in each simple sentence of the language data, wherein the topic structure storage unit establishes a topic within a range in which topics are presented and established; It includes a table that stores information including sections, topic words, topic levels, and topic scopes.

【００２０】[0020]

【作用】話題語候補を求めて各話題語候補に優先順位を
付与した後、所定の条件が成立している場合には優先順
位を修正し、優先順位に応じて話題語を決定するように
なっているので、より的確に話題語を選択できる。具体
的には、話題語候補が他の単文に連体修飾されていない
かなどの連体修飾関係を求め、その連体修飾関係に応じ
て話題語候補の優先順位を修正することにより、複雑な
連体修飾を含むテキストの話題語を正しく認識すること
が可能になる。特に、長い話題区間を持った話題の話題
語を正しく認識することが可能となる。After the priority is assigned to each topic word candidate by obtaining the topic word candidates, the priority is corrected if a predetermined condition is satisfied, and the topic word is determined according to the priority. , So you can select topic words more accurately. More specifically, complex adjacency modification is performed by determining adnominal modification relations such as whether the topic word candidate is not adjoined to another simple sentence and correcting the priority of the topic word candidates according to the adjoint modification relation. It is possible to correctly recognize the topic word of the text including. In particular, it is possible to correctly recognize a topic word having a long topic section.

【００２１】[0021]

【実施例】次に本発明の実施例について、図面を参照し
て説明する。図７は本発明の一実施例の話題構造認識装
置の構成を示すブロック図である。この話題構造認識装
置は、図２に示す従来の話題構造認識装置と比べ、特
に、辞書・規則部１０５に含まれる辞書・テーブル類の
構成において異なっている。すなわち、本実施例の話題
構造認識装置には、言語データが入力するデータ入力部
１０１と、各種の処理を実行する処理部１０２と、結果
を表示する表示部１０３と、処理結果や処理途中で必要
となるデータを保持する記憶部１０４と、話題構造認識
処理で使用される辞書や規則類を格納する辞書・規則部
１０５によって構成されている。記憶部１０４には、前
処理後の言語データを記憶する言語データ記憶部１１０
と、中間の処理結果や最終的な処理結果を保持する話題
構造記憶部１１１とが設けられている。言語データ記憶
部１１０には単文情報テーブル１１５と単語情報テーブ
ル１１６が設けられており、話題構造記憶部１１１に
は、基盤展開記憶部１１２と意味的展開記憶部１１３と
統合話題記憶部１１４が設けられている。一方、辞書・
規則部１０５には、前処理用辞書１２１と意味的展開処
理規則１２２と基盤展開処理規則１２３と統合処理規則
１２４と話題語候補優先順位規則１２５と疑問表現辞書
１２６と話題語候補優先順位修正テーブル１２７とが設
けられている。話題語候補優先順位修正テーブル１２７
には、話題語候補が他の単文によって連体修飾されてい
るかという連体修飾関係と話題語候補優先順位への変更
の対が、前もって記述されている。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 7 is a block diagram showing a configuration of a topic structure recognition device according to one embodiment of the present invention. This topic structure recognition apparatus is different from the conventional topic structure recognition apparatus shown in FIG. 2 particularly in the configuration of dictionaries and tables included in the dictionary / rule section 105. That is, in the topic structure recognition apparatus of the present embodiment, a data input unit 101 for inputting language data, a processing unit 102 for executing various processes, a display unit 103 for displaying results, a process result and a process It comprises a storage unit 104 for storing necessary data and a dictionary / rule unit 105 for storing dictionaries and rules used in the topic structure recognition processing. The storage unit 104 has a language data storage unit 110 that stores the language data after the preprocessing.
And a topic structure storage unit 111 for holding intermediate processing results and final processing results. The language data storage unit 110 includes a simple sentence information table 115 and a word information table 116, and the topic structure storage unit 111 includes a base development storage unit 112, a semantic development storage unit 113, and an integrated topic storage unit 114. Have been. On the other hand, a dictionary
The rule unit 105 includes a preprocessing dictionary 121, a semantic expansion processing rule 122, a base expansion processing rule 123, an integration processing rule 124, a topic word candidate priority rule 125, a question expression dictionary 126, and a topic word candidate priority correction table. 127 are provided. Topic word candidate priority correction table 127
Describes in advance a pair of an adjacency modification relationship indicating whether a topic word candidate has been modified by another simple sentence and a change to a topic word candidate priority.

【００２２】この話題構造認識装置を用いて言語データ
の話題構造の解析を行なう場合、その処理は図３に示し
た従来の処理の流れと同様に処理が行なわれるが、話題
語の決定方法において相違する。この話題構造認識装置
を使用する場合には、話題語候補優先順位規則１２５を
参照して顕著名詞句に対して話題語候補としての優先順
位を付与した後に、その話題語候補が他の単文に連体修
飾されていないかなどの連体修飾関係を求めてその結果
に応じて優先順位を修正することにより、話題語が決定
される。以下、話題語の決定方法の中心にして、本実施
例の話題構造認識装置による話題構造の解析手順を説明
する。When the topic structure of language data is analyzed using this topic structure recognition apparatus, the processing is performed in the same manner as the conventional processing flow shown in FIG. Different. In the case of using this topic structure recognition device, the priority as a topic word candidate is assigned to a prominent noun phrase with reference to the topic word candidate priority rule 125, and then the topic word candidate is assigned to another simple sentence. A topic word is determined by finding a modification relation such as whether the modification is performed and correcting the priority order according to the result. Hereinafter, a procedure of analyzing a topic structure by the topic structure recognition apparatus of the present embodiment will be described, centering on a topic word determination method.

【００２３】［言語データ記憶部と基盤展開記憶部と意
味的展開記憶部］本実施例の話題構造認識装置による話
題構造の解析は、話題語の決定を行なうまでは、上述の
図３に示した手順と同様に進行する。そして、話題構造
認識前処理と基盤展開処理と意味的展開処理の結果は、
それぞれ、言語データ記憶部１１０と基盤展開記憶部１
１２と意味的展開記憶部１１３に保存されている。基盤
展開処理の全てと、意味的展開処理の話題確立区間の決
定とが終了した時点におけるこれらの各記憶部１１０,
１１２,１１３の状態が、図８に示されている。[Language Data Storage Unit, Base Expansion Storage Unit, and Semantic Expansion Storage Unit] The topic structure analysis performed by the topic structure recognition apparatus of this embodiment is shown in FIG. 3 until the topic word is determined. The procedure proceeds as described above. And the result of topic structure recognition pre-processing, base expansion processing and semantic expansion processing is
Language data storage unit 110 and base development storage unit 1
12 and the semantic expansion storage unit 113. At the time when all of the base expansion processing and the determination of the topic establishment section of the semantic expansion processing are completed, each of these storage units 110,
The states of 112 and 113 are shown in FIG.

【００２４】上述したように言語データ記憶部１１０に
は、単文情報テーブル１１５と単文情報テーブル１１６
が含まれている。単語情報テーブル１１６には、テキス
トでの単語の出現順を示す単語番号のフィールドと、形
態素解析結果としての単語の文字列を記述するフィール
ドと、その文字列の品詞等の情報を記述するためのフィ
ールドとが設けられている。図８の例では、認識対象の
テキストの最初の単語はサ変名詞（「する」と結び付い
てサ行変格動詞となり得る名詞）の「通信」である。As described above, the linguistic data storage unit 110 stores the simple sentence information table 115 and the simple sentence information table 116.
It is included. The word information table 116 has a field for a word number indicating the order of appearance of the word in the text, a field for describing a character string of the word as a result of the morphological analysis, and information for describing information such as the part of speech of the character string. A field is provided. In the example of FIG. 8, the first word of the text to be recognized is “communication” of a sa-variable noun (a noun that can be combined with “suru” to become a sa-variable verb).

【００２５】一方、単文情報テーブル１１５には、テキ
スト中での単文の出現順を単文番号として記述するフィ
ールドと、その単文の開始と終了の単語番号を記述する
ための単語範囲フィールドと、その単文に含まれている
顕著名詞句を記述するためのフィールドと、その顕著名
詞句のタイプを記述するためのフィールドと、その単文
に含まれている疑問表現を記述するためのフィールド
と、前述の顕著名詞句が基盤展開か意味的展開での話題
語候補となったときにその優先順位を記述するためのフ
ィールドが含まれる。図８の例では、最初の単文は単語
番号０から８までの範囲であって、その単文に含まれる
顕著名詞句は単語番号０、すなわち「通信」であり、そ
れは明示タイプの顕著名詞句である。顕著名詞句が複数
の単語から構成される場合は、０,１のように複数の単
語番号を指定する。最初の単文の疑問表現のフィールド
の値は"−１"となっているが、これは疑問表現がないこ
とを意味する。もし、辞書・規則部１０５に含まれる疑
問表現辞書１２６に記述されている疑問表現が単文内に
検出されれば、その疑問表現番号を単文情報テーブル１
１５の疑問表現フィールドに記述する。例えば、「問い
かける」という疑問表現が検出されれば、その疑問表現
番号０をその単文に対応するレコードの疑問表現フィー
ルドに記述する。また最初の単文の話題候補優先順位は
２となっているが、これがどの話題確立区間における話
題候補かということについては後述する。On the other hand, in the simple sentence information table 115, a field for describing the order of appearance of the simple sentence in the text as a simple sentence number, a word range field for describing the start and end word numbers of the simple sentence, , A field for describing the type of salient noun phrase, a field for describing the type of salient noun phrase, a field for describing the question expression contained in the simple sentence, When a noun phrase becomes a topic word candidate in base expansion or semantic expansion, a field for describing the priority order is included. In the example of FIG. 8, the first simple sentence ranges from word numbers 0 to 8, and the prominent noun phrase included in the simple sentence is word number 0, that is, “communication”, which is an explicit type prominent noun phrase. is there. When the salient noun phrase is composed of a plurality of words, a plurality of word numbers such as 0 and 1 are designated. The value of the field of the first single sentence question expression is "-1", which means that there is no question expression. If a question expression described in the question expression dictionary 126 included in the dictionary / rule unit 105 is detected in a single sentence, the question expression number is stored in the single sentence information table 1.
Described in 15 question expression fields. For example, if the question expression "question" is detected, the question expression number 0 is described in the question expression field of the record corresponding to the simple sentence. Also, the topic candidate priority of the first simple sentence is 2, and it will be described later in which topic establishment section this is a topic candidate.

【００２６】基盤展開記憶部１１２には、話題番号ごと
に、話題が提示・確立される話題確立区間の開始と終了
をそれぞれ示す単文番号を記述するフィールド（話題確
立区間フィールド）と、話題語を記述するフィールド
と、話題レベルを記述するフィールドと、話題スコープ
をその開始および終了の単語番号で記述するフィールド
とが含まれる。また、意味的展開記憶部１１３にも、基
盤展開記憶部１１２と同様のフィールドが含まれる。図
８の例では、基盤展開記憶部１１２において、最初の話
題の話題確立区間は単文番号０から１０までの範囲であ
り、その話題語は単語番号０、すなわち「通信」となっ
ている。ここで、複数の単語から話題語が構成される場
合には、０,１のように複数の単語番号を指定するもの
とする。また、この話題の話題レベルは１であり、話題
スコープは単文番号０から３０２９までの範囲である。The base development storage unit 112 stores, for each topic number, a field (topic establishment section field) for describing a single sentence number indicating the start and end of a topic establishment section in which a topic is presented and established, and a topic word. A field for describing, a field for describing a topic level, and a field for describing a topic scope by its start and end word numbers are included. In addition, the semantic development storage unit 113 includes the same fields as the base development storage unit 112. In the example of FIG. 8, in the base development storage unit 112, the topic establishment section of the first topic is in the range of simple sentence numbers 0 to 10, and the topic word is the word number 0, that is, “communication”. Here, when a topic word is composed of a plurality of words, a plurality of word numbers such as 0 and 1 are designated. The topic level of this topic is 1, and the topic scope is in the range of single sentence numbers 0 to 3029.

【００２７】［話題語の決定処理の全体の流れ］基盤展
開と意味的展開における各話題確立区間に含まれている
顕著名詞句を話題語候補とする。各話題確立区間におい
て、話題語候補から話題語を選択する処理の流れを図９
のフローチャートに示す。基盤展開と意味的展開とで
は、話題語を選択するために用いる規則が実際には異な
るが、処理の流れは両方とも図９のようになる。まず、
各話題語候補に対して話題語候補優先順位規則１２５に
基づき、話題語候補としての優先順位を付与する（ステ
ップ２０１）。そして、その話題語候補に関する連体修
飾関係を検出し（ステップ２０２）、話題語候補優先順
位修正テーブル１２７にしたがって各話題語候補の優先
順位を修正し（ステップ２０３）、優先順位の高いもの
をもって話題語と決定し、処理を終了する。[Overall Flow of Topic Word Decision Processing] A salient noun phrase included in each topic establishment section in the basic development and the semantic development is set as a topic word candidate. FIG. 9 shows a flow of processing for selecting a topic word from topic word candidates in each topic establishment section.
Is shown in the flowchart of FIG. Although the rules used for selecting a topic word are actually different between the basic development and the semantic development, both of the processing flows are as shown in FIG. First,
Each topic word candidate is given a priority as a topic word candidate based on the topic word candidate priority rule 125 (step 201). Then, the union modification relation regarding the topic word candidate is detected (step 202), and the priority of each topic word candidate is corrected according to the topic word candidate priority correction table 127 (step 203). The word is determined, and the process ends.

【００２８】基盤展開における話題語の決定の場合、基
盤展開での各話題確立区間に含まれている顕著名詞句を
その話題確立区間における話題語候補とし、基盤展開用
の話題候補優先順位にしたがって優先順位を付与し、話
題語優先順位修正テーブルを用い話題語候補の連体修飾
関係に応じて各話題語候補の優先順位を修正し、各話題
確立区間における話題語候補から、最も優先順位が最も
高いものを選び、選ばれた候補が１つしかない場合はそ
の候補を話題語とし、選ばれた候補が複数ある場合は、
その話題確立区間が箇条書き全体の話題を確立するため
の簡条書き全体タイプであれば時間的に最も遅く出現し
た候補を、それ以外の章立てタイプであれば時間的に最
も早く出現した候補を、基盤展開での話題語として選
ぶ。In the case of determining a topic word in the basic development, a prominent noun phrase included in each topic establishment section in the basic development is set as a topic word candidate in the topic establishment section, and is determined according to the topic candidate priority for the basic development. Priorities are assigned, and the priority of each topic word candidate is corrected using the topic word priority correction table according to the union modification relation of the topic word candidates. From the topic word candidates in each topic establishment section, the highest priority is given. If you select a high candidate and only one candidate is selected, the candidate is the topic word. If there are multiple candidates,
If the topic establishment section is the short type whole type for establishing the topic of the whole itemized list, the candidate that appears latest in time is used. If it is the other chapter type, the candidate that appears first in time is used. Is selected as a topic word in the base development.

【００２９】また、意味的展開での話題語決定の場合、
意味的展開での各話題確立区間に含まれている顕著名詞
句を各話題確立区間における話題語候補とし、意味的展
開用の話題候補優先順位にしたがって各候補に優先順位
を付与し、上述と同様に話題語候補優先順位修正テーブ
ルを用いて各話題話候補の優先順位を修正し、各話題確
立区間における話題語候補から、最も優先順位が最も高
いものを選び、選ばれた候補が１つしかない場合はその
候補を話題語とし、選ばれた候補が複数ある場合は、時
間的に最も早く出現した候補を意味的展開での話題語と
する。In the case of topic word determination in semantic development,
The prominent noun phrase included in each topic establishment section in the semantic expansion is set as a topic word candidate in each topic establishment section, and priority is given to each candidate according to the topic candidate priority for the semantic expansion. Similarly, the priority of each topic topic candidate is corrected using the topic word candidate priority correction table, the topic word candidate in each topic establishment section having the highest priority is selected, and one selected candidate is selected. When there is only one candidate, the candidate is set as a topic word, and when there are a plurality of selected candidates, the candidate that appears earliest in time is set as a topic word in a semantic expansion.

【００３０】［話題候補優先順位規則による優先順位の
付与］ここで話題語候補優先順位規則１２５による優先
順位の付与を説明する。図１０は基盤展開において使用
される話題語候補優先順位規則を示し、図１１は意味的
展開において使用される話題語候補優先順位規則を示し
ている。図８に示した例にこれらに話題語候補優先順位
規則を適用した場合を説明する。意味的展開記憶部１１
３に記録されている最初の話題（話題番号０）の話題確
立区間は単文番号８０から８３の範囲であるので、単文
情報テーブル１１５上のその範囲の単文中の顕著名詞句
が話題候補となる。単文番号８０と８１のそれぞれの単
文に含まれる顕著名詞句は、単文情報テーブル１１５に
よると非明示タイプであるが、単語情報テーブル１１６
によるといずれの顕著名詞句も固有名詞を含んでいるの
で、図１１の優先順位規則により、優先順位は２とな
る。一方、単文番号８２と８３の単文はともに顕著名詞
句を持たないので、話題語候補優先順位も持たない。こ
の意味的展開の最初の話題確立区間について、上述のよ
うにして話題語候補優先順位を付与した後の、単文情報
テーブル１１５の状態を図１２に示す。単文番号８２と
８３については、優先順位を持たないので、−１という
値を話題語候補優先順位フィールドに記録している。[Priority Assignment by Topic Candidate Priority Rule] Here, the assignment of priority by the topic word candidate priority rule 125 will be described. FIG. 10 shows topic word candidate priority rules used in base development, and FIG. 11 shows topic word candidate priority rules used in semantic development. A case where topic word candidate priority rules are applied to the example shown in FIG. 8 will be described. Semantic expansion storage unit 11
Since the topic establishment section of the first topic (topic number 0) recorded in No. 3 is in the range of simple sentence numbers 80 to 83, the salient noun phrases in the single sentence in that range on the simple sentence information table 115 are topic candidates. . According to the simple sentence information table 115, the prominent noun phrases included in the simple sentences of the simple sentence numbers 80 and 81 are implicit types, but the word information table 116.
According to the above, since all prominent noun phrases include proper nouns, the priority is 2 according to the priority rule shown in FIG. On the other hand, the simple sentences of the simple sentence numbers 82 and 83 have no prominent noun phrases, and thus do not have the topic word candidate priority. FIG. 12 shows the state of the single sentence information table 115 after the topic word candidate priorities are assigned as described above for the first topic establishment section of this semantic development. Since the single sentence numbers 82 and 83 have no priority, a value of -1 is recorded in the topic word candidate priority field.

【００３１】［連体修飾関係の検出］図９のフローチャ
ートに示されるように、話題語候補優先順位の付与を行
なった後、話題語候補に関する連体修飾関係を検出す
る。ここで連体修飾とは、名詞に対する修飾のことであ
り、例えば「彼が持ってきたカメラ」という名詞句にお
いて、「彼が持ってきた」という単文が「カメラ」とい
う名詞を修飾している。連体修飾関係を検出するための
処理の一例を図１３のフローチャートを用いて説明す
る。連体修飾関係検出の処理は各話題語候補に対して行
なわれる。[Detection of Adjunct Modification Relationship] As shown in the flowchart of FIG. 9, after giving the priority of topic word candidates, the association modification of the topic word candidates is detected. Here, the union modification is a modification to a noun. For example, in the noun phrase “camera he brought”, a simple sentence “he brought” modifies the noun “camera”. An example of a process for detecting the noun modification relation will be described with reference to the flowchart in FIG. The process of detecting the noun modification relation is performed for each topic word candidate.

【００３２】まず話題語候補の中で最も後ろの単語をＡ
とし、元のテキストにおいてＡの直前の単語をＢとする
（ステップ２１１）。Ｂが現在の話題語候補に含まれて
いるかどうかが判断され（ステップ２１２）、話題語候
補に含まれている場合にはステップ２１５に移行し、話
題語候補に含まれていない場合には、Ｂが活用語の連体
形であるかどうかが判断される（ステップ２１３）。ス
テップ２１３で連体形でない場合にはステップ２１５に
進み、連体形である場合には、Ｂを含む単文は話題語候
補Ａを連体修飾しているとものとし（ステップ２１
４）、ステップ２１５に進む。すなわち、話題語候補の
中の最後の単語Ａに対してその直前の単語Ｂがその話題
語候補に含まれておらず、かつ単語Ｂが動詞や形容詞の
ように語尾が変化する活用語の連体形であれば、Ｂを含
む単文は話題語候補を連体修飾しているものと見なして
いる。First, the last word in the topic word candidates is A
And the word immediately before A in the original text is B (step 211). It is determined whether or not B is included in the current topic word candidate (step 212). If B is included in the topic word candidate, the process proceeds to step 215. If B is not included in the topic word candidate, It is determined whether B is a continuous form of the inflected word (step 213). If it is determined in step 213 that it is not a continuous form, the process proceeds to step 215. If it is a continuous form, it is assumed that the simple sentence including B modifies the topic word candidate A in a continuous form (step 21).
4) Go to step 215. That is, the word B immediately before the last word A in the topic word candidate is not included in the topic word candidate, and the word B is a series of inflected words whose endings change like verbs or adjectives. If it is a body shape, a simple sentence containing B is regarded as a subject word candidate that has been modified continuously.

【００３３】ステップ２１５では、単語Ａがその話題語
候補の中で最も前の位置にあるかどうかが調べられ、最
も前の位置にある単語であれば処理を終了し、そうでな
ければ、話題語候補中でＡの前の単語をあらためてＡと
し、元のテキストにおいてその更新後のＡの直前にある
単語をＢとし（ステップ２１６）、そののちステップ２
１２に戻ってＢが話題語候補に含まれているかどうかの
処理から再度実行する。At step 215, it is checked whether the word A is at the earliest position in the topic word candidate. If the word is at the earliest position, the processing is terminated. The word before A in the word candidate is set as A again, and the word immediately before A after the update in the original text is set as B (step 216).
Then, the process returns to step S12, and the process is repeated from the process of determining whether or not B is included in the topic word candidate.

【００３４】図８に示した例で考える。単文番号８０の
単文に含まれている顕著名詞句の話題語候補は単語番号
１０５９の単語だけであるので、単語番号１０５９の単
語をＡとし、単語番号１０５８の単語をＢとする。単語
Ｂは話題語候補に含まれていないが活用語ではないの
で、連体修飾関係は検出されない。また、Ａは話題語候
補中で最も前の単語であるので、この話題語候補に対す
る連体修飾関係検出の処理はこれで終了する。Consider the example shown in FIG. Since only the word with the word number 1059 is the topic word candidate of the salient noun phrase included in the single sentence with the single sentence number 80, the word with the word number 1059 is A and the word with the word number 1058 is B. The word B is not included in the topic word candidate but is not an inflected word, so that the union modification relation is not detected. Also, since A is the earliest word in the topic word candidate, the process of detecting the union modification relation for this topic word candidate ends here.

【００３５】次に単文番号８１の単文に含まれている顕
著名詞句の話題語候補について調べる。この話題語候補
は単語番号が１０６３と１０６４の２つの単語から構成
されるので、まず、単語番号１０６４の単語をＡとし
て、単語番号１０６３の単語をＢとする。すると、単語
Ｂは話題語候補に含まれるので、この時点では連体修飾
関係は検出されない。このとき、Ａは話題語候補中で最
も前の単語ではないので、次に新たに単語番号１０６３
の単語をＡとし、単語番号１０６２の単語Ｂとする。す
ると、Ｂは話題語候補には含まれず、かつＢは活用語の
連体形であるので、Ｂを含む単文８０は現在の話題語候
補を連体修飾していると判断される。この時点ではＡは
話題語候補中で最も前の単語であるので、ここで処理を
終了する。Next, topic word candidates of salient noun phrases included in the simple sentence with the simple sentence number 81 are examined. Since this topic word candidate is composed of two words having the word numbers 1063 and 1064, the word of the word number 1064 is assumed to be A, and the word of the word number 1063 is assumed to be B. Then, since word B is included in the topic word candidate, the union modification relation is not detected at this time. At this time, since A is not the earliest word in the topic word candidates, the word number 1063 is newly added.
Is assumed to be A and word B of word number 1062 is assumed. Then, since B is not included in the topic word candidate and B is a conjunctive form of the conjugate word, it is determined that the simple sentence 80 including B modifies the current topic word candidate by the union. At this point, A is the earliest word in the topic word candidates, so the processing is terminated here.

【００３６】［優先順位の修正と話題語の選択］図９の
フローチャートに示されるように、連体修飾関係の検出
を行なった後に、話題語候補優先順位修正テーブル１２
７にしたがって各話題語候補の優先順位を修正する。図
１４は話題語候補優先順位修正テーブル１２７の構成例
を示している。この例では、他の単文によって連体修飾
されている話題語候補の優先順位を０.５に修正するこ
とにより、優先順位を高めている。これ以外にも、話題
語候補優先順位修正テーブルによる優先順位の修正方法
として、解析対象のテキスト・データの性質に応じて、
例えば、他の話題語候補を連体修飾している単文に含ま
れる話題語候補の優先順位を操作することも考えられ
る。また、修正後の優先順位として０.５のような絶対
値を与えるのではなく、−０.５のようにそれまでの値
への加減値を与えた方がよい場合もあるので、テキスト
の性質に応じて修正方法を適宜選択する。[Correction of Priority and Selection of Topic Word] As shown in the flowchart of FIG. 9, after detecting the union modification relation, the topic word candidate priority correction table 12
The priority of each topic word candidate is corrected according to 7. FIG. 14 shows a configuration example of the topic word candidate priority correction table 127. In this example, the priority of the topic word candidate modified by the unary sentence with another simple sentence is corrected to 0.5 to increase the priority. In addition, as a method of correcting the priority using the topic word candidate priority correction table, according to the characteristics of the text data to be analyzed,
For example, it is conceivable to operate the priority order of the topic word candidates included in the simple sentence that modifies other topic word candidates. In some cases, instead of giving an absolute value such as 0.5 as a priority after correction, it may be better to add or subtract a value up to that value such as -0.5. The correction method is appropriately selected according to the properties.

【００３７】図８に示した例では、単文番号８１の単文
に含まれる顕著名詞句の話題語候補は、他の単文８０に
よって連体修飾されているので、図１４に示した話題語
候補優先順位修正テーブルにしたがって、その優先順位
が０.５に修正される。優先順位の修正後の単文情報テ
ーブル１１５の状態が図１５に示されている。In the example shown in FIG. 8, since the topic word candidates of the salient noun phrase included in the simple sentence with the simple sentence number 81 are modified by the unary sentence 80, the topic word candidate priority shown in FIG. According to the correction table, its priority is corrected to 0.5. FIG. 15 shows the state of the single sentence information table 115 after the priority is corrected.

【００３８】連体修飾関係に基づく優先順位の修正を行
なう前の単文情報テーブル１１５（図１２参照）に基づ
いて話題語を選択すると、単文番号８０と８１の単文の
話題語候補がともに優先順位が２であるので、時間的に
早く出現している単文番号８０の単文の方に含まれる話
題語候補が話題語として選ばれる。すなわち、単語番号
１０５９の単語「Ｐ社」が話題語として選ばれる。これ
に対して、上述したように連体修飾関係に応じて優先順
位を修正した後の単文情報テーブル１１５（図１５参
照）に基づいて話題語を選択すると、単文番号８１の単
文の話題語候補の優先順位が０.５と最も高いので、単
語番号１０６３，１０６４の単語からなる「製品Ｑ」が
話題語として選ばれる。When a topic word is selected based on the simple sentence information table 115 (see FIG. 12) before the priority is corrected based on the union modification relationship, the priority of both the single word topic word candidates of the simple sentence numbers 80 and 81 are changed. Since it is 2, the topic word candidate included in the simple sentence with the simple sentence number 80 appearing earlier in time is selected as the topic word. That is, the word “Company P” with the word number 1059 is selected as a topic word. On the other hand, when a topic word is selected based on the single sentence information table 115 (see FIG. 15) after the priority is corrected according to the noun modification relation as described above, the topic word candidate of the simple sentence with the simple sentence number 81 is selected. Since the priority is the highest at 0.5, “product Q” including the words with the word numbers 1063 and 1064 is selected as the topic word.

【００３９】本実施例の方法を用いて図５のテキスト例
に対して話題構造認識を行なった結果の例を図１６に示
す。従来の方法を用いた結果を表わす図６(b)と比較す
ると、本実施例の方法による結果によれば、元のテキス
トの内容を容易に推定することが可能であることが分か
る。FIG. 16 shows an example of the result of topic structure recognition for the text example of FIG. 5 using the method of the present embodiment. Comparing with FIG. 6B showing the result using the conventional method, it can be seen that the result of the method of the present embodiment makes it possible to easily estimate the contents of the original text.

【００４０】［話題構造認識評価実験の結果］次に、上
述した本実施例の話題語選択方法を組み込んだ話題構造
認識システムと、この方法を組み込まないシステムとを
用いて、実際のテキスト・データの話題構造を認識した
結果について説明する。実験では、全部で６３件の新聞
記事をテキスト・データとして使用した。そして、これ
らのテキスト・データに対して人間が認識した話題構造
と計算機が認識した話題構造とを比較して再現率と適合
率とを算出し、本実施例によるシステムと本実施例によ
らないシステム（従来のシステム）の双方を評価した。
ここで、再現率とは、人間が認識した話題構造のうち、
どれだけが計算機によっても認識されているかを示す尺
度であり、適合率とは、計算機が認識した話題構造のう
ち、どれだけが人間によっても認識されているかを示す
尺度である。もし、人間と計算機がそれぞれ認識した話
題構造が一致すれば、再現率、適合率とも１００％とな
る。適合率と再現率は、話題確立区間、話題語、話題ス
コープに対してそれぞれ求めた。話題スコープに関する
適合率、再現率とは、話題語が正しいものについて、話
題スコープの長さを重み付けして評価したものであり、
例えば、長い話題スコープを持つ大きな話題が正しく認
識されていれば、適合率、再現率はよくなり、逆に短い
話題スコープを持つ小さな話題を誤認識してもそれほど
適合率、再現率は悪くならない。結果を表１に示す。[Results of Evaluation Test on Topic Structure Recognition] Next, actual text data was obtained using a topic structure recognition system incorporating the above-described topic word selection method of the present embodiment and a system not incorporating this method. The result of recognizing the topic structure will be described. In the experiment, a total of 63 newspaper articles were used as text data. Then, the topic structure recognized by a human and the topic structure recognized by the computer are compared with respect to the text data to calculate a recall and a precision, and the system according to the present embodiment and the system according to the present embodiment are not used. Both systems (conventional systems) were evaluated.
Here, the recall is the topic structure recognized by humans.
The precision is a measure indicating how much is recognized by the computer, and the relevance is a measure indicating how much of the topic structure recognized by the computer is also recognized by a human. If the topic structures recognized by the human and the computer match, both the recall and the precision are 100%. The precision and recall were obtained for each topic establishment section, topic word, and topic scope. The relevance and recall for the topic scope are the weights of the topic scopes that evaluate the correct topic words.
For example, if a large topic with a long topic scope is correctly recognized, the precision and recall will improve, and conversely, even if a small topic with a short topic scope is misrecognized, the precision and recall will not decrease so much . Table 1 shows the results.

【００４１】[0041]

【表１】表１から明らかなように、本発明にしたがって連体修飾
関係を考慮することによって、連体修飾関係を考慮しな
い従来の方法に比べ、話題スコープについて、適合率と
再現率がともに向上する。これにより、本発明を用いる
ことにより、より大きな話題スコープを持つ話題語を正
しく認識できることが確認された。[Table 1] As is clear from Table 1, by considering the noun modification relationship according to the present invention, both the relevance and the recall are improved for the topic scope as compared with the conventional method not considering the noun modification relationship. Thus, it was confirmed that topic words having a larger topic scope can be correctly recognized by using the present invention.

【００４２】[0042]

【発明の効果】以上説明したように本発明は、話題語候
補に対して優先順位を付与した後に話題語候補優先順位
修正テーブルによって優先順位を修正することにより、
正しく話題語を決定できるようになる。特に、連体修飾
関係に基づいて優先順位を修正することにより、複雑な
連体修飾を含む書き言葉テキストの話題語を正しく認識
できるようになるという効果がある。特に長い話題スコ
ープを持った話題に関して特に有効である。大きな話題
構造が正しく認識されるので、章立て・目次構造として
利用する場合に、ユーザにとって分かりやすいものとな
る。As described above, according to the present invention, the priorities are assigned to the topic word candidates and then the priorities are corrected by the topic word candidate priority correction table.
You will be able to correctly determine topic words. In particular, by correcting the priority based on the noun modification relation, there is an effect that a topic word of a written word text including a complicated noun modification can be correctly recognized. This is especially useful for topics with a long topic scope. Since a large topic structure is correctly recognized, it becomes easy for the user to understand when used as a chapter / table of contents structure.

[Brief description of the drawings]

【図１】人間による話題構造認識の例である。FIG. 1 is an example of topic structure recognition by a human.

【図２】従来の話題構造認識装置の一例の構造を示すブ
ロック図である。FIG. 2 is a block diagram showing the structure of an example of a conventional topic structure recognition device.

【図３】従来の話題構造認識のための処理を示すフロー
チャートである。FIG. 3 is a flowchart showing a conventional process for topic structure recognition.

【図４】従来の話題構造認識における前処理以降の例で
ある。FIG. 4 is an example after the pre-processing in the conventional topic structure recognition.

【図５】テキストの一例を示す図である。FIG. 5 is a diagram illustrating an example of a text.

【図６】(a)は図５に示すテキストに対して人間によっ
て認識された話題構造の例を示す図、(b)は図５に示す
テキストに対して従来の話題構造認識方法を適用して話
題構造を抽出した結果を示す図である。6A is a diagram showing an example of a topic structure recognized by a human for the text shown in FIG. 5, and FIG. 6B is a diagram showing a case where a conventional topic structure recognition method is applied to the text shown in FIG. FIG. 14 is a diagram showing a result of extracting a topic structure by using FIG.

【図７】本発明の一実施例の話題構造認識装置の構成を
示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of a topic structure recognition device according to an embodiment of the present invention.

【図８】各テーブル、各記憶部間の関係を示す図であ
る。FIG. 8 is a diagram showing a relationship between each table and each storage unit.

【図９】図７の装置を利用して行なう話題語決定のため
の処理を示すフローチャートである。FIG. 9 is a flowchart showing a process for determining a topic word performed using the apparatus of FIG. 7;

【図１０】基盤展開用の話題候補優先順位規則の一例を
示す図である。FIG. 10 is a diagram showing an example of a topic candidate priority order rule for base development.

【図１１】意味的展開用の話題候補優先順位規則の一例
を示す図である。FIG. 11 is a diagram illustrating an example of topic candidate priority rules for semantic development.

【図１２】図１１の意味的展開の話題候補優先順位規則
にしたがい各話題候補に優先順位を付与した時点での単
文情報テーブルの状態を示す図である。12 is a diagram showing a state of a single sentence information table at the time when priorities are assigned to respective topic candidates according to the topic candidate priority rules of the semantic development of FIG. 11;

【図１３】連体修飾検出のための処理を示すフローチャ
ートである。FIG. 13 is a flowchart showing a process for detecting a modification of a union.

【図１４】話題候補優先順位修正テーブルの構成例を示
すである。FIG. 14 is a diagram illustrating a configuration example of a topic candidate priority correction table.

【図１５】話題候補優先順位の修正後の単文情報テーブ
ルの状態を示す図である。FIG. 15 is a diagram illustrating a state of the single sentence information table after the topic candidate priority is corrected.

【図１６】図５のテキスト例に対して本発明の方法を適
用した場合の結果の例を示す図である。FIG. 16 is a diagram showing an example of a result when the method of the present invention is applied to the text example of FIG. 5;

[Explanation of symbols]

１０１データ入力部１０２処理部１０３表示部１０４記憶部１０５辞書・規則部１１０言語データ記憶部１１１話題構造記憶部１１２基盤展開記憶部１１３意味的展開記憶部１１４統合話題記憶部１１５単文情報テーブル１１６単語情報テーブル１２１前処理用辞書１２２意味的展開処理規則１２３基盤展開処理規則１２４統合処理規則１２５話題語候補優先順位規則１２６疑問表現辞書１２７話題語候補優先順位修正テーブル２０１〜２０３,２１１〜２１６ステップ Reference Signs List 101 data input unit 102 processing unit 103 display unit 104 storage unit 105 dictionary / rule unit 110 language data storage unit 111 topic structure storage unit 112 base development storage unit 113 semantic development storage unit 114 integrated topic storage unit 115 single sentence information table 116 words Information Table 121 Preprocessing Dictionary 122 Semantic Expansion Processing Rule 123 Base Expansion Processing Rule 124 Integration Processing Rule 125 Topic Word Candidate Priority Rule 126 Question Expression Dictionary 127 Topic Word Candidate Priority Correction Table 201-203, 211-216 Step

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−160710（ＪＰ，Ａ) 特開平７−160711（ＪＰ，Ａ) 特開平７−160712（ＪＰ，Ａ) 特開平６−236410（ＪＰ，Ａ) 特開平６−139276（ＪＰ，Ａ) 特開平４−332084（ＪＰ，Ａ) 竹下敦，「話題構造認識を用いた映像検索システム」，情報処理学会研究報告 94−ＩＭ−15，日本，1994年３月11 日，Ｖｏｌ．94，Ｎｏ．24，ｐ．１− ｐ．８竹下敦，他，「Ｄ−67 モノローグにおける話題導入部の検出」1994年電子情報通信学会秋季大会，日本，1994年９月５日，ｐ．70 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/30 ────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-7-160710 (JP, A) JP-A-7-160711 (JP, A) JP-A-7-160712 (JP, A) JP-A-6-160712 236410 (JP, A) JP-A-6-139276 (JP, A) JP-A-4-332084 (JP, A) Atsushi Takeshita, "Video Retrieval System Using Topic Structure Recognition", Information Processing Society of Japan Research Report 94- IM-15, Japan, March 11, 1994, Vol. 94, no. 24, p. 1-p. 8. Atsushi Takeshita, et al., "Detection of topic introductions in D-67 monologue" 1994 IEICE Autumn Conference, Japan, September 5, 1994, p. 70 (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/30

Claims

(57) [Claims]

1. A method for determining a topic word in a topic structure recognition apparatus for recognizing a topic structure of language data using rules prepared in advance, wherein the topic structure recognition apparatus assigns priorities to topic word candidates.
Topic word candidate priority rule for giving, and the priority
That describes a set of correction conditions and correction contents for correcting
Dictionary / rule part including title candidate priority correction table
And assigns priorities to topic word candidates extracted from the language data in accordance with the topic word candidate priority rules. Based on the topic word candidate priority correction table, correction conditions and corrections given in advance are provided. According to a set of contents, when the correction condition is satisfied, the priority of the topic word candidate is corrected according to the corresponding correction content, and the priority assigned to a plurality of topic word candidates is compared. A topic word selection method characterized by determining a topic word by using

2. The topic structure recognizing device divides the linguistic data into simple sentences which are units each having only one predicate, and for each simple sentence, a prominent noun phrase which is a noun phrase most emphasized in the simple sentence. To perform topic structure pre-recognition processing including identifying the type of the prominent noun phrase, and then develop the topic into an explicitly indicated base expansion and a semantic expansion developed in the base expansion Using a base expansion processing rule and a semantic expansion processing rule, respectively, for the base expansion, determining a topic establishment section in which a topic is presented / established; determining a topic word in the topic establishment section; The topic level and topic scope representing the nesting of topics are sequentially determined, and then, regarding the semantic expansion, the topic establishment section is determined, the topic word is determined, and the topic scope and topic level are sequentially determined. After using the integrated processing rules, if rows the process of integrating with respect to each of the processing result of the semantic development based deployment,
Topic word selection method as claimed in 請 Motomeko 1.

3. The topic word selection method according to claim 1, wherein the set of the correction condition and the correction content includes, as the content, a priority order correction corresponding to a noun modification relation to the topic word candidate.

4. The topic word selecting method according to claim 3, wherein the set of the correction condition and the correction content includes a content of raising a priority for a topic word candidate which is subject to continuous modification.

5. An input unit for inputting language data,
A dictionary / rule part for storing rules for topic structure recognition;
Includes a processing unit that performs processing using the dictionary specification and regulations of the rules include, a storage unit for storing the result of the processing unit, and a display unit for displaying the processing result by the processing unit, said dictionary and regulations section A topic word candidate priority rule for assigning a priority to a topic word candidate, and a topic word candidate priority correction table that describes a set of correction conditions and correction contents for correcting the priority. The storage unit includes: a language data storage unit that stores information about language data input from an input unit; and a topic structure storage unit that stores information about a topic structure. A word information table that stores information on the character strings and parts of speech of each included word, and information on words, prominent noun phrases, and types of prominent noun phrases included in each simple sentence of the language data. A sentence information table, wherein the topic structure storage unit includes a table for storing information including a topic establishment section, a topic word, a topic level, and a topic scope, which are ranges in which topics are presented and established. Topic structure recognition device.

6. The topic structure recognition pre-processing, wherein the dictionary / rule unit divides the language data into simple sentences, extracts salient noun phrases from each simple sentence, and identifies the type of the salient noun phrases. Dictionary for preprocessing, base expansion processing rules for performing processing for base expansion, semantic expansion processing rules for performing processing for semantic expansion, and integration for integrating base expansion and semantic expansion A processing rule, wherein the topic structure storage unit stores a base development information, a semantic development storage unit that stores information about the semantic development, and information after the integration of the base development and the semantic development. , And each of the semantic expansion storage unit and the base expansion storage unit includes a topic establishment section in which a topic is presented and established, a topic word, a topic level, and a topic scope. Comprising a table for storing information including, the topic structure recognition apparatus according to claim 5.

7. The topic structure recognition apparatus according to claim 5, wherein the modification condition of the topic word candidate priority correction table includes a continuous modification relation to the topic word candidate.

8. A set of the correction condition and the correction content of the topic word candidate priority correction table includes a correction condition that the topic word candidate is modified by the union by another simple sentence and a correction content that the priority is increased. The topic structure recognition device according to claim 7, wherein the set includes: