JP3329352B2

JP3329352B2 - Topic level control method and topic structure recognition device in topic structure recognition

Info

Publication number: JP3329352B2
Application number: JP22315194A
Authority: JP
Inventors: 敦竹下; 孝史井上; 珠喜斎藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-09-19
Filing date: 1994-09-19
Publication date: 2002-09-30
Anticipated expiration: 2017-09-30
Also published as: JPH0887501A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語解析における
話題構造認識の方法および装置に関し、特に話題レベル
を決定する話題レベル制御方法とこの話題レベル制御方
法が適用される話題構造認識装置とに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for topic structure recognition in natural language analysis, and more particularly to a topic level control method for determining a topic level and a topic structure recognition apparatus to which the topic level control method is applied. .

【０００２】[0002]

【従来の技術】人間にテキストや対話データを呈示して
「これらテキストないし対話データの中から同じことが
書いてあるブロックと、その『同じこと』を求めよ」と
いう課題を与えると、個人差なく同じ構造を答えるとい
う性質が実験的に確認されている。その実験について
は、例えば『竹下他：「話題構造認識の観点からのヒュ
ーマンコミュニケーションの研究」電子情報通信学会１
９９３年秋季大会D-62(p.6-64)』に記載されている。人
間によって把握されるこのような構造を「話題構造」と
呼ぶ。話題構造は入れ子構造を形成するので、各話題
は、話題を示す「話題語」と、入れ子の深さを表す「話
題レベル」と、テキストないし対話データの中において
その話題がどの文からどの文まで継続するかという「話
題スコープ」によって表現できる。以下において、話題
構造の解析の対象となるテキストや対話データのことを
言語データと呼ぶ。2. Description of the Related Art When a human being is presented with text or conversation data and given the task of "seeking the same thing in a block in which the same is written from these text or conversation data", there is no individual difference. The ability to answer the same structure has been experimentally confirmed. For the experiment, for example, "Takeshita et al .:" Research on Human Communication from the Viewpoint of Topic Structure Recognition "IEICE 1
99-93 Autumn Meeting D-62 (p.6-64) ”. Such a structure grasped by a human is called a “topic structure”. Since the topic structure forms a nested structure, each topic consists of a "topic word" indicating the topic, a "topic level" indicating the depth of the nesting, and a sentence of the sentence in the text or dialog data. Can be expressed by the "topic scope" of whether to continue. In the following, text and conversation data for which the topic structure is analyzed will be referred to as language data.

【０００３】図１は、電気通信政策に関連した内容の言
語データに対する話題構造の一例を示している。言語デ
ータは、第０文から始まって少なくとも第７７０文まで
続いている。そして、「通信サービス」という話題語を
持つ話題の話題レベルは１であり、その話題スコープは
第０文から第７７０文までの範囲である。なお、説明を
簡単にするために、以下においては、『「通信サービ
ス」の話題』のように、話題語を用いてその話題を指す
ことにする。FIG. 1 shows an example of a topic structure for language data having contents related to a telecommunications policy. The language data starts from the 0th sentence and continues to at least the 770th sentence. The topic level of the topic having the topic word “communication service” is 1, and the topic scope is in the range from the 0th sentence to the 770th sentence. For the sake of simplicity, in the following, the topic will be referred to using a topic word, such as "topic of" communication service "".

【０００４】「通信サービス」の話題の中には、話題レ
ベルが２である「新規サービス」と「従来からのサービ
ス」という話題が存在し、「新規サービス」の話題は第
１２５文から第４３１文までの話題スコープを持ち、
「従来からのサービス」の話題は第４３２文から第７７
０文までの話題スコープを持つ。また、「新規サービ
ス」の話題の中には「サービスＡ」という子話題が、
「従来からのサービス」の話題の中には「サービスＢ」
という子話題が存在し、それぞれの話題スコープは第３
０１文から第４３１文までと第５２１文と第７７０文ま
でである。[0004] Among the topics of "communication services", there are topics of "new service" having a topic level of 2 and "conventional service". The topics of "new service" are from the 125th sentence to the 431rd sentence. Has a topic scope up to the sentence,
The topic of "traditional services" is from the 432rd sentence to the 77th sentence
It has a topic scope of up to 0 sentences. Also, among the topics of "new service", a child topic of "service A"
"Service B" in the topic of "traditional services"
Child topic exists, and each topic scope is 3rd
It is from the 01st sentence to the 431st sentence, the 521st sentence and the 770th sentence.

【０００５】このような話題構造を計算機によって認識
することを話題構造認識と呼ぶ。話題構造を認識するた
めの方法は、これまでにもいくつか提案されている。こ
こでは、『竹下：「話題構造認識を用いた映像検索シス
テム」情報処理学会情報メディア研究会94-IM-15-1』で
述べられている話題構造の認識方法について簡単に説明
する。図２はこの認識方法で使用する話題構造認識装置
の一例の構成を示すブロック図であり、図３はこの認識
方法における話題構造認識処理を示すフローチャートで
あり、図４はこの話題構造認識処理における話題構造認
識前処理以降の処理の流れの一例を示す図である。Recognition of such a topic structure by a computer is called topic structure recognition. Several methods have been proposed for recognizing topic structures. Here, a brief description of the topic structure recognition method described in "Takeshita:" Video Retrieval System Using Topic Structure Recognition ", Information Processing Society of Japan 94-IM-15-1". FIG. 2 is a block diagram showing a configuration of an example of a topic structure recognition device used in this recognition method. FIG. 3 is a flowchart showing a topic structure recognition process in this recognition method. It is a figure showing an example of the flow of processing after topic structure recognition pre-processing.

【０００６】図２に示される従来の話題構造認識装置
は、言語データが入力するデータ入力部７０１と、各種
の処理を実行する処理部７０２と、結果を表示する表示
部７０３と、処理結果や処理途中で必要となるデータを
保持する記憶部７０４と、話題構造認識処理で使用され
る辞書や規則類を格納する辞書・規則部７０５によって
構成されている。記憶部７０４には、前処理後の言語デ
ータを記憶する言語データ記憶部７１０と、中間の処理
結果や最終的な処理結果を保持する話題構造記憶部７１
１とが設けられている。さらに話題構造記憶部７１１に
は、基盤展開記憶部７１２と意味的展開記憶部７１３と
統合話題記憶部７１４が設けられている。一方、辞書・
規則部７０５には、前処理用辞書７２１と意味的展開処
理規則７２２と基盤展開処理規則７２３と統合処理規則
７２４とが設けられている。The conventional topic structure recognition apparatus shown in FIG. 2 includes a data input unit 701 for inputting language data, a processing unit 702 for executing various processes, a display unit 703 for displaying results, a processing result and The storage unit 704 stores data required during processing, and the dictionary / rule unit 705 stores dictionaries and rules used in the topic structure recognition processing. The storage unit 704 includes a language data storage unit 710 that stores preprocessed language data, and a topic structure storage unit 71 that holds intermediate processing results and final processing results.
1 is provided. Further, the topic structure storage unit 711 includes a base development storage unit 712, a semantic development storage unit 713, and an integrated topic storage unit 714. On the other hand, a dictionary
The rule unit 705 is provided with a preprocessing dictionary 721, a semantic expansion processing rule 722, a base expansion processing rule 723, and an integration processing rule 724.

【０００７】この話題構造認識装置を用いて話題構造認
識処理を行なう場合、まず、図３に示すように、入力さ
れた言語データ７３０に対する話題構造認識前処理７４
０を行なう。この話題構造認識前処理７４０の第１ステ
ップは、入力した言語データ７３０に対する形態素解析
処理７４１である。形態素解析処理７４１では、入力さ
れた言語データ７３０の文字列を単語ごとに区切って単
語列とし、さらに各単語の品詞や活用語の活用形等を同
定する。続いて、前処理７４０の第２ステップとして、
形態素解析の結果を入力として、単文区切り処理７４２
を行なう。単文区切り処理７４２は、埋め込み文や重文
のように複数の述語を含む文を、１つの述語のみを含む
単文に分割する処理である。前処理７４０の第３ステッ
プとして、顕著名詞句抽出７４３を実行する。顕著名詞
句抽出７４３は、単文区切り処理７４２の結果を入力と
して、各単文において最も強調されている名詞句を抽出
する処理である。そして、前処理７４０の第４ステップ
として、ブロック認識７４４を実行する。ブロック認識
７４４は、テキストでの段落に相当するブロックを認識
する処理である。これら、話題構造認識前処理７４０に
属する各処理は、辞書・規則部７０５内にある前処理用
辞書７２１を用いて、処理部７０２によって実行され、
その結果は、記憶部７０４内の言語データ記憶部７１０
に格納される。When a topic structure recognition process is performed using this topic structure recognition apparatus, first, as shown in FIG. 3, a topic structure recognition preprocessing 74 for input language data 730 is performed.
Perform 0. The first step of the topic structure recognition pre-processing 740 is a morphological analysis process 741 for the input language data 730. In the morphological analysis process 741, the character string of the input language data 730 is divided into words to form a word string, and the part of speech of each word, the inflected form of the inflected word, and the like are identified. Subsequently, as a second step of the preprocessing 740,
Using the result of the morphological analysis as an input, a single sentence separation process 742
Perform The single sentence delimiting process 742 is a process of dividing a sentence including a plurality of predicates, such as an embedded sentence or a multiple sentence, into a single sentence including only one predicate. As a third step of preprocessing 740, salient noun phrase extraction 743 is executed. The prominent noun phrase extraction 743 is a process of extracting the noun phrase that is most emphasized in each simple sentence by using the result of the simple sentence separation process 742 as an input. Then, as a fourth step of the preprocessing 740, block recognition 744 is executed. The block recognition 744 is a process of recognizing a block corresponding to a paragraph in text. These processes belonging to the topic structure recognition pre-processing 740 are executed by the processing unit 702 using the pre-processing dictionary 721 in the dictionary / rule unit 705,
The result is stored in the language data storage unit 710 in the storage unit 704.
Is stored in

【０００８】話題構造認識前処理７４０が完了したら、
話題の展開の処理を基盤展開処理７５０と意味的展開処
理７６０とに分離して実行する。ここで基盤展開とは、
「まず」や「次に」のような手掛かり句や章立て、箇条
書きなどによって明示的に示された話題展開のことであ
り、意味的展開とは、基盤展開の各話題の中で、明示的
ではない形で提示、進行する話題の展開のことである。When the topic structure recognition pre-processing 740 is completed,
The topic development process is separated into a base development process 750 and a semantic development process 760 and executed. Here, infrastructure development
Topic development explicitly indicated by clue phrases such as "first" and "next", chapters, bullet points, etc. It is the development of a topic that is presented and progressed in a non-target form.

【０００９】まず、図３に示されるように、基盤展開処
理７５０において、話題確立区間の決定７５１、話題語
の決定７５２、話題スコープと話題レベルの決定７５３
という３つの処理を順次行なう。ここで話題確立区間と
は、話題が提示、確立される区間のことである。話題語
の決定７５２では、各話題確立区間における顕著名詞句
を話題語候補とし、これら話題語候補の中で優先順位が
最も高いものを選んで話題語とする。話題スコープと話
題レベルの決定７５３では、箇条書き等の構造に基づい
て、処理が行なわれる。基盤展開処理７５０は、辞書・
規則部７０５内の基盤展開処理規則７２３を用いて処理
部７０２で実行され、その結果は記憶部７０４の中の話
題構造記憶部７１１内に含まれる基盤展開記憶部７１２
に格納される。First, as shown in FIG. 3, in the base development process 750, a topic establishment section determination 751, a topic word determination 752, a topic scope and a topic level determination 753 are performed.
Are sequentially performed. Here, the topic establishment section is a section in which a topic is presented and established. In the topic word determination 752, prominent noun phrases in each topic establishment section are set as topic word candidates, and the topic word with the highest priority among these topic word candidates is selected. In the topic scope and topic level determination 753, processing is performed based on a structure such as an itemized list. The base development process 750 is performed by
The processing is performed by the processing unit 702 using the base expansion processing rule 723 in the rule unit 705, and the result is stored in the base expansion storage unit 712 included in the topic structure storage unit 711 in the storage unit 704.
Is stored in

【００１０】このような基盤展開処理７５０における処
理の具体例が図４に示されている。まず、言語データの
開始時点と「まず」とか「次に」といった手掛かり句の
近辺とを基盤展開の話題確立区間として決定している。
そして、話題語の決定７５２では、最初の話題確立区間
からは「通信サービス」が、２番目の話題確立区間から
は「新規サービス」が、３番目の話題確立区間からは
「従来からのサービス」が、それぞれ、話題語として選
ばれている。FIG. 4 shows a specific example of the processing in the base development processing 750. First, the starting point of the language data and the vicinity of a clue phrase such as "first" or "next" are determined as the topic establishment section of the base development.
Then, in the topic word determination 752, “communication service” from the first topic establishment section, “new service” from the second topic establishment section, and “conventional service” from the third topic establishment section. , Respectively, have been selected as topic words.

【００１１】基盤展開処理７５０の実行後、意味的展開
処理７６０が実行される。意味的展開処理７６０は、基
盤展開処理７５０と同様に、話題確立区間の決定７６
１、話題語の決定７６２、話題スコープと話題レベルの
決定７６３という３つの処理によって構成される。この
意味的展開処理７６０は、辞書・規則部７０５内の意味
的展開処理規則７２２を用いるとともに基盤展開処理７
５０の結果も利用して処理部７０２で実行され、その結
果は記憶部７０４の中の話題構造記憶部７１１に含まれ
る意味的展開記憶部７１３に格納される。After the execution of the base development process 750, a semantic development process 760 is performed. The semantic development process 760 is similar to the base development process 750, and is used to determine the topic establishment section 76
1. It is composed of three processes, namely, topic word determination 762, topic scope and topic level determination 763. This semantic expansion processing 760 uses the semantic expansion processing rule 722 in the dictionary / rule unit 705 and the base expansion processing 7
The processing is also executed by the processing unit 702 using the result of 50, and the result is stored in the semantic development storage unit 713 included in the topic structure storage unit 711 in the storage unit 704.

【００１２】図４に示した例では、話題確立区間とし
て、ある程度以上長い段落が選択され、それらにおける
話題語として、「サービスＡ」と「サービスＢ」が選ば
れている。話題スコープとしては、上述した話題確立区
間の開始点から基盤展開における次の話題確立区間の開
始点までが求められている。話題レベルは、テキストの
意味的展開の場合には、全て同じレベルすなわちレベル
１とされる。In the example shown in FIG. 4, paragraphs longer than a certain length are selected as topic establishment sections, and "service A" and "service B" are selected as topic words in these sections. As the topic scope, the range from the start point of the above-described topic establishment section to the start point of the next topic establishment section in the base development is required. The topic levels are all set to the same level, that is, level 1 in the case of the semantic development of the text.

【００１３】最後に、基盤展開と意味的展開の統合処理
７７０が行なわれ、その結果として、言語データ全体の
話題構造７８０が出力される。この統合処理７７０は、
基盤展開処理７５０と意味的展開処理７６０のそれぞれ
の話題構造を入力とし、辞書・規則部７０５内の統合処
理規則７２４を用いて、処理部７０２によって実行され
る。図４に示した例では、統合処理の結果として、図１
に示したのと同様の話題構造７８０が得られている。Finally, an integration process 770 of base development and semantic development is performed, and as a result, a topic structure 780 of the entire language data is output. This integration processing 770
The respective topic structures of the base development process 750 and the semantic development process 760 are input, and the processing is executed by the processing unit 702 using the integrated processing rules 724 in the dictionary / rule unit 705. In the example shown in FIG. 4, as a result of the integration processing, FIG.
The topic structure 780 similar to that shown in FIG.

【００１４】基盤展開と意味的展開のそれぞれにおい
て、話題確立区間や話題語、話題スコープ、話題レベル
を決定するための規則（意味的展開処理規則７２２や基
盤展開処理規則７２３）は、言語データが対話、モノロ
ーグ、書き言葉テキストなどのどの伝達形態によるもの
であるかによって異なる。伝達形態による話題展開様式
や話題構造認識規則の違いと、話題構造認識実験の結果
については、『竹下他：「話題構造認識の観点からのヒ
ューマンコミュニケーションの研究」電子情報通信学会
１９９３年秋季大会D-62(p.6-64)』に記載がある。In each of the basic development and the semantic development, the rules for determining the topic establishment section, topic word, topic scope, and topic level (semantic development processing rule 722 and base development processing rule 723) are based on language data. It depends on the form of communication, such as dialogue, monologue, or written text. For the differences between topic development styles and topic structure recognition rules depending on the transmission form, and the results of topic structure recognition experiments, see "Takeshita et al .:" Study of Human Communication from the Viewpoint of Topic Structure Recognition "IEICE 1993 Fall Meeting D -62 (p.6-64)].

【００１５】[0015]

【発明が解決しようとする課題】しかしながら、上述し
た従来の話題構造認識技術では、話題の入れ子の開始点
を認識するのは容易であるのに対して、話題の入れ子の
終了点を認識することは困難である。このため、話題構
造認識結果における話題レベルは言語データの後ろにな
ればなるほど深くなってしまうという傾向があり、この
話題構造認識結果を例えば人間にとっての目次として利
用しようとしたときに、人間にとって分かりにくいもの
になってしまったり、人間の認識する話題レベルとは大
きく異なった話題レベルとして扱われてしまったりする
という問題がある。特に、言語データが長かったり、話
題が多岐にわたる場合にこのような傾向が顕著である。However, in the above-mentioned conventional topic structure recognition technology, it is easy to recognize the starting point of the nesting of topics, but it is difficult to recognize the ending point of the nesting of topics. It is difficult. For this reason, the topic level in the topic structure recognition result tends to be deeper behind the linguistic data, and when trying to use this topic structure recognition result as a table of contents for humans, for example, There is a problem that it becomes difficult, or the topic level is significantly different from the topic level recognized by humans. This tendency is remarkable especially when the language data is long or the topics are various.

【００１６】例として、議会における会議録（議事録）
を取り上げる。図５は、会議録の例（代表質問）であっ
て、多岐にわたる話題を多数含む言語データの代表的な
ものである。図５に示した言語データに対し、上述した
従来の方法で話題構造を認識し、その結果を目次形態で
出力したものが図６に示されている。なおここに示した
ものでは、通常の目次とは異なり、章タイトルのみなら
ず、言語データ中でその章タイトルが出現した箇所の前
後の文字列も併記してある。図６に示した例でも、目次
の後半ほど章立ての入れ子が深くなっており、最後の
「5.11.1.11.1.2.2.3.3.10 ＰＫＯの法案」では、章立
ての入れ子は１０重にもなっている。これだけ話題レベ
ルが深くなってしまうと、章立ての関係を理解すること
が人間にとって困難となり、目次としての機能が薄れて
しまう。As an example, proceedings (minutes) in a parliament
Take up. FIG. 5 shows an example of a meeting record (representative question), which is a representative example of language data including many diverse topics. FIG. 6 shows a result of recognizing a topic structure in the language data shown in FIG. 5 by the above-described conventional method and outputting the result in the form of a table of contents. In addition, unlike the ordinary table of contents, not only the chapter title but also the character strings before and after the place where the chapter title appears in the language data are described. In the example shown in Fig. 6, the nesting of chapters is deeper in the latter half of the table of contents, and in the last "5.11.1.11.1.2.2.3.3.10 PKO Bill", the nesting of chapters is tenfold. ing. If the topic level becomes so deep, it becomes difficult for humans to understand the relationship between chapters, and the function as a table of contents is weakened.

【００１７】本発明の目的は、話題構造認識結果におけ
る話題レベルが必要以上に深くなることが抑制され、人
間にとって分かりやすい話題構造を出力できる話題レベ
ル制御方法と話題構造認識装置とを提供することにあ
る。An object of the present invention is to provide a topic level control method and a topic structure recognizing device capable of suppressing a topic level in a topic structure recognition result from becoming unnecessarily deep and outputting a topic structure that is easy for humans to understand. It is in.

【００１８】[0018]

【課題を解決するための手段】本発明の話題レベル制御
方法は、予め準備された規則を用いて言語データの話題
構造を認識する話題構造認識装置における話題レベルの
決定方法において、新しい話題レベルを生成する因子、
話題レベルを継続する因子、話題レベルを終了させる因
子の３種類の話題レベル変更因子に対し、前回現れた話
題レベル変更因子と今回現れた話題レベル変更因子のマ
トリックスにより話題レベルの増減値を記した話題レベ
ル増減テーブルを規則として予め記憶させておき、話題
レベルの値を絶対的な値に変更するための条件と、変更
後の絶対的な話題レベルの値との対応を記した話題レベ
ル設定テーブルを規則として予め記憶させておき、前記
話題構造認識装置が、前記話題レベル増減テーブルおよ
び前記話題レベル設定テーブルを用いて話題レベルを決
定する。Topic level control method of the present invention, in order to solve the problem] is the topic level method of determining the in recognizing the topic structure recognition apparatus topic structure of the language data using the pre-prepared rule, a new topic level Factors to generate,
Factors that continue topic level, factors that end topic level
The story that appeared last time for three kinds of topic level change factors of the child
Topic level change factors and topic level change factors
The topic level increase / decrease table that describes the increase / decrease value of the topic level by the tricks is stored in advance as a rule.
Conditions for changing level values to absolute values and changes
After the topic level setting table describing the correspondence between the value of the absolute topic level allowed to pre-stored as a rule, the
A topic structure recognition device determines a topic level using the topic level increase / decrease table and the topic level setting table.

【００１９】本発明の話題構造認識装置は、言語データ
を入力するための入力部と、話題構造認識のための規則
類を蓄える辞書・規則部と、該辞書・規則部の規則類を
用いた処理を行なう処理部と、前記処理部による結果を
蓄える記憶部と、前記処理部による処理結果を表示する
表示部とを有し、手掛かり句が入れ子開始型と話題転換
型と入れ子終了型の３つのタイプに分類され、前記辞書
・規則部が、新しい話題レベルを生成する因子、話題レ
ベルを継続する因子、話題レベルを終了させる因子の３
種類の話題レベル変更因子に対し、前回現れた話題レベ
ル変更因子と今回現れた話題レベル変更因子のマトリッ
クスにより話題レベルの増減値を記した話題レベル増減
テーブルと、話題レベルの値を絶対的な値に変更するた
めの条件と、変更後の絶対的な話題レベルの値との対応
を記した話題レベル設定テーブルとを含み、前記記憶部
が、前記入力部から入力された言語データに関する情報
を蓄える言語データ記憶部と、話題構造に関する情報を
蓄える話題構造記憶部とを含み、前記言語データ記憶部
が、前記言語データに含まれる各単語の文字列と品詞に
関する情報を格納する単語情報テーブルと、前記言語デ
ータの各単文単位に該単文に含まれる単語を格納する単
文情報テーブルとを含み、話題構造記憶部が、話題が提
示、確立される範囲である話題確立区間と話題語と話題
レベルと話題スコープとを含む情報を格納するテーブル
を含む。The topic structure recognition apparatus of the present invention were used: an input unit for inputting language data, and the dictionary-rule section for storing rules such for topic structure recognition, rules such the dictionary specification and regulations section A processing unit for performing processing, a storage unit for storing the results of the processing units, and a display unit for displaying the processing results of the processing units, wherein the clue phrases are nested start type, topic conversion type, and nested end type. It is classified as One of the type, factors before Symbol dictionary and regulations section, to generate a new topic level, the topic Les
The factor of continuing the bell, the factor of ending the topic level 3
For the topic level change factors, the topic level
And the topic level change factor that appeared this time.
The topic level increase / decrease table that describes the increase / decrease value of the topic level according to the
Between the condition for the change and the absolute topic level value after the change
The topic level setting table, the storage unit includes a language data storage unit that stores information about the language data input from the input unit, and a topic structure storage unit that stores information about the topic structure, language data storage unit, simple sentence information table for storing the word information table for storing information about the strings and part of speech of each word included in the language data, a single word included in the single sentence in each sentence unit of the language data The topic structure storage unit includes a table for storing information including a topic establishment section, a topic word, a topic level, and a topic scope, which are ranges in which topics are presented and established.

【００２０】[0020]

【作用】本発明は、話題レベルの増減を記述した話題レ
ベル増減テーブルと、条件と新しい話題レベル値の対を
記述した話題レベル設定テーブルを用いることにより、
認識結果として得られる話題構造の話題レベルが深くな
らないようにすることを可能とする。これにより、例え
ば話題構造を目次として用いた場合に、ユーザは言語デ
ータにおける話の流れをより容易に理解できるようにな
る。The present invention uses a topic level increase / decrease table describing the increase / decrease in topic level and a topic level setting table describing a pair of a condition and a new topic level value.
It is possible to prevent the topic level of the topic structure obtained as a result of recognition from becoming deep. Thus, for example, when the topic structure is used as the table of contents, the user can more easily understand the flow of the story in the language data.

【００２１】[0021]

【実施例】以下、本発明の実施例について、図面を参照
して説明する。図７は、本発明の一実施例の話題構造認
識装置の構成を示すブロック図である。この話題構造認
識装置は、図２に示す従来の話題構造認識装置と比べ、
特に、辞書・規則部１０５に含まれるテーブル類の構成
において異なっている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 7 is a block diagram showing the configuration of the topic structure recognition device according to one embodiment of the present invention. This topic structure recognition device is different from the conventional topic structure recognition device shown in FIG.
In particular, they differ in the configuration of tables included in the dictionary / rule unit 105.

【００２２】［本実施例の話題構造装置の構成］本実施例の話題構造認識装置には、言語データが入力す
るデータ入力部１０１と、各種の処理を実行する処理部
１０２と、結果を表示する表示部１０３と、処理結果や
処理途中で必要となるデータを保持する記憶部１０４
と、話題構造認識処理で使用される辞書や規則類を格納
する辞書・規則部１０５によって構成されている。記憶
部１０４には、前処理後の言語データを記憶する言語デ
ータ記憶部１１０と、中間の処理結果や最終的な処理結
果を保持する話題構造記憶部１１１とが設けられてい
る。言語データ記憶部１１０には、単文情報テーブル１
１５と単語情報テーブル１１６と発話番情報テーブル１
１７が設けられており、話題構造記憶部１１１には、基
盤展開記憶部１１２と意味的展開記憶部１１３と統合話
題記憶部１１４が設けられている。一方、辞書・規則部
１０５には、前処理用辞書１２１と意味的展開処理規則
１２２と基盤展開処理規則１２３と統合処理規則１２４
と話題レベル増減テーブル１２５と話題レベル設定テー
ブル１２６と入れ子開始型手掛かり句テーブル１２７と
話題転換型手掛かり句テーブル１２８と入れ子終了型手
掛かり句テーブル１２９とが設けられている。[Configuration of Topic Structure Recognition Apparatus of the Embodiment] The topic structure recognition apparatus of the embodiment includes a data input unit 101 for inputting linguistic data, a processing unit 102 for executing various processes, and displaying a result. Display unit 103, and a storage unit 104 for holding processing results and data required during processing.
And a dictionary / rule unit 105 for storing dictionaries and rules used in the topic structure recognition processing. The storage unit 104 includes a language data storage unit 110 that stores language data after preprocessing, and a topic structure storage unit 111 that holds intermediate processing results and final processing results. The linguistic data storage unit 110 has a simple sentence information table 1
15, word information table 116, and utterance number information table 1
17 is provided, on the topic structure storage unit 111, based on
A board development storage unit 112, a semantic development storage unit 113, and an integrated topic storage unit 114 are provided. On the other hand, the dictionary / rule unit 105 includes a preprocessing dictionary 121, a semantic expansion processing rule 122, a base expansion processing rule 123, and an integrated processing rule 124.
A topic level increase / decrease table 125, a topic level setting table 126, a nested start type clue phrase table 127, a topic conversion type clue phrase table 128, and a nested end type clue phrase table 129.

【００２３】この話題構造認識装置を用いて言語データ
の話題構造の解析を行なう場合、その処理は図３に示し
た従来の処理の流れと同様に処理が行なわれるが、基盤
展開処理における話題レベルの決定方法において相違す
る。この話題構造認識装置を使用する場合には、話題レ
ベル増減テーブル１２５と話題レベル設定テーブル１２
６を用いて話題レベルが決定される。これらのテーブル
１２５,１２６の詳細については後述するが、話題レベ
ル増減テーブル１２５はある条件が成立した場合に話題
レベルを相対的にどれだけ変化させるかを記述したもの
であり、話題レベル設定テーブルは特定の条件が成立し
た場合に話題レベルを強制的にどの値に変更するか（絶
対的変化）を記述したものである。そして、これらテー
ブル１２５,１２６に記載されている条件を判断するた
めに、各手掛かり句テーブル１２７〜１２９が使用され
る。以下、話題レベルの決定方法の中心にして、本実施
例の話題構造認識装置による話題構造の解析手順を説明
する。When the topic structure of language data is analyzed using this topic structure recognition apparatus, the processing is performed in the same manner as the conventional processing flow shown in FIG. Is different in the determination method. When this topic structure recognition device is used, the topic level increase / decrease table 125 and the topic level setting table 12
6 is used to determine the topic level. Although the details of these tables 125 and 126 will be described later, the topic level increase / decrease table 125 describes how much the topic level is relatively changed when a certain condition is satisfied. It describes to which value the topic level is forcibly changed when a specific condition is satisfied (absolute change). Then, in order to determine the conditions described in these tables 125 and 126, the clue phrase tables 127 to 129 are used. Hereinafter, a procedure of analyzing a topic structure by the topic structure recognition apparatus of the present embodiment will be described, centering on the topic level determination method.

【００２４】［基盤展開における話題レベルの決定の全
体の流れ］図８のフローチャートは、基盤展開における
話題レベル決定の処理手順を示している。まず、基盤展
開の話題確立区間について、話題レベル増減テーブル１
２５にしたがって、話題レベルを更新する（ステップ２
０１）。そして、話題レベル設定テーブル１２６の条件
が満足しているかが判断され（ステップ２０２）、成立
していない場合にはそのまま話題レベル決定の処理を終
了し、成立している場合には、話題レベル設定テーブル
１２６にしたがって話題レベルを更新して（ステップ２
０３）、話題レベル決定処理を終了する。[Overall Flow of Determining Topic Level in Base Development] The flowchart of FIG. 8 shows a processing procedure for determining a topic level in base development. First, the topic level increase / decrease table 1 for the topic establishment section of the base development
25, the topic level is updated (step 2
01). Then, it is determined whether the condition of the topic level setting table 126 is satisfied (step 202). If the condition is not satisfied, the process of determining the topic level is terminated as it is. The topic level is updated according to the table 126 (step 2
03), the topic level determination processing ends.

【００２５】［各テーブルの構成］図９は、話題レベル
増減テーブル１２５の構成例を示す図である。本実施例
では、直前の手掛かり句から現在の手掛かり句への手掛
かり句タイプの遷移パターンによって話題レベルを増減
させており、話題レベル増減テーブル１２５はこの遷移
パターンと話題レベルの増減量との関係を記述してい
る。本実施例では、「まず」や「次に」などの手掛かり
句を「入れ子開始型」、「話題転換型」、「入れ子終了
型」の３つのタイプに分類している。各タイプに属する
手掛かり句の例が表１に示されている。[Configuration of Each Table] FIG. 9 is a diagram showing a configuration example of the topic level increase / decrease table 125. In the present embodiment, the topic level is increased or decreased by the transition pattern of the clue phrase type from the immediately preceding clue phrase to the current clue phrase, and the topic level increase / decrease table 125 indicates the relationship between this transition pattern and the increase / decrease amount of the topic level. Has been described. In this embodiment, clue phrases such as "first" and "next" are classified into three types: "nested start type", "topic conversion type", and "nested end type". Examples of clue phrases belonging to each type are shown in Table 1.

【００２６】[0026]

【表１】図９に示されるように、現在の手掛かり句が入れ子開始
型であれば、直前の手掛かり句のタイプによらず、話題
レベルを１だけ増加させる。また、現在の手掛かり句が
話題転換型か入れ子終了型で、かつ直前の手掛かり句が
入れ子開始型か話題転換型のいずれかであれば、話題レ
ベルは変わらないものとする。また、現在の手掛かり句
が話題転換型か入れ子終了型で、かつ直前の手掛かり句
が入れ子終了型であれば、話題レベルを１減少させる。[Table 1] As shown in FIG. 9, if the current clue phrase is the nesting start type, the topic level is increased by 1 regardless of the type of the immediately preceding clue phrase. Also, if the current clue phrase is the topic conversion type or the nesting end type, and the immediately preceding clue phrase is the nesting start type or the topic conversion type, the topic level does not change. If the current clue phrase is the topic conversion type or the nested type and the immediately preceding clue phrase is the nested type, the topic level is decreased by one.

【００２７】図１０(a)〜(c)は、それぞれ、入れ子開始
型手掛かり句テーブル１２７、話題転換型手掛かり句テ
ーブル１２８および入れ子終了型手掛かり句テーブル１
２９の構成を示す図であり、図中の値は、図５に示した
言語データ例に対する処理過程での値である。表１に示
した３種類の手掛かり句を記憶するために、辞書・規則
部１０５にはこれら手掛かり句テーブル１２７〜１２９
が含まれている。各手掛かり句テーブル１２７〜１２９
は、手掛かり句番号のフィールドと、手掛かり句の単語
を記述するためのフィールドと、各単語の品詞を記述す
るためのフィールドとが設けられている。例えば、入れ
子開始型手掛かり句テーブル１２７において、手掛かり
句番号０の手掛かり句は、「まず」というｌつの単語か
ら構成され、その品詞は副詞である。FIGS. 10A to 10C show the nested start type clue table 127, the topic conversion type clue table 128, and the nested end type clue table 1 respectively.
FIG. 29 is a diagram showing a configuration of the language data shown in FIG. 29, and the values in the figure are values in the process of processing the example language data shown in FIG. In order to store the three types of clue phrases shown in Table 1, the dictionary / rule unit 105 stores these clue phrase tables 127 to 129.
It is included. Each clue phrase table 127-129
Is provided with a field for a clue phrase number, a field for describing a word of the clue phrase, and a field for describing the part of speech of each word. For example, in the nesting start type clue phrase table 127, the clue phrase with the clue phrase number 0 is composed of one word "first", and its part of speech is an adverb.

【００２８】一方、上述したように、言語データ記憶部
１１０には単文情報テーブル１１５と単語情報テーブル
１１６が設けられている。これらのテーブル１１５,１
１６の構成例が、図１１(a),(b)にそれぞれ示されてい
る。単文情報テーブル１１６は、述語を１つだけ持つ単
位である単文に関する情報を記述するためのものであっ
て、各単文番号のフィールドと、各単文に含まれる手掛
かり句をその手掛かり句番号で記述するフィールドと、
その手掛かり句のタイプを記述するフィールドと、その
単文が属する発話番（その単文が何番目の話者によるも
のかを示す）を表わすフィールドと、その単文の開始お
よび終了の単語番号を表わすフィールド（単語範囲フィ
ールド）とによって構成されている。単文番号は、その
単文が言語データ中の何番目の単文であるかを０から始
まる連続番号で示したものである。手掛かり句と手掛か
り句タイプのフィールドにおける値"−１"は、手掛かり
句が存在しないことを示している。単語情報テーブル１
１６には、言語データ中でその単語が何番目の単語であ
るかを示す単語番号フィールドと、その単語を記述する
ためのフィールドと、その単語の品詞等の情報を記述す
るためのフィールドとが設定されている。上述したよう
に、話題構造認識前処理（図３参照）において形態素解
析と単文区切り処理とが行なわれており、これらの処理
結果が単文情報テーブル１１５と単語情報テーブル１１
６に格納されることになる。なお、言語データ記憶部１
１０には、発話番情報テーブル１１７も設けられている
が、この発話番情報テーブル１１７は、話者の発話の順
序を示す発話番番号フィールドと、話者名が記述される
話者フィールドと、その話者の単文がどの単文番号の単
文から始まるかを示す開始単文フィールドとによって、
構成されている。On the other hand, as described above, the language data storage unit 110 is provided with the simple sentence information table 115 and the word information table 116. These tables 115, 1
16 (a) and (b) are shown in FIGS. The simple sentence information table 116 is for describing information about a simple sentence which is a unit having only one predicate, and describes a field of each simple sentence number and a clue phrase included in each simple sentence by the clue phrase number. Fields and
A field describing the type of the clue phrase, a field indicating the utterance number to which the simple sentence belongs (indicating the number of the speaker of the simple sentence), and a field indicating the start and end word numbers of the simple sentence ( Word range field). The simple sentence number indicates the number of the simple sentence in the language data by a sequential number starting from 0. The value "-1" in the clue phrase and clue phrase type fields indicates that there is no clue phrase. Word information table 1
16 includes a word number field indicating the number of the word in the language data, a field for describing the word, and a field for describing information such as the part of speech of the word. Is set. As described above, the morphological analysis and the simple sentence delimiting process are performed in the topic structure recognition pre-processing (see FIG. 3), and the processing results are stored in the simple sentence information table 115 and the word information table 11.
6 will be stored. The language data storage unit 1
The utterance number information table 117 is also provided in the utterance number table 10. The utterance number information table 117 includes an utterance number field indicating the order of utterance of the speaker, a speaker field in which the speaker name is described, The starting simple sentence field indicating which simple sentence number the simple sentence of the speaker starts with,
It is configured.

【００２９】各単文での手掛かり句の検出は、単文情報
テーブル１１５の単語範囲の値を取り出して、単語情報
テーブル１１６上でのこの範囲内に、３種類の各手掛か
り句テーブル１２７〜１２９に含まれる項目が存在する
かどうかを調べることにより行なわれる。例えば、単文
番号０の単語範囲は単語番号０から１２であるから、単
語情報テーブル１１６の単語番号０から１２の範囲内
に、手掛かり句テーブル１２７〜１２９に含まれる項目
が含まれているかどうかを調べる。単文番号０の場合、
手掛かり句は含まれていないので、単文番号０に係る手
掛かり句と手掛かり句タイプの各フィールド値は、共に
−１としている。もし、手掛かり句が検出されれば、そ
の手掛かり句番号を手掛かり句フィールドに記録し、そ
のタイプを手掛かり句タイプフィールドに記述する。For the detection of the clue phrase in each simple sentence, the value of the word range of the simple sentence information table 115 is taken out, and within this range on the word information table 116, the three types of clue phrase tables 127 to 129 are included. This is done by examining whether the item to be found exists. For example, since the word range of simple sentence number 0 is word numbers 0 to 12, it is determined whether items included in clue phrase tables 127 to 129 are included in the range of word numbers 0 to 12 of word information table 116. Find out. For simple sentence number 0,
Since the clue phrase is not included, each of the field values of the clue phrase and the clue phrase type associated with the simple sentence number 0 is set to −1. If a clue phrase is detected, the clue phrase number is recorded in the clue phrase field, and its type is described in the clue phrase type field.

【００３０】図１２は、話題レベル設定テーブル１２６
の構成例を示している。話題レベル設定テーブル１２６
は、条件フィールドと新しい話題レベルを記述するフィ
ールドとによって構成されている。条件フィールドに記
されているいずれかの条件が満たされた場合には、その
満足した条件に対応する値（満足した条件の右側の欄に
記載された値）に話題レベルが設定される。上述の話題
レベル増減テーブル１２５による話題レベルの遷移はそ
れまでの話題レベルを基準にした増減であるのに対し
て、この話題レベル設定テーブル１２６による話題レベ
ルの更新では、それまでの話題レベル値とは無関係に話
題レベルの絶対値が指定される。図１２に示した例で
は、話題レベルが５を越えた場合には話題レベルが１に
戻される。また、話者交替が起こり、かつ新しい発話番
が３単文以上から構成されており、かつ今回の話者交替
より前に５単文以上存在する場合に、話題レベルが１に
戻される。また、もし２つ以上の条件が同時に満たされ
た場合には、例えば、話題レベル設定テーブル１２６の
中で上位に記述されているものを採用するようにすれば
よい。FIG. 12 shows a topic level setting table 126.
Is shown. Topic level setting table 126
Is composed of a condition field and a field describing a new topic level. When any of the conditions described in the condition field is satisfied, the topic level is set to a value corresponding to the satisfied condition (the value described in the right column of the satisfied condition). The transition of the topic level according to the topic level increase / decrease table 125 described above is an increase / decrease based on the previous topic level, whereas the topic level setting table 126 updates the topic level, and Irrespective of the topic level, the absolute value of the topic level is specified. In the example shown in FIG. 12, when the topic level exceeds 5, the topic level is returned to 1. If the speaker change occurs, the new utterance number is composed of three or more single sentences, and if there are five or more single sentences before the current speaker change, the topic level is returned to 1. If two or more conditions are satisfied at the same time, for example, the one described higher in the topic level setting table 126 may be adopted.

【００３１】次に、基盤展開記憶部１１２、単文情報テ
ーブル１１５および発話番情報テーブル１１７のそれぞ
れと話題レベル設定テーブル１２６に記述された条件と
の関係について、図１３を用いて説明する。図中の値
は、図５に示した言語データ例に対する処理過程での値
である。Next, the relationship between the base development storage unit 112, the single sentence information table 115, and the utterance number information table 117 and the conditions described in the topic level setting table 126 will be described with reference to FIG. The values in the figure are values in the process of processing the example language data shown in FIG.

【００３２】基盤展開記憶部１１２には、話題番号ごと
に、話題が提示・確立される話題確立区間の開始と終了
をそれぞれ示す単文番号を記述するフィールド（話題確
立区間フィールド）と、話題語を記述するフィールド
と、話題レベルを記述するフィールドと、話題スコープ
をその開始および終了の単語番号で記述するフィールド
とが含まれる。上述したように発話番情報テーブル１１
７には、発話番番号フィールドと、話者を記述するフィ
ールドと、発話番が開始する単文番号を記述するフィー
ルドとが含まれ、図示した例では、発話番０の話者は
「議長」であり、単文０から開始し、終了は次の発話番
１の開始単文の１つ前である単文１となっている。The base development storage unit 112 stores, for each topic number, a field describing a single sentence number indicating the start and end of a topic establishment section in which a topic is presented / established (topic establishment section field), and a topic word. A field for describing, a field for describing a topic level, and a field for describing a topic scope by its start and end word numbers are included. As described above, the utterance number information table 11
7 includes a speech number field, a field describing the speaker, and a field describing the simple sentence number at which the speech number starts. In the illustrated example, the speaker with the speech number 0 is “chairperson”. Yes, it starts with a single sentence 0, and ends with a single sentence 1 that is immediately before the start single sentence of the next utterance number 1.

【００３３】図１２に示した話題レベル設定テーブル１
２６の条件にある「話題レベル＞５」が満たされている
かどうかは、基盤展開記憶部１１２の話題レベルのフィ
ールドの値を調べることにより判定することが可能であ
る。また、話題レベル設定テーブル１２６の条件にある
「話者交替が発生したとき」が満たされているかどうか
は、基盤展開記憶部１１２に記録されている話題確立区
間の開始単文について、単文情報テーブル１１５の発話
番フィールドを調べることにより判定できる。また、話
題レベル設定テーブル１２６の条件にある「新しい発話
番に３単文以上含まれている」と「言語データ開始時か
ら話者交替までに５単文以上含まれている」が満たされ
ているかどうかは、発話番情報テーブル１１７の開始単
文フィールドの値から判定できる。Topic level setting table 1 shown in FIG.
Whether “topic level> 5” in the condition 26 is satisfied can be determined by examining the value of the topic level field in the base development storage unit 112. Whether or not “when a speaker change occurs” as a condition of the topic level setting table 126 is satisfied, the single sentence information table 115 is used for the start single sentence of the topic establishment section recorded in the base development storage unit 112. Can be determined by examining the utterance number field of. Whether or not the condition of the topic level setting table 126 satisfies “3 or more simple sentences are included in the new utterance number” and “5 or more simple sentences are included from the start of language data to the change of speaker”. Can be determined from the value of the start simple sentence field of the utterance number information table 117.

【００３４】［言語データ例を用いた説明］次に、図１
３に示された例を用いて、本実施例における基盤展開で
の話題レベル制御方法を詳細に説明する。以下の説明に
おいて、単文番号がＮの単文のことを簡単のために単文
＃Ｎと記載することにする。[Explanation Using Example of Language Data] Next, FIG.
The topic level control method in the base development in this embodiment will be described in detail using the example shown in FIG. In the following description, a simple sentence having a simple sentence number N will be referred to as a simple sentence #N for simplicity.

【００３５】話題番号４５の話題レベルは、既に基盤展
開記憶部１１２に記憶されており、その値は４である。
また、話題番号４５の話題確立区間は単文＃７２０から
開始する。単文情報テーブル１１５によると、この単文
＃７２０は発話番４に属する。The topic level of the topic number 45 is already stored in the base development storage unit 112, and its value is 4.
The topic establishment section of the topic number 45 starts from a simple sentence # 720. According to the single sentence information table 115, the single sentence # 720 belongs to the utterance number 4.

【００３６】話題番号４６の話題レベルはまだ決定され
ていないので（図示空欄になっている）、本発明にした
がって決定する。まず、話題レベル増減テーブル１２５
にしたがって、話題レベルを更新する。話題番号４６の
話題確立区間は単文＃８２２から＃８２６までであり、
単文情報テーブル１１５のその範囲を見ると、単文＃８
２２に入れ子開始型の手掛かり句が存在する。図９に示
されるようにこの話題レベル増減テーブル１２５によれ
ば、今回の手掛かり句が入れ子開始型であれば前回の手
掛かり句が何であっても話題レベルは＋１されるので、
話題レベルは５となる。Since the topic level of the topic number 46 has not yet been determined (blank in the figure), it is determined according to the present invention. First, the topic level increase / decrease table 125
, The topic level is updated. The topic establishment section of topic number 46 is from simple sentences # 822 to # 826,
Looking at the range of the simple sentence information table 115, the simple sentence # 8
22 has a nested start clue phrase. As shown in FIG. 9, according to the topic level increase / decrease table 125, if the current clue phrase is the nesting start type, the topic level is incremented by +1 regardless of the previous clue phrase.
The topic level is 5.

【００３７】次に、話題レベル設定テーブル１２６の条
件が満たされているかどうかを調べる。「話題レベル＞
５」という条件は、話題レベルが５であるので、満たさ
れていない。話題番号４６の話題確立区間は単文＃８２
２から開始するが、単文情報テーブル１１５によると、
この単文は発話番６に属する。前述したように、話題番
号４５は発話番４に属するので、話者交替が起きている
ことがわかる。また、発話番情報テーブル１１７による
と、発話番号６は単文＃７８５から開始しており、次の
発話番号７の開始単文＃１０４７の１つ前の単文＃１０
４６まで続くので、「新しい発話番に３単文以上含まれ
ている」という条件も満たされる。また、発話番号６は
単文＃７８５から開始であるので、「言語データ開始時
から話者交替までに５単文以上含まれている」という条
件も満たされる。これらにより、図１１に示した話題レ
ベル設定テーブル１２６の２番目の条件が満たされてい
ることが分かるので、最終的に話題番号４６の話題レベ
ルは１となる。Next, it is checked whether or not the condition of the topic level setting table 126 is satisfied. "Topic level>
The condition "5" is not satisfied because the topic level is 5. The topic establishment section of topic number 46 is simple sentence # 82
2, but according to the simple sentence information table 115,
This single sentence belongs to utterance number 6. As described above, since the topic number 45 belongs to the utterance number 4, it can be seen that the speaker change has occurred. According to the utterance number information table 117, the utterance number 6 starts from the single sentence # 785, and the single sentence # 10 immediately before the start single sentence # 1047 of the next utterance number 7
Since it continues up to 46, the condition that “3 or more simple sentences are included in the new utterance number” is also satisfied. In addition, since the utterance number 6 starts from the simple sentence # 785, the condition that “5 or more simple sentences are included from the start of the language data to the change of the speaker” is also satisfied. From these, it can be seen that the second condition of the topic level setting table 126 shown in FIG. 11 is satisfied, so that the topic level of the topic number 46 finally becomes 1.

【００３８】本実施例で示した話題レベル制御方法を用
いて図５に例示した言語データに対して話題構造認識を
行なった結果が、図１４に示されている。図１４に示す
目次では、章立ての入れ子は最高でも４重であり、さら
に妥当な箇所で入れ子レベルが戻されているので、従来
の方法による図６の目次と比較すると、人間にとって非
常に分かりやすいものとなっている。FIG. 14 shows a result obtained by performing topic structure recognition on the language data illustrated in FIG. 5 using the topic level control method described in this embodiment. In the table of contents shown in FIG. 14, the chapter nesting is at most four folds, and the nesting level is returned at a more appropriate point. Therefore, compared with the table of contents of FIG. It is easy.

【００３９】[0039]

【発明の効果】以上説明したように本発明は、話題レベ
ル増減テーブルと話題レベル設定テーブルとを用いるこ
とにより、話題構造認識結果における話題レベルが深く
なるのを抑制することが可能となるという効果がある。
会議録や講義録のように長い言語データに対して非常に
有効であり、その中でも特に会議録のように１つの発話
番にたくさんの文が含まれているような言語データに有
効である。本発明により、認識された語題構造はユーザ
にとって理解しやすいものとなる。As described above, according to the present invention, by using the topic level increase / decrease table and the topic level setting table, it is possible to prevent the topic level in the topic structure recognition result from becoming deeper. There is.
This is very effective for long linguistic data such as conference proceedings and lecture records, and is particularly effective for linguistic data such as conference proceedings where one utterance number contains many sentences. According to the present invention, the recognized word structure is easy for the user to understand.

[Brief description of the drawings]

【図１】人間による話題構造認識の例である。FIG. 1 is an example of topic structure recognition by a human.

【図２】従来の話題構造認識装置の一例の構造を示すブ
ロック図である。FIG. 2 is a block diagram showing the structure of an example of a conventional topic structure recognition device.

【図３】従来の話題構造認識のための処理を示すフロー
チャートである。FIG. 3 is a flowchart showing a conventional process for topic structure recognition.

【図４】従来の話題構造認識における前処理以降の例で
ある。FIG. 4 is an example after the pre-processing in the conventional topic structure recognition.

【図５】言語データの一例を示す図である。FIG. 5 is a diagram showing an example of language data.

【図６】図５に示す言語データに対して従来の話題構造
認識方法を適用して話題構造を抽出した結果を示す図で
ある。6 is a diagram showing a result of extracting a topic structure by applying a conventional topic structure recognition method to the language data shown in FIG. 5;

【図７】本発明の一実施例の話題構造認識装置の構造を
示すブロック図である。FIG. 7 is a block diagram showing the structure of a topic structure recognition device according to one embodiment of the present invention.

【図８】図７の装置による基盤展開での話題レベル決定
処理を示すフローチャートである。8 is a flowchart showing a topic level determination process in a base development by the apparatus of FIG. 7;

【図９】話題レベル増減テーブルの構成例を示す図であ
る。FIG. 9 is a diagram illustrating a configuration example of a topic level increase / decrease table.

【図１０】(a)〜(c)はそれぞれ、入れ子開始型手掛かり
句テーブル、話題転換型手掛かり句テーブルおよび入れ
子終了型手掛かり句テーブルの構成例を示す図である。FIGS. 10A to 10C are diagrams illustrating configuration examples of a nested start type clue phrase table, a topic conversion type clue phrase table, and a nested end type clue phrase table, respectively.

【図１１】(a),(b)はそれぞれ単文情報テーブルおよび
単語情報テーブルの構成例を示す図である。FIGS. 11 (a) and 11 (b) are diagrams each showing a configuration example of a single sentence information table and a word information table.

【図１２】話題レベル設定テーブルの構成例を示す図で
ある。FIG. 12 is a diagram illustrating a configuration example of a topic level setting table.

【図１３】基盤展開記憶部、単文情報テーブルおよび発
話番情報テーブルのそれぞれと話題レベル設定テーブル
に記述された条件との関係を説明する図である。FIG. 13 is a diagram for explaining the relationship between each of a base development storage unit, a simple sentence information table, and an utterance number information table and conditions described in a topic level setting table.

【図１４】図７の装置を用い本発明の方法を適用して行
なった話題構造認識結果の例を示す図である。14 is a diagram showing an example of a topic structure recognition result performed by using the apparatus of FIG. 7 and applying the method of the present invention.

[Explanation of symbols]

１０１データ入力部１０２処理部１０３表示部１０４記憶部１０５辞書・規則部１１０言語データ記憶部１１１話題構造記憶部１１２基盤展開記憶部１１３意味的展開記憶部１１４統合話題記憶部１１５単文情報テーブル１１６単語情報テーブル１１７発話番情報テーブル１２１前処理用辞書１２２意味的展開処理規則１２３基盤展開処理規則１２４統合処理規則１２５話題レベル増減テーブル１２６話題レベル設定テーブル１２７入れ子開始型手掛かり句テーブル１２８話題転換型手掛かり句テーブル１２９入れ子終了型手掛かり句テーブル２０１〜２０３ステップ Reference Signs List 101 data input unit 102 processing unit 103 display unit 104 storage unit 105 dictionary / rule unit 110 language data storage unit 111 topic structure storage unit 112 base development storage unit 113 semantic development storage unit 114 integrated topic storage unit 115 single sentence information table 116 words Information table 117 Speech number information table 121 Preprocessing dictionary 122 Semantic expansion processing rule 123 Base expansion processing rule 124 Integration processing rule 125 Topic level increase / decrease table 126 Topic level setting table 127 Nesting start type clue phrase table 128 Topic conversion type clue phrase Table 129 Nested end type clue table 201-203 steps

フロントページの続き (56)参考文献特開平７−160710（ＪＰ，Ａ) 特開平７−160711（ＪＰ，Ａ) 特開平７−160712（ＪＰ，Ａ) 特開平６−236410（ＪＰ，Ａ) 特開平６−139276（ＪＰ，Ａ) 特開平４−332084（ＪＰ，Ａ) 竹下敦，「話題構造認識を用いた映像検索システム」，情報処理学会研究報告 94−ＩＭ−15，日本，1994年３月11 日，Ｖｏｌ．94，Ｎｏ．24，ｐ．１− ｐ．８竹下敦，他，「Ｄ−67 モノローグにおける話題導入部の検出」1994年電子情報通信学会秋季大会，日本，1994年９月５日，ｐ．70 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/30 Continuation of front page (56) References JP-A-7-160710 (JP, A) JP-A-7-160711 (JP, A) JP-A-7-160712 (JP, A) JP-A-6-236410 (JP) , A) JP-A-6-139276 (JP, A) JP-A-4-332084 (JP, A) Atsushi Takeshita, "Video Retrieval System Using Topic Structure Recognition", Information Processing Society of Japan Research Report 94-IM-15 , Japan, March 11, 1994, Vol. 94, no. 24, p. 1-p. 8. Atsushi Takeshita, et al., "Detection of topic introductions in D-67 monologue" 1994 IEICE Autumn Conference, Japan, September 5, 1994, p. 70 (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/30

Claims

(57) [Claims]

1. A method for determining a topic level in a topic structure recognition device for recognizing a topic structure of language data using rules prepared in advance, wherein a factor for generating a new topic level and a topic level are continued.
Factors and factors that terminate topic levels
The topic level change factor that appeared last time
Talking about the matrix of topic level change factors that appeared this time
A topic level increase / decrease table that describes the increase / decrease
A condition for changing the value of the topic level to an absolute value,
A topic level setting table describing the correspondence with the absolute topic level value after the change is stored in advance as a rule , and the topic structure recognizing device uses the topic level increase / decrease table and the topic level setting table. A topic level control method characterized by determining a topic level.

2. The topic level is updated according to the topic level increase / decrease table, and it is determined whether a condition of the topic level setting table is satisfied. If the condition is satisfied, the topic level is updated. The topic level control method according to claim 1, wherein the topic level is updated according to a setting table.

3. Language data using rules prepared in advance.
Topics in a topic structure recognizer that recognizes topic structures
Factors that generate new topic levels in how to determine bells, continue topic levels
Factors and factors that terminate topic levels
The topic level change factor that appeared last time
Talking about the matrix of topic level change factors that appeared this time
A topic level increase / decrease table that describes the increase / decrease
A condition for changing the value of the topic level to an absolute value,
A topic that describes the correspondence with the absolute topic level value after the change
Were allowed prestored level setting table as rules, the topic structure recognition apparatus divides the language data in the simple sentence is a unit having only one predicate, including Rukoto issuing extract the clue phrase from each sentence performs topic structure recognition preprocessing, then the topic structure recognition apparatus performs the determination of the topic established interval topics is presented-established, the determination of topic words in the topic established intervals, further extracted clue clause Based on
The topic level change factor is calculated based on the
Determines the talk <br/> title level by using the table and the topic level setting table, further, the determination of topics scope
A topic level control method characterized in that :

4. In the topic structure recognition preprocessing, each unit is
Associating identifiable information speaker sentences, including a turn-taking as a condition for the topic level setting table
No, the speaker associated with the simple sentence
4. The topic level control method according to claim 3, wherein the topic level control method is performed based on information that specifies the topic level.

5. The topic level control method according to claim 3, wherein the condition of the topic level setting table includes information on a topic level value updated by the topic level increase / decrease table.

6. An input unit for inputting language data,
A dictionary / rule part for storing rules for topic structure recognition;
It includes a processing unit that performs processing using the dictionary specification and regulations of the rules include, a storage unit for storing the result of the processing unit, and a display unit for displaying the processing result by the processing unit, pre-Symbol Dictionary and regulations Factors that generate new topic levels
Child, factor to continue topic level, end topic level
Appearing last time for three topic level change factors
Topic level change factors and topic level change factors that appeared this time
And topic level decrease table that describes the variation value for the topic level by a matrix of the values of the topic level to an absolute value
Conditions for change and absolute topic level after change
A topic level setting table that describes the correspondence with the value , wherein the storage unit stores a language data storage unit that stores information about language data input from the input unit, and a topic structure storage unit that stores information about a topic structure. wherein the storing the language data storing unit, and the word information table for storing information about the strings and part of speech of each word included in the language data, a single word included in the single sentence in each sentence unit of the language data to and a single sentence information table, the topic structure storage unit, topic presented, including a table for storing information including the topic establishment section and topic words and the topic level and topic scope ranges to be established, that Topic structure recognition device that features.

7. The processing unit updates a topic level in accordance with the topic level increase / decrease table, and outputs a result of the updating.
The topic level is stored in the topic structure storage unit. Next, it is checked whether the condition of the topic level setting table is satisfied. If the condition is satisfied, the topic level is updated according to the topic level setting table, and the result is updated. The topic structure recognition device according to claim 6, wherein is stored in the topic structure storage unit.

8. A clue phrase is a nested start type and a topic conversion type.
And the nested-end type clue phrase.
Nested cue phrases that register the words and parts of speech
Words about tables and topic-based clue phrases
A topic conversion type clue table that registers
Words and parts of speech for nested clue phrases
Including the registered nested end type clue table
Further, the linguistic data is divided into simple sentences which are units having only one predicate, clue phrases are extracted from each simple sentence, and the topic structure recognition pre-processing includes identifying the type of the clue phrase. includes a pretreatment dictionary, the topic level increase or decrease table, clue phrase appearing last
Type and the clue type matrix that appeared this time
7. An increase / decrease value of a topic level is described in the list.
2. A topic structure recognition device according to claim 1.

9. The utterance information is specified in the simple sentence information table.
Information for the topic level setting table
As a single sentence.
The topic structure recognition device according to claim 8, wherein the recognition is performed based on information specifying the linked speaker .

10. The topic structure recognition apparatus according to claim 8, wherein the condition of the topic level setting table includes a condition using a topic level value stored in the topic structure storage unit.