JPH08329106A

JPH08329106A - Method for generating dictionary, method for constructing word set, method for constructing document set, and conception supporting system using these methods

Info

Publication number: JPH08329106A
Application number: JP7134241A
Authority: JP
Inventors: Hidekazu Arita; 英一有田; Terumasa Yasui; 照昌安井; Shinichiro Tsudaka; 新一郎津高
Original assignee: GIJUTSU KENKYU KUMIAI SHINJOHO; GIJUTSU KENKYU KUMIAI SHINJOHO SHIYORI KAIHATSU KIKO; Mitsubishi Electric Corp
Current assignee: GIJUTSU KENKYU KUMIAI SHINJOHO; GIJUTSU KENKYU KUMIAI SHINJOHO SHIYORI KAIHATSU KIKO; Mitsubishi Electric Corp
Priority date: 1995-05-31
Filing date: 1995-05-31
Publication date: 1996-12-13

Abstract

PURPOSE: To obtain a dictionary more appropriate to each given text by extracting a specific character stirring from plural character strings appearing in a text and generating a dictionary. CONSTITUTION: This dictionary generating method generates a dictionary by extracting a character string R satisfying a condition including the character string R and excluding a character string S appearing at the same appearance frequency as that of the character string R and longer than the character string R from a text stored in a data base consisting of one or more texts. In the case of executing the method by a dictionary generating means 2, each text in the data base is each document stored in an information storing means 1. A keyword based upon a structured word set can be automatically added to the dictionary and relation between documents can be presented based upon an automatically structured document set.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、文書情報の自動分類
および文書情報の情報空間の可視化に関し、特に、文書
に含まれる各単語から辞書を自動的に生成する辞書生成
方法、単語集合を自動的に構造化する単語集合構造化方
法および文書集合を自動的に構造化する文書集合構造化
方法、ならびに、それらの方法を対応づけて単語や文書
を可視化することによって文書集合の持つ情報空間を自
由に見えるような環境を提供するボトムアップ型の発想
支援システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to automatic classification of document information and visualization of an information space of document information, and more particularly, to a dictionary generation method for automatically generating a dictionary from each word included in a document, and a word set. Structured word set structuring method and document set structuring method automatically structuring the document set, and the information space of the document set by visualizing words and documents by associating these methods The present invention relates to a bottom-up type idea support system that provides an environment that looks freely.

【０００２】[0002]

【従来の技術】発想支援システムは人間の知的創造活動
を支援するために、言葉、画像、音などの情報を扱える
環境を提供するシステムである。言葉や文書を扱う従来
の発想支援システムとして、利用者から与えられた発想
をキーワードとしてまとめ、それらを空間配置して利用
者にさらなる発想を促すものがある。また、あらかじめ
用意されたシソーラスをもとに連想して利用者に発想を
促すものなどがある。しかし、いずれもキーワードとい
う単語レベルで利用者に発想を促すものであり、テキス
トデータベースと対応をとって処理を進めるものではな
い。2. Description of the Related Art An idea support system is a system that provides an environment in which information such as words, images, and sounds can be handled in order to support human intellectual creative activities. As a conventional idea support system that handles words and documents, there is a system that collects ideas given by a user as keywords and arranges them in space to encourage the user to further think. In addition, there is a device that associates a thesaurus prepared in advance and prompts the user to come up with an idea. However, in each case, the word level of the keyword is used to prompt the user to come up with an idea, and the processing is not performed in correspondence with the text database.

【０００３】従来の発想支援システムの具体例として、
例えば、人工知能学会誌，第９巻第１号（１９９４年１
月，社団法人人工知能学会発行），Ｐ．１３９〜Ｐ．１
４７「テキストオブジェクトを空間配置することによる
思考支援システム」に示されたものがある。このシステ
ムは、利用者の頭に浮かんだアイディアや文章の断片
を、仮想的なカードとして空間上に書き並べるものであ
る。As a concrete example of a conventional idea support system,
For example, Journal of Artificial Intelligence, Volume 9, No. 1 (1994 1
Moon, published by Japan Society for Artificial Intelligence), P. 139-P. 1
47, "Thinking support system by arranging text objects in space". This system is to write ideas and text fragments that come to the mind of users as virtual cards in a space.

【０００４】次に動作について説明する。システムは、
まず、利用者の頭に浮かんだアイディアや文書の断片を
入力して、それらをテキストオブジェクトと呼ばれる仮
想的なカードとして空間上に、具体的には画面上に並べ
る。このとき、空間に表示されるのは、文書の本文では
なく利用者が各テキストオブジェクトに付けた名前であ
る。また、利用者は、テキストオブジェクトに対して主
観的にいくつかのキーワードを付ける。次に、システム
は、テキストオブジェクト同士の関連性を考慮して空間
を再構成し、再構成した空間を利用者に提示する。シス
テムは、任意の２つのテキストオブジェクトの間の関連
度を、キーワードの共有度を利用して決める。すなわ
ち、システムは、任意の２つのテキストオブジェクトが
同じキーワードを共有していれば、それに応じて関連度
を上げる。Next, the operation will be described. the system,
First, input ideas or document fragments that come to the mind of the user and arrange them in space as a virtual card called a text object, specifically on the screen. At this time, what is displayed in the space is not the text of the document but the name given to each text object by the user. Also, the user subjectively attaches some keywords to the text object. Next, the system reconstructs the space in consideration of the relation between the text objects and presents the reconstructed space to the user. The system determines the degree of association between any two text objects by using the degree of keyword sharing. That is, the system increases the relevance accordingly if any two text objects share the same keyword.

【０００５】利用者は、再構成した空間をを見て、空間
の軸やクラスタの意味付けを発見することができる。そ
の発見は、システム利用者の新たな発想の刺激になる。The user can see the reconstructed space and find the meaning of the axis of the space or the cluster. The discovery will stimulate new ideas for system users.

【０００６】[0006]

【発明が解決しようとする課題】従来の発想支援システ
ムは以上のように構成されているので、以下のような問
題点があった。（１）全てのテキストに利用者がキーワードを付与しな
ければならず、使い勝手が悪い。（２）キーワードは利用者の主観によって決められるの
で、各キーワードは統制のとれたものとなっていない
し、各キーワードは相互に関係のないものになりやす
い。また、多数のテキストオブジェクトがあると、キー
ワード間の整合がとりにくくなる。（３）テキストオブジェクトの検索機能がないので、利
用者は相互の関係をつかみづらい。（４）テキストオブジェクトは利用者が作成したメモの
みで構成される。従って、発想の題材が限られてしま
う。Since the conventional idea support system is configured as described above, it has the following problems. (1) The user has to add keywords to all texts, which is inconvenient. (2) Since the keywords are determined by the subjectivity of the user, the keywords are not controlled and the keywords are likely to be unrelated to each other. Further, if there are many text objects, it becomes difficult to match the keywords. (3) Since there is no text object search function, it is difficult for users to grasp the mutual relationship. (4) The text object is composed only of the memo created by the user. Therefore, the subject matter of the idea is limited.

【０００７】この発明は上記のような問題点を解消する
ためになされたもので、構造化された単語集合によるキ
ーワードを自動的に文書に付けることができ、さらに、
自動的に構造化された文書集合にもとづいて文書間の関
連性を提示することができる発想支援システムを得るこ
とを目的とする。また、そのような発想支援システムに
適用するのに適した辞書生成方法、単語集合構造化方法
および文書集合構造化方法を得ることを目的とする。The present invention has been made in order to solve the above-mentioned problems, and it is possible to automatically attach a keyword with a structured word set to a document.
It is an object of the present invention to obtain an idea support system that can present the relationships between documents based on automatically structured document sets. Another object of the present invention is to obtain a dictionary generation method, a word set structuring method, and a document set structuring method suitable for being applied to such an idea support system.

【０００８】[0008]

【課題を解決するための手段】請求項１記載の発明に係
る辞書生成方法は、１つまたは複数のテキスト中の文字
列の出現頻度を求め、テキスト中に出現する文字列の中
から、その文字列よりも出現頻度が低くなく、かつ、そ
の文字列を含みその文字列よりも長い文字列が存在しな
いという条件を満たす文字列を抽出し、抽出された文字
列を項目として追加して辞書を生成するものである。According to a first aspect of the present invention, there is provided a dictionary generating method, wherein a frequency of appearance of a character string in one or more texts is calculated, and the appearance frequency is selected from the character strings appearing in the text. Extract a character string that meets the condition that it does not occur less frequently than the character string, and that there is no character string that contains the character string and is longer than the character string, and adds the extracted character string as an item to the dictionary. Is generated.

【０００９】請求項２記載の発明に係る単語集合構造化
方法は、単語の直近下位語および直近上位語を求め、単
語と直近下位語および直近上位語とを関連づけ、それら
の関連のうち所定の条件に適合する関連を除外した上で
単語ネットワークを生成するものである。A word set structuring method according to a second aspect of the present invention obtains the latest low-order word and the latest high-order word of a word, associates the word with the latest low-order word and the latest high-order word, and determines a predetermined one of these relationships. The word network is generated after excluding associations that meet the conditions.

【００１０】請求項３記載の発明に係る単語集合構造化
方法は、見出しと本文とを有するテキストの集合におい
て、テキストＤＯＣ１の見出しに含まれる単語ＫＷ１が
テキストＤＯＣ２の本文に含まれ、テキストＤＯＣ２の
見出しにある単語ＫＷ２がテキストＤＯＣ３の本文に含
まれ、かつ、単語ＫＷ１がテキストＤＯＣ３の本文に含
まれていて、単語ＫＷ３がテキストＤＯＣ３の見出しに
含まれている場合には、３つの単語ＫＷ１，ＫＷ２，Ｋ
Ｗ３によって単語ＫＷ１→単語ＫＷ２→単語ＫＷ３の基
本チェインを生成し、生成された各基本チェインのうち
共通部分を持つ基本チェインを結合して単語ネットワー
クを生成するものである。In the word set structuring method according to the third aspect of the present invention, in a set of texts having a headline and a body, the word KW1 included in the headline of the text DOC1 is included in the body of the text DOC2 and the word KW1 of the text DOC2 is included. If the word KW2 in the heading is included in the body of the text DOC3, the word KW1 is included in the body of the text DOC3, and the word KW3 is included in the heading of the text DOC3, the three words KW1, KW2, K
A basic chain of word KW1 → word KW2 → word KW3 is generated by W3, and basic chains having a common part among the generated basic chains are combined to generate a word network.

【００１１】請求項４記載の発明に係る単語集合構造化
方法は、請求項２記載の発明における単語ネットワーク
を第１の単語ネットワークとし、請求項３記載の発明に
おける単語ネットワークを第２の単語ネットワークと
し、指定された単語をキーとして第１の単語ネットワー
クと第２の単語ネットワークとを統合するものである。In the word set structuring method according to the invention of claim 4, the word network in the invention of claim 2 is the first word network, and the word network in the invention of claim 3 is the second word network. Then, the first word network and the second word network are integrated using the designated word as a key.

【００１２】請求項５記載の発明に係る文書集合構造化
方法は、各テキストを単語ベクトルに変換し、それぞれ
が各テキストのうちの複数のテキストを含み、単語ベク
トル間の距離を用いて離散的に設定された複数のクラス
タを設定し、各クラスタを代表する単語を求めるもので
ある。According to a fifth aspect of the present invention, there is provided a document set structuring method for converting each text into a word vector, each of which includes a plurality of texts. A plurality of clusters set to are set, and a word representative of each cluster is obtained.

【００１３】請求項６記載の発明に係る文書集合構造化
方法は、複数のクラスタを設定する際に、全単語ベクト
ルの重心を求め、求められた重心から最も遠いベクトル
を第１番目のクラスタの中心とし、各単語ベクトルにつ
いて最も近いクラスタ中心との距離を求め距離が最大と
なる単語ベクトルをクラスタ中心として追加し、各単語
ベクトルを最も近いクラスタ中心のクラスタに分類する
とともに各クラスタに含まれる単語ベクトルの平均を新
たなクラスタ中心とする動作をクラスタ中心が変化しな
くなるまで継続するものである。In the document set structuring method according to the sixth aspect of the present invention, when setting a plurality of clusters, the centroids of all word vectors are obtained, and the vector farthest from the obtained centroids is set as the first cluster. For each word vector, find the distance to the nearest cluster center for each word vector, and add the word vector with the maximum distance as the cluster center, classify each word vector into the cluster with the closest cluster center, and include the words in each cluster. The operation with the average of the vectors as the new cluster center is continued until the cluster center does not change.

【００１４】請求項７記載の発明に係る文書集合構造化
方法は、クラスタ中心の各単語ベクトルの要素から所定
個の値の大きいものを抽出し、抽出された各単語ベクト
ルに対応した単語をそのクラスタを代表する単語と定義
するものである。According to a seventh aspect of the present invention, there is provided a method of structuring a document set, which extracts a large number of predetermined values from the elements of each word vector at the center of a cluster, and extracts words corresponding to each extracted word vector. It is defined as a word representing a cluster.

【００１５】請求項８記載の発明に係る発想支援システ
ムは、情報蓄積手段に蓄積されたテキストを対象として
請求項１記載の方法によって辞書を生成する辞書生成手
段と、辞書生成手段が生成した辞書を単語集合として請
求項４記載の方法によって単語集合を構造化する単語集
合構造化手段と、情報蓄積手段中のテキスト集合から所
定のテキストを選択する情報選択手段と、情報蓄積手段
中のテキスト集合および情報選択手段によって選択され
たテキスト集合を、請求項７記載の方法によって構造化
する文書集合構造化手段と、単語集合構造化手段の処理
結果および文書集合構造化手段のそれぞれの処理結果を
表示する情報空間表示手段とを備えたものである。The idea generation support system according to the invention described in claim 8 is a dictionary generation means for generating a dictionary by the method according to claim 1 for the text stored in the information storage means, and a dictionary generated by the dictionary generation means. Is used as a word set, the word set structuring means for structuring the word set by the method according to claim 4, the information selecting means for selecting a predetermined text from the text set in the information storing means, and the text set in the information storing means. And a text set structuring means for structuring the text set selected by the information selecting means by the method according to claim 7, the processing result of the word set structuring means, and the respective processing results of the document set structuring means. And an information space display means for controlling the information space.

【００１６】請求項９記載の発明に係る発想支援システ
ムは、検索のキーワードが入力されると、情報選択手段
が入力されたキーワードにもとづいて各テキストを選択
し、文書集合構造化手段が情報選択手段が選択した各テ
キストについて文書集合構造化を行う構成になっている
ものである。In the idea generation support system according to the ninth aspect of the present invention, when the search keyword is input, the information selecting means selects each text based on the input keyword, and the document group structuring means selects the information. It is configured such that the document set is structured for each text selected by the means.

【００１７】そして、請求項１０記載の発明に係る発想
支援システムは、情報空間表示手段が、関連表示の要求
があると、単語集合構造化手段の処理結果、情報蓄積手
段中のテキスト集合についての文書集合構造化手段の処
理結果、および情報選択手段によって選択されたテキス
ト集合についての文書集合構造化手段の処理結果の各表
示画面上で、関連表示の要求に対応した要素を強調表示
する構成になっているものである。In the idea generation support system according to the tenth aspect of the present invention, when the information space display means requests a related display, the processing result of the word set structuring means and the text set in the information storage means are displayed. On each display screen of the processing result of the document set structuring means and the processing result of the document set structuring means for the text set selected by the information selecting means, the element corresponding to the request for the related display is highlighted. It has become.

【００１８】[0018]

【作用】請求項１記載の発明における文字列を抽出する
ステップは、通常の辞書にない時事用語（例えば、「花
の万博」）やテキストの種類に依存した表現（例えば、
新聞記事を対象とする場合、ゴルバチョフ大統領を意味
する「ゴ大統領」）を抽出するなど、与えられたテキス
トから自動的に意味のある文字列を抽出することを可能
にする。In the step of extracting the character string in the invention described in claim 1, expressions (eg, "Hana no Expo") that are not found in ordinary dictionaries and expressions depending on the type of text (eg,
When a newspaper article is targeted, it is possible to automatically extract a meaningful character string from a given text, such as extracting “Gorbachev”, which means President Gorbachev.

【００１９】請求項２記載の発明における単語ネットワ
ークを生成するステップは、任意の単語の上位／下位語
を自動的に抽出し、情報検索のキーワードとなりうる適
切な検索語を提供することを可能にする。In the step of generating a word network in the invention according to claim 2, it is possible to automatically extract upper / lower words of an arbitrary word and provide an appropriate search word that can be a keyword for information search. To do.

【００２０】請求項３記載の発明における単語ネットワ
ークを生成するステップは、任意の単語の関連語を自動
的に抽出し、情報検索のキーワードとなりうる適切な検
索語を提供することを可能にする。The step of generating a word network in the invention according to claim 3 makes it possible to automatically extract a related word of an arbitrary word and provide an appropriate search word that can be a keyword for information search.

【００２１】請求項４記載の発明における単語ネットワ
ークとを統合するステップは、任意の単語とその上位／
下位語および関連語とを自動的に結びつけ、情報検索の
キーワードとなりうるより適切な検索語を提供すること
を可能にする。The step of integrating with the word network in the invention according to claim 4 is to carry out arbitrary words and their upper /
It is possible to automatically connect a subordinate word and a related word, and to provide a more appropriate search word that can be a keyword for information search.

【００２２】請求項５記載の発明における複数のクラス
タを設定するステップは、内容に応じてテキスト集合を
自動的に分類することを可能にする。また、クラスタを
代表する単語を求めるステップは、分類後の各グループ
を代表する適切な単語を提供することを可能にする。The step of setting a plurality of clusters according to the fifth aspect of the invention makes it possible to automatically classify the text set according to the content. Also, the step of finding words representative of clusters makes it possible to provide suitable words representative of each group after classification.

【００２３】請求項６記載の発明における距離が最大と
なる単語ベクトルをクラスタ中心として追加するステッ
プは、各クラスタを効果的に離散配置することを可能に
する。また、クラスタに含まれる単語ベクトルの平均を
新たなクラスタ中心とする動作をクラスタ中心が変化し
なくなるまで継続するステップは、各単語ベクトルを含
むクラスタの構成を最適化することを可能にする。The step of adding the word vector having the maximum distance as the cluster center in the invention described in claim 6 enables each cluster to be effectively arranged in a discrete manner. Further, the step of continuing the operation with the average of the word vectors included in the cluster as the new cluster center until the cluster center does not change makes it possible to optimize the configuration of the cluster including each word vector.

【００２４】請求項７記載の発明におけるクラスタを代
表する単語を求めるステップは、クラスタを構成する各
単語ベクトルに対応したテキストの中のより重要度の高
い単語を、クラスタを代表する単語として選定すること
を可能にする。In the step of obtaining a word representing a cluster in the invention described in claim 7, a word having a higher degree of importance in the text corresponding to each word vector forming the cluster is selected as a word representing the cluster. To enable that.

【００２５】請求項８記載の発明における情報空間表示
手段は、テキスト中の各単語の関連性を利用者に提示す
るとともに、分類された各テキストを利用者に提示す
る。The information space display means according to the invention of claim 8 presents the user with the relevance of each word in the text and presents the classified texts to the user.

【００２６】請求項９記載の発明における情報空間表示
手段は、利用者が入力したキーワードをもとに分類した
各テキストを利用者に提示する。The information space display means in the invention according to claim 9 presents to the user each text classified based on the keyword input by the user.

【００２７】そして、請求項１０記載の発明における情
報空間表示手段は、利用者が指定した単語の他の単語と
の関連性および利用者が指定した単語の各テキストとの
関連性をわかりやく表示する。The information space displaying means in the invention according to the tenth aspect makes it easy to display the relevance of the word designated by the user to other words and the relevance of the word designated by the user to each text. To do.

【００２８】[0028]

【Example】

実施例１．図１はこの発明の一実施例による発想支援シ
ステムの構成を示すブロック図である。図において、１
は文書情報を蓄積する情報蓄積手段、２は情報蓄積手段
１に蓄積された文書情報から辞書を生成する辞書生成手
段、３は文書情報中の各単語による単語集合を構造化す
る処理を行う単語集合構造化手段、４は利用者の指示に
応じて情報蓄積手段１の文書情報を選択する情報選択手
段、５は情報蓄積手段１から選択された文書情報を蓄積
する選択情報蓄積手段、６は情報蓄積手段１中の各文書
による文書集合と選択情報蓄積手段５中の各文書による
文書集合を構造化する処理を行う文書集合構造化手段、
７は単語集合構造化手段３の処理結果を蓄積する構造化
単語集合蓄積手段８および文書集合構造化手段６の処理
結果を蓄積する構造化文書集合蓄積手段９と構造化選択
文書集合蓄積手段１０とを有し、利用者に発想支援のた
めの情報空間を提示する情報空間表示手段である。Example 1. FIG. 1 is a block diagram showing the configuration of an idea generation support system according to an embodiment of the present invention. In the figure, 1
Is an information accumulating means for accumulating document information, 2 is a dictionary generating means for generating a dictionary from the document information accumulated in the information accumulating means 1, and 3 is a word for performing a process for structuring a word set by each word in the document information. A set structuring means, 4 is an information selecting means for selecting the document information of the information accumulating means 1 according to a user's instruction, 5 is a selection information accumulating means for accumulating the document information selected from the information accumulating means 1, and 6 is A document set structuring unit that performs a process of structuring the document set of each document in the information storage unit 1 and the document set of each document in the selection information storage unit 5,
Reference numeral 7 denotes a structured word set accumulating means 8 for accumulating the processing result of the word set structuring means 3 and a structured document set accumulating means 9 and a structured selection document set accumulating means 10 for accumulating the processing result of the document set structuring means 6. And an information space display means for presenting the information space for the idea support to the user.

【００２９】発想支援システムがコンピュータで実現さ
れる場合には、情報蓄積手段１、選択情報蓄積手段５、
構造化単語集合蓄積手段８、構造化文書集合蓄積手段９
および構造化選択文書集合蓄積手段１０は、コンピュー
タシステムにおける記憶部で実現される。また、辞書生
成手段２、単語集合構造化手段３および文書集合構造化
手段６は、ソフトウェアで実現可能である。情報選択手
段４は、コンピュータシステムにおける入力部とソフト
ウェアとで実現可能である。情報空間表示手段７は、コ
ンピュータシステムにおける表示部とソフトウェアとで
実現可能である。When the idea generation support system is realized by a computer, the information storage means 1, the selection information storage means 5,
Structured word set storage means 8 and structured document set storage means 9
The structured selection document set accumulating unit 10 is realized by the storage unit in the computer system. The dictionary generating means 2, the word set structuring means 3 and the document set structuring means 6 can be realized by software. The information selecting means 4 can be realized by an input unit and software in a computer system. The information space display means 7 can be realized by a display unit and software in a computer system.

【００３０】図２は、この発明の第１の実施例による辞
書生成方法を示すフローチャートである。この辞書生成
方法は、図１に示すシステムにおける辞書生成手段２に
おいて適用可能である。もちろん、この辞書生成方法
は、他のシステムにも適用できる。FIG. 2 is a flow chart showing a dictionary generating method according to the first embodiment of the present invention. This dictionary generation method can be applied to the dictionary generation means 2 in the system shown in FIG. Of course, this dictionary generation method can be applied to other systems.

【００３１】次に動作について説明する。この辞書生成
方法は、１つまたは複数のテキストからなるデータベー
スにおけるテキスト中から、文字列Ｒを含み文字列Ｒの
出現頻度と同じ頻度で出現する文字列Ｓであって、文字
列Ｒよりも長い文字列Ｓが存在しないという条件を満た
す文字列Ｒを抽出することによって辞書を生成する方法
である。図１に示すシステムにおける辞書生成手段２が
この方法を実行する場合には、データベースにおける各
テキストとは、情報蓄積手段１に蓄積されている各文書
である。また、その場合、図２に示すフローチャート
は、辞書生成手段２が実行する処理を示す。Next, the operation will be described. This dictionary generation method is a character string S that includes a character string R and appears at the same frequency as the appearance frequency of the character string R in a text in a database including one or a plurality of texts, and is longer than the character string R. This is a method of generating a dictionary by extracting a character string R that satisfies the condition that the character string S does not exist. When the dictionary generation means 2 in the system shown in FIG. 1 executes this method, each text in the database is each document stored in the information storage means 1. In that case, the flowchart shown in FIG. 2 shows the processing executed by the dictionary generation means 2.

【００３２】辞書を生成するために、一時ファイルT-FI
LE、辞書DICT、一時辞書T-DICTおよび変数Ｉの各領域を
用意する。まず、一時ファイルT-FILE、辞書DICTおよび
一時辞書T-DICTを「空」状態にする。また、テキストの
長さＬを決定する（ステップＳＴ１３０）。テキストの
長さとは、取り扱っているテキストの文字数またはそれ
以下の長さである。そして、変数Ｉに値Ｌを代入する
（ステップＳＴ１３１）。一時ファイルT-FILEは、上記
条件を満たす文字列を探す処理の実行中に、各テキスト
に出現する各部分文字列とその出現回数とを一時記憶す
るものである。一時辞書T-DICTは、上記条件を満たす文
字列を探す処理の実行中に、各長さの値Ｉに関して上記
条件を満たす各部分文字列とその出現回数とを一時記憶
するものである。A temporary file T-FI is used to generate the dictionary.
Prepare areas for LE, dictionary DICT, temporary dictionary T-DICT, and variable I. First, the temporary file T-FILE, the dictionary DICT, and the temporary dictionary T-DICT are set to the "empty" state. Also, the length L of the text is determined (step ST130). The text length is the number of characters of the text being handled or a length less than that. Then, the value L is substituted for the variable I (step ST131). The temporary file T-FILE temporarily stores each partial character string that appears in each text and the number of appearances thereof during execution of the process of searching for a character string that satisfies the above conditions. The temporary dictionary T-DICT temporarily stores each partial character string that satisfies the above condition for the value I of each length and the number of appearances thereof while executing the process of searching for a character string that satisfies the above condition.

【００３３】以下、変数Ｉの値が「１」になるまでステ
ップＳＴ１３２〜ＳＴ１３６の処理が繰り返される。従
って、まず、変数Ｉの値が「１」になっているかどうか
調べ、「１」になっていなければ、上記条件を満たす文
字列を探す処理を行う（ステップＳＴ１３２）。上記条
件を満たす文字列を探す処理において、最初に、各テキ
ストにおける部分である長さがＩの全ての部分文字列の
出現回数を数え、各部分文字列をその出現回数とともに
一時ファイルT-FILEに記憶する（ステップＳＴ１３
３）。Thereafter, the processes of steps ST132 to ST136 are repeated until the value of the variable I becomes "1". Therefore, first, it is checked whether or not the value of the variable I is "1", and if it is not "1", a process of searching for a character string satisfying the above conditions is performed (step ST132). In the process of searching for a character string satisfying the above conditions, first, the number of appearances of all partial character strings having a length of I, which is a portion in each text, is counted, and each partial character string is displayed together with the number of appearances in a temporary file T-FILE. To store (step ST13
3).

【００３４】次に、辞書DICTに設定されている全ての文
字列と一時ファイルT-FILE中の各部分文字列とを比較す
る。そして、一時ファイルT-FILE中の各部分文字列のう
ち辞書DICT中の文字列の部分文字列となっていないも
の、および、辞書DICT中の文字列の部分文字列であるが
その出現頻度が辞書DICT中の文字列の出現頻度を上回る
ものを一時辞書T-DICTに記録する（ステップＳＴ１３
４）。なお、第１回目の処理、すなわちＩ＝Ｌの場合に
は、辞書DICTは「空」であるから、一時ファイルT-FILE
中の全ての部分文字列が一時辞書T-DICTに記録される。
そして、一時辞書T-DICT中の全ての文字列を辞書DICTに
追加する（ステップＳＴ１３５）。Next, all the character strings set in the dictionary DICT are compared with each partial character string in the temporary file T-FILE. Then, among the partial character strings in the temporary file T-FILE, those that are not the partial character strings of the character string in the dictionary DICT, and the partial character strings of the character string in the dictionary DICT, Those that exceed the frequency of appearance of character strings in the dictionary DICT are recorded in the temporary dictionary T-DICT (step ST13).
4). In the first process, that is, when I = L, the dictionary DICT is “empty”, so the temporary file T-FILE
All substrings inside are recorded in the temporary dictionary T-DICT.
Then, all the character strings in the temporary dictionary T-DICT are added to the dictionary DICT (step ST135).

【００３５】次に、一時ファイルT-FILEおよび一時辞書
T-DICTを「空」状態にして、かつ、変数Ｉの値を１減じ
てステップＳＴ１３２の処理に戻る（ステップＳＴ１３
６）。そして、ステップＳＴ１３２〜ＳＴ１３６の処理
が、Ｉ＝１になるまで繰り返される。Ｉ＝１になると処
理を終了する。このとき、辞書DICTは、上記条件を満た
す文字列からなる自動生成された辞書である。Next, the temporary file T-FILE and the temporary dictionary
The T-DICT is set to the “empty” state, the value of the variable I is decremented by 1, and the process returns to step ST132 (step ST13).
6). Then, the processes of steps ST132 to ST136 are repeated until I = 1. When I = 1, the process ends. At this time, the dictionary DICT is an automatically generated dictionary composed of character strings satisfying the above conditions.

【００３６】以上の処理において、辞書DICTは最初に、
すなわちステップＳＴ１３０において「空」に設定され
たが、一般の辞書を設定してもよい。その場合には、一
般の辞書の内容に加えて、上述したように自動抽出され
た文字列が辞書に設定される。また、最初に、あるテキ
ストを対象として作成された辞書を辞書DICTに設定し、
別のテキストから新たな文字列を抽出して辞書DICTに追
加してもよい。そして、そのように作成された辞書をス
テップＳＴ１３０において辞書DICTに設定した上で上記
処理を実行することを繰り返せば、段階的に辞書を生成
していくことができる。In the above processing, the dictionary DICT is first
That is, although it is set to "empty" in step ST130, a general dictionary may be set. In that case, in addition to the contents of the general dictionary, the character string automatically extracted as described above is set in the dictionary. Also, first, set the dictionary created for a certain text as the dictionary DICT,
You may extract a new string from another text and add it to the dictionary DICT. Then, by setting the dictionary thus created as the dictionary DICT in step ST130 and repeating the above processing, the dictionary can be generated step by step.

【００３７】実施例２．図３および図４は、この発明の
第２の実施例による単語集合構造化方法における第１の
単語ネットワークを作成する処理を示すフローチャート
である。この単語集合構造化方法による処理は、図１に
示すシステムにおける単語集合構造化手段３において適
用可能である。もちろん、この処理は、他のシステムに
も適用できる。Example 2. 3 and 4 are flowcharts showing the process of creating the first word network in the word set structuring method according to the second embodiment of the present invention. The processing by this word set structuring method can be applied in the word set structuring means 3 in the system shown in FIG. Of course, this process can be applied to other systems.

【００３８】この方法によれば、ある単語Ａの直近の下
位語および直近の上位語の関係を反映した第１の単語ネ
ットワークが得られる。ある単語Ａの直近の下位語と
は、その単語Ａを含む単語Ｂである。ただし、単語Ｂを
含む単語Ｃは、単語Ａの直近ではない。ある単語Ａの直
近の上位語とは、単語Ａを構成する各単語Ｄｎである。
ただし、単語Ｄｎを構成するＤｎｎは、単語Ａの直近で
はない。例えば、”原子力安全”は、単語「安全」の直
近下位語であるが、”原子力安全白書”は、単語「安
全」の直近下位語ではない。”１６ビット”は、単語
「１６ビットパソコン」の直近上位語であるが、”ビッ
ト”は、単語「１６ビットパソコン」の直近上位語では
ない。According to this method, it is possible to obtain the first word network that reflects the relationship between the most recent lower word and the most recent upper word of a word A. The nearest lower word of a word A is a word B including the word A. However, the word C including the word B is not the closest to the word A. The most recent superordinate word of a certain word A is each word Dn forming the word A.
However, the Dnn forming the word Dn is not the closest to the word A. For example, “nuclear safety” is the most recent subordinate word of the word “safety”, but “nuclear safety white paper” is not the most recent subordinate word of the word “safety”. Although "16 bits" is the most recent broader word of the word "16-bit personal computer", "bit" is not the most recent broader word of the word "16-bit personal computer".

【００３９】次に動作について説明する。図１に示すシ
ステムにおける単語集合構造化手段３がこの方法を実行
する場合には、情報蓄積手段１に蓄積されている文書ま
たは辞書生成手段２が生成した辞書を対象として、直近
下位語および直近上位語の抽出が行われる。Next, the operation will be described. When the word set structuring means 3 in the system shown in FIG. 1 executes this method, the document stored in the information storage means 1 or the dictionary generated by the dictionary generation means 2 is targeted, and the latest lower word and the most recent word are used. The high-order word is extracted.

【００４０】最初に、図３のフローチャートを参照して
第１の単語ネットワークを生成する処理における単語WO
RDの直近下位語を抽出する処理を説明する。直近下位語
の抽出を行うために、単語WORD、リストWORDS 、領域WO
RD2 、リストWORDS-KAI 、リストWORDS-KAI-SAKUJOおよ
びリスト1-LEVEL-KAI の領域を用意する。図１に示すシ
ステムにおける単語集合構造化手段３がこの方法を実行
する場合には、単語WORDは、利用者が指定した単語また
はリストの各単語であり、そこから直近下位語が抽出さ
れる単語のリストは、例えば、辞書生成手段２が生成し
た辞書である。First, referring to the flowchart of FIG. 3, the word WO in the process of generating the first word network.
The process of extracting the immediately lower word of RD will be described. Word WORD, list WORDS, area WO to extract the most recent lower word
Areas of RD2, list WORDS-KAI, list WORDS-KAI-SAKUJO and list 1-LEVEL-KAI are prepared. When the word set structuring means 3 in the system shown in FIG. 1 executes this method, the word WORD is a word specified by the user or each word in the list, and the word from which the latest subordinate word is extracted. The list of is a dictionary created by the dictionary creating means 2, for example.

【００４１】まず、単語のリストをリストWORDS に代入
する（ステップＳＴ５１）。次に、単語WORDを文字列と
して含む全ての単語をリストWORDS から選びだし、選ば
れたものからなるリストWORDS-KAI を作成する（ステッ
プＳＴ５２）。例えば、単語WORDを「安全」とした場合
に、リストWORDS-KAI は、［”安全センタ”、”原子力
安全”、”原子力安全白書”、・・・］のようになる。First, the word list is substituted into the list WORDS (step ST51). Next, all words including the word WORD as a character string are selected from the list WORDS, and a list WORDS-KAI including the selected words is created (step ST52). For example, when the word WORD is “safe”, the list WORDS-KAI becomes [“Safety Center”, “Nuclear Safety”, “Nuclear Safety White Paper”, ...].

【００４２】そして、リストWORDS-KAI から直近下位語
以外の下位語を削除する処理を行う。そのために、ま
ず、リストWORDS-KAI-SAKUJOを「空」状態にする（ステ
ップＳＴ５３）。次に、リストWORDS-KAI 中の１つの単
語を領域WORD2 に設定する。次いで、領域WORD2 に設定
された単語が、リストWORDS-KAI 中に存在する単語を含
むかどうか調べる。含む場合には、その単語をリストWO
RDS-KAI-SAKUJOに追加する（ステップＳＴ５４）。例え
ば、”原子力安全白書”は、リストWORDS-KAI-SAKUJO中
にある文字列”原子力安全”を含むので、リストWORDS-
KAI-SAKUJOに追加される。そして、この処理を、リスト
WORDS-KAI 中の全ての単語について実行する。Then, a process of deleting lower-order words other than the latest lower-order word from the list WORDS-KAI is performed. Therefore, first, the list WORDS-KAI-SAKUJO is set to the "empty" state (step ST53). Next, set one word in the list WORDS-KAI to the area WORD2. Then, it is checked whether the word set in the area WORD2 includes the word existing in the list WORDS-KAI. If included, list the word WO
Add to RDS-KAI-SAKUJO (step ST54). For example, the “nuclear safety white paper” contains the string “nuclear safety” in the list WORDS-KAI-SAKUJO, so the list WORDS-
Added to KAI-SAKUJO. And this process, list
Execute for all words in WORDS-KAI.

【００４３】次いで、リストWORDS-KAI からリストWORD
S-KAI-SAKUJO中の単語を除いた上で、リストWORDS-KAI
中の各単語を、リスト1-LEVEL-KAI に追加する（ステッ
プＳＴ５５）。以上のようにして、直近下位語がリスト
1-LEVEL-KAI に設定される。例えば、単語”安全”に関
するリスト1-LEVEL-KAI は、［”安全センタ”、”原子
力安全”、・・・］となる。Then, from list WORDS-KAI to list WORD
Excluding words in S-KAI-SAKUJO, then list WORDS-KAI
Each word in the list is added to the list 1-LEVEL-KAI (step ST55). As described above, the most recent subordinate word is a list
Set to 1-LEVEL-KAI. For example, the list 1-LEVEL-KAI about the word "safety" becomes ["safety center", "nuclear safety", ...].

【００４４】さらに、リスト1-LEVEL-KAI から、実は直
近下位語とはいえない可能性のあるものに印を付ける処
理を行う。ここで、偽の上位／下位関係を見つけるルー
ルを適用する。ルールの例として以下のようなものがあ
る。すなわち、リスト1-LEVEL-KAI に属する単語のうち
で、単語WORDに設定されている単語との違いの部分の文
字列がアルファベット１文字、かたかな１文字またはひ
らがな１文字のものに、偽の上位／下位関係の印を付け
る（ステップＳＴ５６）。このルールは、例えば、魚の
「アジ（鰺）」と近隣諸国の「アジア（Asia）」とのよ
うに意味的に明らかに上位／下位関係にならないものを
排除するヒューリスティックルールである。Further, from the list 1-LEVEL-KAI, a process of marking a word which may not be said to be the most recent subordinate word is performed. Here, the rule for finding a false upper / lower relationship is applied. The following are examples of rules. That is, among the words belonging to list 1-LEVEL-KAI, if the character string of the difference from the word set in word WORD is one alphabet, one katakana or one hiragana, A high-order / low-order relationship is marked (step ST56). This rule is a heuristic rule that excludes, for example, those that do not clearly have a superior / subordinate relationship in meaning, such as "Aji" of fish and "Asia" of neighboring countries.

【００４５】また、単語WORDに設定されている単語との
違いが１文字を越えている場合でも、違いの文字列が、
拗音（「ゅ」や「ュ」など）や促音（「っ」、「ッ」）
や撥音（「ん」、「ン」）や長音を示す文字で始まると
きには偽の上位／下位関係がある。例えば、「カルテ」
と「カルテット」とにおける違いの文字列は「ット」な
ので、偽の上位／下位関係がある。「コンビ」と「コン
ビーフ」とにおける違いの文字列は「ーフ」なので、偽
の上位／下位関係がある。このようなルールをさらに適
用してもよい。Even when the difference from the word set in the word WORD exceeds one character, the difference character string is
Gong (“yu”, “u”, etc.) and consonant (“tsu”, “tsu”)
There is a false superior / inferior relation when it begins with a character that indicates a syllable (“n”, “n”) or a long sound. For example, "chart"
Since the character string of the difference between "quartet" and "quartet" is "t", there is a false upper / lower relationship. Since the character string of the difference between "combi" and "corned beef" is "oo", there is a false upper / lower relationship. You may further apply such a rule.

【００４６】次に、図４のフローチャートを参照して単
語WORDの直近上位語を求める処理について説明する。直
近上位語の抽出を行うために、単語WORD、リストWORDS
、領域WORD2 、リストWORDS-JOUI、リストWORDS-JOUI-
SAKUJO およびリスト1-LEVEL-JOUIの領域を用意する。
図１に示すシステムにおける単語集合構造化手段３がこ
の方法を実行する場合には、単語WORDは、利用者が指定
した単語またはリストの各単語であり、そこから直近上
位語が抽出される単語のリストは、例えば、辞書生成手
段２が生成した辞書である。Next, with reference to the flow chart of FIG. 4, the process for obtaining the most recent upper word of the word WORD will be described. Word WORD, list WORDS to extract the most recent upper word
, Area WORD2, list WORDS-JOUI, list WORDS-JOUI-
Prepare areas for SAKUJO and Listing 1-LEVEL-JOUI.
When the word set structuring means 3 in the system shown in FIG. 1 executes this method, the word WORD is a word specified by the user or each word in the list, and the word from which the most recent superordinate word is extracted. The list of is a dictionary created by the dictionary creating means 2, for example.

【００４７】まず、単語のリストをリストWORDS に代入
する（ステップＳＴ５１）。次に、単語WORDが文字列と
して含む単語をリストWORDS から選びだし、選ばれたも
のからなるリストWORDS-JOUIを作成する（ステップＳＴ
５８）。例えば、単語WORDを「１６ビットパソコン」と
した場合に、リストWORDS-JOUIは、［”１６ビッ
ト”、”ビット”、”パソコン”］のようになる。First, the word list is substituted into the list WORDS (step ST51). Next, the words included in the word WORD as a character string are selected from the list WORDS, and a list WORDS-JOUI composed of the selected words is created (step ST
58). For example, when the word WORD is "16-bit personal computer", the list WORDS-JOUI becomes ["16-bit", "bit", "personal computer"].

【００４８】そして、リストWORDS-JOUIから直近上位語
以外の上位語を削除する処理を行う。まず、リストWORD
S-JOUI-SAKUJO を「空」状態にする（ステップＳＴ５
９）。次に、リストWORDS-JOUI中の１つの単語を領域WO
RD2 に設定する。次いで、領域WORD2 に設定された単語
の文字列が、リストWORDS-JOUI中に存在する単語に含ま
れるかどうか調べる。含まれる場合には、領域WORD2 中
のその単語をリストWORDS-KAI-SAKUJOに追加する（ステ
ップＳＴ６０）。例えば、”ビット”はリストWORDS-JO
UI中に存在する”１６ビット”に含まれるので、リスト
WORDS-KAI-SAKUJOに追加される。そして、この処理を、
リストWORDS-JOUI中の全ての単語について実行する。Then, the process of deleting the upper words other than the most recent upper word from the list WORDS-JOUI is performed. First, the list WORD
Put S-JOUI-SAKUJO in the "empty" state (step ST5
9). Next, put one word in the list WORDS-JOUI into the region WO
Set to RD2. Then, it is checked whether or not the character string of the word set in the area WORD2 is included in the words existing in the list WORDS-JOUI. If it is included, the word in the area WORD2 is added to the list WORDS-KAI-SAKUJO (step ST60). For example, "bit" is the list WORDS-JO
As it is included in "16 bits" existing in the UI, list
Added to WORDS-KAI-SAKUJO. And this process
Execute for every word in the list WORDS-JOUI.

【００４９】次いで、リストWORDS-JOUIからリストWORD
S-JOUI-SAKUJO 中の単語を除いた上で、リストWORDS-JO
UI中の各単語を、リスト1-LEVEL-JOUIに追加する（ステ
ップＳＴ６１）。以上のようにして、直近上位語がリス
ト1-LEVEL-JOUIに設定される。例えば、単語「１６ビッ
トパソコン」に関するリスト1-LEVEL-JOUIは、［”１６
ビット”、”パソコン”］となる。Next, from list WORDS-JOUI to list WORD
S-JOUI-SAKUJO Excluding words in list WORDS-JO
Each word in the UI is added to the list 1-LEVEL-JOUI (step ST61). As described above, the most recent upper word is set in the list 1-LEVEL-JOUI. For example, the list 1-LEVEL-JOUI for the word "16-bit personal computer" is ["16
Bit "," PC "].

【００５０】さらに、リスト1-LEVEL-JOUIから、実は直
近上位語とはいえない可能性のあるものに印を付ける処
理を行う。ここで、偽の上位／下位関係を見つけるルー
ルを適用する。ルールの例として以下のようなものがあ
る。すなわち、リスト1-LEVEL-JOUIに属する単語のうち
で、単語WORDに設定されている単語との違いの部分の文
字列がアルファベット１文字、かたかな１文字またはひ
らがな１文字のものに、偽の上位／下位関係の印を付け
る（ステップＳＴ６２）。このルールは、例えば、病気
の「ガン」と国名の「ウガンダ」とのように意味的に明
らかに上位／下位関係にならないものを排除するヒュー
リスティックルールである。Further, from the list 1-LEVEL-JOUI, a process of marking those that may not be said to be the most recent superordinate word is performed. Here, the rule for finding a false upper / lower relationship is applied. The following are examples of rules. That is, among the words belonging to list 1-LEVEL-JOUI, if the character string of the part that differs from the word set in word WORD is one alphabet, one katakana or one hiragana, The upper / lower relation of is marked (step ST62). This rule is a heuristic rule that excludes, for example, sickness “cancer” and country name “Uganda” that do not have a significant upper / lower relationship in meaning.

【００５１】以上のようにして、各単語についての上位
／下位関係を得ることができる。ある単語とそのその直
近下位語との関係を直近下位関係と呼び、ある単語とそ
のその直近上位語との関係を直近上位関係と呼ぶことに
し、各単語と直近下位関係および直近上位関係を表す各
リンクとを含むものを第１の単語ネットワークと表現す
る。なお、第１の単語ネットワークにおいて、偽の上位
／下位関係の印がついた関係には、必要に応じてリンク
の種類を区別できるようにする。As described above, it is possible to obtain the upper / lower relation for each word. The relationship between a word and its nearest lower-ranked word is called the nearest lower-ranked relationship, and the relationship between a word and its nearest higher-ranked word is called the nearest higher-ranked relationship. The one including each link is expressed as a first word network. In the first word network, the relation marked with a false upper / lower relation can be distinguished in link type as necessary.

【００５２】第１の単語ネットワークが図１に示すシス
テムに適用される場合には、単語集合構造化手段３は第
１の単語ネットワークを作成し、それを構造化単語集合
蓄積手段８に蓄積する。情報空間表示手段７は、利用者
の要求に応じて第１の単語ネットワークを可視表示する
こともできる。その場合、各単語についての上位／下位
関係が、第１の単語ネットワークとして表示される。When the first word network is applied to the system shown in FIG. 1, the word set structuring means 3 creates the first word network and stores it in the structured word set storage means 8. . The information space display means 7 can also visually display the first word network in response to the user's request. In that case, the upper / lower relationship for each word is displayed as the first word network.

【００５３】図５は第１の単語ネットワークの一例を示
す説明図である。第１の単語ネットワークを利用して発
想支援システムにおける情報検索の再現率を上げること
ができる。例えば、利用者が「腎臓移植」に関する文書
をあるデータベースから検索したい場合に、データベー
スをもとに作成された第１の単語ネットワークを参照す
れば、文字列「腎臓移植」とともに、文字列「じん臓移
植」と文字列「腎移植」とをキーワードにすべきことが
わかる。第１の単語ネットワークが提供されない場合に
は、文字列「腎臓移植」のみをキーワードにしてしまっ
て、検索の再現率が悪くなる可能性がある。ここで、検
索の再現率は、（検索された文書の中で利用者の望むも
のであった文書の数）÷（データベース利用者に提供さ
れるべき文書の数）である。FIG. 5 is an explanatory diagram showing an example of the first word network. It is possible to increase the recall rate of information retrieval in the idea generation support system by using the first word network. For example, if a user wants to retrieve a document regarding "kidney transplantation" from a certain database, if the user refers to the first word network created based on the database, the character string "kidney transplantation" and the character string "kidney" can be referred to. It is understood that the keywords are "transplant" and the character string "kidney transplant". If the first word network is not provided, only the character string “kidney transplant” is used as a keyword, and the recall of the search may be deteriorated. Here, the recall ratio of the search is (the number of documents that the user desired in the retrieved documents) ÷ (the number of documents that should be provided to the database user).

【００５４】また、第１の単語ネットワークを利用して
発想支援システムにおける情報検索の精度を上げること
ができる。例えば、利用者が微生物の「ウィルス」に関
する文書をあるデータベースから検索したい場合に、デ
ータベースをもとに作成された第１の単語ネットワーク
を参照すれば、文字列「ウィルス」をキーワードにする
とともに、文字列「コンピュータウィルス」を排他的な
キーワードにすべきことがわかる。第１の単語ネットワ
ークが提供されない場合には、文字列「ウィルス」のみ
をキーワードにしてしまって、コンピュータ関連の文書
も抽出されてしまう可能性がある。すなわち、検索の精
度が悪くなる可能性がある。ここで、検索の精度とは、
（検索された文書の中で利用者の望むものであった文書
の数）÷（検索された文書の数）である。Further, the accuracy of information retrieval in the idea generation support system can be improved by utilizing the first word network. For example, if a user wants to retrieve a document concerning a microorganism "virus" from a database, if the user refers to the first word network created based on the database, the character string "virus" is used as a keyword, and It turns out that the character string "computer virus" should be an exclusive keyword. If the first word network is not provided, computer-related documents may be extracted by using only the character string “virus” as a keyword. That is, the accuracy of the search may be deteriorated. Here, the accuracy of the search is
(Number of documents that the user desired in the retrieved documents) ÷ (Number of retrieved documents).

【００５５】次に、単語集合構造化方法における第２の
単語ネットワークについて説明する。この単語集合構造
化方法による処理は、図１に示すシステムにおける単語
集合構造化手段３において適用可能である。もちろん、
この処理は、他のシステムにも適用できる。第２の単語
ネットワークは、以下のように構成されるものである。Next, the second word network in the word set structuring method will be described. The processing by this word set structuring method can be applied in the word set structuring means 3 in the system shown in FIG. of course,
This process can be applied to other systems. The second word network is configured as follows.

【００５６】文書ＤＯＣｉの見出しにキーワードＫＷｊ
が含まれていることをＫＷｊ（ＤＯＣｉ）と表す。１つ
の単語には複数の意味が含まれる場合があるが、ここで
のキーワードＫＷｊは、ＤＯＣｉという文書の文脈にお
いて使用される意味で用いられるものをいう。The keyword KWj is added to the heading of the document DOCi.
Is included is represented as KWj (DOCi). Although one word may include a plurality of meanings, the keyword KWj here means a meaning used in the context of the document DOCi.

【００５７】文書ＤＯＣｍの見出しに含まれるキーワー
ドＫＷｋが文書ＤＯＣｉの本文に含まれていて、その見
出しにキーワードＫＷｊが含まれていることを、ＫＷｋ（ＤＯＣｍ）＝＞ＫＷｊ（ＤＯＣｉ）と表す。この関係を共起関係と呼ぶ。ここで、図６に示
すように、ＫＷ１（ＤＯＣ１）＝＞ＫＷ２（ＤＯＣ２）ＫＷ２（ＤＯＣ２）＝＞ＫＷ３（ＤＯＣ３）ＫＷ１（ＤＯＣ１）＝＞ＫＷ３（ＤＯＣ３）の関係にあるときには、ＫＷ１，ＫＷ２，ＫＷ３は連想
関係にあると考える。第１項目および第２項目の共起関
係が生じただけでＫＷ１，ＫＷ２，ＫＷ３は連想関係に
あると考えたのでは、それら２項が偶然に共起関係を生
じた場合を排除できない。そこで、第３項目も考慮する
のである。上記３項の共起関係を、ＫＷ１（ＤＯＣ１）→ＫＷ２（ＤＯＣ２）→ＫＷ３（Ｄ
ＯＣ３）と表すことにする。この表記が連想関係の基本チェイン
になる。The fact that the keyword KWk included in the headline of the document DOCm is included in the body of the document DOCi and the keyword KWj is included in the headline is expressed as KWk (DOCm) => KWj (DOCi). This relationship is called a co-occurrence relationship. Here, as shown in FIG. 6, when there is a relationship of KW1 (DOC1) => KW2 (DOC2) KW2 (DOC2) => KW3 (DOC3) KW1 (DOC1) => KW3 (DOC3), KW1, KW2, I think that KW3 has an associative relationship. If it is considered that KW1, KW2, and KW3 are in an associative relationship only by the occurrence of the co-occurrence relationship between the first item and the second item, it is not possible to exclude the case where these two terms happen to cause the co-occurrence relationship. Therefore, the third item is also considered. The co-occurrence relation of the above 3 terms is expressed as
OC3). This notation is the basic chain of association.

【００５８】いくつかの基本チェインの間に共通部分が
ある場合には、それらの基本チェインを結合してネット
ワークを構成する。このネットワークが第２の単語ネッ
トワークである。例えば、ＫＷ１（ＤＯＣ１）→ＫＷ２（ＤＯＣ２）→ＫＷ３（Ｄ
ＯＣ３）ＫＷ２（ＤＯＣ２）→ＫＷ３（ＤＯＣ３）→ＫＷ４（Ｄ
ＯＣ４）ＫＷ１（ＤＯＣ１）→ＫＷ２（ＤＯＣ２）→ＫＷ５（Ｄ
ＯＣ５）の３つの基本チェインは、以下のように結合する。When there is an intersection between some basic chains, these basic chains are combined to form a network. This network is the second word network. For example, KW1 (DOC1) → KW2 (DOC2) → KW3 (D
OC3) KW2 (DOC2) → KW3 (DOC3) → KW4 (D
OC4) KW1 (DOC1) → KW2 (DOC2) → KW5 (D
The three basic chains of OC5) are combined as follows.

【００５９】[0059]

【表１】 [Table 1]

【００６０】さらに、ＤＯＣｉとＤＯＣｊが同じ種類の
文書である場合には、ＫＷｋ（ＤＯＣｉ）とＫＷｋ（Ｄ
ＯＣｊ）とを結合する。例えば、ＫＷ１（ＤＯＣ１）→ＫＷ２（ＤＯＣ２）→ＫＷ３（Ｄ
ＯＣ３）ＫＷ４（ＤＯＣ４）→ＫＷ１（ＤＯＣ６）→ＫＷ５（Ｄ
ＯＣ５）であり、かつ、ＤＯＣ１とＤＯＣ６とが同じ種類の文書
である場合には、Further, when DOCi and DOCj are the same type of document, KWk (DOCi) and KWk (D
And OCj). For example, KW1 (DOC1) → KW2 (DOC2) → KW3 (D
OC3) KW4 (DOC4) → KW1 (DOC6) → KW5 (D
OC5) and DOC1 and DOC6 are the same type of document,

【表２】のように結合されて、第２の単語ネットワークが生成さ
れる。同じ種類の文書とは、テーマを同じくする文書の
意味であり、例えば、利用者が同一インデックスをつけ
た各文書や、後述する自動分類によって同一クラスタに
分類された各文書である。[Table 2] To generate a second word network. The same type of document means a document having the same theme, and is, for example, each document to which the user gives the same index, or each document which is classified into the same cluster by the automatic classification described later.

【００６１】要約すると、図６に示すように、見出しと
本文とを有する文書の集合において、文書ＤＯＣ１の見
出しに含まれる単語ＫＷ１が文書ＤＯＣ２の本文に含ま
れ、文書ＤＯＣ２の見出しにある単語ＫＷ２が文書ＤＯ
Ｃ３の本文に含まれ、かつ、単語ＫＷ１が文書ＤＯＣ３
の本文に含まれていて、単語ＫＷ３が文書ＤＯＣ３の見
出しに含まれている場合には、３つの単語ＫＷ１，ＫＷ
２，ＫＷ３によって単語ＫＷ１→単語ＫＷ２→単語ＫＷ
３の基本チェインを構成し、さらに、各基本チェインを
結合することによって第２の単語ネットワークが生成さ
れる。In summary, as shown in FIG. 6, in a set of documents having a headline and a body, the word KW1 included in the headline of the document DOC1 is included in the body of the document DOC2 and the word KW2 in the headline of the document DOC2. Document DO
The word KW1 contained in the body of C3 and the word KW1 is document DOC3
, And the word KW3 is included in the heading of the document DOC3, the three words KW1, KW
2, word KW1 → word KW1 → word KW2 → word KW
A second word network is generated by constructing three basic chains and further combining each basic chain.

【００６２】例えば、新聞記事データとして、フロン
（ＤＯＣａ），炭酸ガス（ＤＯＣｂ），温室効果（ＤＯ
Ｃｃ），温暖化（ＤＯＣｄ），環境外交（ＤＯＣｅ）が
あり、フロン（ＤＯＣａ）→炭酸ガス（ＤＯＣｂ）→温室効果
（ＤＯＣｃ）炭酸ガス（ＤＯＣｂ）→温室効果（ＤＯＣｃ）→温暖化
（ＤＯＣｄ）温室効果（ＤＯＣｃ）→温暖化（ＤＯＣｄ）→環境外交
（ＤＯＣｅ）炭酸ガス（ＤＯＣｂ）→フロン（ＤＯＣａ）→温暖化
（ＤＯＣｄ）の各基本チェインがある場合には、For example, as newspaper article data, Freon (DOCa), carbon dioxide (DOCb), greenhouse effect (DO)
Cc), warming (DOCd), environmental diplomacy (DOCe), CFCs (DOCa) → carbon dioxide (DOCb) → greenhouse effect (DOCc) carbon dioxide (DOCb) → greenhouse effect (DOCc) → warming (DOCd) Greenhouse effect (DOCc) → warming (DOCd) → environmental diplomacy (DOCe) carbon dioxide (DOCb) → freon (DOCa) → global warming (DOCd) If there are basic chains,

【表３】のような第２の単語ネットワークが生成される。なお、
ここでは、文書ＩＤは省略されている。[Table 3] A second word network such as In addition,
Here, the document ID is omitted.

【００６３】新聞記事データとして、フロン（ＤＯＣ
ａ），炭酸ガス（ＤＯＣｂ），温室効果（ＤＯＣｃ），
温暖化（ＤＯＣｄ），冷蔵庫（ＤＯＣｆ），オゾン層
（ＤＯＣｇ），フロン（ＤＯＣｈ）があり、炭酸ガス（ＤＯＣｂ）→温室効果（ＤＯＣｃ）→温暖化
（ＤＯＣｄ）フロン（ＤＯＣａ）→温室効果（ＤＯＣｃ）→温暖化
（ＤＯＣｄ）冷蔵庫（ＤＯＣｆ）→フロン（ＤＯＣｈ）→オゾン層
（ＤＯＣｇ）の各基本チェインがあり、かつ、文書ＤＯＣａと文書Ｄ
ＯＣｈとが同一種類の文書である場合には、As the newspaper article data, Freon (DOC)
a), carbon dioxide (DOCb), greenhouse effect (DOCc),
There are global warming (DOCd), refrigerator (DOCf), ozone layer (DOCg), and freon (DOCh). Carbon dioxide (DOCb) → greenhouse effect (DOCc) → warming (DOCd) freon (DOCa) → greenhouse effect (DOCc) ) → Global warming (DOCd) Refrigerator (DOCf) → Freon (DOCh) → Ozone layer (DOCg) There are basic chains, and document DOCa and document D are available.
If OCh is the same type of document,

【表４】のような第２の単語ネットワークが生成される。[Table 4] A second word network such as

【００６４】第２の単語ネットワークが図１に示すシス
テムに適用される場合には、単語集合構造化手段３は第
２の単語ネットワークを作成し、それを構造化単語集合
蓄積手段８に蓄積する。情報空間表示手段７は、利用者
の要求に応じて第２の単語ネットワークを可視表示す
る。すなわち、各単語についての連想関係が、第２の単
語ネットワークとして表示される。表示された第２の単
語ネットワークを発想支援システムにおける知識抽出支
援機能に適用することができる。利用者が関心のある概
念（単語）の関係は、提示された第２の単語ネットワー
クにおける注目する項目（単語）間のパスを探索するこ
とによって得られる。例えば、「冷蔵庫」と「温暖化」
との関係は、上記（１）のように提示された第２の単語
ネットワークを参照して、「冷蔵庫には冷媒としてフロ
ンが使用されていて、そのフロンが大気中に拡散するこ
とによって生ずる温室効果によって地球が温暖化する。
また、温室効果は炭酸ガスによっても引き起こされ
る。」といった知識を想起できる。さらに、情報空間表
示手段７は、第１の単語ネットワークと第２の単語ネッ
トワークとを結合したものを表示することもできる。When the second word network is applied to the system shown in FIG. 1, the word set structuring means 3 creates the second word network and stores it in the structured word set storage means 8. . The information space display means 7 visually displays the second word network in response to the user's request. That is, the association relation for each word is displayed as the second word network. The displayed second word network can be applied to the knowledge extraction support function in the idea generation support system. The relationship of concepts (words) that the user is interested in is obtained by searching a path between the items (words) of interest in the presented second word network. For example, "refrigerator" and "warming"
For the relationship with the above, refer to the second word network presented as in (1) above, and read, "A refrigerator that uses CFCs as a refrigerant and that CFCs diffuse into the atmosphere The effect warms the earth.
The greenhouse effect is also caused by carbon dioxide. Can be recalled. Furthermore, the information space display means 7 can also display a combination of the first word network and the second word network.

【００６５】次に図７のフローチャートを参照して第２
の単語ネットワークを作成する具体的な処理を説明す
る。まず、全ての文書の見出し部を単語に分割する（ス
テップＳＴ１１０）。得られた全単語のうちで相互に異
なっているものを抽出する。そして、抽出された各単語
をＫＥＹＷＯＲＤｉ（ｉ＝１〜ｐ）とする（ステップＳ
Ｔ１１１）。なお、ここではｐ個の異なる単語が抽出さ
れたとする。図６に示す例では、ｐ＝３である。すなわ
ち、ＫＥＹＷＯＲＤ１＝ＫＷ１，ＫＥＹＷＯＲＤ２＝Ｋ
Ｗ２，ＫＥＹＷＯＲＤ３＝ＫＷ３である。Next, referring to the flowchart of FIG.
A specific process for creating the word network will be described. First, the headline portion of all documents is divided into words (step ST110). Of the obtained words, those that are different from each other are extracted. Then, each extracted word is set as KEYWORDi (i = 1 to p) (step S
T111). It is assumed here that p different words are extracted. In the example shown in FIG. 6, p = 3. That is, KEYWORD1 = KW1, KEYWORD2 = K
W2, KEYWORD3 = KW3.

【００６６】次いで、ＫＥＹＷＯＲＤｉのそれぞれにつ
いて、見出し部にそれを含む文書を検索する。見いださ
れた文書をＩＤＭｉｊ（ｊ＝１〜ｑ）とする（ステップ
ＳＴ１１２）。なお、ここでは、ｑ個の文書が見いださ
れたとする。図６に示す例では、ｉ＝１についてｑ＝
１、ｉ＝２についてｑ＝１、ｉ＝３についてｑ＝１であ
る。すなわち、ＩＤＭ１１＝ＤＯＣ１，ＩＤＭ２１＝Ｄ
ＯＣ２，ＩＤＭ３１＝ＤＯＣ３である。Then, for each of the KEYWORDi, the document including it in the index section is searched. The found document is set as IDMij (j = 1 to q) (step ST112). Here, it is assumed that q documents are found. In the example shown in FIG. 6, for i = 1, q =
1, q = 1 for i = 2 and q = 1 for i = 3. That is, IDM11 = DOC1, IDM21 = D
OC2 and IDM31 = DOC3.

【００６７】さらに、ＫＥＹＷＯＲＤｉのそれぞれにつ
いて、本文中にそれを含む文書を検索する。見いだされ
た文書をＩＤＨｉｋ（ｋ＝１〜ｒ）とする（ステップＳ
Ｔ１１３）。なお、ここでは、ｒ個の文書が見いだされ
たとする。図６に示す例では、ｉ＝１についてｒ＝２、
ｉ＝２についてｒ＝１である。すなわち、ＩＤＨ１１＝
ＤＯＣ２，ＩＤＨ１２＝ＤＯＣ３，ＩＤＨ２１＝ＤＯＣ
３である。Further, for each of the KEYWORDi, the document including it in the text is searched. Let the found document be IDHik (k = 1 to r) (step S
T113). Here, it is assumed that r documents have been found. In the example shown in FIG. 6, r = 2 for i = 1,
For i = 2, r = 1. That is, IDH11 =
DOC2, IDH12 = DOC3, IDH21 = DOC
It is 3.

【００６８】さらに、ＫＥＹＷＯＲＤｉのそれぞれにつ
いて、ＩＤＨｉｂ＝ＩＤＭｊａを満たすｊを求める（ス
テップＳＴ１１４）。ここで、ａ，ｂは不定値である。
図６に示す例では、ＩＤＨ１１＝ＤＯＣ２，ＩＤＭ２１
＝ＤＯＣ２、およびＩＤＨ１２＝ＤＯＣ３，ＩＤＭ３１
＝ＤＯＣ３であるから、ｉ＝１については、ｊ＝２，３
である。ＩＤＨ２１＝ＤＯＣ３，ＩＤＭ３１＝ＤＯＣ３
であるから、ｉ＝２については、ｊ＝３である。Further, for each of KEYWORDi, j satisfying IDHib = IDMja is obtained (step ST114). Here, a and b are indefinite values.
In the example shown in FIG. 6, IDH11 = DOC2, IDM21
= DOC2, and IDH12 = DOC3, IDM31
= DOC3, for i = 1, j = 2,3
Is. IDH21 = DOC3, IDM31 = DOC3
Therefore, for i = 2, j = 3.

【００６９】そして、求められたｉ，ｊについて、ＩＤ
Ｈｉｄ＝ＩＤＭｋｃおよびＩＤＨｊｅ＝ＩＤＭｋｃを満
たすｋを求める（ステップＳＴ１１５）。ここで、ｄ，
ｃ，ｅは不定値である。図６に示す例では、ｉ＝１，ｊ
＝２の場合に、両式を満たすｋ＝３が存在し、ｉ＝１，
ｊ＝３の場合およびｉ＝２，ｊ＝３の場合には、両式を
満たすｋは存在しない。すなわち、ｉ＝１，ｊ＝２，ｋ
＝３の場合に、両式は満たされる。Then, for the obtained i, j, ID
A k that satisfies Hid = IDMkc and IDHje = IDMkc is obtained (step ST115). Where d,
c and e are indefinite values. In the example shown in FIG. 6, i = 1, j
= 2, there exists k = 3 that satisfies both expressions, and i = 1,
When j = 3 and when i = 2 and j = 3, there is no k satisfying both expressions. That is, i = 1, j = 2, k
= 3, both equations are satisfied.

【００７０】次いで、ＫＥＹＷＯＲＤｉ（＊）→ＫＥＹ
ＷＯＲＤｊ（ＩＤＭｊａ）→ＫＥＹＷＯＲＤｋ（ＩＤＭ
ｋｃ）を連想関係（概念連鎖）の基本チェイン（基本断
片）とする。図６に示す例では、ｉ＝１，ｊ＝２，ｋ＝
３であるから、結局、ＫＥＹＷＯＲＤ１（＊）→ＫＥＹＷＯＲＤ２（ＩＤＭ２
ａ）→ＫＥＹＷＯＲＤ３（ＩＤＭ３ｃ）すなわち、ＫＷ１（ＤＯＣ１）→ＫＷ２（ＤＯＣ３）→
ＫＷ３（ＤＯＣ３）の基本チェインが得られる。なお、
（＊）は、ＫＥＹＷＯＲＤｉに対応した文書を示す。Then, KEYWORDi (*) → KEY
WORDj (IDMja) → KEYWORDk (IDM
Let kc) be a basic chain (basic fragment) of an associative relationship (conceptual chain). In the example shown in FIG. 6, i = 1, j = 2, k =
Therefore, after all, KEYWORD1 (*) → KEYWORD2 (IDM2
a) → KEYWORD3 (IDM3c) That is, KW1 (DOC1) → KW2 (DOC3) →
The basic chain of KW3 (DOC3) is obtained. In addition,
(*) Indicates a document compatible with KEYWORDi.

【００７１】さらに、ステップＳＴ１１６で得られた各
基本チェインの間で、部分連鎖が一致しているものがあ
る場合には、それらを結合する（ステップＳＴ１１
７）。この処理は、全ての基本チェインについて実行さ
れる。Further, if there is a partial chain match among the basic chains obtained in step ST116, they are combined (step ST11).
7). This process is executed for all basic chains.

【００７２】図３、図４および図７に示された処理によ
って第１の単語ネットワークおよび第２の単語ネットワ
ークが得られるが、それらを、指定された単語をキーと
して統合し、単語集合構造化が完了する。図１に示すシ
ステムでは、単語集合構造化手段３が、利用者によって
指定された単語を中心として、第１の単語ネットワーク
および第２の単語ネットワークを統合する。図８は、単
語集合構造化方法による処理結果の一例を示す説明図で
ある。図１に示すシステムでは、単語集合構造化手段３
がそのような構造化を行い、情報空間表示手段７が図８
に示されたように可視表示する。Although the first word network and the second word network are obtained by the processing shown in FIGS. 3, 4 and 7, they are integrated using the designated word as a key to structure the word set. Is completed. In the system shown in FIG. 1, the word set structuring unit 3 integrates the first word network and the second word network centering on the word designated by the user. FIG. 8 is an explanatory diagram showing an example of the processing result by the word set structuring method. In the system shown in FIG. 1, the word set structuring means 3
Performs such structuring, and the information space display means 7 is shown in FIG.
Visually display as shown in.

【００７３】図８に示された例は、「温室効果」という
単語が指定された場合に、点線で示された第１の単語ネ
ットワークのリンク、各直近上位語／下位語、実線で示
された第２の単語ネットワークのリンク、および連想関
係のある各単語が示された例である。利用者は、「温室
効果」に対して、「フロン」、「炭酸ガス」、「温暖
化」などの第２の単語ネットワークにおける関連単語集
合、および「温室」、「効果」などの第１の単語ネット
ワークにおける関連単語集合が存在することがわかる。
このように、ある単語を指定したときに、その単語に関
連する単語集合を容易に認識することができる。In the example shown in FIG. 8, when the word "greenhouse effect" is designated, the links of the first word network indicated by dotted lines, the most recent upper / lower words, and the solid line are shown. It is an example in which links of the second word network and each word having an associative relationship are shown. For the "greenhouse effect", the user selects the related word set in the second word network such as "CFC", "carbon dioxide", "warming", and the first word such as "greenhouse", "effect". It can be seen that there are related word sets in the word network.
Thus, when a certain word is designated, the word set related to the word can be easily recognized.

【００７４】なお、この実施例では、各単語は１つの意
味しか有していないものとして説明したので、多義語に
ついては、それぞれの語義ごと別単語が存在していると
して扱われる。In this embodiment, each word has been described as having only one meaning, so polysemous words are treated as if different words exist for each meaning.

【００７５】実施例３．図９は、この発明の第３の実施
例による文書集合構造化方法を示すフローチャートであ
る。この文書集合構造化方法は、図１に示すシステムに
おける文書集合構造化手段６において適用可能である。
もちろん、この文書集合構造化方法は、他のシステムに
も適用できる。この方法は、複数の文書を、いくつかの
グループにグループ分けし、かつ、各グループの内容を
代表する単語を求める方法である。Example 3. FIG. 9 is a flowchart showing a document set structuring method according to the third embodiment of the present invention. This document set structuring method can be applied to the document set structuring means 6 in the system shown in FIG.
Of course, this document set structuring method can be applied to other systems. This method is a method in which a plurality of documents are divided into several groups and a word representative of the content of each group is obtained.

【００７６】ここで、文書ｉに含まれる単語ｊの重みＶ
_ij、すなわち文書ｉにおける単語ｊの重要度を以下のよ
うに定義する。Ｖ_ij＝Ｆ_ij×log （Ｎ／Ｎ_j ）×log （Ｌ_j ）・・・・（２）Ｎは文書集合における文書の数、Ｎ_j は単語ｊを含む文
書の数、Ｌ_j は単語ｊを構成する文字数、Ｆ_ijは文書ｉ
において単語ｊが出現する頻度である。式（２）は以下
のことを表している。Here, the weight V of the word j included in the document i
_ij , that is, the importance of the word j in the document i is defined as follows. V _ij = F _ij × log (N / N _j ) × log (L _j ) ... (2) N is the number of documents in the document set, N _j is the number of documents containing the word j, and L _j is the word The number of characters constituting j, F _ij is the document i
Is the frequency with which the word j appears. Expression (2) represents the following.

【００７７】文書に出現する回数が多いほどその単語は
重要と考えられ、また、その単語が出現する文書数が少
ないほど分類という観点から見ると重要と考えられる。
よって、そのような場合には、単語ｊの重みＶ_ijは大き
くなる。単語ｊが文書集合における全ての文書に出現す
る場合には、Ｎ_j ＝Ｎとなってlog （Ｎ／Ｎ_j ）＝０と
なる。従って、単語ｊの重みＶ_ijも０となる。すなわ
ち、全ての文書に出現する単語は、それによって文書を
区別することができないので、文書の分類という観点か
ら見ると重要度は０であることを示している。また、lo
g （Ｌ_j ）は、単語を構成する文字数が多いほど重要で
あるということを示すファクタである。例えば、”
山”、”川”といった１文字で表される単語は非常にあ
りふれた単語であることが多く、その重要度は低いと考
えられる。その単語を構成する文字数が多いほど多くの
概念を包含した複合名詞である場合が多い。複合名詞は
文書の分類という観点から見ると重要であるから、その
重みＶ_ijは大きい。It is considered that the word is more important as the number of times it appears in a document is larger, and more important from the viewpoint of classification as the number of documents in which the word appears is smaller.
Therefore, in such a case, the weight V _{ij of the} word j becomes large. When the word j appears in all the documents in the document set, N _j = N and log (N / N _j ) = 0. Therefore, the weight V _{ij of the} word j is also 0. That is, the words appearing in all the documents cannot be distinguished from each other by the words, and therefore the importance is 0 from the viewpoint of document classification. Also, lo
g (L _j ) is a factor indicating that the larger the number of characters that make up a word, the more important it is. For example, "
Words represented by one letter such as "mountain" and "river" are very common words, and their importance is considered to be low. The more letters that make up the word, the more concepts are included. The compound noun is often a compound noun, which is important from the viewpoint of document classification, and therefore has a large weight V _ij .

【００７８】次に動作について図９のフローチャートを
参照して説明する。まず、それぞれの文書を、式（２）
に従って、それに含まれる各単語の重みを要素とするベ
クトルに変換する（ステップＳＴ１４９）。従って、各
文書に対応したそれぞれのベクトルが得られる。次い
で、得たいクラスタの数を変数Ｋに設定する（ステップ
ＳＴ１５０）。図１に示すシステムにおける文書集合構
造化手段６がこの方法を実行する場合には、利用者は得
たいクラスタの数を入力する。そして、全てのベクトル
の重心を求め、そこから最も遠いベクトルをクラスタ中
心とする（ステップＳＴ１５１）。Next, the operation will be described with reference to the flowchart of FIG. First, each document is given by equation (2)
In accordance with the above, the weight of each word contained in it is converted into a vector having elements (step ST149). Therefore, each vector corresponding to each document is obtained. Next, the number of clusters to be obtained is set in the variable K (step ST150). When the document set structuring means 6 in the system shown in FIG. 1 executes this method, the user inputs the number of clusters to be obtained. Then, the centroids of all the vectors are obtained, and the vector farthest from that is set as the cluster center (step ST151).

【００７９】そして、ステップＳＴ１５２，ＳＴ１５３
の処理が、Ｋ個のクラスタが得られるまで繰り返され
る。ステップＳＴ１５２において、各ベクトルについて
最も近いクラスタ中心との距離を求め、それらの距離の
うち最大の値に対応したベクトルを、さらにクラスタ中
心とする。Ｋ個のクラスタ中心が求まると、すなわち、
Ｋ個のクラスタが求まると、クラスタ分類処理に移行す
る（ステップＳＴ１５３）。Then, steps ST152 and ST153
The above process is repeated until K clusters are obtained. In step ST152, the distance to the closest cluster center is obtained for each vector, and the vector corresponding to the maximum value of those distances is further set as the cluster center. When K cluster centers are obtained, that is,
When K clusters are obtained, the process proceeds to cluster classification processing (step ST153).

【００８０】クラスタ分類処理において、まず、各ベク
トルを、最も近いクラスタ中心のクラスタに分類する
（ステップＳＴ１５４）。そして、各クラスタにおい
て、クラスタに含まれるベクトルの平均を新たなクラス
タ中心とする（ステップＳＴ１５５）。ステップＳＴ１
５５の処理によって変化したクラスタ中心があった場合
には、再度ステップＳＴ１５４の処理を実行する（ステ
ップＳＴ１５６）。クラスタ中心が変化しなくなった場
合には、各クラスタにおける代表的な単語を求める処理
を実行する。すなわち、各クラスタにおいて、クラスタ
中心のベクトルの要素から、値の大きいものをＬ個抽出
する。そして、抽出されたＬ個の要素に対応した各単語
を、そのクラスタの内容を表す代表的な単語とする（ス
テップＳＴ１５７）。In the cluster classification process, first, each vector is classified into the closest cluster center cluster (step ST154). Then, in each cluster, the average of the vectors included in the cluster is set as a new cluster center (step ST155). Step ST1
When there is a cluster center changed by the process of 55, the process of step ST154 is executed again (step ST156). When the cluster center does not change, the process of finding a representative word in each cluster is executed. That is, in each cluster, L elements having large values are extracted from the elements of the cluster-centered vector. Then, each word corresponding to the extracted L elements is set as a representative word representing the content of the cluster (step ST157).

【００８１】以上のようにして、各文書がＫ個のクラス
タに分類され、各クラスタにおいて、Ｌ個の代表的な単
語が設定されたことになる。つまり、文書集合が構造化
されたことになる。なお、ステップＳＴ１５７の処理に
おいて、クラスタ中心のベクトルの要素の値があるしき
い値よりも小さい場合には、その要素はクラスタの特徴
を表していないと考えられるので、代表的な単語から除
外してもよい。As described above, each document is classified into K clusters, and L representative words are set in each cluster. That is, the document set is structured. In the process of step ST157, when the value of the element of the cluster-centered vector is smaller than a certain threshold, it is considered that the element does not represent the feature of the cluster, and therefore it is excluded from the representative words. May be.

【００８２】以下、上記の方法によって得られたクラス
タおよび代表的な単語の例を示す。（ａ）経済関連の新聞記事データを対象として、「テレ
ビ」という単語を含む各記事を文書集合とし、その文書
集合を自動分類した例。クラスタ番号代表的な単語１指数２サービス，ＪＲ西日本，山陽新幹線３ＮＨＫ，衛星放送，開発 … … … … … … … … ６ＣＭ，コマーシャル，消費者７粗大ごみ，手数料，有料化８エネルギー，２０１０年度，省エネルギー９フランス，公告，たばこ … … … … １１モールメイン，外国人，政府１２自動車，輸出，半導体 … … … …Examples of clusters and typical words obtained by the above method will be shown below. (A) An example in which each article including the word “TV” is set as a document set for economic-related newspaper article data and the document set is automatically classified. Cluster number Representative word 1 Index 2 service, JR West, Sanyo Shinkansen 3 NHK, satellite broadcasting, development ………………………………… 6 CM, commercial, consumer 7 Oversized garbage, fee, fee 8 Energy, 2010 Fiscal year, energy conservation 9 France, public notice, tobacco ………………… 11 Mall main, foreigner, government 12 Automobile, export, semiconductor ……………

【００８３】（ｂ）経済関連の新聞記事データを対象と
して、「公定歩合」という単語を含む各記事を文書集合
とし、その文書集合を自動分類した例。クラスタ番号代表的な単語１定額貯金，高金利，金利２蔵相，原油価格，マルク３金融引き締め，借り入れ，製造業４長期プライムレート，貸出金利，引き下げ５マネーサプライ，伸び率，貸し出し６聖徳太子，高額紙幣，大蔵省７概算要求，原油価格，自民党８国債，ＳＡＭＡ，投資 … … … … 上記（ｂ）におけるクラスタ番号’４’に分類された各
文書の見出しは、以下のようになっていた。(B) An example in which each article including the word "official discount rate" is set as a document set for economic-related newspaper article data and the document set is automatically classified. Cluster number Typical words 1 Fixed amount savings, High interest rate, Interest rate 2 Finance minister, Crude oil price, Marc 3 Monetary tightening, Borrowing, Manufacturing 4 Long-term prime rate, Lending interest rate, Reduction 5 Money supply, Growth rate, Lending 6 Shotoku Taishi, High-value banknotes, Ministry of Finance 7 rough estimate request, crude oil price, LDP 8 government bond, SAMA, investment …………………………………………………………………………………………………

【００８４】（ｃ）１個人資産運用、預貯……… ２金利「上限法」廃止……… ３長信銀各行、長プラ……… ４都銀、今度は大幅減……… ５長プラ０．３％上げ……… なお、この実施例ではクラスタ中心を初めから決めてい
く場合について説明したが、既に文書集合の構造化がな
された後に文書を追加していくときには、ステップＳＴ
１５４から処理を開始すればよい。そのようにして、文
書を順に追加して全文書について構造化することができ
る。(C) 1 Personal asset management, deposit and savings ..... 2 Abolition of interest rate “cap method” ..... 3 Banks of Naganoshin banks, long plastics ........ 4 Metropolitan bank, this time drastically reduced .... 5 Long plastics 0.3% In addition, although the case where the cluster center is determined from the beginning has been described in this embodiment, when adding a document after the document set has already been structured, step ST
The process may be started from 154. In that way, documents can be added in sequence and structured for all documents.

【００８５】実施例４．図１０および図１１はこの発明
の第４の実施例による発想支援システムの動作を示すフ
ローチャートである。このシステムの構成は、図１に示
したとおりである。Example 4. 10 and 11 are flowcharts showing the operation of the idea generation support system according to the fourth embodiment of the present invention. The configuration of this system is as shown in FIG.

【００８６】次に動作について説明する。情報蓄積手段
１は、入力手段（図示せず）から入力された複数の文書
（テキスト）を含む文書情報を蓄積する（ステップＳＴ
４００）。辞書生成手段２は、第１の実施例で説明され
た方法を実行するものであり、そのような方法を用いて
辞書を生成する（ステップＳＴ４１０）。Next, the operation will be described. The information storage unit 1 stores document information including a plurality of documents (texts) input from an input unit (not shown) (step ST).
400). The dictionary generating means 2 executes the method described in the first embodiment, and generates a dictionary using such a method (step ST410).

【００８７】単語集合構造化手段３は、第１の実施例で
説明された方法を実行するものであり、そのような方法
を用いて、辞書生成手段２が生成した辞書中の各単語の
構造化処理を行う（ステップＳＴ４１２）。処理結果
は、例えば、既に説明したような第１の単語ネットワー
クおよび第２の単語ネットワークであり、単語集合構造
化手段３は、それらを、構造化単語集合蓄積手段８に格
納する（ステップＳＴ４１４）。また、文書集合構造化
手段６は、情報蓄積手段１内の各文書を対象として、文
書構造化の処理を行う（ステップＳＴ４２０）。構造化
処理は、第３の実施例で説明されたように実行される。
文書集合構造化手段６は、処理結果を構造化文書集合蓄
積手段９に格納する（ステップＳＴ４２２）。The word set structuring means 3 executes the method described in the first embodiment, and the structure of each word in the dictionary generated by the dictionary generating means 2 using such a method. Conversion processing is performed (step ST412). The processing result is, for example, the first word network and the second word network as described above, and the word set structuring unit 3 stores them in the structured word set storage unit 8 (step ST414). . Further, the document set structuring unit 6 performs a document structuring process for each document in the information storage unit 1 (step ST420). The structuring process is performed as described in the third embodiment.
The document set structuring unit 6 stores the processing result in the structured document set storage unit 9 (step ST422).

【００８８】利用者が選択文書を示す情報を入力する
と、情報選択手段４は、選択された各文書を情報蓄積手
段１から選択する処理を行う（ステップＳＴ４３０）。
選択に際して、利用者はキーワードを用いることができ
る。情報選択手段４は、入力された各キーワードの論理
和や論理積によって情報蓄積手段１内の文書集合を検索
し、該当する各文書を選択することができる。あるい
は、単語のベクトルによるベクトルサーチによって各文
書を選択することができる。選択された各文書は、選択
情報蓄積手段５に格納される（ステップＳＴ４３２）。
文書集合構造化手段６は、選択情報蓄積手段５内の各文
書を対象として文書集合構造化処理を行い（ステップＳ
Ｔ４３４）、処理結果を構造化選択文書集合蓄積手段１
０に格納する（ステップＳＴ４３６）。When the user inputs the information indicating the selected document, the information selecting means 4 performs a process of selecting each selected document from the information storing means 1 (step ST430).
When selecting, the user can use keywords. The information selection unit 4 can search the document set in the information storage unit 1 by the logical sum or logical product of the input keywords and select the corresponding documents. Alternatively, each document can be selected by vector search with a vector of words. Each selected document is stored in the selection information storage means 5 (step ST432).
The document set structuring unit 6 performs a document set structuring process for each document in the selection information storage unit 5 (step S
T434), the processing result is stored as a structured selection document set storage unit 1.
It is stored in 0 (step ST436).

【００８９】情報空間表示手段７は、構造化単語集合蓄
積手段８、構造化文書集合蓄積手段９および構造化選択
文書集合蓄積手段１０の各内容を読み出し、利用者が指
定した単語をキーとしてそれらを対応づけて表示する
（ステップＳＴ４３８）。例えば、図１２の上段および
中段に示すような表示がなされる。The information space displaying means 7 reads out the contents of the structured word set accumulating means 8, the structured document set accumulating means 9 and the structured selected document set accumulating means 10 and uses them as a key with the word specified by the user as a key. Are displayed in association with each other (step ST438). For example, the displays shown in the upper and middle rows of FIG. 12 are displayed.

【００９０】利用者は、システムに対して、入力手段
（図示せず）を介して種々の処理要求ができる（ステッ
プＳＴ４４０）。利用者が新たな「キーワード検索」を
要求した場合には、システムは、入力されたキーワード
をもとにステップＳＴ４３０以下の処理を再実行する。
すなわち、新たなキーワードについて文書検索を行い、
検索された各文書を構造化し、それを表示する。The user can make various processing requests to the system via the input means (not shown) (step ST440). When the user requests a new “keyword search”, the system re-executes the processing from step ST430 onward based on the input keyword.
That is, do a document search for a new keyword,
Structures each document retrieved and displays it.

【００９１】利用者が「クラスタ内容表示」を要求した
場合には、情報空間表示手段７は、利用者が指定したク
ラスタに属する文書ＩＤと見出しとを構造化文書集合蓄
積手段９および構造化選択文書集合蓄積手段１０から読
み出して、それらを表示する（ステップＳＴ４４２）。
利用者が文書ＩＤとともに「本文表示」を要求した場合
には、情報空間表示手段７は、情報蓄積手段１から該当
文書を読み出してその内容を表示する（ステップＳＴ４
４４）。When the user requests "display cluster contents", the information space display means 7 outputs the document IDs and headings belonging to the cluster designated by the user as the structured document set storage means 9 and the structured selection. It is read from the document set accumulating means 10 and displayed (step ST442).
When the user requests "display body" together with the document ID, the information space display means 7 reads the relevant document from the information storage means 1 and displays the contents (step ST4).
44).

【００９２】利用者が「関連表示」を要求した場合に
は、情報空間表示手段７は、利用者が指定した要素に関
連する要素を強調表示する（ステップＳＴ４４６）。図
１２は、利用者が要素としてｗｏｒｄ−ｉｌという単語
を入力した場合の関連表示の例を示すものである。情報
空間表示手段７は、構造化単語集合蓄積手段８に蓄積さ
れている単語ネットワークから、ｗｏｒｄ−ｉｌとその
単語の関連性を得る。そして、例えば、ｗｏｒｄ−ｉｌ
の直近上位語および直近下位語等を、強調表示する。利
用者は、提示された画面から、ｗｏｒｄ−ｉｌに関連す
る単語として、直近語のｗｏｒｄ−ａ、ｗｏｒｄ−ｂお
よびｗｏｒｄ−ｃがあり、さらに、概念的にｗｏｒｄ−
ｉｌの上位にある単語として、ｗｏｒｄ−ｄ、ｗｏｒｄ
−ｅなどがあることが容易にわかる。また、それらの単
語の意味に近い意味を持つ単語として、ｗｏｒｄ−ｆ、
ｗｏｒｄ−ｇ、ｗｏｒｄ−ｈ、ｗｏｒｄ−ｊ、ｗｏｒｄ
−ｋ、ｗｏｒｄ−ｐ、ｗｏｒｄ−ｍ、ｗｏｒｄ−ｎなど
があることが容易にわかる。When the user requests the "related display", the information space display means 7 highlights the element related to the element designated by the user (step ST446). FIG. 12 shows an example of a related display when the user inputs the word word-il as an element. The information space display means 7 obtains the relation between the word-il and the word from the word network stored in the structured word set storage means 8. And, for example, word-il
The most recent upper word and the most recent lower word of are highlighted. From the presented screen, the user has the latest words, word-a, word-b, and word-c, as words related to word-il, and conceptually word-
Words higher than il are word-d, word
It is easy to see that there is -e. Also, as words having meanings close to those words, word-f,
word-g, word-h, word-j, word
It is easily understood that there are -k, word-p, word-m, word-n, and the like.

【００９３】利用者は、その関係と構造化文書の表示内
容とから、ｗｏｒｄ−ｉｌを代表的な単語として持つク
ラスタはクラスタ番号ｉのクラスタであることがわか
る。そこで、クラスタ番号ｉの「クラスタの内容表示」
を要求すれば、そのクラスタに属する文書ＩＤと見出し
とが表示されるので、図１２に示すような表示内容か
ら、クラスタ番号ｉのクラスタには、ｄｏｃ−ａ１、ｄ
ｏｃ−ａ２、…、ｄｏｃ−ａｉ、…、ｄｏｃ−ａｎの文
書が含まれることが容易にわかる。さらに、その中でｗ
ｏｒｄ−ｉｌを見出しに含むものはｄｏｃ−ａｉである
こともわかる。そこで、必要ならば、利用者は「本文表
示」を要求することにより内容を参照することができ
る。このように、利用者は、指定した単語であるｗｏｒ
ｄ−ｉｌと各文書との関係を容易に知ることができる。From the relationship and the display contents of the structured document, the user can understand that the cluster having word-il as a representative word is the cluster of cluster number i. Therefore, "Display cluster contents" for cluster number i
Request, the document IDs and headings belonging to that cluster are displayed. Therefore, from the display contents as shown in FIG.
It is easily understood that the documents oc-a2, ..., Doc-ai, ..., Doc-an are included. In addition, w
It can also be seen that the one including ord-il in the heading is doc-ai. Therefore, if necessary, the user can refer to the contents by requesting "text display". In this way, the user uses the specified word
The relationship between the d-il and each document can be easily known.

【００９４】そして、利用者が「終了」の要求を行う
と、システムの処理は終了する。なお、情報蓄積手段１
は、外部から情報が設定される１つの装置であるかのよ
うに説明したが、コンピュータネットワーク上に分散し
ている複数の蓄積手段から構成されるものであってもよ
い。また、システムの終了時に選択情報蓄積手段５およ
び構造化文書集合蓄積手段９の内容を保存してもよい。
その場合には、システムは、再起動時にそれらを読み出
して、ステップＳＴ４３８から処理を再開することがで
きる。When the user makes a request for "end", the processing of the system ends. The information storage means 1
Has been described as if it were one device to which information is set from the outside, but it may be composed of a plurality of storage means distributed over a computer network. Further, the contents of the selection information storage means 5 and the structured document set storage means 9 may be saved when the system ends.
In that case, the system can read them at the time of restart and restart the process from step ST438.

【００９５】次に、発想支援システムの具体的な使用例
について説明する。例えば、単語集合構造化手段３が以
下のような第２の単語ネットワークを生成できることは
既に説明した。Next, a specific example of using the idea generation support system will be described. For example, it has already been explained that the word set structuring means 3 can generate the following second word network.

【表５】 [Table 5]

【００９６】この第２の単語ネットワークは、情報空間
表示手段７によって可視表示される。利用者は、関心の
ある概念の関係を、第２の単語ネットワークにおける項
目間のパスを探索することによって得ることができる。
例えば、「冷蔵庫」と「温暖化」との関係として、「冷
蔵庫には冷媒としてフロンが使用されていて、そのフロ
ンが大気中に拡散することによって生ずる温室効果によ
って地球が温暖化する。また、温室効果は炭酸ガスによ
っても引き起こされる。」といった知識を得ることがで
きる。This second word network is visually displayed by the information space display means 7. The user can obtain the relation of the concept of interest by searching the path between items in the second word network.
For example, regarding the relationship between “refrigerator” and “warming”, “a refrigerator uses CFCs as a refrigerant, and the greenhouse effect caused by diffusion of the CFCs into the atmosphere causes the earth to warm. The greenhouse effect is also caused by carbon dioxide. "

【００９７】また、既に説明した以下のような構造化さ
れた文書集合から、利用者は、興味のある情報にアクセ
スすることができる。（ａ）経済関連の新聞記事データを対象として、「テレ
ビ」という単語を含む各記事を文書集合とし、その文書
集合を自動分類した例。クラスタ番号代表的な単語１指数２サービス，ＪＲ西日本，山陽新幹線３ＮＨＫ，衛星放送，開発 … … … … … … … … ６ＣＭ，コマーシャル，消費者７粗大ごみ，手数料，有料化８エネルギー，２０１０年度，省エネルギー９フランス，公告，たばこ … … … … １１モールメイン，外国人，政府１２自動車，輸出，半導体 … … … …The user can access the information of interest from the structured document set as described above. (A) An example in which each article including the word “TV” is set as a document set for economic-related newspaper article data and the document set is automatically classified. Cluster number Representative word 1 Index 2 service, JR West, Sanyo Shinkansen 3 NHK, satellite broadcasting, development ………………………………… 6 CM, commercial, consumer 7 Oversized trash, fee, charge 8 Energy, 2010 Fiscal year, energy conservation 9 France, public notice, tobacco ………………… 11 Mall main, foreigner, government 12 Automobile, export, semiconductor ……………

【００９８】この構造化された文書集合は、利用者が
「テレビ」というキーワードで「キーワード検索」を要
求した結果、情報選択手段４が情報蓄積手段１から該当
する各文書を抽出し（ステップＳＴ４３０，ＳＴ４３
２）、文書集合構造化手段６が構造化処理を行い、情報
空間表示手段７が表示したものである。In the structured document set, as a result of the user requesting "keyword search" with the keyword "TV", the information selecting means 4 extracts each relevant document from the information storing means 1 (step ST430). , ST43
2), the document set structuring means 6 performs the structuring process and the information space display means 7 displays it.

【００９９】「テレビ」という漠然としたキーワードを
もとに情報を得た利用者は、この自動分類結果とクラス
タの代表的な単語を見て、以下のような「テレビ」に関
する情報の切り口を得ることができる。（１）クラスタ番号３のクラスタからの、ＮＨＫ、衛星
放送といった放送事業に関する観点。（２）クラスタ番号６のクラスタからの、ＣＭ、コマー
シャルといった商用テレビジョン放送の観点。（３）クラスタ番号７のクラスタからの、粗大ごみとい
ったテレビジョン受像器の処分の観点。（４）クラスタ番号１２のクラスタからの、自動車、輸
出、半導体といった商品としてのテレビジョン受像器の
観点。The user who has obtained information based on the vague keyword "TV" sees the automatic classification result and the representative words of the clusters, and obtains the following information about "TV". be able to. (1) From the viewpoint of the broadcasting business such as NHK and satellite broadcasting from the cluster of cluster number 3. (2) From the viewpoint of commercial television broadcasting such as commercials and commercials from the cluster with cluster number 6. (3) From the viewpoint of disposal of the television receiver such as oversized waste from the cluster of cluster number 7. (4) Viewpoint of a television receiver as a commodity such as an automobile, an export, and a semiconductor from the cluster of cluster number 12.

【０１００】粗大ごみといったテレビジョン受像器の処
分というマイナスの観点は通常忘れがちであるが、以上
のように、大量の発想の題材を対象とすることにより、
利用者が忘れている可能性のある観点をも提供すること
ができる。すなわち、このシステムは、利用者がシステ
ムの処理結果を見ながら興味のある情報を探索的にアク
セスできる効果的な発想支援システムとなっている。It is easy to forget the negative point of disposing of television receivers such as oversized trash, but as described above, by targeting a large amount of ideas,
It can also provide perspectives that the user may have forgotten. In other words, this system is an effective idea generation support system that allows the user to search and access information of interest while viewing the processing result of the system.

【０１０１】また、既に説明した（ｂ），（ｃ）のよう
な構造化された文書集合から、利用者は、興味のある情
報にアクセスすることができる。（ｂ）は、経済関連の
新聞記事データを対象として、「公定歩合」という単語
を含む各記事を文書集合とし、その文書集合を自動分類
した例である。（ｃ）は、（ｂ）の分類におけるクラス
タ番号４のクラスタ中の文書の見出しである。クラスタ
の代表的な単語から予想されるように、長期金利、貸出
金利に関連する文書が集まっていることがわかる。Further, the user can access the information of interest from the structured document set as described above in (b) and (c). (B) is an example in which each article including the word "official discount rate" is set as a document set for economic-related newspaper article data and the document set is automatically classified. (C) is the heading of the document in the cluster of cluster number 4 in the classification of (b). As can be expected from the typical words of the cluster, it can be seen that documents related to long-term interest rates and lending rates are gathered.

【０１０２】以上のように、この実施例によれば、情報
蓄積手段１に蓄積されている文書集合を構造化した構造
化文書集合と、情報選択手段４によって選択された選択
情報を構造化した構造化選択文書集合との対応づけをと
ることができるので、利用者は、情報空間の全体を把握
しながら情報探索ができ、利用者が主導権を持って情報
空間を探索していくことが可能である。さらに、利用者
が情報空間を自由に見回ることができるので、初期の意
図とは異なるものを利用者が発見できる場合もある。As described above, according to this embodiment, the structured document set obtained by structuring the document set accumulated in the information accumulating means 1 and the selection information selected by the information selecting means 4 are structured. Since it can be associated with the structured selection document set, the user can search information while grasping the entire information space, and the user can take the initiative to search the information space. It is possible. Further, since the user can freely look around the information space, the user may be able to find something different from the initial intention.

【０１０３】[0103]

【発明の効果】以上のように、請求項１記載の発明によ
れば、辞書生成方法を、テキスト中に出現する文字列の
中から、その文字列よりも出現頻度が低くなく、かつ、
その文字列を含みその文字列よりも長い文字列が存在し
ないという条件を満たす文字列を抽出して辞書を生成す
るように構成したので、与えられた各テキストに即し
て、通常の辞書にない文字列を自動的に抽出でき、与え
られた各テキストに対してより適切な辞書を提供できる
ものが得られる効果がある。As described above, according to the first aspect of the present invention, the dictionary generation method uses the character string appearing in the text, the appearance frequency of which is not lower than that character string, and
Since it is configured to generate a dictionary by extracting a character string that satisfies the condition that there is no character string that contains that character string and is longer than that character string, a normal dictionary is created according to each given text. There is an effect that a character string that does not exist can be automatically extracted and a more appropriate dictionary can be provided for each given text.

【０１０４】請求項２記載の発明によれば、単語集合構
造化方法を、単語と直近下位語および直近上位語とを関
連づけ、それらの関連のうち所定の条件に適合する関連
を除外した上で単語ネットワークを生成するように構成
したので、ある単語の上位／下位語を自動的に得て、情
報検索における適切な検索語を見つける環境を提供で
き、検索の再現率や精度を向上させることができるもの
が得られる効果がある。According to the second aspect of the present invention, the word set structuring method relates the words to the latest low order word and the latest high order word, and excludes the relationships that meet a predetermined condition from those relationships. Since it is configured to generate a word network, it is possible to automatically obtain the upper / lower words of a certain word and provide an environment for finding an appropriate search word in information retrieval, and improve the recall and accuracy of the search. There is an effect that you can get what you can.

【０１０５】請求項３記載の発明によれば、単語集合構
造化方法を、見出しと本文とを有する文書の集合におい
て、テキスト間の共起関係をもとに３つの単語によって
基本チェインを生成し、生成された各基本チェインのう
ち共通部分を持つ基本チェインを結合して単語ネットワ
ークを生成するように構成したので、ある単語に関連す
る単語集合を容易に理解できる環境を提供できるものが
得られる効果がある。この方法が発想支援システムに適
用された場合には、利用者が関心のある概念の関係を容
易に理解できる効果がある。According to the third aspect of the present invention, a word set structuring method is used to generate a basic chain by three words based on a co-occurrence relation between texts in a set of documents having a headline and a body. , It is configured to combine the basic chains that have a common part among the generated basic chains to generate the word network, so that it is possible to provide an environment that can easily understand the word set related to a certain word. effective. When this method is applied to the idea generation support system, there is an effect that the user can easily understand the relation of the concept of interest.

【０１０６】請求項４記載の発明によれば、単語集合構
造化方法を、指定された単語をキーとして第１の単語ネ
ットワークと第２の単語ネットワークとを統合するよう
に構成したので、ある単語に対して、その単語に関連す
る単語集合が容易に理解されうる環境を提供できるもの
が得られる効果がある。According to the fourth aspect of the present invention, the word set structuring method is configured to integrate the first word network and the second word network with the designated word as a key. On the other hand, there is an effect that it is possible to provide an environment in which a word set related to the word can be easily understood.

【０１０７】請求項５記載の発明によれば、文書集合構
造化方法を、それぞれが複数のテキストを含み、各テキ
ストに対応した単語ベクトル間の距離を用いて離散的に
設定された複数のクラスタを設定し、各クラスタを代表
する単語を求めるように構成したので、内容に応じてテ
キスト集合を自動的に分類し各分類を代表する単語が得
られる効果がある。この方法が発想支援システムに適用
された場合には、利用者があるテーマについて気づかな
かった属性も見いだすことができる環境を提供できるも
のが得られる効果がある。According to the fifth aspect of the invention, the document set structuring method includes a plurality of clusters, each of which includes a plurality of texts and which are discretely set by using the distance between word vectors corresponding to the respective texts. Since it is configured to obtain a word representative of each cluster, there is an effect that a text set is automatically classified according to the content and a word representative of each classification is obtained. When this method is applied to the idea generation support system, there is an effect that it is possible to provide an environment in which the user can find an attribute that the user did not notice about a theme.

【０１０８】請求項６記載の発明によれば、文書集合構
造化方法を、全単語ベクトルの重心から最も遠いベクト
ルを第１番目のクラスタの中心とし、各単語ベクトルに
ついて最も近いクラスタ中心との距離を求め距離が最大
となる単語ベクトルをクラスタ中心として追加し、各単
語ベクトルを最も近いクラスタ中心のクラスタに分類す
るとともに各クラスタに含まれる単語ベクトルの平均を
新たなクラスタ中心とする動作をクラスタ中心が変化し
なくなるまで継続するように構成したので、各クラスタ
が文書集合空間内で効果的に離散配置されるものが得ら
れる効果がある。According to the sixth aspect of the present invention, in the document set structuring method, the vector farthest from the center of gravity of all word vectors is set as the center of the first cluster, and the distance between each word vector and the closest cluster center is set. Then, the word vector with the maximum distance is added as the cluster center, each word vector is classified into the cluster with the closest cluster center, and the average of the word vectors included in each cluster is used as the new cluster center. Since it is configured to continue until does not change, there is an effect that each cluster is effectively arranged discretely in the document set space.

【０１０９】請求項７記載の発明によれば、文書集合構
造化方法を、クラスタ中の各単語ベクトルの要素から所
定個の値の大きいものを抽出し、抽出された各単語ベク
トルに対応した単語をそのクラスタを代表する単語と定
義するように構成したので、クラスタを構成する各単語
ベクトルに対応した文書の中のより重要度の高い単語
を、クラスタを代表する単語として選定できるものが得
られる効果がある。According to the seventh aspect of the invention, the document set structuring method extracts a predetermined number of large values from the elements of each word vector in the cluster, and extracts words corresponding to each extracted word vector. Is defined as a word representing the cluster, so that a word with higher importance in the document corresponding to each word vector forming the cluster can be selected as the word representing the cluster. effective.

【０１１０】請求項８記載の発明によれば、発想支援シ
ステムを、単語集合構造化手段の処理結果および文書集
合構造化手段のそれぞれの処理結果を表示するように構
成したので、関連性のつけられた単語集合および構造化
されたテキスト集合からなる情報空間を可視化でき、利
用者が単語ネットワークによって連想関係の糸口を抽出
できるものが得られる効果がある。According to the invention described in claim 8, since the idea generation support system is configured to display the processing result of the word set structuring means and the processing result of each of the document set structuring means, it is possible to correlate them. There is an effect that an information space consisting of a set of words and a structured set of texts can be visualized, and a user can extract a clue of an associative relationship by a word network.

【０１１１】請求項９記載の発明によれば、発想支援シ
ステムを、検索のキーワードにもとづいて各テキストを
選択し、選択された各テキストについて文書集合構造化
を行うように構成したので、利用者が知識の糸口から発
想することを情報検索によって確かめられるものが得ら
れる効果がある。According to the invention described in claim 9, since the idea generation support system is configured to select each text based on the keyword of the search and to structure the document set for each selected text, the user There is an effect that what can be confirmed by information retrieval that a person thinks from the clue of knowledge can be obtained.

【０１１２】そして、請求項１０記載の発明によれば、
発想支援システムを、関連表示の要求があると、構造化
された単語集合、構造化されたテキスト集合および構造
化された選択テキスト集合についての各表示画面上で、
関連表示の要求に対応した要素を強調表示するように構
成したので、利用者が指定した単語と他の単語との関係
および利用者が指定した単語と各テキストとの関係を利
用者が容易に理解できるものが得られる効果がある。According to the invention of claim 10,
The idea generation support system, when there is a request for a related display, on each display screen for a structured word set, a structured text set and a structured selection text set,
Since it is configured to highlight the element corresponding to the request for the related display, the user can easily understand the relationship between the word specified by the user and other words and the relationship between the word specified by the user and each text. There is an effect that you can get what you can understand.

[Brief description of drawings]

【図１】この発明の一実施例による発想支援システム
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an idea generation support system according to an embodiment of the present invention.

【図２】この発明の第１の実施例による辞書生成方法
を示すフローチャートである。FIG. 2 is a flowchart showing a dictionary generation method according to the first embodiment of the present invention.

【図３】この発明の第２の実施例による単語集合構造
化方法における第１の単語ネットワークを作成する処理
を示すフローチャートである。FIG. 3 is a flowchart showing a process of creating a first word network in the word set structuring method according to the second embodiment of the present invention.

【図４】この発明の第２の実施例による単語集合構造
化方法における第１の単語ネットワークを作成する処理
を示すフローチャートである。FIG. 4 is a flowchart showing a process of creating a first word network in the word set structuring method according to the second embodiment of the present invention.

【図５】第１の単語ネットワークの一例を示す説明図
である。FIG. 5 is an explanatory diagram showing an example of a first word network.

【図６】共起関係を説明するための説明図である。FIG. 6 is an explanatory diagram for explaining a co-occurrence relationship.

【図７】この発明の第２の実施例による単語集合構造
化方法における第２の単語ネットワークを作成する処理
を示すフローチャートである。FIG. 7 is a flowchart showing a process of creating a second word network in the word set structuring method according to the second embodiment of the present invention.

【図８】単語集合構造化方法による処理結果の一例を
示す説明図である。FIG. 8 is an explanatory diagram showing an example of a processing result by the word set structuring method.

【図９】この発明の第３の実施例による文書集合構造
化方法を示すフローチャートである。FIG. 9 is a flowchart showing a document set structuring method according to a third embodiment of the present invention.

【図１０】この発明の第４の実施例による発想支援シ
ステムの動作を示すフローチャートである。FIG. 10 is a flowchart showing an operation of the idea generation support system according to the fourth example of the present invention.

【図１１】この発明の第４の実施例による発想支援シ
ステムの動作を示すフローチャートである。FIG. 11 is a flowchart showing the operation of the idea generation support system according to the fourth example of the present invention.

【図１２】情報空間表示手段による表示の例を示す説
明図である。FIG. 12 is an explanatory diagram showing an example of display by the information space display means.

[Explanation of symbols]

１情報蓄積手段、２辞書生成手段、３単語集合構
造化手段、４情報選択手段、６文書集合構造化手
段、７情報空間表示手段。1 information storage means, 2 dictionary generation means, 3 word set structuring means, 4 information selecting means, 6 document set structuring means, 7 information space display means.

フロントページの続き (72)発明者安井照昌尼崎市塚口本町八丁目１番１号三菱電機株式会社中央研究所内 (72)発明者津高新一郎尼崎市塚口本町八丁目１番１号三菱電機株式会社中央研究所内Front page continued (72) Inventor Terumasa Yasui 8-1-1 Tsukaguchihonmachi, Amagasaki City Mitsubishi Electric Corporation Central Research Laboratory (72) Inventor Shinichiro Tsutaka 8-1-1 Tsukaguchihonmachi, Amagasaki Mitsubishi Electric Corporation Company Central Research Institute

Claims

[Claims]

1. A step of obtaining an appearance frequency of a character string in one or a plurality of texts, wherein the appearance frequency is not lower than the character string among the character strings appearing in the text, and the character A dictionary generation method comprising: a step of extracting a character string that satisfies a condition that a character string including a string does not exist longer than the character string; and a step of adding the extracted character string as an item to generate a dictionary.

2. A step of obtaining a latest lower-rank word of a word and a latest higher-rank word of the word, and associating the word with the latest lower-rank word and the latest higher-rank word, and a relationship satisfying a preset condition among those relationships. A method of structuring a word set, the method comprising:

3. The first included in the heading of the first text
Is included in the body of the second text, the second word in the heading of the second text is included in the body of the third text, and the first word is the third text. The first word, if the third word is included in the heading of the third text, the first word,
A word set structure including a step of generating a basic chain connecting the second word and the third word, and a step of combining basic chains having a common part among the generated basic chains to generate a word network. Method.

4. A step of obtaining a latest lower-order word of a word and a latest higher-order word of the word from a plurality of texts, and associating the word with the latest lower-order word and the latest higher-order word, and presetting those relationships. Generating a first word network excluding associations that meet the specified conditions, the first word included in the heading of the first text being included in the body of the second text, and the first word being included in the body of the second text. The second word in the headline is contained in the body of the third text, the first word is contained in the body of the third text, and the third word is contained in the body of the third text. First word, second word, if included in heading
Generating a basic chain connecting the first word and the third word, combining the basic chains having a common part among the generated basic chains to generate a second word network, and the specified word And a step of integrating the first word network and the second word network with the key as a key.

5. A step of converting each text into a word vector, and setting a plurality of clusters each including a plurality of texts of the respective texts and discretely set using a distance between the word vectors. Steps to
A method of structuring a document set, comprising the step of obtaining a word representing each of the clusters.

6. The step of setting a plurality of clusters comprises:
The step of obtaining the center of gravity of all word vectors, the step of setting the vector farthest from the obtained center of gravity as the center of the first cluster, and the distance to the closest cluster center for each of the word vectors are obtained, and the maximum distance And adding each word vector as a cluster center, and classifying each of the word vectors into the cluster with the closest cluster center and using the average of the word vectors included in each cluster as a new cluster center, the cluster center does not change. 6. The document set structuring method according to claim 5, further comprising the step of continuing up to.

7. The step of obtaining a word representative of a cluster includes a step of extracting a large number of predetermined values from the elements of each word vector at the center of the cluster and a step of extracting a word corresponding to each extracted word vector. 7. The document set structuring method according to claim 6, further comprising the step of defining words as representative of clusters.

8. An information storage unit for storing document information, a dictionary generation unit for generating a dictionary by the method according to claim 1 for the text stored in the information storage unit, and the dictionary generation unit. A word set structuring means for structuring the word set by the method according to claim 4 using a dictionary as a word set, an information selecting means for selecting a predetermined text from a text set in the information storing means, and the information storing means. 8. A document set structuring means for structuring the text set in the text set and the text set selected by the information selecting means by the method according to claim 7, the processing result of the word set structuring means, and the document set structuring means. Idea support system including an information space display means for displaying the processing result of.

9. When a search keyword is input, the information selection means selects each text based on the keyword, and the document set structuring means selects each text based on the keyword for each text selected by the information selection means. The idea generation support system according to claim 8, wherein the document set is structured.

10. The information space display means, when there is a request for a related display, the processing result of the word set structuring means, the processing result of the document set structuring means for the text set in the information storage means, and the information selecting means. 10. The idea generation support system according to claim 9, wherein an element corresponding to the request for the related display is highlighted on each display screen of the processing result of the document set structuring unit for the text set selected by.