JP2000227917A

JP2000227917A - Thesaurus browsing system and method therefor and recording medium recording its processing program

Info

Publication number: JP2000227917A
Application number: JP2810199A
Authority: JP
Inventors: Toshiko Aizono; 敏子相薗; Hiroyuki Kaji; 博行梶; Yasutsugu Morimoto; 康嗣森本; Noriyuki Yamazaki; 山崎　　紀之; Keiko Iida; 恵子飯田; Yasuhiko Uchida; 安彦内田
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1999-02-05
Filing date: 1999-02-05
Publication date: 2000-08-15
Anticipated expiration: 2019-02-05
Also published as: JP4404323B2

Abstract

PROBLEM TO BE SOLVED: To improve the use efficiency of a related thesaurus by permitting a user to select any set, extracting a term related with a general term belonging to the respective representative terms of the selected set from the thesaurus and generating and displaying several sets from the extracted terms. SOLUTION: The set of terms in document data stored in a document data storage part 2a is stored in a cooccurrence term table 3a. The term extracted from a document and the frequency are stored in the document term table 3b for every document. A related thesaurus is generated from document data in a related thesaurus generation part 1a. The term vectors of the respective documents are extracted from a term vector extraction part 1b and the view of the related thesaurus is generated. A term cluster list is selected from it and the related term of the term belonging to the list is extracted. Several sets are generated from the extracted terms and they are displayed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、シソーラスに格納
された各タームの表示技術に係わり、特に、ネットワー
ク型のシソーラスに格納されたタームの検索の効率化を
図るのに好適なシソーラスブラウジングシステムと方法
およびその処理プログラムを記録した記録媒体に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for displaying terms stored in a thesaurus, and more particularly, to a thesaurus browsing system suitable for improving the efficiency of searching terms stored in a network type thesaurus. The present invention relates to a method and a recording medium on which a processing program is recorded.

【０００２】[0002]

【従来の技術】以下、まず、このシソーラスの概要につ
いてＪＩＳ規格を引用して簡単に述べ、次に、本発明が
対象とするターム（関連ターム）を格納したシソーラス
の特徴について説明し、その後、当該シソーラスをブラ
ウジングするための従来の技術を示す。2. Description of the Related Art First, the outline of this thesaurus will be briefly described with reference to JIS standards, and then the features of a thesaurus storing terms (related terms) targeted by the present invention will be described. 1 shows a conventional technique for browsing the thesaurus.

【０００３】ＪＩＳＸ０９０１「シソーラスの構成及
びその作成方法」によれば、シソーラスとは「統制され
た索引言語の語彙であって、あらかじめ概念間の先験的
な関係を明示するように組織化したもの」である。ここ
で「索引言語の語彙」、すなわち「索引語」とは、「あ
る概念を名詞又は名詞相当の句の形で適切に表現したも
の」（ＪＩＳＸ０９０１）であり、一般には「ター
ム」と呼ばれることが多い。よって、以下「ターム」と
呼ぶ。According to JIS X 0901 "Thesaurus construction and method of creation", a thesaurus is a "vocabulary of a controlled index language, and organized in such a way that a priori relationship between concepts is explicitly specified in advance. What did you do? " Here, the “vocabulary of the index language”, that is, the “index word” is “an appropriate expression of a concept in the form of a noun or a phrase equivalent to a noun” (JIS X0901), and is generally called “term”. Often called. Therefore, it is referred to as “term” below.

【０００４】シソーラスに記述されるターム間の関係に
は、「同義関係」、「階層関係」及び「関連関係」があ
る（ＪＩＳＸ０９０１）。「同義関係」とは、同じ概
念を指すターム間の関係である。例えば、「カメラ」と
「写真機」などである。「階層関係」とは、上位概念と
下位概念間の関係であり、類種関係（クラスとそのメン
バの関係、例：「霊長類」と「サル」）、全体部分関係
（部分の名称とそれが属する全体の名称、例：「消化
器」と「腸」）、及び例示関係（カテゴリとその事例の
関係、例：「高速鉄道」と「新幹線」）の３種類があ
る。このような階層関係にあるタームを格納したシソー
ラスを以下「階層シソーラス」と呼ぶ。There are "synonymous relationship", "hierarchical relationship", and "association relationship" as the relationship between terms described in the thesaurus (JIS X 0901). The “synonymous relationship” is a relationship between terms indicating the same concept. For example, "camera" and "camera". A “hierarchical relationship” is a relationship between a superordinate concept and a subordinate concept, such as typological relationships (relationships between classes and their members, eg, “primates” and “monkeys”), and whole partial relationships (part names and their names). There are three types of names, such as "digestive organs" and "gut", and example relationships (relationship between categories and their cases, such as "high-speed rail" and "Shinkansen"). A thesaurus storing the terms having such a hierarchical relationship is hereinafter referred to as a “hierarchical thesaurus”.

【０００５】これに対して「関連関係」は、「同義関係
及び階層関係以外の関連を持つ語の関係」（ＪＩＳＸ
０９０１）であり、意味が部分的に重複している場合
（例：「乗用車」と「自家用車」）や、一方のタームが
他方のタームを強く含意する場合（例：「出版」と「図
書」）がある。このような関連関係にあるターム、すな
わち関連タームを格納したシソーラス（以下、「関連シ
ソーラス」）の特徴として、ネットワーク型であるとい
う点が挙げられる。例を用いて説明する。[0005] On the other hand, "relationship" means "relationship between words having relations other than synonymous relation and hierarchical relation" (JIS X
0901), meanings partially overlap (eg, “passenger car” and “private car”), or when one term strongly implies the other (eg, “publishing” and “books”). )). As a feature of a term having such a related relationship, that is, a thesaurus storing the related terms (hereinafter, referred to as a “related thesaurus”), it is pointed out that it is of a network type. This will be described using an example.

【０００６】図２は、関連シソーラスの一例を示す説明
図である。図２中、ノードはターム、ノード間のリンク
は関連関係を示す。本図２に示すように関連シソーラス
では、「公定歩合」と「利率」に関連関係があり、さら
に「利率」と「ローン」が関連関係で結ばれ、更に「ロ
ーン」と「貸し渋り」に関連関係がある。このように関
連シソーラスでは、タームが互いに関連を持ってネット
ワークを形成している。FIG. 2 is an explanatory diagram showing an example of a related thesaurus. In FIG. 2, nodes represent terms, and links between the nodes represent relationships. As shown in Fig. 2, in the related thesaurus, "official rate" and "interest rate" are related, and "interest rate" and "loan" are connected in a related relation. There is a related relationship. As described above, in the relation thesaurus, terms are associated with each other to form a network.

【０００７】このような関連シソーラスをブラウジング
する従来の技術として、次の２つがある。まず、特開平
５−２３３７１７号公報に記載の「情報検索装置」は、
ユーザがタームを入力すると、ユーザが指定した範囲内
の関連度を持つ関連タームを検索して表示する。関連度
の範囲は、関連度の強さによってＮ段階ある（例：「大
中小の３段階」）。これにより、ユーザは関連度の低い
関連タームも効率よく参照することが出来る。There are the following two conventional techniques for browsing such a related thesaurus. First, an "information retrieval device" described in Japanese Patent Application Laid-Open No. 5-233717
When the user inputs a term, a related term having a relevance within a range designated by the user is searched for and displayed. There are N levels of relevance depending on the level of relevance (eg, “large, medium, and small”). As a result, the user can efficiently refer to related terms having low relevance.

【０００８】また、特開平９−４４５２５号公報に記載
の「データ検索装置」は、ユーザがタームを入力する
と、入力されたタームとの関連度の強さに従って関連タ
ームの表示位置を決定する。より具体的には、関連シソ
ーラスを図２に示すようなネットワーク構造で２次元的
に表示する方法に関し、ユーザが入力したタームとの関
連度が強いタームほど入力タームの近くに表示する。こ
れにより、ユーザは入力したタームとの関連度の強さを
ビジュアルに把握できる。[0008] In the "data search device" described in Japanese Patent Application Laid-Open No. 9-44525, when a user inputs a term, the display position of the related term is determined according to the strength of the degree of association with the input term. More specifically, the present invention relates to a method of displaying a related thesaurus two-dimensionally with a network structure as shown in FIG. 2, in which terms having a higher degree of relevance to terms input by the user are displayed closer to the input terms. As a result, the user can visually grasp the degree of the degree of association with the input term.

【０００９】これらの従来技術は、ユーザが関連シソー
ラスの一部分のみに興味を持つとき有用である。しか
し、その一方で、（１）関連シソーラスの概観をユーザ
に提供することができない、（２）関連シソーラスの一
部分に興味を持っている場合でも、所望のタームに辿り
着くまでに手間がかかる、という問題点がある。以下、
それぞれについて説明する。[0009] These prior arts are useful when the user is interested in only a portion of the associated thesaurus. However, on the other hand, (1) it is not possible to provide the user with an overview of the related thesaurus, and (2) even if he is interested in a part of the related thesaurus, it takes time to reach a desired term. There is a problem. Less than,
Each will be described.

【００１０】（１）関連シソーラスの概観をユーザに提
供することができない：上述した従来の技術は、ユーザ
が入力したキータームの関連タームのみを表示する。そ
の一方で、関連シソーラスの全体の概観を掴みたいとい
うユーザの要求がある。例えば、関連シソーラスを参照
したいものの、特に何について参照したいかは実際に関
連シソーラスを調べなけれはっきりと分からないときが
ある。(1) The user cannot be provided with an overview of the related thesaurus: The above-described conventional technique displays only the related terms of the key terms input by the user. On the other hand, there is a user request to get an overview of the related thesaurus. For example, there are times when a user wants to refer to a related thesaurus, but does not know exactly what he / she wants to refer to without actually examining the related thesaurus.

【００１１】このような場合ユーザは、まずどのような
タームが格納されているか関連シソーラスの概観を掴ん
でから興味のある方向へネットワークの部分構造を辿っ
ていきたいと思うであろう。例えば、階層シソーラスで
は、ユーザはその最上位のタームのリストを参照すれば
概観を掴むことができる。これに対して、上記従来技術
では、予め関連シソーラスのネットワークのどの部分を
参照するかユーザが決めておく必要がある。そのため、
関連シソーラスの概観を掴んでからシソーラスをブラウ
ジングしたいというユーザの要求に応えることができな
い。In such a case, the user may first want to get an overview of the related thesaurus as to what terms are stored and then follow the partial structure of the network in the direction of interest. For example, in a hierarchical thesaurus, a user can get an overview by referring to a list of top-level terms. On the other hand, in the related art, the user needs to determine in advance which part of the network of the related thesaurus is to be referred to. for that reason,
It is not possible to respond to a user's request to browse the thesaurus after gaining an overview of the related thesaurus.

【００１２】（２）所望のタームに辿り着くまでに手間
がかかる：上述のように従来の技術では、ユーザによる
キータームの入力が必要である。その一方で、適切なキ
ーの入力は、簡単ではないことが知られている。特に検
索対象の関連シソーラスに熟知していないユーザにとっ
て、入力するタームとは適当に思い付いたタームである
ことが多い。例えば「経済」について知りたいとき、
「経済」をキータームとして入力してしまう、などであ
る。(2) It takes time to reach a desired term: As described above, the conventional technique requires input of a key term by a user. On the other hand, it is known that inputting an appropriate key is not easy. In particular, for a user who is not familiar with the related thesaurus to be searched, the term to be input is often a term that the user has come up with appropriately. For example, if you want to know about the economy,
"Economy" is entered as a key term, and so on.

【００１３】このように思い付いたタームをキーとして
入力したとき、ユーザの興味に一致する関連タームが直
接得られる可能性は低い。そのため表示された関連ター
ムの中からキーとなるタームを選択して、さらに関連タ
ームを検索し辿っていくことが必要であろう。上述のよ
うに関連シソーラスはネットワーク型のシソーラスなの
で、関連タームを次々と辿っていくうちにユーザが“迷
子”、すなわちどのようにシソーラスを辿ったか分から
なくなることもある。このように、思い付いたタームを
適当に入れてシソーラスを調べるのは効率が悪い。[0013] When such a term is input as a key, it is unlikely that a related term matching the user's interest is directly obtained. Therefore, it will be necessary to select a key term from the displayed related terms, and further search and trace the related terms. As described above, since the related thesaurus is a network type thesaurus, the user may become "lost", that is, may not understand how the user has traced the thesaurus while following the related terms one after another. As described above, it is inefficient to check the thesaurus by appropriately adding a term that comes to mind.

【００１４】[0014]

【発明が解決しようとする課題】解決しようとする問題
点は、従来の技術では、関連シソーラスの全体構造（概
観）をユーザに提供することができない点と、ユーザが
関連シソーラスの一部分に興味を持っている場合でも所
望のタームに辿り着くまでに手間がかかってしまう点で
ある。The problems to be solved are that the conventional technology cannot provide the user with the entire structure (outlook) of the related thesaurus and that the user is interested in a part of the related thesaurus. The point is that it takes time to reach the desired term even if you have it.

【００１５】本発明の目的は、これら従来技術の課題を
解決し、関連シソーラスの利用効率を向上させることを
可能とするシソーラスブラウジングシステムと方法およ
びその処理プログラムを記録した記録媒体を提供するこ
とである。An object of the present invention is to provide a thesaurus browsing system and method capable of solving the problems of the prior art and improving the use efficiency of a related thesaurus, and a recording medium on which a processing program is recorded. is there.

【００１６】[0016]

【課題を解決するための手段】上記目的を達成するた
め、本発明のシソーラスブラウジングシステムと方法
は、従来技術のようにネットワーク型の関連シソーラス
の一部分のみを表示するものではなく、関連シソーラス
に格納されたタームの内、この関連シソーラスの全体構
造（概観）を示すようなターム（代表ターム）の集合を
幾つか生成して表示する。そして、ユーザがいずれかの
集合を選択した場合、選択された集合の各代表タームに
属する一般的なタームに関連するタームをシソーラスか
ら抽出し、さらに、抽出した各タームから同様に、幾つ
かの集合を生成して表示する。SUMMARY OF THE INVENTION To achieve the above object, the thesaurus browsing system and method of the present invention do not display only a part of a network type related thesaurus as in the prior art, but store the related thesaurus in the related thesaurus. A set of terms (representative terms) showing the overall structure (outline) of the related thesaurus is generated and displayed from the terms thus obtained. Then, when the user selects any of the sets, terms related to general terms belonging to each representative term of the selected set are extracted from the thesaurus, and some terms are similarly extracted from each of the extracted terms. Generate and display a set.

【００１７】これにより、関連シソーラス中、一般的な
タームで構成された全体構造から、より限定的なターム
も含む部分構造へとユーザをナビゲートすることができ
る。例えば、ナビゲーションの初期状態としてシソーラ
スの概観を示すタームの集合を表示した画面から、ユー
ザは興味を持った部分（タームの集合）を選びズーミン
グを指示することにより、興味を持った部分について、
より限定的なタームを含む詳細構造を参照することがで
きる。Thus, the user can be navigated from the general structure composed of general terms to the partial structure including more limited terms in the related thesaurus. For example, from the screen displaying a set of terms indicating an overview of the thesaurus as an initial state of navigation, the user selects a part of interest (a set of terms) and instructs zooming, so that the part of interest is displayed.
Reference can be made to detailed structures including more restrictive terms.

【００１８】[0018]

【発明の実施の形態】以下、本発明の実施の形態を、図
面により詳細に説明する。図１は、本発明のシソーラス
ブラウジングシステムの本発明に係る構成の第１の実施
形態例を示すブロック図であり、図２は、図１における
シソーラスブラウジングシステムの処理対象の関連シソ
ーラスの内容を示す説明図であり、図３は、図１におけ
るシソーラスブラウジングシステムの本発明に係わる処
理の概要を示す説明図であり、図４は、図１におけるシ
ソーラスブラウジングシステムのハードウェア構成例を
示するブロック図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of the configuration of the thesaurus browsing system of the present invention according to the present invention. FIG. 2 shows the contents of a related thesaurus to be processed by the thesaurus browsing system in FIG. FIG. 3 is an explanatory diagram showing an outline of processing according to the present invention of the thesaurus browsing system in FIG. 1, and FIG. 4 is a block diagram showing an example of a hardware configuration of the thesaurus browsing system in FIG. It is.

【００１９】まず、図２および図３を用いて、本例のシ
ソーラスブラウジングシステムの概要を説明する。本例
のシソーラスブラウジングシステムは、特に経済分野に
属する文書のデータベースから自動的に図２に示す関連
シソーラスを生成し、その概観を、図３における一般的
なタームとしてまとめられた複数のタームの集合（ター
ムクラスタ）単位で表示してユーザに提供する。First, the outline of the thesaurus browsing system of this embodiment will be described with reference to FIGS. The thesaurus browsing system of this example automatically generates a related thesaurus shown in FIG. 2 from a database of documents belonging to the economic field in particular, and gives an overview of a plurality of terms collected as general terms in FIG. (Term cluster) and displayed to the user.

【００２０】この図３により、経済分野における関連シ
ソーラスのブラウジングを説明する。図３中、最上位
は、一般的なタームの集合から構成されるシソーラスの
概観である。図３に示すようにシソーラスの概観は、
「税制」、「所得」、「国税庁」等を含むタームの集
合、「日銀」、「外貨」、「円」、「為替」を含むター
ムの集合等から構成される。Referring to FIG. 3, browsing of the related thesaurus in the economic field will be described. In FIG. 3, the top level is an overview of a thesaurus composed of a general set of terms. As shown in FIG.
It is composed of a set of terms including “tax system”, “income”, “National Tax Agency”, etc., and a set of terms including “BOJ”, “foreign currency”, “yen”, and “exchange”.

【００２１】これらのうち、ユーザが「日銀」、「外
貨」、「円」、「為替」を含むタームの集合を選択する
と、選択したタームの集合に含まれるタームの関連ター
ムで、より限定的なタームを含む集合が表示される。そ
の一例として図３の中段には「円」、「円売り」、「円
買い」、「東京外為市場」を含むタームの集合、「外
貨」、「ドル」、「ＩＭＦ」、「ユーロ」を含むターム
の集合等が表示されている。When the user selects a set of terms including “BOJ”, “foreign currency”, “yen”, and “exchange”, the terms related to the terms included in the selected set of terms are more limited. The set containing the appropriate terms is displayed. As an example, in the middle part of FIG. 3, a set of terms including “yen”, “yen sell”, “yen buy”, and “Tokyo forex market”, “foreign currency”, “dollar”, “IMF”, and “euro” are shown. A set of terms and the like are displayed.

【００２２】これらのうちからユーザが更に興味を持っ
たタームの集合を選択すれば、より限定的なタームから
成るタームの集合を参照することができる。すなわち、
図３中下段の「円買い」、「円高」、「貿易黒字」、
「不均衡」を含むタームの集合等がその一例である。If the user selects a set of terms that the user is more interested in, a set of terms composed of more limited terms can be referred to. That is,
In the lower middle of Fig. 3, "Yen buying", "Yen appreciation", "Trade surplus",
An example is a set of terms including "imbalance".

【００２３】このように一般的なタームから構成される
タームの集合を参照することにより、ユーザは関連シソ
ーラスの概観を掴むことが可能となる。更に、ユーザが
選択した一般的なタームの集合からより限定的なターム
を含む部分集合を表示するインタラクティブな処理（以
下「ズーミング」と呼ぶ。）により、効率良くシソーラ
スをブラウジングすることができる。By referring to a set of terms composed of general terms in this way, the user can gain an overview of the related thesaurus. Furthermore, the thesaurus can be efficiently browsed by interactive processing (hereinafter, referred to as "zooming") for displaying a subset including more limited terms from a general term set selected by the user.

【００２４】このようなシソーラスのブラウジングを実
現するため、本例では、シソーラス概観生成機能とター
ムクラスタのズーミング機能を設ける。シソーラス概観
生成機能は、関連シソーラスの概観を生成するものであ
り、次の２つの処理機能からなる。すなわち、（ａ）シ
ソーラスに格納されている関連関係が対応している分野
から一般的なタームを抽出する代表ターム抽出処理機能
と、（ｂ）この代表ターム抽出処理機能で抽出したター
ムのうち関連度の高いものをまとめてクラスタを生成す
るタームクラスタ生成処理機能である。In order to realize such browsing of the thesaurus, in this embodiment, a thesaurus overview generating function and a term cluster zooming function are provided. The thesaurus overview generation function is for generating an overview of a related thesaurus, and includes the following two processing functions. That is, (a) a representative term extraction processing function for extracting a general term from the field corresponding to the relation stored in the thesaurus, and (b) a relation among the terms extracted by the representative term extraction processing function. This is a term cluster generation processing function for generating clusters by combining high-degree ones.

【００２５】以下、シソーラスが対応する分野における
一般的なタームで、シソーラス概観の要素となるターム
を「代表ターム」、タームの集合からクラスタを生成す
る処理を「タームクラスタリング」、タームを構成要素
とするクラスタを「タームクラスタ」と呼ぶ。In the following, general terms in the field to which the thesaurus corresponds, terms that are elements of the thesaurus overview are “representative terms”, processing for generating a cluster from a set of terms is “term clustering”, and terms are constituent elements. The cluster that performs the operation is called a “term cluster”.

【００２６】タームクラスタのズーミング処理機能は、
一般的なタームの集合からより限定的なタームを含む集
合を生成して表示するものであり、次の２つの処理機能
からなる。すなわち、（ａ）ユーザがタームクラスタを
選択すると当該タームクラスタに属するタームの関連タ
ームを取得する関連ターム取得処理機能と、（ｂ）この
関連ターム取得処理機能で取得した関連タームをクラス
タリングするタームクラスタ生成処理機能である。The zoom processing function of the term cluster is as follows.
A set including more limited terms is generated and displayed from a set of general terms, and includes the following two processing functions. That is, (a) when a user selects a term cluster, a related term acquisition processing function for acquiring a related term of a term belonging to the term cluster, and (b) a term cluster for clustering related terms acquired by the related term acquisition processing function. This is a generation processing function.

【００２７】このようなシソーラスのブラウジングを実
現するため各機能を実装するためのハードウェア構成を
図４を用いて説明する。本図４に示すように、本例のシ
ソーラスブラウジングシステムは、ＣＰＵ（Central Pr
ocessing Unit）１、ハードディスク２、メモリ３、デ
ィスプレイ４ａ、ディスプレイ制御部４ｂ、キーボード
５ａ、キーボード制御部５ｂ、マウス６ａ、マウス制御
部６ｂ、及びバス７から構成される。A hardware configuration for implementing each function for realizing such a thesaurus browsing will be described with reference to FIG. As shown in FIG. 4, the thesaurus browsing system according to the present embodiment includes a CPU (Central Pr
Ocessing Unit) 1, a hard disk 2, a memory 3, a display 4a, a display control unit 4b, a keyboard 5a, a keyboard control unit 5b, a mouse 6a, a mouse control unit 6b, and a bus 7.

【００２８】ＣＰＵ１は、プログラムに基づき、データ
の入出力、読み込み、格納及び各種処理を実行して、本
発明に係わるシソーラスブラウジング処理を行なう。ハ
ードディスク２は、データを保存する装置、メモリ３
は、プログラム及びデータをロードして記憶する装置で
ある。ディスプレイ４ａは、ユーザにデータを表示する
装置であり、ディスプレイ制御部４ｂによって制御され
る。キーボード５ａ及びマウス６ａはユーザからの入力
を受け付ける装置であり、それぞれキーボード制御部５
ｂ及びマウス制御部６ｂによって制御される。バス７
は、各構成要素間にデータを受け渡す。The CPU 1 executes data the input / output, reading, storing, and various processes based on a program to perform a thesaurus browsing process according to the present invention. The hard disk 2 is a device for storing data, a memory 3
Is a device that loads and stores programs and data. The display 4a is a device that displays data to a user, and is controlled by the display control unit 4b. The keyboard 5a and the mouse 6a are devices for receiving input from a user,
b and the mouse controller 6b. Bus 7
Passes data between components.

【００２９】このようなハードウェア構成において、図
示していない光ディスク等の記録媒体から本発明のシソ
ーラスブラウジング方法に係わる処理プログラムをハー
ドディスク２に格納し、メモリ３にロードして起動する
ことにより、図１に示すシソーラスブラウジングシステ
ムが構成される。In such a hardware configuration, a processing program relating to the thesaurus browsing method of the present invention is stored in a hard disk 2 from a recording medium such as an optical disk (not shown), loaded into a memory 3, and started up. The thesaurus browsing system shown in FIG.

【００３０】図１におけるシソーラスブラウジングシス
テムは、次の３つのモジュールに分類される。すなわ
ち、各種処理を実行する処理部（図中、矩形で示す）、
データを格納するデータ格納部（図中、ドラム形で示
す）、及びデータ（図中、平行四辺形で示す）である。
以下、それぞれについて説明する。The thesaurus browsing system in FIG. 1 is classified into the following three modules. That is, a processing unit that executes various types of processing (shown by rectangles in the figure),
A data storage unit (shown as a drum in the figure) for storing data, and data (shown as a parallelogram in the figure).
Hereinafter, each will be described.

【００３１】まず処理部として、関連シソーラス生成部
１ａ、タームベクトル抽出部１ｂ、代表ターム取得部１
ｃ、関連ターム取得部１ｄ、及びタームクラスタ生成部
１ｅの５つがある。関連シソーラス生成部１ａは、文書
データ格納部２ａに格納された文書データから関連シソ
ーラスを生成する。First, as a processing unit, a related thesaurus generation unit 1a, a term vector extraction unit 1b, a representative term acquisition unit 1
c, a related term acquisition unit 1d, and a term cluster generation unit 1e. The related thesaurus generation unit 1a generates a related thesaurus from the document data stored in the document data storage unit 2a.

【００３２】タームベクトル抽出部１ｂは、文書データ
格納部２ａに格納された文書データからタームベクトル
を抽出する。代表ターム取得部１ｃは、タームベクトル
格納部２ｃに格納されたタームベクトルから代表ターム
を取得する。関連ターム取得部１ｄは、ユーザが選択し
たタームクラスタに属するタームの関連タームを取得す
る。タームクラスタ生成部１ｅは、代表タームリスト３
ｃ又は関連タームリスト３ｄに格納されたタームをクラ
スタリングする。尚、これらの各部の詳細な処理手順に
ついてはフローチャートを用いて後述する。The term vector extraction section 1b extracts a term vector from the document data stored in the document data storage section 2a. The representative term acquisition unit 1c acquires a representative term from the term vectors stored in the term vector storage unit 2c. The related term acquisition unit 1d acquires a related term of a term belonging to the term cluster selected by the user. The term cluster generation unit 1e generates the representative term list 3
Cluster the terms stored in c or the related term list 3d. The detailed processing procedure of each of these units will be described later using a flowchart.

【００３３】次にデータ格納部として、文書データ格納
部２ａ、関連シソーラス格納部２ｂ、タームベクトル格
納部２ｃ、及びシソーラス概観格納部２ｄの４つがあ
る。文書データ格納２ａは、ある分野に属する文書のデ
ータを格納する。例えば、経済面に掲載された新聞記
事、或いはある分類番号を付与された特許の明細書など
である。Next, there are four data storage units: a document data storage unit 2a, a related thesaurus storage unit 2b, a term vector storage unit 2c, and a thesaurus overview storage unit 2d. The document data storage 2a stores data of documents belonging to a certain field. For example, it is a newspaper article published in economic terms, or a specification of a patent given a certain classification number.

【００３４】関連シソーラス格納部２ｂは、文書データ
格納部２ａに格納された文書データから生成した関連シ
ソーラスを格納する。タームベクトル格納部２ｃは、文
書データ格納部２ａに格納された文書データから抽出し
たタームベクトルを格納する。シソーラス概観格納部２
ｄは、関連シソーラス格納部２ｂに格納された関連シソ
ーラスの概観を格納する。The related thesaurus storage unit 2b stores a related thesaurus generated from the document data stored in the document data storage unit 2a. The term vector storage unit 2c stores a term vector extracted from the document data stored in the document data storage unit 2a. Thesaurus overview storage 2
d stores an overview of the related thesaurus stored in the related thesaurus storage unit 2b.

【００３５】ここで各データ格納部の詳細な構成につい
て図５〜図８を用いて説明する。図５は、図１における
文書データ格納部の構成例を示す説明図である。本図５
に示すように文書データ格納部２ａは、文書データ２ａ
０１から構成される。文書データ２ａ０１には、文書の
テキストデータを格納する。図５中第１行目には、経済
に関する新聞記事の一例として「２１日の東京外為市場
の円相場は…」が格納されている。Here, a detailed configuration of each data storage unit will be described with reference to FIGS. FIG. 5 is an explanatory diagram showing a configuration example of the document data storage unit in FIG. Figure 5
The document data storage unit 2a stores the document data 2a
01. The document data 2a01 stores text data of the document. The first line in FIG. 5 stores “What is the yen exchange rate of the Tokyo foreign exchange market on the 21st ...” as an example of a newspaper article on the economy.

【００３６】図６は、図１における関連シソーラス格納
部の構成例を示す説明図である。本図６に示すように関
連シソーラス格納部２ｂは、タームＸ２ｂ０１、ターム
Ｙ２ｂ０２、及び関連度２ｂ０３から構成される。ター
ムＸ２ｂ０１及びタームＹ２ｂ０２には、関連関係にあ
るタームを、関連度２ｂ０３にはその関連度を格納す
る。図６中、第１行目には「円」と「東京外為市場」が
関連タームであり、その関連度は「１１．５」であるこ
とが格納されている。尚、本例における関連度はターム
の集合の相互情報量の値である。これについては後述す
る。FIG. 6 is an explanatory diagram showing a configuration example of the related thesaurus storage unit in FIG. As shown in FIG. 6, the relation thesaurus storage unit 2b includes a term X2b01, a term Y2b02, and a degree of relation 2b03. The terms X2b01 and Y2b02 store terms having a relational relationship, and the degree of relation 2b03 stores the degree of relation. In FIG. 6, the first line stores that "yen" and "Tokyo Forex Market" are related terms, and that the degree of relevance is "11.5". Note that the degree of association in this example is a value of the mutual information amount of a set of terms. This will be described later.

【００３７】図７は、図１におけるタームベクトル格納
部の構成例を示す説明図である。本図７に示すようにタ
ームベクトル格納部２ｃは、文書ＩＤ２ｃ０１、及び重
要タームリスト２ｃ０２から構成される。文書ＩＤ２ｃ
０１には、文書データ格納部２ａに格納された文書のＩ
Ｄを格納し、重要タームリスト２ｃ０２には当該文書に
出現するタームのうち重要なもののリストを格納する。
図７中、第１行目には文書ＩＤ１の文書の重要タームが
「円高」、「東京外為市場」、「ドル安」等であること
が格納されている。FIG. 7 is an explanatory diagram showing a configuration example of the term vector storage unit in FIG. As shown in FIG. 7, the term vector storage unit 2c includes a document ID 2c01 and an important term list 2c02. Document ID 2c
01 contains the I of the document stored in the document data storage unit 2a.
D is stored, and a list of important terms appearing in the document is stored in the important term list 2c02.
In FIG. 7, the first line stores that the important terms of the document with document ID 1 are “yen appreciation”, “Tokyo foreign exchange market”, “dollar depreciation” and the like.

【００３８】図８は、図１におけるシソーラス概観格納
部の構成例を示す説明図である。本図８に示すようにシ
ソーラス概観格納部２ｄは、タームリスト２ｄ０１から
構成される。タームリスト２ｄ０１には、タームクラス
タに属するタームのリストを格納する。図８中、第１行
目には、「景気」、「売り上げ」、「消費者」等から構
成されるタームクラスタが格納されている。尚、図８に
示すシソーラス概観は、一例として１０個のタームクラ
スタからなるものとする。FIG. 8 is an explanatory diagram showing an example of the configuration of the thesaurus overview storage unit in FIG. As shown in FIG. 8, the thesaurus overview storage unit 2d includes a term list 2d01. The term list 2d01 stores a list of terms belonging to the term cluster. In FIG. 8, the first line stores a term cluster including “economics”, “sales”, “consumer”, and the like. Note that the thesaurus overview shown in FIG. 8 is composed of ten term clusters as an example.

【００３９】以上、図５〜図８を用いて図１に示すモジ
ュール構成のうちデータ格納部につ次に、データについ
て説明する。図１に示すようにデータとして、共起ター
ムテーブル３ａ、文書タームテーブル３ｂ、代表ターム
リスト３ｃ、関連タームリスト３ｄ、相関行列３ｅ、出
力タームクラスタリスト３ｆ、及び入力タームクラスタ
リスト３ｇがある。The data storage unit of the module configuration shown in FIG. 1 will be described next with reference to FIGS. As shown in FIG. 1, the data includes a co-occurrence term table 3a, a document term table 3b, a representative term list 3c, a related term list 3d, a correlation matrix 3e, an output term cluster list 3f, and an input term cluster list 3g.

【００４０】共起タームテーブル３ａは、文書データ格
納部２ａに格納された文書データ中、ある一定の範囲内
で同時に出現するタームの組（タームの集合）を格納す
る。文書タームテーブル３ｂは、文書データ格納部２ａ
に格納された文書から抽出したタームとその頻度を文書
ごとに格納する。代表タームリスト３ｃは、タームベク
トル格納部２ｂから抽出した代表タームを格納する。関
連タームリスト３ｄは、入力タームクラスタ３ｇに格納
されたタームの関連タームを格納する。The co-occurrence term table 3a stores a set of terms (a set of terms) that appear simultaneously within a certain range in the document data stored in the document data storage unit 2a. The document term table 3b is stored in the document data storage 2a.
The terms extracted from the documents stored in the document and their frequencies are stored for each document. The representative term list 3c stores the representative terms extracted from the term vector storage unit 2b. The related term list 3d stores the related terms of the terms stored in the input term cluster 3g.

【００４１】相関行列３ｅは、代表タームリスト３ｃ又
は関連タームリスト３ｄに格納されたターム間の関連度
を格納する。出力タームクラスタリスト３ｆは、代表タ
ームリスト３ｃ又は関連タームリスト３ｄに格納された
タームをクラスタリングした結果を格納する。入力ター
ムクラスタリスト３ｇは、ユーザが選択したタームクラ
スタを格納する。The correlation matrix 3e stores the degree of association between the terms stored in the representative term list 3c or the related term list 3d. The output term cluster list 3f stores the result of clustering the terms stored in the representative term list 3c or the related term list 3d. The input term cluster list 3g stores the term cluster selected by the user.

【００４２】ここで、各データの詳細について図９〜図
１５を用いて説明する。図９は、図１における共起ター
ムテーブルの構成例を示す説明図である。本図９に示す
ように、共起タームテーブル３ａは、タームＸ３ａ０
１、タームＹ３ａ０２、及び共起頻度３ａ０３から構成
される。タームＸ３ａ０１及びタームＹ３ａ０２には、
文書データ格納部２ａに格納された文書データ中、ある
一定の範囲内で同時に出現するタームの組（集合）を格
納する。共起頻度３ａ０３には、当該タームの組（集
合）が同時に出現した頻度を格納する。Here, details of each data will be described with reference to FIGS. FIG. 9 is an explanatory diagram showing a configuration example of the co-occurrence term table in FIG. As shown in FIG. 9, the co-occurrence term table 3a stores the term X3a0
1, term Y3a02, and co-occurrence frequency 3a03. Term X3a01 and term Y3a02 include:
In the document data stored in the document data storage unit 2a, a set of terms appearing simultaneously within a certain range is stored. The co-occurrence frequency 3a03 stores the frequency at which the term set (set) appears at the same time.

【００４３】尚、「共起」とはあるタームが他のターム
と一定の範囲内で同時に出現することであり、「共起頻
度」とは同時に出現した回数を指す。更にあるタームと
共起するタームを「共起ターム」と呼ぶ。ここでは一定
の範囲の一例として“同じ文”を用いる。一例として図
９中第１行目には、「円相場」と「東京外為市場」が２
５９回、同じ文に出現したことが格納されている。Note that "co-occurrence" means that a term appears simultaneously with another term within a certain range, and "co-occurrence frequency" indicates the number of times that a term appears simultaneously. Further, a term that co-occurs with a certain term is called a “co-occurrence term”. Here, "same sentence" is used as an example of a certain range. As an example, in the first line of FIG. 9, “Yen market price” and “Tokyo forex market” are 2
It stores that it has appeared 59 times in the same sentence.

【００４４】図１０は、図１における文書タームテーブ
ルの構成例を示す説明図である。本図１０に示すよう
に、文書タームテーブル３ｂは、文書ＩＤ３ｂ０１、タ
ーム３ｂ０２、及び出現頻度３ｂ０３から構成される。
文書ＩＤ３ｂ０１は、文書データ格納部２ａに格納され
た文書のＩＤを格納する。ターム３ｂ０２には、当該Ｉ
Ｄの文書に出現したタームを、出現頻度３ｂ０３にはタ
ームの当該文書内での出現頻度を格納する。図１０中、
第１行目には文書ＩＤ「１」の文書中、「東京外為市
場」が２回出現したことが格納してある。FIG. 10 is an explanatory diagram showing a configuration example of the document term table in FIG. As shown in FIG. 10, the document term table 3b includes a document ID 3b01, a term 3b02, and an appearance frequency 3b03.
The document ID 3b01 stores the ID of the document stored in the document data storage unit 2a. Term 3b02 contains the I
The term that appears in the document D and the appearance frequency 3b03 store the frequency of occurrence of the term in the document. In FIG.
The first line stores that “Tokyo Forex Market” appears twice in the document with the document ID “1”.

【００４５】図１１は、図１における代表タームリスト
の構成例を示す説明図である。図１１（ａ）はタームベ
クトルから重要タームを取得しているときの状態を示
し、図１１（ｂ）は重要タームの取得後ソートされた状
態のリストを示す。図１１（ａ）に示すように代表ター
ムリスト３ｃは、代表ターム３ｃ０１及び文書数３ｃ０
２から構成される。FIG. 11 is an explanatory diagram showing a configuration example of the representative term list in FIG. FIG. 11A shows a state in which important terms are acquired from the term vector, and FIG. 11B shows a list of states sorted after the acquisition of important terms. As shown in FIG. 11A, the representative term list 3c includes a representative term 3c01 and the number of documents 3c0.
2

【００４６】代表ターム３ｃ０１には、タームベクトル
格納部２ｂから抽出した代表タームを格納する。文書数
３ｃ０２には、当該タームが重要タームとなった文書数
を格納する。図１１（ａ）中、第１行目には「為替」が
４５１文書で重要タームであったことが格納されてる。
図１１（ａ）のリストを文書数の降順にソートした状態
が図１１（ｂ）であり、図１１（ｂ）中、第１行目には
「株」が４９８文書で重要タームであったことが格納さ
れている。The representative term 3c01 stores the representative term extracted from the term vector storage 2b. The number of documents 3c02 stores the number of documents in which the term has become an important term. In FIG. 11A, the first line stores that "exchange" was an important term in 451 documents.
FIG. 11 (b) shows a state in which the list of FIG. 11 (a) is sorted in descending order of the number of documents. In FIG. 11 (b), on the first line, "stock" is 498 documents and an important term. That is stored.

【００４７】図１２は、図１における関連タームリスト
の構成例を示す説明図である。図１２（ａ）はユーザが
選択したタームクラスタに属するタームの関連タームを
取得しているときの状態を示し、図１２（ｂ）は関連タ
ームの取得後ソートされた状態のリストを示す。図１２
（ａ）に示すように関連タームリスト３ｄは、関連ター
ム３ｄ０１及び順位３ｄ０２から構成される。FIG. 12 is an explanatory diagram showing a configuration example of the related term list in FIG. FIG. 12A shows a state when the related terms of the terms belonging to the term cluster selected by the user are acquired, and FIG. 12B shows a list of states sorted after the acquisition of the related terms. FIG.
As shown in (a), the related term list 3d includes a related term 3d01 and a rank 3d02.

【００４８】関連ターム３ｄ０１には、ユーザが選択し
たタームクラスタに属するタームの関連タームを格納す
る。順位３ｄ０２には、当該タームの関連タームのうち
関連度が何番目に大きかったかを表す順位を格納する。
図１２（ａ）中第１５番目には「東京外為市場」が順位
１として格納されている。このことは、「東京外為市
場」があるタームの関連タームのうち最も関連度が大き
かったことを表す。The related term 3d01 stores the related term of the term belonging to the term cluster selected by the user. In the order 3d02, an order indicating the order of the degree of association among the terms related to the term is stored.
In FIG. 12A, “Tokyo Forex Market” is stored as the first place in the fifteenth place. This indicates that the term “Tokyo Forex Market” has the highest relevance among the terms related to a term.

【００４９】図１２（ａ）のリストを順位の昇順でソー
トした状態が図１２（ｂ）である。この図１２（ｂ）
中、第１行目から第１０行目には入力されたタームクラ
スタに属する各タームと関連度が最も大きかったター
ム、すなわち順位１のタームが格納されている。FIG. 12B shows a state in which the list shown in FIG. 12A is sorted in ascending order. FIG. 12B
In the first to tenth lines, the terms having the highest degree of relevance to the terms belonging to the input term cluster, that is, the terms of the first rank are stored.

【００５０】図１３は、図１における相関行列の構成例
を示す説明図である。本図１３に示すように相関行列３
ｅは、ｎ×ｎの行列である。ここでｎは代表タームリス
ト３ｃ又は関連タームリスト３ｄに格納されたタームの
数である。図１３に示す相関行列３ｅ中、ｉ行目ｊ列目
の要素は、代表タームリスト３ｃ又は関連タームリスト
３ｄ中、ｉ番目のタームとｊ番目のタームの関連度であ
る。FIG. 13 is an explanatory diagram showing a configuration example of the correlation matrix in FIG. As shown in FIG. 13, the correlation matrix 3
e is an n × n matrix. Here, n is the number of terms stored in the representative term list 3c or the related term list 3d. The element in the i-th row and the j-th column in the correlation matrix 3e shown in FIG. 13 is the degree of association between the i-th term and the j-th term in the representative term list 3c or the related term list 3d.

【００５１】図１３中、第１行目の第２列には関連度
「７．３」が格納されている。尚、相関行列３ｅでは、
行列の対角要素には同じ値が格納されている。すなわ
ち、図１３中、第２行目第１列には、第１行目第２列と
同じ値「７．３」が格納されている。In FIG. 13, the degree of association “7.3” is stored in the second column of the first row. In the correlation matrix 3e,
The same value is stored in the diagonal elements of the matrix. That is, in FIG. 13, the same value “7.3” as in the first row and the second column is stored in the second row and the first column.

【００５２】図１４は、図１における出力タームクラス
タリストの構成例を示す説明図である。図１４（ａ）は
出力タームクラスタリスト３ｆの初期状態を示し、図１
４（ｂ）はタームクラスタ生成後の状態を示す。図１４
（ａ）に示すように出力タームクラスタリスト３ｆは、
最大関連度のクラスタＩＤ３ｆ０１、最大関連度３ｆ０
２、及びタームリスト３ｆ０３から構成される。FIG. 14 is an explanatory diagram showing a configuration example of the output term cluster list in FIG. FIG. 14A shows an initial state of the output term cluster list 3f, and FIG.
FIG. 4B shows the state after the generation of the term cluster. FIG.
As shown in (a), the output term cluster list 3f is
Cluster ID 3f01 with maximum relevance, maximum relevance 3f0
2 and a term list 3f03.

【００５３】出力タームクラスタリスト３ｆの各行は、
１つのタームクラスタを表し、各行のインデックスがす
なわちクラスタＩＤとなる。最大関連度のクラスタＩＤ
３ｆ０１には、当該タームクラスタと最大の関連度をと
るタームクラスタのＩＤを格納し、最大関連度３ｆ０２
にはその最大関連度を格納する。タームクラスタ間の関
連度については後述する。タームリスト３ｆ０３には、
当該タームクラスタに属するタームのリストを格納す
る。Each line of the output term cluster list 3f is
It represents one term cluster, and the index of each row is the cluster ID. Cluster ID of maximum relevance
In 3f01, the ID of the term cluster having the highest degree of association with the term cluster is stored.
Stores the maximum relevance. The degree of association between term clusters will be described later. In the term list 3f03,
A list of terms belonging to the term cluster is stored.

【００５４】図１４（ａ）に示すように、出力タームク
ラスタリスト３ｆには、初期状態として１つのタームを
タームリスト３ｆ０３に格納したタームクラスタが格納
されている。これら１つのタームから成るタームクラス
タを定められた数になるまでマージした状態が図１４
（ｂ）である。この図１４（ｂ）中の第１行目には「Ｉ
ＭＦ」、「ドル」、「外貨」、「ユーロ」などから構成
されるタームクラスタが格納されている。As shown in FIG. 14A, the output term cluster list 3f stores a term cluster in which one term is stored in the term list 3f03 as an initial state. FIG. 14 shows a state in which the term clusters composed of these one terms are merged until a predetermined number is reached.
(B). The first line in FIG. 14B shows "I
A term cluster including “MF”, “dollar”, “foreign currency”, “euro”, and the like is stored.

【００５５】図１５は、図１における入力タームクラス
タリストの構成例を示す説明図である。図１５に示すよ
うに入力タームクラスタリスト３ｇは、タームリスト３
ｇ０１から構成される。出力タームクラスタリスト３ｆ
と同様入力タームクラスタリスト３ｇの各行は、１つの
タームクラスタを表す。図１５中の第１行目には
「円」、「為替」、「日銀」、「外貨」などから構成さ
れるタームクラスタが格納されている。FIG. 15 is an explanatory diagram showing a configuration example of the input term cluster list in FIG. As shown in FIG. 15, the input term cluster list 3g is the term list 3
g01. Output term cluster list 3f
Each line of the input term cluster list 3g represents one term cluster. The first line in FIG. 15 stores a term cluster including “yen”, “exchange”, “BOJ”, “foreign currency”, and the like.

【００５６】以上、本例のシソーラスブラウジングシス
テムのモジュール構成及びその構成要素の詳細について
説明した。次に、本例のシソーラスブラウジングシステ
ムの詳細な処理手順について図１６〜図２６に示すフロ
ーチャートを用いて説明する。The module configuration of the thesaurus browsing system of the present embodiment and the details of its components have been described above. Next, a detailed processing procedure of the thesaurus browsing system of this example will be described with reference to flowcharts shown in FIGS.

【００５７】まず、シソーラスブラウジングシステムの
全体の処理手順について図１６を用いて説明する。図１
６は、本発明のシソーラスブラウジング方法の処理手順
例を示すフローチャートである。First, the overall processing procedure of the thesaurus browsing system will be described with reference to FIG. FIG.
FIG. 6 is a flowchart showing an example of a processing procedure of the thesaurus browsing method of the present invention.

【００５８】本図１６に示すように、本例のシソーラス
ブラウジング方法は、大きく次の２つの処理から構成さ
れる。すなわち、図１６（ａ）に示すシソーラスブラウ
ジング用データ生成処理と、図１６（ｂ）に示すシソー
ラスブラウジング処理である。前者はシソーラスブラウ
ジングのためのデータを生成する処理であり、バッチ的
に実行する。これに対して後者は、ユーザとインタラク
ティブに応答する処理であり、リアルタイムに実行す
る。以下、図１６（ａ）及び図１６（ｂ）についてそれ
ぞれ説明し、次に各ステップについて詳細なフローチャ
ートを用いて説明する。As shown in FIG. 16, the thesaurus browsing method of the present embodiment is mainly composed of the following two processes. That is, a thesaurus browsing data generation process shown in FIG. 16A and a thesaurus browsing process shown in FIG. 16B. The former is a process for generating data for thesaurus browsing, and is executed in batches. On the other hand, the latter is a process for interactively responding to the user, and is executed in real time. Hereinafter, each of FIGS. 16A and 16B will be described, and then each step will be described using a detailed flowchart.

【００５９】まず、図１６（ａ）に示すシソーラスブラ
ウジング用データ生成処理について説明する。図１６
（ａ）に示すようにシソーラスブラウジング用データ生
成処理では、まず文書データから関連シソーラスを生成
し（ステップ１０１）、各文書のタームベクトルを抽出
して（ステップ１０２）、関連シソーラスの概観を生成
する（ステップ１０３）。First, the thesaurus browsing data generation processing shown in FIG. FIG.
As shown in (a), in the thesaurus browsing data generation process, first, a related thesaurus is generated from document data (step 101), and term vectors of each document are extracted (step 102) to generate an overview of the related thesaurus. (Step 103).

【００６０】次に、図１６（ｂ）に示すシソーラスブラ
ウジング処理について説明する。図１６（ｂ）に示すよ
うにシソーラスブラウジング処理では、まずシソーラス
概観格納部２ｄに格納されたシソーラス概観をユーザに
表示し（ステップ１１１）、次にユーザが表示されたタ
ームクラスタリストを選択してズーミングを指示すれば
（ステップ１１２）、ユーザが選択したタームクラスタ
に属するタームの関連タームを取得する（ステップ１１
３）。Next, the thesaurus browsing process shown in FIG. 16B will be described. As shown in FIG. 16B, in the thesaurus browsing process, the user first displays the thesaurus overview stored in the thesaurus overview storage unit 2d (step 111), and then selects the displayed term cluster list by the user. If zooming is instructed (step 112), terms related to the terms belonging to the term cluster selected by the user are acquired (step 11).
3).

【００６１】そして、それらをクラスタリングし（ステ
ップ１１４）、生成したタームクラスタをユーザに表示
する（ステップ１１５）。ユーザからのシソーラスブラ
ウジング終了の指示があれば（ステップ１１６）、処理
を終了し、なければステップ１１２の処理に戻る。Then, they are clustered (step 114), and the generated term cluster is displayed to the user (step 115). If there is an instruction to end the thesaurus browsing from the user (step 116), the process is ended, and if not, the process returns to step 112.

【００６２】以下、図１６（ａ）におけるステップ１０
１の関連シソーラス生成処理、ステップ１０２のターム
ベクトル抽出処理、ステップ１０３のシソーラス概観生
成処理、及び図１６（ｂ）におけるステップ１１３の関
連ターム取得処理について詳細に説明する。尚、ステッ
プ１１４のタームクラスタ生成処理は、図１６（ａ）に
示すシソーラス概観生成処理に含まれるタームクラスタ
生成処理と同じなので、説明を省略する。Hereinafter, step 10 in FIG.
The first related thesaurus generation processing, the term vector extraction processing in step 102, thesaurus overview generation processing in step 103, and the related term acquisition processing in step 113 in FIG. 16B will be described in detail. Note that the term cluster generation processing in step 114 is the same as the term cluster generation processing included in the thesaurus overview generation processing shown in FIG.

【００６３】まず、関連シソーラス生成処理の詳細な手
順の説明を行なう。文書データから共起タームを抽出し
て関連シソーラスを生成する処理は、特開平５−２８２
３６７号公報（「関連キーワード自動生成装置」）、あ
るいは特開平８−１６１３４３号公報（「関連語辞書作
成装置」）等で公知であるが、ここではその一例とし
て、文書データから共起タームを抽出し、相互情報量に
基づいて関連度を求める処理について説明する。First, a detailed procedure of the related thesaurus generation processing will be described. A process for extracting co-occurrence terms from document data and generating a related thesaurus is disclosed in Japanese Patent Laid-Open No. 5-282.
No. 367 (“Related keyword automatic generation device”) or Japanese Patent Application Laid-Open No. 8-161343 (“Related word dictionary creation device”). Here, as an example, a co-occurrence term from document data is described. The process of extracting and calculating the degree of association based on the mutual information amount will be described.

【００６４】尚、相互情報量とは、「直接得られる事象
Ｘ（ここではターム）から推定される他の事象Ｙの情報
量」と定義されるが、ここではタームの組（タームの集
合）の共起しやすさを正規化した値として考える。相互
情報量を求める式を数１に示す。The mutual information amount is defined as "the information amount of another event Y estimated from the directly obtained event X (here, term)". Here, a set of terms (a set of terms) is used. Is considered as a normalized value. The equation for obtaining the mutual information is shown in Equation 1.

【数１】 (Equation 1)

【００６５】図１７は、図１６（ａ）のシソーラスブラ
ウジング用データ生成処理における関連シソーラス生成
処理の詳細な処理手順例を示すフローチャートである。
本図１７に示すように関連シソーラス生成処理は、まず
文書データ格納部２ａに格納された文書の数を示す変数
ｉに１をセットして初期化し（ステップ１ａ０１）、ｉ
番目の文書データを単語に分割しながら同じ文に出現し
たタームの組をすべて抽出し、共起タームテーブル３ａ
に格納する（ステップ１ａ０２）。FIG. 17 is a flowchart showing a detailed processing procedure example of the related thesaurus generation processing in the thesaurus browsing data generation processing of FIG. 16 (a).
As shown in FIG. 17, in the related thesaurus generation processing, first, a variable i indicating the number of documents stored in the document data storage unit 2a is set to 1 and initialized (step 1a01), and i
While dividing the second document data into words, all the term sets appearing in the same sentence are extracted, and the co-occurrence term table 3a is extracted.
(Step 1a02).

【００６６】このとき、共起タームテーブル３ａに同じ
タームの組があれば共起頻度を１増やし、同じタームの
組がなければ共起頻度を１として格納する。次にｉを１
増やして（ステップ１ａ０３）、ｉが文書数より小さけ
れば（ステップ１ａ０４）ステップ１ａ０２に戻る。At this time, if there is the same set of terms in the co-occurrence term table 3a, the co-occurrence frequency is increased by 1. If there is no such set of terms, the co-occurrence frequency is stored as 1. Then i is 1
The number is increased (step 1a03). If i is smaller than the number of documents (step 1a04), the process returns to step 1a02.

【００６７】ｉが文書数を超えれば、共起タームテーブ
ル３ａのインデックスを示す変数ｊを１に初期化して
（ステップ１ａ０５）、共起タームテーブル３ａのｊ番
目のタームの組の関連度を数１の式を用いて計算し、あ
る閾値以上の関連度を持つタームの組を関連シソーラス
格納部２ｂに格納する（ステップ１ａ０６）。次にｊを
１増やして（ステップ１ａ０７）、ｊが共起タームテー
ブル３ａに格納されたタームの組数よりも小さければ
（ステップ１ａ０８）、ステップ１ａ０６に戻り、ｊが
共起数を超えれば処理を終了する。If i exceeds the number of documents, a variable j indicating the index of the co-occurrence term table 3a is initialized to 1 (step 1a05), and the degree of association of the j-th term set in the co-occurrence term table 3a is calculated. 1 and stores a set of terms having a degree of relevance equal to or greater than a certain threshold value in the relation thesaurus storage unit 2b (step 1a06). Next, j is incremented by 1 (step 1a07). If j is smaller than the number of term sets stored in the co-occurrence term table 3a (step 1a08), the process returns to step 1a06, and if j exceeds the co-occurrence number, processing is performed. To end.

【００６８】次に、タームベクトル抽出処理の詳細な手
順を説明する。図１８は、図１６（ａ）のシソーラスブ
ラウジング用データ生成処理におけるタームベクトル抽
出処理の詳細な処理手順例を示すフローチャートであ
る。本図１８に示すようにタームベクトル抽出処理は、
まず文書データ格納部２ａに格納された文書の数を示す
変数ｉに１をセットして初期化し（ステップ１ｂ０
１）、ｉ番目の文書を単語に分割して各単語とその出現
回数を文書タームテーブル３ｂに格納する（ステップ１
ｂ０２）。Next, a detailed procedure of the term vector extraction processing will be described. FIG. 18 is a flowchart illustrating a detailed processing procedure example of the term vector extraction processing in the thesaurus browsing data generation processing of FIG. As shown in FIG. 18, the term vector extraction processing
First, a variable i indicating the number of documents stored in the document data storage unit 2a is set to 1 and initialized (step 1b0).
1) The i-th document is divided into words, and each word and the number of appearances are stored in the document term table 3b (step 1).
b02).

【００６９】そしてｉを１増やして（ステップ１ｂ０
３）、ｉが文書数より小さければ（ステップ１ｂ０４）
ステップ１ｂ０２に戻る。ｉが文書数を超えれば、文書
タームテーブル３ｂに格納されたタームそれぞれについ
て文書データベース全体での出現文書数をカウントし
（ステップ１ｂ０５）、ｉを１にリセットして（ステッ
プ１ｂ０６）、ｉ番目の文書に出現したタームの重みを
式２に基づき計算し、ある閾値以上の重みを持つターム
をｉ番目の文書の重要タームとしてタームベクトルを生
成し、タームベクトル格納部２ｃに格納する（ステップ
１ｂ０７）。Then, i is increased by 1 (step 1b0
3) If i is smaller than the number of documents (step 1b04)
It returns to step 1b02. If i exceeds the number of documents, the number of appearing documents in the entire document database is counted for each of the terms stored in the document term table 3b (step 1b05), i is reset to 1 (step 1b06), and the i-th The term weights appearing in the document are calculated based on Equation 2, a term having a weight equal to or greater than a certain threshold is generated as an important term of the i-th document, and a term vector is stored in the term vector storage unit 2c (step 1b07). .

【００７０】さらにｉを１増やして（ステップ１ｂ０
８）、ｉが文書データ格納部２ａに格納された文書の数
よりも小さければ（ステップ１ｂ０９）、ステップ１ｂ
０７に戻る。ｉが文書数を超えれば処理を終了する。Further, i is increased by 1 (step 1b0
8) If i is smaller than the number of documents stored in the document data storage unit 2a (step 1b09), step 1b
Return to 07. If i exceeds the number of documents, the process ends.

【００７１】次にシソーラス概観生成処理の詳細な手順
を説明する。図１９は、図１６（ａ）のシソーラスブラ
ウジング用データ生成処理におけるシソーラス概観生成
処理の詳細な処理手順例を示すフローチャートである。
本図１９に示すように、シソーラス概観生成処理は、ま
ずシソーラス概観の要素となる代表タームを抽出し（ス
テップ１０３１）、抽出した代表タームをクラスタリン
グしてタームクラスタを生成し（ステップ１０３２）、
生成したタームクラスタをシソーラス概観格納部２ｄに
格納する（ステップ１０３３）。Next, the detailed procedure of the thesaurus overview generation processing will be described. FIG. 19 is a flowchart illustrating a detailed processing procedure example of the thesaurus overview generation processing in the thesaurus browsing data generation processing of FIG.
As shown in FIG. 19, in the thesaurus overview generation process, first, a representative term which is an element of the thesaurus overview is extracted (step 1031), and the extracted representative terms are clustered to generate a term cluster (step 1032).
The generated term cluster is stored in the thesaurus overview storage unit 2d (step 1033).

【００７２】以下、ステップ１０３１の代表ターム抽出
処理及びステップ１０３２のタームクラスタ生成処理に
ついて詳細に説明する。シソーラス概観生成処理におけ
る代表ターム抽出処理の手順を、まずその概要について
述べ、次にフローチャートを用いて詳細に説明する。Hereinafter, the representative term extraction processing in step 1031 and the term cluster generation processing in step 1032 will be described in detail. The procedure of the representative term extraction processing in the thesaurus overview generation processing will first be outlined, and then will be described in detail using a flowchart.

【００７３】ある分野における一般的なタームを抽出す
る手がかりとして、当該分野におけるタームの出現頻度
や出現文書数などのデータがある。これらをそのまま用
いて、当該分野中、出現頻度の大きいターム、または出
現文書数の大きいタームを一般的なタームとして抽出す
る方法が考えられるが、意味のないタームが抽出される
可能性がある。例えば、「経済」は経済分野において出
現頻度は高いと予想されるが意味があるとは考えにく
い。As clues for extracting general terms in a certain field, there are data such as the frequency of occurrence of terms in the relevant field and the number of documents appearing. Using these as they are, a method of extracting a term having a high appearance frequency or a term having a large number of appearing documents as a general term in the field can be considered, but a meaningless term may be extracted. For example, “economy” is expected to appear frequently in the economic field, but is unlikely to be meaningful.

【００７４】これに対して本例の代表ターム抽出処理で
は、タームの重要度を利用することにより意味のある一
般的なタームを抽出する。以下、具体的に説明する。本
例の代表ターム抽出処理においては、予め当該分野に属
する文書データベースから抽出した「タームベクトル」
を利用する。この「タームベクトル」とは、文書を特徴
付けるタームのリストであり、「Salton, G., et al. :
A Vector Space Model for Automatic Indexing, Comm
unications of the ACM, Vol.18, No.11(1975).」に記
載のｔｆ・ｉｄｆ法（Term Frequency inverse Documen
t Frequency）を利用することにより抽出可能である。On the other hand, in the representative term extraction processing of this embodiment, a meaningful general term is extracted by utilizing the importance of the term. Hereinafter, a specific description will be given. In the representative term extraction processing of this example, the “term vector” extracted in advance from a document database belonging to the relevant field
Use This "term vector" is a list of terms that characterize the document, and is described in "Salton, G., et al .:
A Vector Space Model for Automatic Indexing, Comm
unications of the ACM, Vol. 18, No. 11 (1975). "
t Frequency) can be extracted.

【００７５】このｔｆ・ｉｄｆ法は、文書のインデキシ
ング方法として最もよく知られているものの一つであ
り、ある文書におけるタームの出現頻度（ｔｆ）と、当
該タームが出現した文書数の逆数（ｉｄｆ）をかけた値
を当該文書におけるタームの重要度とし、当該文書にお
いて重要度の高いターム（以下、重要ターム）を抽出し
てタームベクトルとする技術である。The tf · idf method is one of the most well-known document indexing methods, and includes the frequency of occurrence (tf) of a term in a certain document and the reciprocal (idf) of the number of documents in which the term appears. ) Is used as the importance of a term in the document, and a term having a high importance (hereinafter, important term) in the document is extracted and used as a term vector.

【００７６】文書の重要度を計算する式を数２に示す。The equation for calculating the importance of a document is shown in equation (2).

【数２】尚、数２の式中分母はｔｆ・ｉｄｆの値をタームが出現
した文書のサイズで正規化するための値である。(Equation 2) The denominator in the equation (2) is a value for normalizing the value of tf · idf by the size of the document in which the term appears.

【００７７】代表ターム抽出処理では、タームベクトル
を構成するタームを抽出して、それぞれ重要タームとな
った文書の数をカウントし、多くの文書で重要タームと
なったタームを一般的なタームとして抽出する。次に、
代表ターム抽出処理の詳細な処理手順例について図２０
を用いて説明する。In the representative term extraction processing, terms constituting a term vector are extracted, the number of documents each having an important term is counted, and terms that have become important terms in many documents are extracted as general terms. I do. next,
FIG. 20 shows a detailed processing procedure example of the representative term extraction processing.
This will be described with reference to FIG.

【００７８】図２０は、代表ターム抽出処理の詳細な処
理手順例を示すフローチャートである。図２０に示すよ
うに代表ターム抽出処理では、まず文書データ格納部２
ａに格納された文書の数を示す変数ｉに１をセットして
初期化し（ステップ１ｃ０１）、ｉ番目の文書のターム
ベクトルを構成する重要タームをタームベクトル格納部
２ｃから取得して、重要タームが既に代表タームリスト
３ｃにあれば文書数３ｃ０２を１増やし、なければ文書
数３ｃ０２を１として代表タームリスト３ｃに格納する
（ステップ１ｃ０２）。FIG. 20 is a flowchart showing an example of the detailed processing procedure of the representative term extraction processing. As shown in FIG. 20, in the representative term extraction processing, first, the document data storage unit 2
A variable i indicating the number of documents stored in a is set to 1 and initialized (step 1c01), important terms constituting the term vector of the i-th document are obtained from the term vector storage unit 2c, and important terms are acquired. If is already in the representative term list 3c, the number of documents 3c02 is incremented by one, and if not, the number of documents 3c02 is set to 1 and stored in the representative term list 3c (step 1c02).

【００７９】そしてｉを１増やして（ステップ１ｃ０
３）、ｉが文書数より小さければ（ステップ１ｃ０４）
ステップ１ｃ０２に戻る。ｉが文書数を超えれば、代表
タームリスト３ｃを文書数の降順にソートし（ステップ
１ｃ０５）、代表タームリスト３ｃの上位から指定され
た数だけ残して、後の重要タームを削除して処理を終了
する（ステップ１ｃ０６）。Then, i is increased by 1 (step 1c0).
3) If i is smaller than the number of documents (step 1c04)
It returns to step 1c02. If i exceeds the number of documents, the representative term list 3c is sorted in descending order of the number of documents (step 1c05), leaving only the number specified from the top of the representative term list 3c, deleting important terms later, and processing. The process ends (step 1c06).

【００８０】次に、シソーラス概観生成処理におけるタ
ームクラスタ生成処理の説明として、まずそのタームク
ラスタ生成処理の概要について述べ、次にフローチャー
トを用いて詳細に説明する。タームの集合からクラスタ
を生成するためにタームクラスタ生成処理では、まず、
クラスタリング対象のタームを｛ｔ１，ｔ２，・・・，
ｔｍ｝とし、初期状態としてｎ個のクラスタｃ１＝｛ｔ
１｝、Ｃ２＝｛ｔ２｝、・・・、｛ｔｍ｝を生成する。Next, as an explanation of the term cluster generation processing in the thesaurus overview generation processing, first, an outline of the term cluster generation processing will be described, and then a detailed description will be given using a flowchart. In order to generate a cluster from a set of terms, in the term cluster generation process, first,
The terms to be clustered are {t1, t2, ...,
tm}, and n clusters c1 = {t
1}, C2 = {t2},..., {Tm}.

【００８１】そしてクラスタ総数が指定された数と等し
くなるまで関連度が最大であるタームクラスタの対を選
択し、一つのタームクラスタにマージする処理を繰り返
す。ここで二つのクラスタＣとＤの関連度ｒｅｌ（Ｃ，
Ｄ）は、例えば、Ｃに属するタームｔとＤに属するター
ムＳの関連度Ｒ（ｔ，ｓ）のうち、最大値をタームクラ
スタの関連度ｒｅｌ（Ｃ，Ｄ）とするなどの技術があ
る。Then, a pair of the term clusters having the highest degree of association is selected until the total number of clusters becomes equal to the designated number, and the process of merging into a single term cluster is repeated. Here, the degree of association rel (C,
D) includes, for example, a technique of setting the maximum value among the relevance R (t, s) of the term t belonging to C and the term S belonging to D to the relevance rel (C, D) of the term cluster. .

【００８２】次に、タームクラスタ生成処理の詳細な処
理手順について図２１を用いて説明する。図２１は、図
１６（ａ）のシソーラスブラウジング用データ生成処理
におけるタームクラスタ生成処理の詳細な処理手順例を
示すフローチャートである。Next, a detailed processing procedure of the term cluster generation processing will be described with reference to FIG. FIG. 21 is a flowchart illustrating an example of a detailed processing procedure of the term cluster generation processing in the thesaurus browsing data generation processing of FIG.

【００８３】本図２１に示すようにタームクラスタ生成
処理では、まずターム間の関連度を格納した相関行列を
生成する（ステップ１ｅ０１）。次に１つのタームから
なるタームクラスタを生成して出力タームクラスタリス
ト３ｆを初期化し（ステップ１ｅ０２）、タームクラス
タ数を示す変数Ｎにクラスタリングするターム数をセッ
トして初期化する（ステップ１ｅ０３）。As shown in FIG. 21, in the term cluster generation processing, first, a correlation matrix storing the degree of association between terms is generated (step 1e01). Next, a term cluster consisting of one term is generated, the output term cluster list 3f is initialized (step 1e02), and the number of terms to be clustered is set to a variable N indicating the number of term clusters to initialize (step 1e03).

【００８４】Ｎ個のタームクラスタのうち関連度の高い
タームクラスタの組Ｃ１及びＣ２を取得して（ステップ
１ｅ０４）、マージし（ステップ１ｅ０５）、Ｎを１減
らす（ステップ１ｅ０６）。Ｎが指定されたタームクラ
スタ数と同じでないなら（ステップ１ｅ０７）ステップ
１ｅ０４に戻る。同じであれば処理を終了する。A set of term clusters C1 and C2 having a high degree of association among the N term clusters are obtained (step 1e04), merged (step 1e05), and N is reduced by 1 (step 1e06). If N is not the same as the designated term cluster number (step 1e07), the process returns to step 1e04. If they are the same, the process ends.

【００８５】ここで、ステップ１ｅ０１の相関行列生成
処理と、ステップ１ｅ０２のタームクラスタ初期化処
理、１ｅ０４のタームクラスタ選択処理、及びステップ
１ｅ０５のタームクラスタマージ処理について、それぞ
れフローチャートを用いて詳細に説明する。まず、ター
ムクラスタ生成処理における相関行列作成処理の詳細な
説明を行なう。Here, the correlation matrix generation processing in step 1e01, the term cluster initialization processing in step 1e02, the term cluster selection processing in 1e04, and the term cluster merge processing in step 1e05 will be described in detail with reference to flowcharts. . First, the correlation matrix creation processing in the term cluster generation processing will be described in detail.

【００８６】図２２は、図２１のタームクラスタ生成処
理における相関行列生成処理の詳細な処理手順例を示す
フローチャートである。尚、以下の相関行列生成処理の
フローチャートの説明において、タームリストとは、入
力が代表タームリスト３ｃのとき代表タームリスト、入
力が関連タームリスト３ｄのときは関連タームリストを
指す。FIG. 22 is a flowchart showing a detailed processing procedure example of the correlation matrix generation processing in the term cluster generation processing of FIG. In the following description of the flowchart of the correlation matrix generation process, the term list indicates a representative term list when the input is the representative term list 3c, and a related term list when the input is the related term list 3d.

【００８７】本図２２に示すように相関行列生成処理で
は、まず、相関行列の行番号を示す変数ｉに１をセット
して初期化し（ステップ１ｅ０１１）、相関行列のｉ行
ｉ列目の要素に０をセットする（ステップ１ｅ０１
２）。次に相関行列の列（又は行）番号を示すｊにｉ＋
１をセットし（ステップ１ｅ０１３）、タームリストの
ｉ番目のタームとｊ番目のタームの関連度を関連シソー
ラスから取得して（ステップ１ｅ０１４）、相関行列ｉ
行ｊ列目、及びｊ行ｉ列目にその値をセットする（ステ
ップ１ｅ０１５）。As shown in FIG. 22, in the correlation matrix generation processing, first, a variable i indicating the row number of the correlation matrix is set to 1 and initialized (step 1e011), and the element of the ith row and the ith column of the correlation matrix is set. Is set to 0 (step 1e01)
2). Next, j indicating the column (or row) number of the correlation matrix is i +
1 is set (step 1e013), the degree of association between the i-th term and the j-th term in the term list is obtained from the association thesaurus (step 1e014), and the correlation matrix i
The value is set to the j-th row and the i-th row of the j-th row (step 1e015).

【００８８】ｊを１増やして（ステップ１ｅ０１６）、
ｊがタームリストに格納されたターム数を超えれば（ス
テップ１ｅ０１７）、ステップ１ｅ０１８に進み、超え
なければステップ１ｅ０１４に戻る。ステップ１ｅ０１
８ではｉを１増やして、ｉがタームリストに格納された
ターム数を超えれば（ステップ１ｅ０１９）処理を終了
し、超えなければステップ１ｅ０１２に戻る。J is incremented by 1 (step 1e016),
If j exceeds the number of terms stored in the term list (step 1e017), the process proceeds to step 1e018, and if not, the process returns to step 1e014. Step 1e01
In step 8, i is incremented by 1. If i exceeds the number of terms stored in the term list (step 1e019), the process ends. If i does not exceed the number, the process returns to step 1e012.

【００８９】次に、タームクラスタ生成処理におけるタ
ームクラスタ初期化処理の詳細な説明を行なう。図２３
は、図２１のタームクラスタ生成処理におけるタームク
ラスタ初期化処理の詳細な処理手順例を示すフローチャ
ートである。本図２３に示すようにタームクラスタ初期
化処理ではまず、相関行列の行番号を示す変数ｉに１を
セットして初期化し（ステップ１ｅ０２１）、タームリ
ストｉ番目のタームを出力タームクラスタリスト３ｆの
ｉ番目のタームリスト３ｆ０３にセットする（ステップ
１ｅ０２２）。Next, the term cluster initialization processing in the term cluster generation processing will be described in detail. FIG.
22 is a flowchart illustrating a detailed processing procedure example of a term cluster initialization process in the term cluster generation process of FIG. 21. As shown in FIG. 23, in the term cluster initialization processing, first, a variable i indicating the row number of the correlation matrix is set to 1 and initialized (step 1e021), and the ith term in the term list is output from the output term cluster list 3f. It is set in the i-th term list 3f03 (step 1e022).

【００９０】相関行列ｉ行目に格納された関連度のうち
最大値を出力タームクラスタリスト３ｆのｉ番目の最大
値３ｆ０２にセットし（ステップ１ｅ０２３）、当該最
大値をとる列の番号を出力タームクラスタリスト３ｆの
ｉ番目のクラスタＩＤ３ｆ０１にセットする（ステップ
１ｅ０２４）。ｉを１増やして（ステップ１ｅ０２
５）、ｉがタームリストに格納されたターム数を超えれ
ば（ステップ１ｅ０２６）処理を終了し、超えなければ
ステップ１ｅ０２２に戻る。The maximum value of the relevance stored in the i-th row of the correlation matrix is set to the i-th maximum value 3f02 of the output term cluster list 3f (step 1e023), and the number of the column having the maximum value is output. It is set to the i-th cluster ID 3f01 of the cluster list 3f (step 1e024). i is increased by 1 (step 1e02
5) If i exceeds the number of terms stored in the term list (step 1e026), the process ends; otherwise, the process returns to step 1e022.

【００９１】この処理の具体的な結果が、図１４に示す
例である。尚、この図１４に示す例は、図１２（ｂ）に
示す関連タームリスト３ｄに基づくものである。すなわ
ち、関連タームリスト３ｄにおける第１番目の行のター
ム「ＩＭＦ」を出力タームクラスタリスト３ｆの１行目
のタームリスト３ｆ０３にセットし、相関行列３ｅの１
行目における最大の関連度である「１４．４」を最大関
連度３ｆ０２に、また、その最大関連度の列番号「３」
をクラスタＩＤ３ｆ０３にセットする。同様にして、出
力タームクラスタリスト３ｆの２行目には「ドル」、
「７．３」、「１」が、３行目には「ユーロ」、「１
４．４」、「１」がセットされる。FIG. 14 shows an example of a specific result of this processing. The example shown in FIG. 14 is based on the related term list 3d shown in FIG. That is, the term “IMF” in the first row in the related term list 3d is set in the term list 3f03 in the first row of the output term cluster list 3f, and the term “IMF” in the correlation matrix 3e is set to 1
The maximum relevance “14.4” in the row is set to the maximum relevance 3f02, and the column number “3” of the maximum relevance is set.
Is set to the cluster ID 3f03. Similarly, the second line of the output term cluster list 3f includes “dollar”,
“7.3” and “1” are replaced by “Euro” and “1” on the third line.
4.4 "and" 1 "are set.

【００９２】次に、タームクラスタ生成処理におけるタ
ームクラスタ選択処理の詳細な説明を行なう。図２４
は、図２１のタームクラスタ生成処理におけるタームク
ラスタ選択処理の詳細な処理手順例を示すフローチャー
トである。Next, the term cluster selection processing in the term cluster generation processing will be described in detail. FIG.
22 is a flowchart illustrating a detailed processing procedure example of a term cluster selection process in the term cluster generation process of FIG. 21.

【００９３】図２４に示すようにタームクラスタ選択処
理ではまず、出力タームクラスタリスト３ｆのインデッ
クスを示す変数ｉに１を、更に最も関連度の高い組とし
て選択するタームクラスタのインデックスを示す変数Ｃ
１とＣ２にそれぞれ０を、その関連度を格納する変数Ｍ
ａｘに０をセットして初期化する（ステップ１ｅ０４
１）。そして出力タームクラスタリスト３ｆのｉ番目に
格納された最大値とＭａｘの値を比較する（ステップ１
ｅ０４２）。As shown in FIG. 24, in the term cluster selection processing, first, 1 is added to the variable i indicating the index of the output term cluster list 3f, and the variable C indicating the index of the term cluster to be selected as the most relevant set.
A variable M that stores 0 in each of 1 and C2 and the degree of association
ax is set to 0 to initialize (step 1e04)
1). Then, the maximum value stored in the i-th of the output term cluster list 3f is compared with the value of Max (step 1).
e042).

【００９４】Ｍａｘの方が大きければステップ１ｅ０４
４に進み、Ｍａｘの方が小さければ、出力タームクラス
タリスト３ｆのｉ番目の最大値をＭａｘにセットし、ま
たＣ１にｉを、Ｃ２に出力タームクラスタリスト３ｆの
ｉ番目のクラスタＩＤをセットする（ステップ１ｅ０４
３）。If Max is larger, step 1e04
Then, if Max is smaller, the i-th maximum value of the output term cluster list 3f is set to Max, i is set to C1, and the i-th cluster ID of the output term cluster list 3f is set to C2. (Step 1e04
3).

【００９５】ステップ１ｅ０４４ではｉを１増やして、
ｉが出力タームクラスタリスト３ｆに格納されたターム
クラスタ数を超えれば（ステップ１ｅ０４５）、処理を
終了する。超えなければ、ステップ１ｅ０４２に戻る。In step 1e044, i is increased by 1, and
If i exceeds the number of term clusters stored in the output term cluster list 3f (step 1e045), the process ends. If not, the process returns to step 1e042.

【００９６】次にタームクラスタ生成処理におけるター
ムクラスタマージ処理の詳細な説明を行なう。図２５
は、図２１のタームクラスタ生成処理におけるタームク
ラスタマージ処理の詳細な処理手順例を示すフローチャ
ートである。尚、以下のフローチャートの説明において
Ｃ１及びＣ２はマージするタームクラスタのインデック
スである。Next, the term cluster merge processing in the term cluster generation processing will be described in detail. FIG.
22 is a flowchart illustrating a detailed processing procedure example of a term cluster merge process in the term cluster generation process of FIG. 21. In the following description of the flowchart, C1 and C2 are the index of the term cluster to be merged.

【００９７】図２５に示すようにタームクラスタマージ
処理ではまず、相関行列３ｅの列番号を示す変数ｉに１
をセットして初期化する（ステップ１ｅ０５１）。次に
相関行列Ｃ１行ｉ列の関連度と相関行列Ｃ２行ｉ列の関
連度を比較し（ステップ１ｅ０５２）、相関行列Ｃ２行
ｉ列の関連度の方が小さければステップ１ｅ０５４に進
み、大きければ、その値を相関行列Ｃ１行ｉ列の要素に
セットする（ステップ１ｅ０５３）。As shown in FIG. 25, in the term cluster merge processing, first, a variable i indicating the column number of the correlation matrix 3e is set to 1
Is set and initialized (step 1e051). Next, the degree of association of the correlation matrix C1 row i column is compared with the degree of association of the correlation matrix C2 row i column (step 1e052). If the degree of association of the correlation matrix C2 row i column is smaller, the process proceeds to step 1e054; , Is set to the element of the correlation matrix C1 row i column (step 1e053).

【００９８】例えば、図１３の相関行列３ｅの４列目に
おける１行目（Ｃ１）と３行目（Ｃ２）では、それぞれ
の関連度は「４．１」と「８．６」であり、相関行列３
ｅの４列目における１行目（Ｃ１）において、「４．
１」を削除して「８．６」にセットし、３行目（Ｃ２）
の「８．６」を「０」にセットする。For example, in the first row (C1) and the third row (C2) in the fourth column of the correlation matrix 3e in FIG. 13, the relevance is “4.1” and “8.6”, respectively. Correlation matrix 3
e, in the first row (C1) in the fourth column, "4.
"1" is deleted and set to "8.6", and the third line (C2)
Is set to “0”.

【００９９】ステップ１ｅ０５４では、相関行列Ｃ２行
ｉ列目の要素に０をセットし、ｉを１増やして（ステッ
プ１ｅ０５５）、ｉがタームリストのターム数を超えれ
ば（ステップ１ｅ０５６）、ステップ１ｅ０５７に進
み、超えなければ、ステップ１ｅ０５２に戻る。In step 1e054, 0 is set to the element in the 2nd row and the ith column of the correlation matrix C, and i is incremented by 1 (step 1e055). If i exceeds the number of terms in the term list (step 1e056), the process proceeds to step 1e057. If not, return to step 1e052.

【０１００】ステップ１ｅ０５７では、出力タームクラ
スタリスト３ｆのＣ１番目のタームリストに、出力ター
ムクラスタリストＣ２番目のタームリストに格納された
タームを追加し、さらに相関行列３ｅのＣ１行目に格納
された関連度のうちの最大値となる列の番号を、出力タ
ームクラスタリスト３ｆのＣ１番目の「最大関連度のク
ラスタＩＤ３ｆ０１」に、また、その値を「最大関連度
３ｆ０２」に格納する。そして出力タームクラスタリス
ト３ｆのＣ２番目の「タームリスト３ｆ０３」の情報を
削除し、「最大値をとるタームクラスタＩＤ３ｆ０１」
と「最大関連度３ｆ０２」に「０」をセットして（ステ
ップ１ｅ０５８）処理を終了する。In step 1e057, the terms stored in the output term cluster list C2 and the term stored in the second term list of the output term cluster list 3f are added to the C1 term list, and further stored in the C1 line of the correlation matrix 3e. The number of the column having the maximum value of the relevance is stored in the C1st “cluster ID 3f01 of the maximum relevance” in the output term cluster list 3f, and the value is stored in the “maximum relevance 3f02”. Then, the information of the C2th “term list 3f03” of the output term cluster list 3f is deleted, and the “term cluster ID 3f01 having the maximum value” is deleted.
Is set to "0" for "maximum relevance 3f02" (step 1e058), and the process ends.

【０１０１】例えば、図１４の出力タームクラスタリス
ト３ｆにおいて、１番目（１行目）のタームリスト３ｆ
０３（「ＩＭＦ」）に２行目のタームリストの「ドル」
を追加し、最大関連度のクラスタＩＤ３ｆ０１に「３」
を、また最大関連度３ｆ０２に「１４．４」を格納し、
２行目のタームリストの「ドル」を削除し、最大関連度
のクラスタＩＤ３ｆ０１および最大関連度３ｆ０２に
「０」を格納する。この操作を繰り返し、また、ターム
リストが削除されて最大関連度のクラスタＩＤ３ｆ０１
および最大関連度３ｆ０２に「０」が格納された行を削
除することにより、図１４（ｂ）に示す内容となる。
尚、各行におけるターム数は「１０」等、所定の数とす
る。For example, in the output term cluster list 3f of FIG. 14, the first (first line) term list 3f
03 (“IMF”) “$” in the term list on the second line
Is added, and “3” is added to the cluster ID 3f01 of the maximum relevance.
, And “14.4” in the maximum relevance 3f02,
“Dollar” in the term list on the second line is deleted, and “0” is stored in the cluster ID 3f01 and the maximum relevance 3f02 of the maximum relevance. This operation is repeated, and the term list is deleted and the cluster ID 3f01 having the highest relevance is set.
By deleting the row in which “0” is stored in the maximum relevance 3f02, the content shown in FIG. 14B is obtained.
Note that the number of terms in each row is a predetermined number such as "10".

【０１０２】次に、シソーラスブラウジング処理におけ
る関連ターム取得処理に関して、まずその概要について
述べ、次にフローチャートを用いて詳細を説明する。ユ
ーザが入力タームクラスタリスト３ｇから選択したター
ムクラスタに属するターム（以下、「種ターム」と呼
ぶ）の関連タームは、関連シソーラスを検索することに
よって取得できる。Next, an outline of the related term acquisition processing in the thesaurus browsing processing will be described first, and then details will be described with reference to a flowchart. The related terms of the terms belonging to the term cluster selected by the user from the input term cluster list 3g (hereinafter, referred to as “seed terms”) can be obtained by searching the related thesaurus.

【０１０３】取得された関連タームの総数が小さければ
（例えば３００個以内）、それらすべてをクラスタリン
グの対象すれば良い。一方で、その数が多いときは、一
例として各種タームから均等に関連ターム取得する。す
なわち、すべての種タームから関連タームを関連度の強
い順にＸタームずつ取得する。これにより、クラスタリ
ングするタームの数が予め定めた数の範囲内となるの
で、タームクラスタの生成のための計算量を抑えること
が可能となる。If the total number of acquired related terms is small (for example, within 300), all of them may be subjected to clustering. On the other hand, when the number is large, related terms are equally acquired from various terms as an example. That is, the related terms are acquired from all the seed terms in the order of the degree of relevance, X terms at a time. As a result, the number of terms to be clustered falls within a range of a predetermined number, so that it is possible to suppress the amount of calculation for generating the term cluster.

【０１０４】関連ターム取得処理の詳細な処理手順につ
いて図２６を用いて説明する。図２６は、図１６（ｂ）
のソーラスブラウジング処理における関連ターム取得処
理の詳細な処理手順例を示すフローチャートである。本
図２６に示すように関連ターム取得処理では、まず、入
力タームクラスタリスト３ｇのタームリスト３ｇ０１に
格納されたタームを関連タームリスト３ｄに格納する。
ここで、各タームを格納した行の順位３ｄ０２は１とす
る（ステップ１ｄ０１）。例えば、１５行目の「東京外
為市場」、２０行目の「ドル」などである。The detailed processing procedure of the related term acquisition processing will be described with reference to FIG. FIG. 26 shows the state shown in FIG.
It is a flowchart which shows the detailed example of a processing procedure of the related term acquisition process in the solar browsing process of FIG. As shown in FIG. 26, in the related term acquisition processing, first, the terms stored in the term list 3g01 of the input term cluster list 3g are stored in the related term list 3d.
Here, the rank 3d02 of the row storing each term is set to 1 (step 1d01). For example, "Tokyo Forex Market" on line 15 and "Dollar" on line 20 are shown.

【０１０５】そして、入力タームクラスタリスト３ｇの
タームリスト３ｇ０１に格納されたタームの総数を表す
変数ｉに１をセットして初期化し（ステップ１ｄ０
２）、入力タームクラスタリスト３ｇのタームリスト３
ｇ０１に格納されたタームのうち、ｉ番目のタームの関
連タームを関連シソーラス格納部２ｂを検索して取得
し、それらを関連度の降順にソートして順位付けして関
連タームリスト３ｄの末尾に格納する（ステップ１ｄ０
３）。その結果、例えば、図１２（ａ）の関連タームリ
スト３ｄにおける１５〜１８行目のようになる。The variable i representing the total number of terms stored in the term list 3g01 of the input term cluster list 3g is set to 1 for initialization (step 1d0).
2), term list 3 of input term cluster list 3g
Among the terms stored in g01, the related terms of the i-th term are obtained by searching the related thesaurus storage unit 2b, and they are sorted and ranked in descending order of the degree of relevance, and placed at the end of the related term list 3d. Store (step 1d0
3). As a result, for example, lines 15 to 18 in the related term list 3d in FIG.

【０１０６】さらにｉを１増やして（ステップ１ｄ０
４）。ｉがタームリスト３ｇ０１に格納されたタームの
総数を超えれば（ステップ１ｄ０５）、ステップ１ｄ０
６に進み、超えなければステップ１ｄ０３に戻る。ステ
ップ１ｄ０６では関連タームリスト３ｄを順位の昇順で
ソートする。この結果、図１２（ｂ）における関連ター
ムリスト３ｄのようになるが、この時点では、重複した
関連タームが存在している可能性があるので、さらに、
以下の処理を行なう。Further, i is increased by 1 (step 1d0
4). If i exceeds the total number of terms stored in the term list 3g01 (step 1d05), step 1d0
Go to step 6, if not exceeded, return to step 1d03. In step 1d06, the related term list 3d is sorted in ascending order of rank. As a result, the related term list 3d shown in FIG. 12B is obtained.
The following processing is performed.

【０１０７】すなわち、関連タームリスト３ｄのインデ
ックスを示す変数ｉに１を、関連タームのうちクラスタ
リング対象として選択したタームの数を表す変数ｊに０
をセットして初期化する（ステップ１ｄ０７）。そし
て、関連タームリスト３ｄのｉ番目のタームがリストの
１〜ｉ−１番目に無ければ（ステップ１ｄ０８）、ｊを
１増やして（ステップ１ｄ０９）ステップ１ｄ１１に進
む。関連タームリスト３ｄにあれば、関連タームリスト
３ｄのｉ番目の順位を０にセットする（ステップ１ｄ１
０）。ステップ１ｄ１１ではｉを１増やして（ステップ
１ｄ１１）、ｉが関連タームリスト３ｄに格納された関
連ターム数を超えれば（ステップ１ｄ１２）、ステップ
１ｄ１５へ進む。That is, 1 is set to a variable i indicating the index of the related term list 3d, and 0 is set to a variable j indicating the number of terms selected as a clustering target among the related terms.
Is set and initialized (step 1d07). If the i-th term in the related term list 3d is not the first to i-1st terms (step 1d08), j is incremented by 1 (step 1d09), and the process proceeds to step 1d11. If it is in the related term list 3d, the i-th order of the related term list 3d is set to 0 (step 1d1).
0). In step 1d11, i is incremented by 1 (step 1d11). If i exceeds the number of related terms stored in the related term list 3d (step 1d12), the process proceeds to step 1d15.

【０１０８】それ以外は、関連タームリスト３ｄのｉ−
１番目の順位とｉ番目の順位が同じであれば（ステップ
１ｄ１３）、ステップ１ｄ０８に戻る。同じでなければ
ステップ１ｄ１４に進み、ｊがクラスタリングするター
ムの数と同じになれば、ｉ番目以降の関連ターム及び順
位が０の関連タームを削除して（ステップ１ｄ１５）、
処理を終了する。しかし、ｊがクラスタリングするター
ムの数と同じでないなら、ステップ１ｄ０８に戻る。こ
のようにして、図１２（ｂ）の関連タームリスト３ｄは
重複の無いものとなる。[0108] Otherwise, i- in the related term list 3d
If the first order and the i-th order are the same (step 1d13), the process returns to step 1d08. If they are not the same, the process proceeds to step 1d14, and if j becomes equal to the number of terms to be clustered, the i-th and related terms and the related terms whose rank is 0 are deleted (step 1d15),
The process ends. However, if j is not the same as the number of terms to be clustered, the process returns to step 1d08. In this way, the related term list 3d in FIG. 12B has no duplication.

【０１０９】以上、本例のシソーラスブラウジング方法
の処理手順について説明した。以下、このようなシソー
ラスブラウジングに係わる具体的な画面に関して説明す
る。図２７は、図１におけるシソーラスブラウジングシ
ステムで表示される画面の構成例を示す説明図である。
本画面は、図６に示す経済分野の関連シソーラスをブラ
ウジングしている例であり、図２７（ａ）はシソーラス
概観の表示状態、図２７（ｂ）はタームクラスタのズー
ミング状態である。The processing procedure of the thesaurus browsing method of this example has been described above. Hereinafter, a specific screen related to such thesaurus browsing will be described. FIG. 27 is an explanatory diagram illustrating a configuration example of a screen displayed by the thesaurus browsing system in FIG. 1.
This screen is an example in which the related thesaurus in the economic field shown in FIG. 6 is being browsed. FIG. 27A shows a display state of the thesaurus overview, and FIG. 27B shows a zooming state of the term cluster.

【０１１０】図２７（ａ）に示すように経済分野のシソ
ーラス概観として、「景気」、「売り上げ」、「消費
者」などからなるタームクラスタや、「税制」、「所
得」、「国税庁」等からなるタームクラスタなどが表示
されている。ここでユーザが第３番目のタームクラスタ
に興味を持ち、当該タームクラスタを選択し、＜ズーム
＞ボタンをクリックしてズーミングを指示すると、画面
は図２７（ｂ）に示す状態となる。As shown in FIG. 27A, an overview of the thesaurus in the economic field includes term clusters composed of “economics”, “sales”, “consumers”, etc., “tax system”, “income”, “National Tax Agency”, etc. Are displayed. Here, when the user is interested in the third term cluster, selects the term cluster, and clicks the <zoom> button to instruct zooming, the screen changes to the state shown in FIG. 27B.

【０１１１】図２７（ｂ）は、図２７（ａ）中の第３の
タームクラスタを選択してズーミングした状態であり、
「ＩＭＦ」、「ドル」、「外貨」等からなるタームクラ
スタや、「円」、「東京外為市場」、「円買い」等から
なるタームクラスタ等が表示されている。このようにユ
ーザはある特定のタームを入力せずとも、関連シソーラ
スを効率よくブラウジングすることができる。FIG. 27B shows a state in which the third term cluster in FIG. 27A is selected and zoomed.
A term cluster composed of “IMF”, “dollar”, “foreign currency”, etc., a term cluster composed of “yen”, “Tokyo forex market”, “yen purchase”, etc. are displayed. In this manner, the user can browse the related thesaurus efficiently without inputting a specific term.

【０１１２】次に、このような実施形態の第１〜３の変
形例を説明する。まず、第１の変形例として、代表ター
ム抽出処理の変形例を説明する。第１の変形例にかかる
シソーラスブラウジングシステムのハードウェア構成は
図４に示した実施形態と同じである。またモジュール構
成は、図１におけるモジュール構成からタームベクトル
抽出部１ｂ及びタームベクトル格納部２ｂを除いた形態
と同じである。Next, first to third modified examples of such an embodiment will be described. First, a modified example of the representative term extraction processing will be described as a first modified example. The hardware configuration of the thesaurus browsing system according to the first modification is the same as that of the embodiment shown in FIG. The module configuration is the same as the module configuration in FIG. 1 except that the term vector extraction unit 1b and the term vector storage unit 2b are omitted.

【０１１３】第１の変形例において代表ターム抽出部１
ｃの変形例は、文書の構成要素を手がかりにして代表タ
ームを抽出する。具体的には、文書を特徴付けるターム
が出現しやすい構成要素からタームを抽出して、その出
現文書数が多いタームを代表タームとして代表タームリ
スト３ｃに格納する。例えば、文書データ格納部２ａに
格納されているデータが新聞記事であれば、第１文すな
わち見出しに出現したタームを抽出し、また、特許明細
書であれば、要約書又は従来の技術と課題からタームを
抽出して重要タームとする。In the first modification, the representative term extraction unit 1
In a modified example of c, a representative term is extracted by using constituent elements of a document as clues. Specifically, terms are extracted from constituent elements in which terms that characterize documents tend to appear, and terms having a large number of appearing documents are stored as representative terms in the representative term list 3c. For example, if the data stored in the document data storage unit 2a is a newspaper article, the term appearing in the first sentence, that is, the headline, is extracted. Terms are extracted from the terms to be important terms.

【０１１４】この第１の変形例における代表ターム抽出
部１ｃの変形例によれば、一般的なタームを含む文書の
構成要素が明らかな新聞記事のような場合に、効率よく
シソーラス概観を生成することができる。According to the modified example of the representative term extracting unit 1c in the first modified example, in the case of a newspaper article in which the components of a document including general terms are clear, a thesaurus overview is efficiently generated. be able to.

【０１１５】次に、第２の変形例として、ユーザによる
タームクラスタの編集処理を含むシソーラスブラウジン
グシステムについて説明する。第２の変形例にかかるシ
ソーラスブラウジングシステムのハードウェア構成及び
モジュール構成は図４及び図１に示すものと同じであ
る。第２の変形例において、関連ターム取得部１ｄに入
力される入力タームクラスタ３ｇは、表示されたターム
クラスタのうち、ユーザが選択してタームを追加・削除
したものである。Next, as a second modified example, a thesaurus browsing system including a process of editing a term cluster by a user will be described. The hardware configuration and module configuration of the thesaurus browsing system according to the second modified example are the same as those shown in FIGS. In the second modified example, the input term cluster 3g input to the related term acquisition unit 1d is one in which the user selects and adds / deletes a term from the displayed term clusters.

【０１１６】例えば、図２７に示す画面例において、
「ズーム」ボタンの隣に「編集」ボタンを設け、ユーザ
が第３番目のタームクラスタ（「円、為替、日銀、外
貨」、・・・）と、「編集」ボタンを選択すると、第３
番目のタームクラスタの各ターム（「円」、「為替」、
「日銀」、「外貨」、・・・」）の一覧表と、「追加」
ボタン、「削除」ボタンが設けられた「編集画面」を別
ウィンドウに表示する。For example, in the screen example shown in FIG.
An "edit" button is provided next to the "zoom" button, and when the user selects the third term cluster ("yen, foreign exchange, BOJ, foreign currency", ...) and the "edit" button, the third term cluster is displayed.
Terms in the second term cluster ("Yen", "Exchange",
"BOJ", "foreign currency", ...)) and "Add"
An "edit screen" provided with a button and a "delete" button is displayed in another window.

【０１１７】ユーザは、この「編集画面」上で「追加」
ボタンと「削除」ボタンを操作して、一覧表の各ターム
の内、必要なものだけ、例えば、「円」、「為替」、
「日銀」のみを残した新たなタームクラスタを作成す
る。そして、ユーザが「終了」ボタン等を選択すると、
新たなタームクラスタに対するズーミングを行なう。The user clicks “Add” on this “edit screen”.
Operate the button and the "Delete" button to select only the necessary terms from the list, such as "Yen", "Exchange",
Create a new term cluster leaving only the "BOJ". Then, when the user selects the “end” button or the like,
Perform zooming on the new term cluster.

【０１１８】このように、第２の変形例によれば、入力
タームクラスタ３ｇに対するユーザの編集処理が可能と
なり、ユーザはより自分の興味に合わせてシソーラスを
ズーミングすることができ、使い勝手の良いシソーラス
のブラウジングを達成することができる。As described above, according to the second modification, the user can edit the input term cluster 3g, the user can zoom the thesaurus in accordance with his / her interest, and the user-friendly thesaurus can be used. Browsing can be achieved.

【０１１９】さらに、第３の変形例として、ユーザが選
択したタームクラスタのズーミング結果を表示する際、
種ターム（選択した元のタームクラスタに属するター
ム）は他の関連タームとは区別可能な状態で表示するシ
ソーラスブラウジング技術について説明する。Further, as a third modification, when displaying the zooming result of the term cluster selected by the user,
A thesaurus browsing technique for displaying seed terms (terms belonging to the selected original term cluster) in a state that is distinguishable from other related terms is described.

【０１２０】第３の変形例にかかるシソーラスブラウジ
ングシステムのハードウェア構成及び処理手順の概要
は、図１〜図２７で説明したものと同様である。本第３
の変形例において、生成された出力タームクラスタ３ｆ
は、種タームと他のタームを区別可能なよう、色、もし
くはフォント等を変えて表示する。ここで種タームと
は、タームクラスタリングの入力となったタームクラス
タに属するタームであり、シソーラス概観から現在表示
中のタームクラスタに至るまでにユーザが選択したター
ムクラスタのいずれかに属する。The outline of the hardware configuration and processing procedure of the thesaurus browsing system according to the third modification is the same as that described with reference to FIGS. Book 3
Of the output term cluster 3f
Is displayed in a different color or font so that the seed term can be distinguished from other terms. Here, the seed term is a term belonging to the term cluster that has been input for term clustering, and belongs to any of the term clusters selected by the user from the thesaurus overview to the currently displayed term cluster.

【０１２１】このようにするとにより、ブラウジングの
履歴を階層的に表示可能な階層シソーラスとは異なり、
“迷子”になりやすいネットワーク型のシソーラスにお
いても、ユーザは現在の状態を参照するだけでこれまで
にどのようなタームを辿ったかを容易に知ることができ
る。In this way, unlike a hierarchical thesaurus in which browsing histories can be displayed hierarchically,
Even in a network-type thesaurus that is easily lost, the user can easily know what terms have been traced so far only by referring to the current state.

【０１２２】以上、ある分野に属する文書データから自
動的に生成した関連シソーラスをブラウジングするシソ
ーラスブラウジングシステムと方法について説明した。
次に本発明の第２の実施形態例として、ユーザによるキ
ータームの入力がある場合のシソーラスブラウジング技
術について説明する。The thesaurus browsing system and method for browsing a related thesaurus automatically generated from document data belonging to a certain field have been described.
Next, as a second embodiment of the present invention, a thesaurus browsing technique when a user inputs a key term will be described.

【０１２３】まず、図２８を用いて本第２の実施形態に
かかるシソーラスブラウジングシステムのモジュール構
成を示し、次に図２９を用いてシソーラスブラウジング
方法の処理手順について述べ、最後に図３０を用いて本
第２の実施形態におけるシソーラスブラウジングの画面
の一例を示す。尚、本第２の実施形態にかかるシソーラ
スブラウジングシステムのハードウェア構成は図４に示
す第１の実施形態と同じで良い。First, the module configuration of the thesaurus browsing system according to the second embodiment will be described with reference to FIG. 28. Next, the processing procedure of the thesaurus browsing method will be described with reference to FIG. 29, and finally, with reference to FIG. 13 shows an example of a thesaurus browsing screen according to the second embodiment. The hardware configuration of the thesaurus browsing system according to the second embodiment may be the same as that of the first embodiment shown in FIG.

【０１２４】図２８は、本発明のシソーラスブラウジン
グシステムの本発明に係る構成の第２の実施形態例を示
すブロック図である。図２８に示すように、本シソーラ
スブラウジングシステムは、処理部として関連ターム取
得部１ｄ及びタームクラスタ生成部１ｅ、データ格納部
として関連シソーラス格納部２ｂ、データとしてターム
３ｈ、関連タームリスト３ｄ、相関行列３ｅ、及び出力
タームクラスタリスト３ｆから構成される。FIG. 28 is a block diagram showing a second embodiment of the thesaurus browsing system according to the present invention. As shown in FIG. 28, the present thesaurus browsing system includes a related term acquisition unit 1d and a term cluster generation unit 1e as processing units, a related thesaurus storage unit 2b as a data storage unit, a term 3h as data, a related term list 3d, and a correlation matrix. 3e and an output term cluster list 3f.

【０１２５】これらのうち、ターム３ｈはユーザが入力
したタームである。それ以外の構成要素の概要は図１に
示す例と同じである。このような構成のシソーラスブラ
ウジングシステムの処理動作を図２９と図３０を用いて
説明する。Among these, the term 3h is a term input by the user. The outline of the other components is the same as the example shown in FIG. The processing operation of the thesaurus browsing system having such a configuration will be described with reference to FIGS.

【０１２６】図２９は、図２８におけるシソーラスブラ
ウジングシステムの本発明に係わる処理手順例を示すフ
ローチャートである。図２９に示すように、まず関連シ
ソーラスを検索して、ユーザが入力したターム３ｈの関
連タームを取得し関連タームリストに格納し（ステップ
２０１）、関連タームリストに格納されたタームをクラ
スタリングして（ステップ２０２）、タームクラスタを
ユーザに表示する（ステップ２０３）。FIG. 29 is a flowchart showing an example of a processing procedure according to the present invention of the thesaurus browsing system in FIG. As shown in FIG. 29, first, the related thesaurus is searched, the related term of the term 3h input by the user is acquired and stored in the related term list (step 201), and the terms stored in the related term list are clustered. (Step 202), the term cluster is displayed to the user (Step 203).

【０１２７】これらの処理のうち、ステップ２０１の関
連ターム取得処理は、１つのタームからなるタームクラ
スタがユーザから選択されたとすれば、第１の実施形態
例における関連ターム取得処理と同様であるので、説明
を省略する。同様に、ステップ２０２におけるタームク
ラスタ生成処理は、第１の実施形態例におけるタームク
ラスタ生成処理と同じであるので、説明を省略する。Of these processes, the related term acquisition process in step 201 is the same as the related term acquisition process in the first embodiment if the user selects a term cluster consisting of one term. The description is omitted. Similarly, the term cluster generation processing in step 202 is the same as the term cluster generation processing in the first embodiment, and a description thereof will be omitted.

【０１２８】図３０は、図２８におけるシソーラスブラ
ウジングシステムで表示される画面の構成例を示す説明
図である。図３０で示す画面は、図６に示す経済分野の
関連シソーラスにキーターム「円高」を入力してブラウ
ジングしている例である。本図３０に示すように「円
高」の関連タームには「ドル安」、「東京外為市場」等
があるが、それらは関連の強いものがまとまって表示さ
れている。FIG. 30 is an explanatory diagram showing a configuration example of a screen displayed by the thesaurus browsing system in FIG. The screen shown in FIG. 30 is an example in which the key term “yen appreciation” is input to the related thesaurus in the economic field shown in FIG. 6 for browsing. As shown in FIG. 30, the terms related to “yen appreciation” include “dollar depreciation”, “Tokyo foreign exchange market”, and the like.

【０１２９】例えば、第１のタームクラスタとして「ド
ル安」、「貿易不均衡」、「黒字減らし」等、第２のタ
ームクラスタとして「東京外為市場」、「円買い」、
「差益」があることがわかる。このように関連タームが
いくつかのタームクラスタに分割されて表示されるの
で、ユーザは効率よく関連タームを参照することができ
る。以上、第２の実施形態例として、ユーザによるター
ムの入力がある場合の関連シソーラスのブラウジングに
ついて説明した。For example, as the first term cluster, “dollar depreciation”, “trade imbalance”, “reduction of surplus”, etc., as the second term cluster, “Tokyo foreign exchange market”, “yen buying”,
You can see that there is a “margin”. As described above, the related terms are divided into several term clusters and displayed, so that the user can efficiently refer to the related terms. As described above, the browsing of the related thesaurus when the user inputs a term has been described as the second embodiment.

【０１３０】以上、図１〜図３０を用いて説明したよう
に、本実施例のシソーラスブラウジングシステムと方法
では、一般的なタームから構成されるシソーラスの概観
（全体構造）を表示し、このシソーラスの概観からより
限定的なタームを含む部分構造へユーザをナビゲートす
る。これにより、ユーザは、関連シソーラスの概観を容
易に把握でき、更に興味を持ったタームクラスタのズー
ミングにより、効率的に関連シソーラスをブラウジング
することができる。As described above with reference to FIGS. 1 to 30, the thesaurus browsing system and method according to the present embodiment displays an overview (overall structure) of a thesaurus composed of general terms. Navigate the user from the overview to a substructure that contains more specific terms. Thus, the user can easily grasp the outline of the related thesaurus, and can browse the related thesaurus efficiently by zooming the term cluster in which the user is interested.

【０１３１】すなわち、関連シソーラスの検索要求が漠
然としている、或いは検索対象の関連シソーラスについ
て熟知していない等の理由で適切な検索タームを入力で
きないユーザでも、表示されたシソーラスの概観を参照
すれば全体の構造を把握でき、更に興味を持ったターム
クラスタを選択してズーミングを指示すれば、シソーラ
スをブラウジングすることができる。これにより、全体
の構造を把握しにくく、かつブラウジング中に迷子にな
りやすいネットワーク型の関連シソーラスを効率的に参
照することができる。That is, even if a user cannot input an appropriate search term because the related thesaurus search request is vague or the user is not familiar with the related thesaurus to be searched, the user can refer to the displayed overview of the thesaurus. By grasping the overall structure and selecting a more interesting term cluster and instructing zooming, the thesaurus can be browsed. As a result, it is difficult to grasp the entire structure, and it is possible to efficiently refer to a network-type related thesaurus that is easily lost during browsing.

【０１３２】また、関連シソーラスを文書データベース
から生成した場合、従来の文書データベース検索では不
可能であったデータベースそのもののブラウジングが可
能となる。すなわち、従来、文書データベースを参照す
る技術として、格納されている文書を検索して本文を参
照する、格納されているタイトル等の書誌情報の一覧を
参照する等があるが、これらの技術はいずれも格納され
ている文書それぞれに対応するデータを参照するもので
あり、文書データベースに格納された個々のデータより
も全体の特徴を把握したいユーザ（例えば、ある年度の
特許データベースの傾向を知りたいユーザ）にとって有
用な技術とは言えなかった。When the related thesaurus is generated from the document database, it is possible to browse the database itself, which cannot be performed by the conventional document database search. That is, conventionally, as a technique for referring to a document database, there are a technique of searching for a stored document to refer to a text, and a technique of referring to a list of stored bibliographic information such as titles. The user refers to data corresponding to each of the stored documents, and the user wants to grasp the overall characteristics rather than the individual data stored in the document database (for example, the user wants to know the tendency of the patent database in a certain year). It was not a useful technique for).

【０１３３】これに対して、本例のように、文書データ
ベースから自動的に生成した関連シソーラスは当該デー
タベースに特徴的な関連関係を抽出可能であることか
ら、関連シソーラスをブラウジングすれば当該データベ
ースの特徴を把握することが可能となる。例えば、ある
年度の特許データベースから生成した関連シソーラスは
「音声認識」と「カーナビゲーション」の関連度が強い
などを把握できる。また、従来技術は関連シソーラスの
部分構造のみを表示するものであったが、本例では、関
連シソーラス全体をブラウジングすることができる。On the other hand, as in the present example, a related thesaurus automatically generated from a document database can extract a characteristic related relationship in the database, so that browsing the related thesaurus enables the related thesaurus to be extracted. Features can be grasped. For example, in a related thesaurus generated from a patent database for a certain year, it is possible to grasp that the degree of relevance between “voice recognition” and “car navigation” is strong. Further, in the related art, only the partial structure of the related thesaurus is displayed, but in this example, the entire related thesaurus can be browsed.

【０１３４】尚、本発明は、図１〜図３０を用いて説明
した実施例に限定されるものではなく、その要旨を逸脱
しない範囲において種々変更可能である。例えば、本例
では、経済分野の文書データの関連シソーラスについて
説明したが、技術文書のデータベースや特許データベー
スなどの関連シソーラスについても同様に適用すること
ができる。The present invention is not limited to the embodiment described with reference to FIGS. 1 to 30 and can be variously modified without departing from the gist thereof. For example, in this example, a related thesaurus of document data in the economic field has been described, but the present invention can be similarly applied to a related thesaurus such as a technical document database or a patent database.

【０１３５】[0135]

【発明の効果】本発明によれば、関連シソーラス中の一
般的なタームで構成されたシソーラスの概観をユーザに
提供し、更にユーザが興味を持った部分構造をより詳細
に表示することができ、従来の文書データベース検索と
は異なる「文書データベースのブラウジング」という新
しい情報処理技術をユーザに提供することが可能とな
り、ユーザは表示されたシソーラスの概観から当該文書
データベースの全体的な特徴をつかみ、更に興味を持っ
た部分をズーミングすることにより当該文書データベー
スの詳細な特徴について知ることができ、関連シソーラ
スの利用効率を向上させることが可能である。According to the present invention, it is possible to provide a user with an overview of a thesaurus composed of general terms in a related thesaurus, and to display a partial structure of interest to the user in more detail. It is possible to provide the user with a new information processing technology called “browsing of a document database” different from the conventional document database search, and the user can grasp the overall characteristics of the document database from the displayed thesaurus, Further, by zooming in the part of interest, it is possible to know the detailed characteristics of the document database, and it is possible to improve the use efficiency of the related thesaurus.

[Brief description of the drawings]

【図１】本発明のシソーラスブラウジングシステムの本
発明に係る構成の第１の実施形態例を示すブロック図で
ある。FIG. 1 is a block diagram showing a first embodiment of a configuration according to the present invention of a thesaurus browsing system of the present invention.

【図２】関連シソーラスの一例を示す説明図である。FIG. 2 is an explanatory diagram illustrating an example of a related thesaurus.

【図３】図１におけるシソーラスブラウジングシステム
の本発明に係わる処理の概要を示す説明図である。FIG. 3 is an explanatory diagram showing an outline of a process according to the present invention of the thesaurus browsing system in FIG. 1;

【図４】図１におけるシソーラスブラウジングシステム
のハードウェア構成例を示するブロック図である。FIG. 4 is a block diagram illustrating a hardware configuration example of the thesaurus browsing system in FIG. 1;

【図５】図１における文書データ格納部の構成例を示す
説明図である。FIG. 5 is an explanatory diagram showing a configuration example of a document data storage unit in FIG. 1;

【図６】図１における関連シソーラス格納部の構成例を
示す説明図である。FIG. 6 is an explanatory diagram showing a configuration example of a related thesaurus storage unit in FIG. 1;

【図７】図１におけるタームベクトル格納部の構成例を
示す説明図である。FIG. 7 is an explanatory diagram showing a configuration example of a term vector storage unit in FIG. 1;

【図８】図１におけるシソーラス概観格納部の構成例を
示す説明図である。8 is an explanatory diagram illustrating a configuration example of a thesaurus overview storage unit in FIG. 1. FIG.

【図９】図１における共起タームテーブルの構成例を示
す説明図である。FIG. 9 is an explanatory diagram showing a configuration example of a co-occurrence term table in FIG. 1;

【図１０】図１における文書タームテーブルの構成例を
示す説明図である。FIG. 10 is an explanatory diagram showing a configuration example of a document term table in FIG. 1;

【図１１】図１における代表タームリストの構成例を示
す説明図である。FIG. 11 is an explanatory diagram showing a configuration example of a representative term list in FIG. 1;

【図１２】図１における関連タームリストの構成例を示
す説明図である。FIG. 12 is an explanatory diagram showing a configuration example of a related term list in FIG. 1;

【図１３】図１における相関行列の構成例を示す説明図
である。FIG. 13 is an explanatory diagram showing a configuration example of a correlation matrix in FIG. 1;

【図１４】図１における出力タームクラスタリストの構
成例を示す説明図である。FIG. 14 is an explanatory diagram showing a configuration example of an output term cluster list in FIG. 1;

【図１５】図１における入力タームクラスタリストの構
成例を示す説明図である。FIG. 15 is an explanatory diagram showing a configuration example of an input term cluster list in FIG. 1;

【図１６】本発明のシソーラスブラウジング方法の処理
手順例を示すフローチャートである。FIG. 16 is a flowchart illustrating a processing procedure example of the thesaurus browsing method of the present invention.

【図１７】図１６（ａ）のシソーラスブラウジング用デ
ータ生成処理における関連シソーラス生成処理の詳細な
処理手順例を示すフローチャートである。FIG. 17 is a flowchart illustrating a detailed processing procedure example of a related thesaurus generation process in the thesaurus browsing data generation process of FIG.

【図１８】図１６（ａ）のシソーラスブラウジング用デ
ータ生成処理におけるタームベクトル抽出処理の詳細な
処理手順例を示すフローチャートである。FIG. 18 is a flowchart illustrating a detailed processing procedure example of a term vector extraction process in the thesaurus browsing data generation process of FIG. 16A.

【図１９】図１６（ａ）のシソーラスブラウジング用デ
ータ生成処理におけるシソーラス概観生成処理の詳細な
処理手順例を示すフローチャートである。FIG. 19 is a flowchart illustrating a detailed example of a thesaurus overview generation process in the thesaurus browsing data generation process of FIG. 16A.

【図２０】代表ターム抽出処理の詳細な処理手順例を示
すフローチャートである。FIG. 20 is a flowchart illustrating a detailed processing example of a representative term extraction process.

【図２１】図１６（ａ）のシソーラスブラウジング用デ
ータ生成処理におけるタームクラスタ生成処理の詳細な
処理手順例を示すフローチャートである。FIG. 21 is a flowchart illustrating a detailed processing procedure example of a term cluster generation process in the thesaurus browsing data generation process of FIG. 16A.

【図２２】図２１のタームクラスタ生成処理における相
関行列生成処理の詳細な処理手順例を示すフローチャー
トである。FIG. 22 is a flowchart illustrating a detailed processing example of a correlation matrix generation process in the term cluster generation process of FIG. 21;

【図２３】図２１のタームクラスタ生成処理におけるタ
ームクラスタ初期化処理の詳細な処理手順例を示すフロ
ーチャートである。23 is a flowchart illustrating a detailed processing procedure example of a term cluster initialization process in the term cluster generation process of FIG. 21;

【図２４】図２１のタームクラスタ生成処理におけるタ
ームクラスタ選択処理の詳細な処理手順例を示すフロー
チャートである。FIG. 24 is a flowchart illustrating a detailed processing procedure example of a term cluster selection process in the term cluster generation process of FIG. 21;

【図２５】図２１のタームクラスタ生成処理におけるタ
ームクラスタマージ処理の詳細な処理手順例を示すフロ
ーチャートである。FIG. 25 is a flowchart illustrating a detailed processing procedure example of a term cluster merge process in the term cluster generation process of FIG. 21;

【図２６】図１６（ｂ）のソーラスブラウジング処理に
おける関連ターム取得処理の詳細な処理手順例を示すフ
ローチャートである。FIG. 26 is a flowchart showing a detailed processing procedure example of a related term acquisition process in the solar browsing process of FIG. 16 (b).

【図２７】図１におけるシソーラスブラウジングシステ
ムで表示される画面の構成例を示す説明図である。FIG. 27 is an explanatory diagram showing a configuration example of a screen displayed by the thesaurus browsing system in FIG. 1;

【図２８】本発明のシソーラスブラウジングシステムの
本発明に係る構成の第２の実施形態例を示すブロック図
である。FIG. 28 is a block diagram showing a second embodiment of the configuration of the thesaurus browsing system of the present invention according to the present invention.

【図２９】図２８におけるシソーラスブラウジングシス
テムの本発明に係わる処理手順例を示すフローチャート
である。FIG. 29 is a flowchart illustrating an example of a processing procedure according to the present invention of the thesaurus browsing system in FIG. 28;

【図３０】図２８におけるシソーラスブラウジングシス
テムで表示される画面の構成例を示す説明図である。FIG. 30 is an explanatory diagram showing a configuration example of a screen displayed by the thesaurus browsing system in FIG. 28;

[Explanation of symbols]

１：ＣＰＵ、２：ハードディスク、３：メモリ、４ａ：
ディスプレイ、４ｂ：ディスプレイ制御部、５ａ：キー
ボード、５ｂ：キーボード制御部、６ａ：マウス、６
ｂ：マウス制御部、７：バス、１ａ：関連シソーラス生
成部、１ｂ：タームベクトル抽出部、１ｃ：代表ターム
取得部、１ｄ：関連ターム取得部、１ｅ：タームクラス
タ生成部、２ａ：文書データ格納部、２ｂ：関連シソー
ラス格納部、２ｃ：タームベクトル格納部、２ｄ：シソ
ーラス概観格納部、３ａ：共起タームテーブル、３ｂ：
文書タームテーブル、３ｃ：代表タームリスト、３ｄ：
関連タームリスト、３ｅ：相関行列、３ｆ：出力ターム
クラスタリスト、３ｇ：入力タームクラスタリスト、３
ｈ：ターム、２ａ０１：文書データ、２ｂ０１：ターム
Ｘ、２ｂ０２：タームＹ、２ｂ０３：関連度、２ｃ０
１：文書ＩＤ、２ｃ０２：重要タームリスト、２ｄ０
１：タームリスト、３ａ０１：タームＸ、３ａ０２：タ
ームＹ、３ａ０３：共起頻度、３ｂ０１：文書ＩＤ、３
ｂ０２：ターム、３ｂ０３：出現頻度、３ｃ０１：代表
ターム、３ｃ０２：文書数、３ｄ０１：関連ターム、３
ｄ０２：順位、３ｆ０１：最大関連度のクラスタＩＤ、
３ｆ０２：最大関連度、３ｆ０３：タームリスト、３ｇ
０１：タームリスト。1: CPU, 2: Hard disk, 3: Memory, 4a:
Display, 4b: display controller, 5a: keyboard, 5b: keyboard controller, 6a: mouse, 6
b: mouse control unit, 7: bus, 1a: related thesaurus generation unit, 1b: term vector extraction unit, 1c: representative term acquisition unit, 1d: related term acquisition unit, 1e: term cluster generation unit, 2a: document data storage Section, 2b: related thesaurus storage section, 2c: term vector storage section, 2d: thesaurus overview storage section, 3a: co-occurrence term table, 3b:
Document term table, 3c: representative term list, 3d:
Related term list, 3e: correlation matrix, 3f: output term cluster list, 3g: input term cluster list, 3
h: term, 2a01: document data, 2b01: term X, 2b02: term Y, 2b03: degree of association, 2c0
1: Document ID, 2c02: Important term list, 2d0
1: term list, 3a01: term X, 3a02: term Y, 3a03: co-occurrence frequency, 3b01: document ID, 3
b02: Term, 3b03: Frequency of appearance, 3c01: Representative term, 3c02: Number of documents, 3d01: Related term, 3
d02: ranking, 3f01: cluster ID of maximum relevance,
3f02: maximum relevance, 3f03: term list, 3g
01: Term list.

フロントページの続き (72)発明者森本康嗣東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 (72)発明者山崎紀之神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア事業部内 (72)発明者飯田恵子神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア事業部内 (72)発明者内田安彦神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア事業部内Ｆターム(参考） 5B075 ND02 ND35 NR02 NR12 PP13 PQ02 PQ42 PQ46 QP03 5E501 AC33 AC34 BA05 CA02 CB09 EA10 EB05 EB20 FA03 FA06Continued on the front page (72) Inventor Yasushi Morimoto 1-280 Higashi Koigakubo, Kokubunji-shi, Tokyo Inside the Hitachi, Ltd. Central Research Laboratory (72) Inventor Noriyuki Yamazaki 5030 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Software of Hitachi, Ltd. Within the Business Unit (72) Inventor Keiko Iida 5030 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Within the Software Division, Hitachi, Ltd. (72) Inventor Yasuhiko Uchida 5030 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Software, Ltd. F-term in business division (reference) 5B075 ND02 ND35 NR02 NR12 PP13 PQ02 PQ42 PQ46 QP03 5E501 AC33 AC34 BA05 CA02 CB09 EA10 EB05 EB20 FA03 FA06

Claims

[Claims]

1. A thesaurus browsing system for searching and displaying a thesaurus storing a plurality of terms having a relational relationship, wherein an overview term cluster obtained by extracting and clustering a plurality of general terms representing an overview of the thesaurus is provided. A thesaurus overview generating means for displaying, and a thesaurus zooming means for displaying a limited term cluster obtained by clustering by extracting terms related to each of the general terms belonging to the overview term cluster selected by the user. Thesaurus browsing system.

2. The thesaurus browsing system according to claim 1, wherein terms related to each of the limited terms belonging to the limited term cluster selected by the user are extracted and clustered, and a newly obtained limited term cluster is obtained. A thesaurus browsing system comprising a second thesaurus zooming means for displaying.

3. The thesaurus browsing system according to claim 1 or 2,
The thesaurus overview generating means is a representative term obtaining means for extracting the representative terms that are the general terms from the terms having a relational relationship, and collectively collects those having a high degree of association from the representative terms extracted by the representative term obtaining means. A term cluster generating means for generating an overview term cluster.

4. The thesaurus browsing system according to claim 1, wherein said thesaurus zooming means extracts a term related to a general term belonging to an overview term cluster selected by said user. Related term acquisition means for extracting terms from the thesaurus, term cluster generation means for acquiring the degree of relevance between terms extracted by the related term acquisition means from the thesaurus, and summing terms having a high degree of relevance to generate the limited term cluster; And a thesaurus browsing system comprising:

5. A thesaurus browsing system for searching and displaying a thesaurus storing a plurality of terms having a related relationship, wherein related terms acquiring means for extracting terms related to a term input by a user from the thesaurus. And a term cluster generating means for obtaining a degree of relevance between terms extracted by the related term obtaining means from the thesaurus and generating a term cluster by summing up terms having a high degree of relevance, wherein the term cluster generating means A thesaurus browsing system characterized by displaying the generated term cluster.

6. The thesaurus browsing system according to claim 5, further comprising: a thesaurus zooming means for extracting terms related to each of the terms belonging to the term cluster selected by the user and displaying a limited-term cluster obtained by clustering. Thesaurus browsing system characterized by the following.

7. A thesaurus browsing method for a device for searching and displaying a thesaurus storing a plurality of terms having a related relationship, wherein a plurality of general terms representing an overview of the thesaurus are extracted and clustered. A thesaurus overview generation processing step of displaying a cluster; and a thesaurus zooming processing step of extracting terms related to each general term belonging to the overview term cluster selected by the user and displaying a limited-term cluster obtained by clustering. A thesaurus browsing method characterized by the above-mentioned.

8. The thesaurus browsing method according to claim 7, wherein terms related to each of the limited terms belonging to the limited term cluster selected by the user are extracted and clustered, and a newly obtained limited term cluster is obtained. A thesaurus browsing method comprising a second thesaurus zooming processing step for displaying.

9. The thesaurus browsing method according to claim 7, wherein the thesaurus overview generating step extracts a representative term that is the general term from terms having a related relationship. A thesaurus browsing method, comprising: a representative term acquisition processing step; and a term cluster generation processing step of generating the overview term cluster by combining those having high relevance from the representative terms extracted in the representative term acquisition processing step. .

10. The thesaurus browsing method according to any one of claims 7 to 9, wherein the thesaurus zooming processing step includes the step of recognizing a term that is related to a general term belonging to an overview term cluster selected by the user. Related term acquisition processing step extracted from the thesaurus, and the degree of association between the terms extracted in the related term acquisition processing step is acquired from the thesaurus,
And a term cluster generation processing step of generating the limited term cluster by combining terms having a high degree of relevance.

11. The term browsing method according to any one of claims 7 to 10, wherein a term cluster editing processing step of adding / deleting a term belonging to the overview term cluster based on an editing instruction from a user. And performing a process in the thesaurus zooming process step on the overview term cluster selected and edited by the user.

12. The thesaurus browsing method according to claim 7, wherein the general term cluster selected by the user is selected from the terms belonging to the limited term cluster displayed in the thesaurus zooming processing step. A thesaurus browsing method, comprising a term cluster display processing step of identifying and displaying the terms to which the user belongs.

13. A thesaurus browsing method for a device for searching and displaying a thesaurus storing a plurality of terms having a related relationship, wherein the term related to the term input by a user is extracted from the thesaurus. Acquisition processing step, acquiring the degree of relevance between terms extracted by the related term acquisition means from the thesaurus,
A term cluster generating step of generating a term cluster by grouping terms having a high degree of relevance, and displaying the term cluster generated by the term cluster generating means.

14. The thesaurus browsing method according to claim 13, further comprising a thesaurus zooming processing step of extracting terms related to each of the terms belonging to the term cluster selected by the user and displaying a limited-term cluster obtained by clustering. A thesaurus browsing method characterized by the above-mentioned.

15. A recording medium for recording a computer-readable program and data, wherein the program records a program for causing a computer to execute the processing of each step according to any one of claims 7 to 14. A recording medium characterized by the above-mentioned.