JP2000242662A

JP2000242662A - Data base preparation device and data base retrieval device

Info

Publication number: JP2000242662A
Application number: JP11045312A
Authority: JP
Inventors: Norihiro Minegishi; 則宏嶺岸; Ikuko Takanashi; 郁子高梨; Satoshi Tanaka; 聡田中
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-02-23
Filing date: 1999-02-23
Publication date: 2000-09-08
Anticipated expiration: 2019-02-23
Also published as: JP4037001B2

Abstract

PROBLEM TO BE SOLVED: To automatically prepare an index without affecting an index reference by automatically generating a relation degree between words included in data based on the index reference supplied to a data base and performing appropriate weighting to the relation degree further. SOLUTION: A word relation degree map preparation device 7 checks the appearing frequency of the word used by the index reference 4 for the data 1, defines that the simultaneously appearing words are related to each other, calculates the relation degree corresponding to a prescribed calculation formula and obtains a word relation degree map 8. A word importance degree imparting device 10 divides the number of the words appearing in an extract by the number of the words appearing in a title and performs weighting to the words in the title for the portion of the value. An index preparation device 6 temporarily corrects the relation degree of the word relation degree map 8 by weighting and generates the index 9 to the data 1 by using the corrected relation degree.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データベースのイ
ンデクスを自動的に作成するデータベース作成装置、お
よび作成されたデータベースに対してカテゴリーを絞り
ながら検索を行うデータベース検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database creation apparatus for automatically creating an index of a database, and a database search apparatus for searching a created database while narrowing a category.

【０００２】[0002]

【従来の技術】図１３は、従来の類似検索装置を示すブ
ロック構成図である。この検索装置は、属性と属性値の
ペアで表現されたデータを蓄えているデータベース１７
と、類似データの検索の前にデータベース１７中のデー
タからデータ量を第１のインデクスを用いて絞り込む第
１検索部１５と、属性値間の類似度の範囲と属性の重要
度から類似度の範囲を計算し、第１検索部１５で検索さ
れたデータを第２のインデクスを基に類似検索する第２
検索部１６と、類似度範囲にしたがって第１のインデク
スを変更する第１のインデクスの変更手段を備えた推論
処理部１４と、第１インデクスの類似度値を設定し、類
似度値に基づいて、第１インデクスのレベルを決定する
第１インデクス生成部１８と、入力装置１１と、出力装
置１２と、入出力制御部１３とから構成される。2. Description of the Related Art FIG. 13 is a block diagram showing a conventional similarity search apparatus. This search device stores a database 17 storing data represented by pairs of attributes and attribute values.
And a first search unit 15 for narrowing down the data amount from the data in the database 17 using a first index before searching for similar data, a similarity range between attribute values and a similarity A second range for calculating a range and performing a similar search on the data searched for by the first search unit 15 based on the second index
A search unit 16, an inference processing unit 14 including a first index changing unit that changes a first index according to a similarity range, and a similarity value of the first index are set, and based on the similarity value. , A first index generator 18 that determines the level of the first index, an input device 11, an output device 12, and an input / output controller 13.

【０００３】この検索装置は、第１インデクスの類似度
値を設定し、類似度値に基づいて第１インデクスのレベ
ルを決定し、属性値間の類似度の範囲と第２インデクス
を基に類似度を計算し、計算された類似度範囲にしたが
って第１インデクスを変更して、類似検索を行う。この
ような検索装置は、たとえば特開平６−１７６０７２号
公開公報に開示されている。This search device sets a similarity value of a first index, determines a level of the first index based on the similarity value, and sets a similarity based on a range of similarity between attribute values and a second index. The degree is calculated, the first index is changed according to the calculated similarity range, and a similarity search is performed. Such a search device is disclosed, for example, in Japanese Patent Laid-Open Publication No. Hei 6-176072.

【０００４】また、図１４は、従来の情報検索装置を示
すブロック構成図である。この検索装置は、キーワード
メモリ２４からの索引キーワード行列と相関度に応じて
修正された検索ベクトルとを内積する演算器１９と、そ
の結果を部分的線形に変換する部分線形器２０と、部分
線形器２０の出力ベクトルとキーワード行列Ｘとを積す
る第２演算器２１と、積結果の各要素に対して０，１に
正規化する正規化器２２と、演算器１９に１回フィード
バックする前の正規化器２２の出力ベクトルと比較する
コントローラ２３と、修正された部分線形器２０の出力
ベクトルの位置の対応するアドレスに基づき、データベ
ース２５から所望データを読み出す読み出し器２６と、
その所望データを表示する表示器２７とから構成され
る。FIG. 14 is a block diagram showing a conventional information retrieval apparatus. The search device includes an arithmetic unit 19 for inner product of an index keyword matrix from a keyword memory 24 and a search vector corrected in accordance with the degree of correlation, a partial linearizer 20 for converting the result to a partial linear form, Computing unit 21 for multiplying the output vector of unit 20 and keyword matrix X, normalizing unit 22 for normalizing each element of the product result to 0 and 1, and before feeding back once to computing unit 19 A controller 23 that compares the output vector of the normalizer 22 with the output vector of the normalizer 22, a readout unit 26 that reads out desired data from the database 25 based on the corresponding address of the position of the output vector of the corrected partial linearizer 20,
And a display 27 for displaying the desired data.

【０００５】この検索装置は、具体的には図１５に示す
構成からなり、キーワード入力部２８から入力されたキ
ーワードを蓄積部３３に蓄積するとともに、それを演算
器２４で数値ベクトル化し、キーワード相関テーブル３
５を参照して、相関度に応じた、より関連のあるキーワ
ードを見つけ、変換器２９において、最初に入力された
キーワードを、その見つけたキーワードに変換する。そ
して、検索装置は、検索器３０により新たにそのキーワ
ードを検索条件としてデータベース３６を検索し、読み
出し器３１で読み出し、選択器３７を介して選択された
結果を表示器３２に表示する。このような検索装置は、
たとえば特開平８−８７５０８号公開公報に開示されて
いる。[0005] This search device has a configuration shown in FIG. 15. The keyword input from the keyword input unit 28 is stored in the storage unit 33, and the keyword is converted into a numerical vector by the arithmetic unit 24, and the keyword correlation is obtained. Table 3
Referring to FIG. 5, a more relevant keyword corresponding to the degree of correlation is found, and the converter 29 converts the first input keyword into the found keyword. Then, the search device searches the database 36 using the keyword as a new search condition using the search device 30, reads the database 36 using the read device 31, and displays the result selected via the selector 37 on the display device 32. Such a search device,
For example, it is disclosed in JP-A-8-87508.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述し
た類似検索装置では、単語単位の類似度を求めるために
は属性と属性値のペアの状態でデータが格納されていな
ければならないという問題点や、属性の重要度のよう
に、使用者の勘や経験に基づく目標や見本、あるいは使
用者の意図が反映された目標や見本を設定しなければな
らないため、使用者によって得られる結果が異なるとい
う問題点がある。However, in the above-described similarity search apparatus, there is a problem that data must be stored in a state of a pair of an attribute and an attribute value in order to obtain a similarity in word units. Problems such as the importance of attributes, because it is necessary to set goals and samples based on the intuition and experience of the user, or goals and samples that reflect the intentions of the user, resulting in different results depending on the user There is a point.

【０００７】また、上述した情報検索装置では、何らか
の方法により算出した一般的な相関度合いを示すキーワ
ードの相関テーブルを用いて、入力されたキーワードを
別のキーワードに変換しているため、異なる分野のデー
タであっても文字列が同じ単語であれば同じ相関になっ
てしまい、分野に応じた適切な結果が得られないという
問題点がある。これを回避するために仮に相関テーブル
を修正すると、キーワードの空間全体に影響がおよび、
全ての検索に対して性能が向上するとは限らない。In the information retrieval apparatus described above, an input keyword is converted into another keyword using a correlation table of keywords indicating a general degree of correlation calculated by some method. Even in the case of data, if the character string is the same word, the correlation becomes the same, and there is a problem that an appropriate result according to the field cannot be obtained. Modifying the correlation table to avoid this would affect the entire keyword space,
Performance is not always improved for all searches.

【０００８】本発明は、上記問題点を解決するためにな
されたもので、データベースに与えたインデクス基準に
基づいて、データに含まれる単語間の関連度を自動的に
生成し、さらにその関連度に対して適切な重み付けを行
うことによって、インデクス基準に影響を及ぼすことな
く自動的にインデクスを作成するデータベース作成装
置、およびそのデータベースに対してカテゴリーを絞り
ながら検索を行うデータベース検索装置を得ることを目
的とする。The present invention has been made to solve the above problems, and automatically generates the degree of relevance between words included in data based on an index criterion given to a database, and furthermore, the degree of relevance. A database creation device that automatically creates an index without affecting index criteria by performing appropriate weighting on a database, and a database search device that searches the database while narrowing down categories. Aim.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、データベースにデータを入力するための
データ入力装置と、インデクスの基準となる構成を規定
したインデクス基準を入力するためのインデクス基準読
込装置と、入力されたデータに対して、前記インデクス
基準で使用されている単語の出現頻度を調べ、同時に出
現する単語について関連度を算出し、その関連度と前記
インデクス基準とに基づいて単語関連度マップを作成す
る単語関連度マップ作成装置と、入力されたデータの第
１の文書中に出現する単語数を、その第１の文書の要約
または見出しとなる第２の文書中に出現する単語数で除
し、得られた値の分だけ前記第２の文書中の単語に対し
て重み付けを行う単語重要度付与装置と、前記単語重要
度付与装置により得られた重み付けを用いて、前記単語
関連度マップ作成装置により作成された単語関連度マッ
プの関連度を一時的に修正し、その修正された単語関連
度マップを用いて、前記入力データに対してインデクス
を作成するインデクス作成装置と、を具備することを特
徴とする。In order to achieve the above object, the present invention provides a data input device for inputting data to a database, and an index for inputting an index reference defining a configuration serving as an index reference. The reference reading device and, for the input data, check the appearance frequency of the word used in the index criterion, calculate the degree of relevance for words that appear simultaneously, based on the degree of relevance and the index criterion. A word relevance map creating apparatus for creating a word relevance map, and the number of words appearing in a first document of input data is found in a second document serving as a summary or heading of the first document Divided by the number of words to be assigned, and a word importance assigning device for weighting words in the second document by the obtained value, and a word importance assigning device. Using the weighting obtained, the relevance of the word relevance map created by the word relevance map creating device is temporarily corrected, and the corrected word relevance map is used to correct the input data. An index creation device for creating an index.

【００１０】この発明によれば、データ入力装置により
データが入力されるとともに、インデクス基準読込装置
によりインデクス基準が入力されると、単語関連度マッ
プ作成装置は、入力データに対して、インデクス基準で
使用されている単語の出現頻度を調べ、同時に出現する
単語について関連度を算出し、その関連度とインデクス
基準とに基づいて単語関連度マップを作成する。また、
単語重要度付与装置は、入力データの第１の文書中に出
現する単語数を、その第１の文書の要約または見出しと
なる第２の文書中に出現する単語数で除し、得られた値
の分だけ第２の文書中の単語に対して重み付けを行う。
また、インデクス作成装置は、その重み付けを用いて単
語関連度マップの関連度を一時的に修正し、その修正さ
れた単語関連度マップを用いて入力データに対してイン
デクスを作成する。[0010] According to the present invention, when data is input by the data input device and the index criterion is input by the index criterion reading device, the word relevance map creator sets the input data on the index criterion. The appearance frequency of the used word is checked, the relevance is calculated for the words that appear at the same time, and a word relevance map is created based on the relevance and the index criterion. Also,
The word importance assigning device is obtained by dividing the number of words appearing in the first document of the input data by the number of words appearing in the second document serving as a summary or heading of the first document. The words in the second document are weighted by the value.
Further, the index creation device temporarily corrects the relevance of the word relevance map using the weighting, and generates an index for the input data using the corrected word relevance map.

【００１１】また本発明は、データベースにデータを入
力するためのデータ入力装置と、インデクスの基準とな
る構成を規定したインデクス基準を入力するためのイン
デクス基準読込装置と、入力されたデータに対して、前
記インデクス基準で使用されている単語の出現頻度を調
べ、同時に出現する単語について関連度を算出し、その
関連度と前記インデクス基準とに基づいて単語関連度マ
ップを作成する単語関連度マップ作成装置と、入力され
たデータの第１の文書中に出現する単語数を、その第１
の文書の要約または見出しとなる第２の文書中に出現す
る単語数で除し、得られた値の分だけ前記第２の文書中
の単語に対して重み付けを行う単語重要度付与装置と、
前記単語重要度付与装置により得られた重み付けを用い
て、前記単語関連度マップ作成装置により作成された単
語関連度マップの関連度を一時的に修正し、その修正さ
れた単語関連度マップを用いて、前記入力データに対し
てインデクスを作成するインデクス作成装置と、前記イ
ンデクス作成装置により作成されたインデクスに基づい
て検索を行うデータ検索装置と、その検索結果を表示す
る結果表示装置と、を具備することを特徴とする。Further, the present invention provides a data input device for inputting data to a database, an index reference reading device for inputting an index reference defining a configuration serving as an index reference, and an input device for inputting data. Checking the frequency of appearance of words used in the index criterion, calculating the relevance of words that appear simultaneously, and creating a word relevance map based on the relevance and the index criterion Device and the number of words appearing in the first document of the input data
A word importance assigning device that divides by the number of words appearing in the second document serving as the summary or heading of the document and weights the words in the second document by the obtained value;
Using the weighting obtained by the word importance assigning device, the relevance of the word relevance map created by the word relevance map creating device is temporarily corrected, and the corrected word relevance map is used. An index creation device that creates an index for the input data, a data search device that performs a search based on the index created by the index creation device, and a result display device that displays the search result. It is characterized by doing.

【００１２】この発明によれば、データ入力装置により
データが入力されるとともに、インデクス基準読込装置
によりインデクス基準が入力されると、単語関連度マッ
プ作成装置は、入力データに対して、インデクス基準で
使用されている単語の出現頻度を調べ、同時に出現する
単語について関連度を算出し、その関連度とインデクス
基準とに基づいて単語関連度マップを作成する。また、
単語重要度付与装置は、入力データの第１の文書中に出
現する単語数を、その第１の文書の要約または見出しと
なる第２の文書中に出現する単語数で除し、得られた値
の分だけ第２の文書中の単語に対して重み付けを行う。
また、インデクス作成装置は、その重み付けを用いて単
語関連度マップの関連度を一時的に修正し、その修正さ
れた単語関連度マップを用いて入力データに対してイン
デクスを作成する。そして、データ検索装置は、作成さ
れたインデクスに基づいて検索を行い、結果表示装置
は、その検索結果を表示する。[0012] According to the present invention, when data is input by the data input device and the index criterion is input by the index criterion reading device, the word relevance map creator sets the input data based on the index criterion. The appearance frequency of the used word is checked, the relevance is calculated for the words that appear at the same time, and a word relevance map is created based on the relevance and the index criterion. Also,
The word importance assigning device is obtained by dividing the number of words appearing in the first document of the input data by the number of words appearing in the second document serving as a summary or heading of the first document. The words in the second document are weighted by the value.
Further, the index creation device temporarily corrects the relevance of the word relevance map using the weighting, and generates an index for the input data using the corrected word relevance map. Then, the data search device performs a search based on the created index, and the result display device displays the search result.

【００１３】[0013]

【発明の実施の形態】以下、この発明にかかるデータベ
ース作成装置およびデータベース検索装置の実施の形態
について、添付図面を参照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of a database creation device and a database search device according to the present invention will be described below in detail with reference to the accompanying drawings.

【００１４】実施の形態１．図１は、本発明にかかるデ
ータベース作成装置の一例を示すブロック構成図であ
る。このデータベース作成装置は、データベース３、シ
ステム外部からデータベース３にデータ１を入力するた
めのデータ入力装置２、システム外部からインデクス基
準４を入力するためのインデクス基準読込装置５、入力
されたデータ１とインデクス基準４とに基づいて単語関
連度マップ８を作成する単語関連度マップ作成装置７、
入力されたデータ１と単語関連度マップ８とインデクス
基準４とに基づいてインデクス９を生成するインデクス
作成装置６、およびインデクス作成時に単語の重要度を
計算する単語重要度付与装置１０を備えている。Embodiment 1 FIG. 1 is a block diagram showing an example of a database creation device according to the present invention. The database creation device includes a database 3, a data input device 2 for inputting data 1 to the database 3 from outside the system, an index reference reading device 5 for inputting an index reference 4 from outside the system, and the input data 1 A word relevance map creating device 7 for creating a word relevance map 8 based on the index criterion 4;
An index creation device 6 that creates an index 9 based on the input data 1, the word relevance map 8, and the index criterion 4, and a word importance assignment device 10 that calculates the importance of a word when creating an index. .

【００１５】データベース３、単語関連度マップ８およ
びインデクス９は、たとえばハードディスク等の記憶装
置に格納される。また、データ入力装置２、インデクス
基準読込装置５、インデクス作成装置６、単語関連度マ
ップ作成装置７および単語重要度付与装置１０は、それ
ぞれコンピュータ・システムにおいて、たとえばデータ
入力プログラム、インデクス基準読込プログラム、イン
デクス作成プログラム、単語関連度マップ作成プログラ
ムおよび単語重要度付与プログラムが実行されることに
より実現される。The database 3, the word relevance map 8 and the index 9 are stored in a storage device such as a hard disk. The data input device 2, the index reference reading device 5, the index creating device 6, the word relevance map creating device 7, and the word importance assigning device 10 are, for example, a data input program, an index reference reading program, This is realized by executing an index creation program, a word relevance map creation program, and a word importance assignment program.

【００１６】インデクス基準４は、インデクスの構成を
示すものであり、一例として医学書のデータベースに対
するインデクス基準１００を図２に示す。この医学書イ
ンデクス基準１００は、たとえば最上層に「医学」とい
うタイトルがあり、その一つ下層に「基礎医学」、「内
科学」および「外科学」という３つのタイトルがあり、
さらに「基礎医学」の一つ下層に「解剖学」および「生
理学」があり、また「内科学」の一つ下層には「循環
器」および「消化器」があり、また「外科学」の一つ下
層には「局所外科」および「整形外科」があるというよ
うにインデクスがツリー構造をなすように構成されてい
る。この医学書インデクス基準１００のように、インデ
クス基準４もツリー構造をなすように構成されている。The index criterion 4 indicates the structure of the index. As an example, FIG. 2 shows an index criterion 100 for a medical book database. The medical book index standard 100 has, for example, a title "medicine" at the top layer, and three titles "basic medicine", "internal medicine" and "surgery" under one layer,
In addition, one layer below “Basic Medicine” has “Anatomy” and “Physiology”, and one layer below “Internal Medicine” has “Circulatory System” and “Gastrointestinal System”. The index is configured to form a tree structure such as “local surgery” and “orthopedics” at the lower layer. Like the medical book index standard 100, the index standard 4 is also configured to form a tree structure.

【００１７】なお、以下の説明では、この医学書インデ
クス基準１００を例にして具体的に説明するが、本発明
は、医学書に関するデータベースおよび医学書インデク
ス基準１００に限らないのはいうまでもない。In the following description, the medical book index standard 100 will be specifically described as an example. However, it is needless to say that the present invention is not limited to the medical book database and the medical book index standard 100. .

【００１８】単語関連度マップ８は、インデクス基準４
の階層関係に各単語間の関連度を付与したマップであ
る。単語関連度マップ作成装置７は、入力されたデータ
１に対して、たとえば医学書インデクス基準１００で使
用されている単語の出現頻度を調べ、それに基づいて所
定の計算を行い、図３に示すような単語関連度マップ１
０４を得る。単語関連度マップ作成装置７が単語関連度
マップ８を作成する方法を、図４に示すデータ１０１を
例にして具体的に説明する。The word relevance map 8 is based on the index criterion 4
5 is a map in which the degree of association between words is added to the hierarchical relationship of. The word relevance map creator 7 checks the frequency of appearance of words used in the medical data index standard 100 for the input data 1 and performs a predetermined calculation based on the frequency, as shown in FIG. Word Relevance Map 1
Get 04. The method by which the word relevance map creating device 7 creates the word relevance map 8 will be specifically described using the data 101 shown in FIG. 4 as an example.

【００１９】たとえば、データ１０１は３つの文書から
なり、文書１のタイトルは「循環器の話」であり、抄録
は「・・・循環器系の病気で最も恐いのは、解剖学的に
狭心症と心不全である。・・・」である。文書２のタイ
トルは「循環器系の病気とヘルニアの併発」であり、抄
録は「・・・解剖学的には、循環器が、・・・ヘルニア
については外科の医師の診察を受けること。・・・」で
ある。文書３のタイトルは「消化器と循環器」であり、
抄録は「・・・良くそしゃくしないと、消化器に炎症を
起こし、嘔吐する場合があります。嘔吐すると心臓に負
担をかけ、狭心症など循環器系の病気をもっていると、
・・・」である。For example, the data 101 is composed of three documents, and the title of the document 1 is "talk of the cardiovascular system", and the abstract is "... the scariest cardiovascular disease is anatomically narrow. Heart disease and heart failure ... ". The title of Document 2 is "Concurrent cardiovascular disease and hernia", and the abstract is "... anatomically, cardiovascular .... for hernia, consult a surgeon." ... ". Document 3 is titled “Digestive and Cardiovascular”
The abstract states, "... If you don't masticate well, you can cause inflammation in your digestive tract and vomiting. If you vomit, it puts a burden on your heart, and if you have a circulatory disease such as angina,
... ".

【００２０】これらの文書１〜３からそれぞれ単語のみ
を抽出すると、図４に示す単語列１０２のようになる。
すなわち、単語列１０２は、文書１では、タイトルに対
して「循環器」、抄録に対して「循環器、解剖学、狭心
症、心不全」となり、文書２では、タイトルに対して
「循環器、ヘルニア」、抄録に対して「解剖学、循環
器、ヘルニア、外科」となり、文書３では、タイトルに
対して「消化器、循環器」、抄録に対して「そしゃく、
消化器、嘔吐、心臓、狭心症、循環器」となる。When only words are extracted from these documents 1 to 3, a word string 102 shown in FIG. 4 is obtained.
That is, the word string 102 is “Cardiovascular” for the title in Document 1 and “Cardiovascular, anatomy, angina, heart failure” for the abstract, and “Cardiovascular” for the title in Document 2 , Hernia ”and“ anatomy, circulatory, hernia, and surgery ”for the abstract. In Document 3,“ Gastrointestinal and circulatory ”for the title and“ chewing,
Digestive, vomiting, heart, angina, circulatory.

【００２１】そして、１つの文書に同時に出現する各単
語間は相互に関係があるものとして、それらを共出現の
単語の組１０３とし、すべてのデータに対して処理をす
る。そして、共出現の単語の組１０３について、たとえ
ば、つぎの（１）式のように総出現回数に対する共出現
の比率、などを用いて関連度を定義する。ただし、ある
単語（「ＫＷ１」とする）の総出現頻度をＮ１とし、別
のある単語（「ＫＷ２」とする）の総出現頻度をＮ２と
し、「ＫＷ１」と「ＫＷ２」とが同時に出現する共出現
頻度をＮ１２とし、「ＫＷ１」と「ＫＷ２」との関連度
をμ１２とする。Assuming that words appearing simultaneously in one document are related to each other, they are set as a co-occurring word set 103, and processing is performed on all data. For the co-occurrence word set 103, the degree of relevance is defined using, for example, the ratio of co-occurrence to the total number of appearances, as in the following equation (1). However, the total appearance frequency of a certain word (referred to as “KW1”) is N1, the total appearance frequency of another certain word (referred to as “KW2”) is N2, and “KW1” and “KW2” appear simultaneously. Let the co-occurrence frequency be N12 and let the degree of association between “KW1” and “KW2” be μ12.

【００２２】 μ１２＝Ｎ１２／（Ｎ１＋Ｎ２−Ｎ１２）・・・（１）Μ12 = N12 / (N1 + N2-N12) (1)

【００２３】たとえば、上述した文書１に対して説明す
ると、共出現の単語の組１０３は、図４に示すように
「循環器、解剖学」、「循環器、狭心症」、「循環器、
心不全」、「解剖学、狭心症」、「解剖学、心不全」、
「狭心症、心不全」、・・・となる。たとえば「循環
器、狭心症」の共出現の組に対しては、上記（１）式に
したがって、（「循環器」と「狭心症」の共出現頻度）
／｛（「循環器」の総出現頻度）＋（「狭心症」の総出
現頻度）−（「循環器」と「狭心症」の共出現頻度）｝
の値を求める。For example, referring to the document 1 described above, the co-occurring word set 103 includes “circulatory organ, anatomy”, “circulatory organ, angina”, and “circulatory organ” as shown in FIG. ,
Heart failure "," anatomy, angina "," anatomy, heart failure ",
"Angina pectoris, heart failure", ... For example, for the co-occurrence set of “circulatory organ, angina”, according to the above formula (1), (the co-occurrence frequency of “circulatory organ” and “angina”)
/ {(Total frequency of "circulatory") + (total frequency of "angina")-(co-frequency of "circulatory" and "angina")}
Find the value of

【００２４】そして、その値、すなわち関連度１０５を
インデクス基準１００の階層関係に付与することによ
り、図３に示す単語関連度マップ１０４が得られる。な
お、単語間の関連度の算出式は、上記（１）式以外に
も、単語の１つの文書中の出現回数によって重み付けを
行い共出現比率を計算するなど、種々の算式が適用でき
る。Then, the word relevance map 104 shown in FIG. 3 is obtained by assigning the value, that is, the relevance 105 to the hierarchical relation of the index standard 100. Various formulas other than the above formula (1) can be applied to the formula for calculating the degree of association between words, such as calculating the co-occurrence ratio by weighting the word according to the number of times the word appears in one document.

【００２５】単語重要度付与装置１０は、インデクス作
成対象のデータに、たとえばタイトル、抄録および本文
がある場合、タイトルに出現した単語と抄録に出現した
単語と本文に出現した単語との間でそれぞれの価値に応
じて適宜重み付けを行う。すなわち、一般に本文を簡潔
に集約したものが抄録であり、その抄録をさらに集約し
たものがタイトルであるが、タイトル、抄録および本文
のいずれも表現したい内容のボリュームは同等であると
し、タイトル、抄録および本文に出現した単語に対して
価値を数値化して重み付けを行う。単語重要度付与装置
１０による重み付けの決定方法を、たとえば図５に示す
データ１０８を例にして、図６を参照しながら具体的に
説明する。If the data to be indexed includes, for example, a title, an abstract, and a body, the word importance assigning apparatus 10 determines whether the word appearing in the title, the word appearing in the abstract, and the word appearing in the body are different from each other. Is appropriately weighted according to the value of In other words, in general, an abstract is a summary of the text in brief, and a title is an aggregate of the abstract, but it is assumed that the volume of the title, the abstract, and the content to be expressed is the same, and the title, abstract, The value of the word that appears in the text and the value are quantified and weighted. The method of determining weights by the word importance assigning apparatus 10 will be specifically described with reference to FIG. 6 using the data 108 shown in FIG. 5 as an example.

【００２６】図５に示すデータは、タイトルと抄録を有
している。まずタイトルおよび抄録のそれぞれについ
て、出現する単語数をカウントする（図６のステップＳ
１，Ｓ２）。たとえば、文書１については、抄録に含ま
れた単語は「循環器」、「狭心症」、「心不全」および
「解剖学」の４個である。それに対して、タイトルに含
まれた単語は「循環器」の１個である。従って、タイト
ルに含まれた単語は、抄録に含まれた単語の４倍の価値
を有していると考えられる。そこで、タイトルの単語に
ついては、抄録の単語に対して４倍という重み付けを行
う（図６のステップＳ３）。これを各データ毎に行う。The data shown in FIG. 5 has a title and an abstract. First, the number of appearing words is counted for each of the title and the abstract (step S in FIG. 6).
1, S2). For example, for document 1, the words included in the abstract are four: "circulatory", "angina", "heart failure", and "anatomy". On the other hand, the word included in the title is one of "circulatory organs". Therefore, it is considered that the word included in the title has four times the value of the word included in the abstract. Therefore, the word of the title is weighted by four times the word of the abstract (step S3 in FIG. 6). This is performed for each data.

【００２７】たとえば、文書２は、タイトルに「ヘルニ
ア」および「循環器」の２個の単語を含み、抄録に「解
剖学」、「循環器」、「ヘルニア」および「外科」の４
個の単語を含むので、タイトルの単語は２倍の重み付け
となる。また、文書３は、タイトルに「そしゃく」およ
び「循環器」の２個の単語を含み、抄録に「そしゃ
く」、「消化器」、「嘔吐」、「解剖学」、「生理学」
および「循環器」の６個の単語を含むので、タイトルの
単語は３倍の重み付けとなる。このような重み付けによ
って、たとえば図７に示す例では、「循環器」は、本来
０．２である関連度が、文書１では０．８、文書２では
０．４、文書３では０．６になり、データ毎、すなわち
文書１と文書２と文書３とで「循環器」の価値に違いが
出ることになる（図６のステップＳ４）。For example, Document 2 contains two words “Hernia” and “Circulatory” in the title and four words “Anatomy”, “Circulatory”, “Hernia” and “Surgical” in the abstract.
Title words are weighted twice as much. Document 3 also contains two words, “chewing” and “circulatory organ” in the title, and “abstract”, “digestive organ”, “vomiting”, “anatomy”, “physiology” in the abstract.
And six words "circulatory organ", so the title word is weighted three times. By such weighting, for example, in the example shown in FIG. 7, the relevance, which is originally 0.2, is 0.8 for document 1, 0.4 for document 2, and 0.6 for document 3 And the value of the "circulatory organ" is different for each data, that is, for document 1, document 2, and document 3 (step S4 in FIG. 6).

【００２８】また、たとえば抄録と本文との間で重み付
けを行う場合や、他の文書データの項目間で重み付けを
行う場合も同様である。図８に、抄録と本文との間の重
み付けの例を示す。図８に示すデータ１１０では、たと
えば本文に関しては、同じ単語が繰り返し出現した場合
には、その出現回数を加味している。また、単純に出現
回数を加算するだけでは、対象としている文書や図書の
量に差があるため、正規化するのが望ましい。すなわ
ち、文書や図書によって本文の文章の量が異なり、一般
的には文章量が多いほうが単語はより多く出現する。そ
こで、たとえば１ページあたり、または１０００文字あ
たり、というように一定の決まった文書量や、単位文書
量を対象にして、重み付けを行うように正規化するとよ
い。The same applies to a case where weighting is performed between an abstract and a text, and a case where weighting is performed between items of other document data. FIG. 8 shows an example of weighting between the abstract and the text. In the data 110 shown in FIG. 8, for example, when the same word repeatedly appears in the text, the number of appearances is taken into account. Also, simply adding the number of appearances causes a difference in the amount of target documents and books, so it is desirable to normalize them. In other words, the amount of text in the body varies depending on the document or book, and in general, the larger the amount of text, the more words appear. Therefore, it is preferable to perform normalization so as to perform weighting on a fixed document amount such as per page or per 1000 characters or a unit document amount.

【００２９】つぎに、インデクスの作成処理の流れにつ
いて説明する。データ入力装置２によってデータベース
３にデータ１が入力され、またインデクス基準読込装置
５により、たとえば図２に示すインデクス基準１００が
入力されると、単語関連度マップ作成装置７は、入力さ
れたデータ１およびインデクス基準４に現れる単語に基
づいて、たとえば図３に示すような単語関連度マップ８
を作成する。Next, the flow of the index creation process will be described. When the data 1 is input to the database 3 by the data input device 2 and the index reference 100 shown in FIG. 2, for example, is input by the index reference reading device 5, the word relevance map creating device 7 And a word relevance map 8 as shown in FIG.
Create

【００３０】しかる後、インデクス作成装置６は、たと
えば図９に示すフローチャートに従い、インデクス作成
対象データに対して、インデクス基準４と単語関連度マ
ップ８を基にしてインデクス９を作成する。すなわち、
まず各文書に含まれている分類項目のノード（単語）を
ピックアップし、単語関連度マップにマッピングする
（ステップＳ１１）。一例として、図７に、データ１０
６の文書３について出現単語を単語関連度マップ８中に
マーキングした様子を示す。図示例では、マーキング
は、該当する単語、すなわち「消化器」、「循環器」、
「そしゃく」、「嘔吐」、「心臓」および「狭心症」と
いう単語を下線付きの太字で表すことにより示した。Thereafter, the index creation device 6 creates an index 9 for the index creation target data based on the index criterion 4 and the word relevance map 8 according to, for example, a flowchart shown in FIG. That is,
First, a node (word) of a classification item included in each document is picked up and mapped to a word relevance map (step S11). As an example, FIG.
6 shows a state in which appearing words are marked in the word relevance map 8 for the document 3 of No. 6; In the example shown, the markings are the corresponding words: "digestive", "circulatory",
The words "chewing", "vomiting", "heart" and "angina" are indicated by bold underlined letters.

【００３１】続いて、単語重要度付与装置１０によって
重み付けを行い、単語関連度マップ８を一時的に修正す
る（ステップＳ１２）。図７に示す例では、文書３の場
合、タイトルに「消化器」および「循環器」という２個
の単語が出現し、それに対して抄録の出現単語数は６個
であるため、文書３の処理時のみ、「消化器」および
「循環器」については、単語関連度マップ８の関連度を
一時的に３倍して、それぞれ０．６（０．２×３）とす
る。Subsequently, weighting is performed by the word importance assigning device 10, and the word relevance map 8 is temporarily corrected (step S12). In the example shown in FIG. 7, in the case of document 3, two words “digestive organ” and “circulatory organ” appear in the title, whereas the number of words in the abstract is six. Only at the time of processing, as for “digestive organ” and “circulatory organ”, the relevance of the word relevance map 8 is temporarily tripled to be 0.6 (0.2 × 3), respectively.

【００３２】続いて、インデクス作成対象データに出現
した単語を末端語としてチェックし、各末端語からルー
トノード（「医学」）まで遡るように分類判定評価値を
計算する（ステップＳ１３）。これは、単語関連度マッ
プ８にマッピングされた各単語を、ある計算手順に従っ
て計算し、評価することによって、マッピングされた位
置で単体で評価せずに、分類体系全体の中でどのような
位置付けにあるかということを考慮するためである。Subsequently, words appearing in the index creation target data are checked as terminal words, and classification evaluation values are calculated so as to trace from each terminal word to the root node ("medicine") (step S13). This is because, by calculating and evaluating each word mapped to the word relevance map 8 according to a certain calculation procedure, it is possible to determine what position in the entire classification system without evaluating the word alone at the mapped position. In order to take into account that

【００３３】すなわち、たとえば図７に示す例では、
「心臓」という分類項目は、単に「心臓」という単語を
意味しているわけではなく、「医学」に関する「内科
学」に関する「循環器」に関する「心臓」という概念を
意味している。それを反映するために、たとえば「心
臓」という末端語ノードからルートノードの「医学」ま
で、マッピングされている分類項目を順に遡ってたど
り、その途中の関連度を加算し、得られた関連度の累計
を、たどった階層数で除して平均値を得、これを分類判
定評価値とする。That is, for example, in the example shown in FIG.
The classification item "heart" does not simply mean the word "heart", but means the concept of "heart" regarding "circulatory organ" regarding "medicine" regarding "medicine". In order to reflect this, for example, from the terminal node node of “heart” to the root node “medicine”, the mapped classification items are traced back in order, and the relevance in the middle is added, and the obtained relevance is obtained. Is obtained by dividing the total number by the number of traversed layers to obtain an average value, which is used as a classification judgment evaluation value.

【００３４】図７に示す例で、文書３の場合、「心臓」
とその一つ上層の「循環器」との関連度は０．９であ
り、「循環器」とその一つ上層の「内科学」との関連度
は、本来０．２であるが、重み付けによって一時的に
０．６になっており、さらに「内科学」とその一つ上層
の「医学」との関連度は０．３である。従って、「心
臓」という単語の分類判定評価値は、０．９と０．６と
０．３を足し、それを３で除することにより、０．６と
なる。すなわち、文書３が「心臓」に分類される度合い
は０．６である。In the example shown in FIG. 7, in the case of document 3, "heart"
The degree of relevance between the circulatory organ and the next higher level is 0.9, and the degree of relevance between the circulatory organ and the upper level internal medicine is originally 0.2, Is temporarily 0.6, and the degree of relevance between “internal medicine” and the next higher level of “medicine” is 0.3. Therefore, the classification judgment evaluation value of the word “heart” is 0.6 by adding 0.9, 0.6, and 0.3 and dividing the sum by 3. That is, the degree to which document 3 is classified as “heart” is 0.6.

【００３５】ただし、図７に示す文書３では、「心臓」
および「循環器」という単語は出現しているが、ルート
ノードまで遡る途中の「内科学」および「医学」という
単語は出現していない。このようにルートノードの「医
学」に至るまでにマッピングされていない単語が出現
し、途切れた場合には、単語関連度マップ８の関連度を
そのまま加算せずに、つぎのステップＳ１４のような処
理を行う。However, in the document 3 shown in FIG.
And the words "circulatory organ" appear, but the words "internal medicine" and "medicine" on the way back to the root node do not appear. In this way, if a word that has not been mapped up to the root node “medicine” appears and is interrupted, the relevance of the word relevance map 8 is not added as it is, and the following step S14 is performed. Perform processing.

【００３６】すなわち、たとえば図７に示す例で説明す
れば、文書１について「狭心症」という末端語ノードか
ら上層にたどると、文書１には「心臓」という単語が出
現していない。そこで、「心臓」の下位ノードの関連度
の平均値を求める。具体的には、「心臓」の下位ノード
である「狭心症」の関連度０．５と「心不全」の関連度
０．５との平均値０．５（（０．５＋０．５）／２）を
求める。そして、その平均値と、「循環器」に対する
「心臓」の関連度の値０．９との積を求め、その値０．
４５（０．９×０．５）を仮関連度として加算する（ス
テップＳ１４）。That is, for example, referring to the example shown in FIG. 7, if the document 1 is traced from the terminal node “angina” to the upper layer, the word “heart” does not appear in the document 1. Therefore, the average value of the degree of association of the lower nodes of “heart” is obtained. Specifically, the average value 0.5 ((0.5 + 0.5) / of the relevance 0.5 of “angina” which is a lower node of “heart” and the relevance 0.5 of “heart failure” Find 2). Then, the product of the average value and the value 0.9 of the degree of relevance of the “heart” to the “circulatory organ” is calculated, and the value of the product is set to 0.
45 (0.9 × 0.5) is added as the provisional association degree (step S14).

【００３７】先に「文書３が「心臓」に分類される度合
いは０．６である」としたが、このステップＳ１４の処
理を行うことによって、文書３の「心臓」という単語の
分類判定表価値は、「医学」に対する「内科学」の仮関
連度が０．１８（（０．６＋０．６）／２×０．３）で
あるので、実際には０．５６（（０．９＋０．６＋０．
１８）／３）となる。Although "the degree to which the document 3 is classified as" heart "is 0.6" earlier, by performing the processing of step S14, the classification judgment table of the word "heart" of the document 3 is obtained. The value is actually 0.56 ((0.9 + 0.0.9) because the provisional relevance of “internal medicine” to “medicine” is 0.18 ((0.6 + 0.6) /2×0.3). 6 + 0.
18) / 3).

【００３８】上述したステップＳ１３、およびノードが
途切れた場合にはステップ１４を、全ての末端語ノード
について繰り返し行う（ステップＳ１５）。たとえば図
７に示すデータの場合、文書３については「そしゃ
く」、「消化器」、「嘔吐」、「心臓」、「狭心症」お
よび「心不全」について、それぞれルートノードまでた
どる途中の全てのノードについて評価を行う。「そしゃ
く」、「消化器」、「嘔吐」、「心臓」、「狭心症」お
よび「心不全」のそれぞれについて、分類判定表価値の
計算式および計算結果を示す。その計算式におい
て「“」と「”」で囲まれた値は、仮関連度であり、下
位ノードから上位ノードに向かって順に加算している。
なお「生理学」および「基礎医学」については、省略す
る。The above step S13 and step 14 when the node is interrupted are repeated for all terminal words nodes (step S15). For example, in the case of the data shown in FIG. 7, for document 3, “chewing”, “digestion”, “vomiting”, “heart”, “angina”, and “heart failure”, all of which are on their way to the root node Evaluate the node. The calculation formula and the calculation result of the classification judgment table value are shown for each of “chewing”, “digestion”, “vomiting”, “heart”, “angina”, and “heart failure”. In the calculation formula, a value surrounded by "" and "" is a provisional degree of association, and is added in order from a lower node to an upper node.
Note that “physiology” and “basic medicine” are omitted.

【００３９】「そしゃく」：（０．７＋“０．２３”＋
“０．１４”＋“０．０９”）／４＝０．２９「嘔吐」：（０．８＋“０．２３”＋“０．１４”
＋“０．０９”）／４＝０．３２「消化」：（“０．２３”＋“０．１４”＋“０．
０９”）／３＝０．１５「狭心症」：（０．５＋０．９＋０．６＋“０．１
８”）／４＝０．５５「心不全」：（０．５＋０．９＋０．６＋“０．１
８”）／４＝０．５５「心臓」：（０．９＋０．６＋“０．１８”）／３
＝０．５６「循環器」：（０．６＋“０．１８”）／２＝０．３
９「消化器」：（０．６＋“０．１８”）／２＝０．３
９"Chewing": (0.7+ "0.23" +
“0.14” + “0.09”) / 4 = 0.29 “vomiting”: (0.8+ “0.23” + “0.14”)
+ “0.09”) / 4 = 0.32 “digestion”: (“0.23” + “0.14” + “0.
09 ”) / 3 = 0.15“ Angina pectoris ”: (0.5 + 0.9 + 0.6 +“ 0.1
8 ″) / 4 = 0.55 “heart failure”: (0.5 + 0.9 + 0.6 + “0.1
8 ") / 4 = 0.55" heart ": (0.9 + 0.6 +" 0.18 ") / 3
= 0.56 "circulatory organ": (0.6+ "0.18") / 2 = 0.3
9 "digestive organ": (0.6+ "0.18") / 2 = 0.3
9

【００４０】以上のようにしてインデクス作成対象デー
タの分類先として可能性のある分類項目のすべての評価
が終わったら、その中で最も評価が高い項目を分類先と
して決定し、分類する（ステップＳ１６）。図７に示す
例では、「心臓」の分類項目が最も高い評価値（０．５
６）であるため、分類先を「心臓」に決定する。そし
て、重み付けにより一時的に修正した単語関連度マップ
８を初期値に戻した後（ステップＳ１７）、同様の処理
をインデクス作成の対象となるすべての文書について繰
り返し行う（ステップＳ１８）。After the evaluation of all the classification items that are possible as the classification destinations of the index creation target data is completed as described above, the item having the highest evaluation among them is determined as the classification destination and classified (step S16). ). In the example shown in FIG. 7, the classification item of “heart” has the highest evaluation value (0.5
6), the classification destination is determined to be “heart”. Then, after returning the word relevance map 8 temporarily corrected by the weighting to the initial value (step S17), the same processing is repeated for all the documents for which the index is to be created (step S18).

【００４１】なお、文書の分類先決定の評価方法につい
ては、階層数や、マップの大きさにより正規化して加算
する方法も適用できる。たとえば、分書中に出現した単
語の、単語関連度マップ８における階層を考慮して、深
い層（すなわち下位の層）については一律に関連度に重
み付けをするようにしてもよい。そうすれば、分類体系
の階層数が非常に多い場合でも、関連度の相加平均値が
過度に低くなり、偶然出現した、上位階層の単語の項目
に分類されてしまうのを回避することができる。As a method of evaluating the classification destination of the document, a method of normalizing and adding the number of layers and the size of the map can be applied. For example, in consideration of the hierarchy in the word relevance map 8 of words appearing in a document, the relevance may be uniformly weighted for a deep layer (that is, a lower layer). By doing so, even if the number of levels in the classification system is very large, it is possible to avoid the arithmetic mean of the relevance being excessively low and being classified as an item of a higher-level word that appeared by accident. it can.

【００４２】実施の形態１によれば、インデクス基準４
に基づいて、データに含まれる単語間の関連度を自動的
に生成し、さらにその関連度に対して適切な重み付けを
行うようになっているため、インデクス作成者が見本や
典型的な例などを特に指定しなくても、インデクス基準
４に影響を及ぼすことなく、単語単位のインデクスを自
動的に作成することができる。According to the first embodiment, index criterion 4
Automatically generates the degree of relevance between words contained in the data and weights the degree of relevance appropriately. Can be automatically created without affecting index criterion 4 without specially specifying.

【００４３】なお、インデクス作成対象となるデータ
は、文書に限らず、データベースに格納されたデータ
で、かつ単語を認識できるものであれば特に問わない。
たとえば、インデクス作成対象データは、制御コードに
相当するタグを含むインターネット上のＷＥＢページの
データであってもよい。The data for which an index is to be created is not limited to a document, but may be any data stored in a database and capable of recognizing words.
For example, the index creation target data may be data of a web page on the Internet including a tag corresponding to a control code.

【００４４】実施の形態２．図１０は、本発明にかかる
データベース検索装置の一例を示すブロック構成図であ
る。このデータベース検索装置は、図１に示す実施の形
態１のデータベース作成装置に、インデクス作成装置６
で作成されたインデクスに基づいて検索を行うデータ検
索装置３８と、その検索結果を表示する結果表示装置３
９を追加したものである。従って、データベース作成装
置を構成するデータ入力装置２、データベース３、イン
デクス基準読込装置５、インデクス作成装置６、単語関
連度マップ作成装置７および単語重要度付与装置１０、
並びにインデクス基準４および単語関連度マップ８につ
いては、実施の形態１と同様であるため、説明を省略す
る。Embodiment 2 FIG. 10 is a block diagram showing an example of the database search device according to the present invention. This database search device is different from the database creation device of the first embodiment shown in FIG.
And a result display device 3 for displaying a search result based on the index created in step 3.
9 is added. Therefore, the data input device 2, the database 3, the index reference reading device 5, the index creating device 6, the word relevance map creating device 7, and the word importance assigning device 10, which constitute the database creating device,
The index criterion 4 and the word relevance map 8 are the same as those in the first embodiment, and a description thereof will be omitted.

【００４５】データ検索装置３８は、コンピュータ・シ
ステムにおいて、たとえばデータ検索プログラムが実行
されることにより実現される。データ検索装置３８は、
たとえば図２に示すようなインデクス基準１００に基づ
いて作成されたインデクスメニュー１１１を結果表示装
置３９に表示させて一望し得るようなインタフェース
と、そのメニューの中から適当な項目を選択するため
の、たとえばマウスカーソル１１２を提供する。従って
データ検索装置３８には、図示省略したが入力装置とし
てマウス等のポインティングデバイスやキーボードが接
続されている。結果表示装置は、たとえばコンピュータ
・システムの表示装置であるブラウン管や液晶表示装置
である。The data search device 38 is realized in a computer system by executing, for example, a data search program. The data search device 38
For example, an interface for displaying an index menu 111 created based on the index standard 100 as shown in FIG. 2 on the result display device 39 and allowing the user to overlook the interface, and for selecting an appropriate item from the menu, For example, a mouse cursor 112 is provided. Therefore, although not shown, a pointing device such as a mouse or a keyboard is connected to the data search device 38 as an input device. The result display device is, for example, a cathode ray tube or a liquid crystal display device which is a display device of a computer system.

【００４６】インデクス作成装置６により作成されたイ
ンデクスに対して検索を行う場合には、検索者は結果表
示装置３９に表示されたメニューに対して、マウスカー
ソル１１２を移動させて適当な項目を指示し、選択する
ことにより、インデクスを探すことができ、目的の図書
を検索することができる。When performing a search on the index created by the index creation device 6, the searcher moves the mouse cursor 112 to the menu displayed on the result display device 39 and designates an appropriate item. Then, by making a selection, the index can be searched, and the target book can be searched.

【００４７】なお、図１２に示すように、インデクス基
準を各ノード毎に分割し、「医学」のノード１１４から
マウスカーソル１１２で「内科学」を指示して「内科
学」のノード１１５を開き、さらにマウスカーソル１１
２で「循環器」を指示して「循環器」のノード１１６を
開き、最終的にマウスカーソル１１２で「リンパ腺」を
選択することにより、目的の図書を検索するようにして
もよい。このようなツリー構造をなすインデクス基準に
対して次々と分類項目を絞り込んでいくメニュー状のイ
ンタフェースにより、効果的に検索を行うことができ
る。As shown in FIG. 12, the index criterion is divided for each node, and “internal medicine” is designated from the “medical” node 114 with the mouse cursor 112 to open the “internal medicine” node 115. And the mouse cursor 11
The target book may be searched by instructing “circulatory organ” in step 2 to open the “circulatory organ” node 116 and finally selecting “lymph gland” with the mouse cursor 112. A menu-like interface that narrows down the classification items one after another with respect to such a tree-structured index criterion allows an effective search.

【００４８】[0048]

【発明の効果】以上、説明したとおり、本発明によれ
ば、データ入力装置によりデータが入力されるととも
に、インデクス基準読込装置によりインデクス基準が入
力されると、単語関連度マップ作成装置は、入力データ
に対して、インデクス基準で使用されている単語の出現
頻度を調べ、同時に出現する単語について関連度を算出
し、その関連度とインデクス基準とに基づいて単語関連
度マップを作成する。また、単語重要度付与装置は、入
力データの第１の文書中に出現する単語数を、その第１
の文書の要約または見出しとなる第２の文書中に出現す
る単語数で除し、得られた値の分だけ第２の文書中の単
語に対して重み付けを行う。また、インデクス作成装置
は、その重み付けを用いて単語関連度マップの関連度を
一時的に修正し、その修正された単語関連度マップを用
いて入力データに対してインデクスを作成する。従っ
て、インデクス作成者が見本や典型的な例などを特に指
定しなくても、インデクス基準に影響を及ぼすことな
く、単語単位のインデクスを自動的に作成することがで
きる。As described above, according to the present invention, when data is input by the data input device and the index criterion is input by the index criterion reading device, the word relevance map creating device is For data, the frequency of occurrence of words used in the index criterion is checked, the relevance of words that appear at the same time is calculated, and a word relevance map is created based on the relevance and the index criterion. Further, the word importance assigning device determines the number of words appearing in the first document of the input data by the first number.
Is divided by the number of words that appear in the second document, which is the summary or headline of the document, and the words in the second document are weighted by the obtained value. Further, the index creation device temporarily corrects the relevance of the word relevance map using the weighting, and generates an index for the input data using the corrected word relevance map. Therefore, even if the index creator does not particularly specify a sample or a typical example, the index can be automatically created in word units without affecting the index criterion.

【００４９】つぎの発明によれば、データ入力装置によ
りデータが入力されるとともに、インデクス基準読込装
置によりインデクス基準が入力されると、単語関連度マ
ップ作成装置は、入力データに対して、インデクス基準
で使用されている単語の出現頻度を調べ、同時に出現す
る単語について関連度を算出し、その関連度とインデク
ス基準とに基づいて単語関連度マップを作成する。ま
た、単語重要度付与装置は、入力データの第１の文書中
に出現する単語数を、その第１の文書の要約または見出
しとなる第２の文書中に出現する単語数で除し、得られ
た値の分だけ第２の文書中の単語に対して重み付けを行
う。また、インデクス作成装置は、その重み付けを用い
て単語関連度マップの関連度を一時的に修正し、その修
正された単語関連度マップを用いて入力データに対して
インデクスを作成する。そして、データ検索装置は、作
成されたインデクスに基づいて検索を行い、結果表示装
置は、その検索結果を表示する。従って、効率よくイン
デクスを探すことができ、目的のデータを検索すること
ができる。According to the next invention, when data is input by the data input device and the index criterion is input by the index criterion reading device, the word relevance map creator generates the index criterion for the input data. The frequency of occurrence of the word used in (1) is checked, the relevance is calculated for words that appear simultaneously, and a word relevance map is created based on the relevance and the index criterion. Further, the word importance assigning device divides the number of words appearing in the first document of the input data by the number of words appearing in the second document which is a summary or a headline of the first document, and obtains the obtained result. The words in the second document are weighted by the value obtained. Further, the index creation device temporarily corrects the relevance of the word relevance map using the weighting, and generates an index for the input data using the corrected word relevance map. Then, the data search device performs a search based on the created index, and the result display device displays the search result. Therefore, the index can be efficiently searched, and the target data can be searched.

[Brief description of the drawings]

【図１】本発明にかかるデータベース作成装置の一例
を示すブロック構成図である。FIG. 1 is a block diagram showing an example of a database creation device according to the present invention.

【図２】そのデータベース作成装置において使用され
るインデクス基準の一構成例を示す系統図である。FIG. 2 is a system diagram showing a configuration example of an index standard used in the database creation device.

【図３】そのデータベース作成装置において作成され
た単語関連度マップの一例を示す模式図である。FIG. 3 is a schematic diagram showing an example of a word relevance map created by the database creation device.

【図４】単語関連度マップの作成方法を説明するため
の説明図である。FIG. 4 is an explanatory diagram for explaining a method of creating a word relevance map.

【図５】単語関連度マップに対して重み付けを行う方
法を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining a method of weighting a word relevance map.

【図６】重み付けの決定方法の一例を示すフローチャ
ートである。FIG. 6 is a flowchart illustrating an example of a method for determining a weight.

【図７】重み付けを行った単語関連度マップの一例を
示す模式図である。FIG. 7 is a schematic diagram showing an example of a weighted word relevance map.

【図８】抄録と本文との間の重み付けの一例を示す模
式図である。FIG. 8 is a schematic diagram showing an example of weighting between an abstract and a text.

【図９】インデクス作成方法の一例を示すフローチャ
ートである。FIG. 9 is a flowchart illustrating an example of an index creation method.

【図１０】本発明にかかるデータベース検索装置の一
例を示すブロック構成図である。FIG. 10 is a block diagram showing an example of a database search device according to the present invention.

【図１１】そのデータベース検索装置で使用される検
索用メニューの一例を示す模式図である。FIG. 11 is a schematic diagram showing an example of a search menu used in the database search device.

【図１２】そのデータベース検索装置で使用される検
索用メニューの他の例を示す模式図である。FIG. 12 is a schematic diagram showing another example of a search menu used in the database search device.

【図１３】従来におけるデータベース検索装置を示す
ブロック構成図である。FIG. 13 is a block diagram showing a conventional database search device.

【図１４】従来におけるデータベース検索装置を示す
ブロック構成図である。FIG. 14 is a block diagram showing a conventional database search device.

【図１５】従来におけるデータベース検索装置を示す
ブロック構成図である。FIG. 15 is a block diagram showing a conventional database search device.

[Explanation of symbols]

１データ、２データ入力装置、３データベース、
４インデクス基準、５インデクス基準読込装置、６
インデクス作成装置、７単語関連度マップ作成装
置、８単語関連度マップ、９インデクス、１０単
語重要度付与装置。1 data, 2 data input devices, 3 databases,
4 Index standard, 5 Index standard reading device, 6
Index creation device, 7 word relevance map creation device, 8 word relevance map, 9 index, 10 word importance giving device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者田中聡東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5B075 ND03 PP02 PP03 PP13 PP22 PQ02 PQ38 PR04 PR06 PR08 QM08 5B082 EA05 ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Satoshi Tanaka 2-3-2 Marunouchi, Chiyoda-ku, Tokyo Mitsubishi Electric Corporation F-term (reference) 5B075 ND03 PP02 PP03 PP13 PP22 PQ02 PQ38 PR04 PR06 PR08 QM08 5B082 EA05

Claims

[Claims]

A data input device for inputting data to a database; an index reference reading device for inputting an index reference defining a configuration serving as an index reference; and an index for input data. A word relevance map creating device that examines the frequency of appearance of words used in the standard, calculates the relevance of words that appear at the same time, and creates a word relevance map based on the relevance and the index criterion, The number of words that appear in the first document of the input data is
Divide by the number of words that appear in the second document, which is the summary or headline of the first document, and obtain the second document by the obtained value.
And a word importance assigning device that assigns weights to the words in the document, and using the weights obtained by the word importance assigning device, to associate the word relevance map created by the word relevance map creating device with the weight. An index creation device for temporarily adjusting a degree and using the corrected word relevance map to create an index for the input data.

2. A data input device for inputting data to a database, an index reference reading device for inputting an index reference defining a configuration serving as an index reference, and an index for input data. A word relevance map creating device that examines the frequency of appearance of words used in the standard, calculates the relevance of words that appear at the same time, and creates a word relevance map based on the relevance and the index criterion, The number of words that appear in the first document of the input data is
Divide by the number of words that appear in the second document, which is the summary or headline of the first document, and obtain the second document by the obtained value.
And a word importance assigning device that assigns weights to the words in the document, and using the weights obtained by the word importance assigning device, to associate the word relevance map created by the word relevance map creating device with the weight. An index creation device that creates an index for the input data using the corrected word relevance map, and performs a search based on the index created by the index creation device A database search device comprising: a data search device; and a result display device for displaying the search result.