JPH1091645A

JPH1091645A - Method for retrieving information

Info

Publication number: JPH1091645A
Application number: JP8261275A
Authority: JP
Inventors: Hiromi Haniyuda; 博美羽生田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-09-10
Filing date: 1996-09-10
Publication date: 1998-04-10

Abstract

PROBLEM TO BE SOLVED: To avoid a state that a word deeply related to a keyword but regarded as a generally unimportant word is not extracted as a word to be retrieved in the case of retrieving required information from a data base by using a keyword and a word related to the keyword or regarded as an improtant word. SOLUTION: The relative importance of a word included in the range of a fixed distance from an inputted keyword (S10) is calculated from the absolute importance of the word and the absolute importance of a word adjacent to the word concerned without simply using only the absolute importance of the word (S12) and whether the word is to be used for retrieval or not is determined by using the relative importance (S13).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データベースから
所望の情報を検索する方法に関し、特に、所望の情報を
検索するためのキーワードのみに基づくだけでなく、そ
のキーワードに関連する単語にも基づいて情報を検索す
る方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for retrieving desired information from a database, and more particularly to a method for retrieving desired information not only based on a keyword for retrieving the desired information but also on a word related to the keyword. How to find information.

【０００２】[0002]

【従来の技術】昨今、文献等からなるデータベースから
所望の情報を検索する方法として、検索のためのキーワ
ードとそのキーワードに関連する単語とを併用すること
が広く行われている。この検索方法では、最初に検索者
がキーワードを選択する。次に、単語、単語自身の重要
度、単語同士の遠近度等が格納されたシソーラス・ファ
イルから、一定以上の重要度を有する単語や一定以上の
近さにある単語を抽出する。そして、そのキーワードと
抽出された単語に基づきデータベースから所望する情報
を検索している。2. Description of the Related Art Recently, as a method of searching for desired information from a database composed of documents and the like, it has been widely used to use a keyword for search and a word related to the keyword together. In this search method, a searcher first selects a keyword. Next, a word having a certain degree of importance or a word having a certain degree of closeness is extracted from a thesaurus file storing words, importance of the words themselves, degree of perspective between words, and the like. Then, based on the keyword and the extracted words, desired information is searched from a database.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、このよ
うな情報検索方法では、シソーラス・ファイルに格納さ
れている単語自身の重要度や単語同士の遠近度等は、単
語の意味や使用頻度、単語同士の文法的な位置関係等、
単語が本来的に有する言語的意味の観点から予め定めら
れており、固定されている。従って、例えば、ある単語
が、一般的には重要度が低いが、選択されたキーワード
との関係で関連が強い場合には、キーワードとの関連が
強いために、一度はシソーラス・ファイルから抽出され
ようとするが、その単語自身があまり重要でないため
に、結果的には抽出されないという事態を生じてしまっ
ていた。これにより、検索の拠り所となる単語群に、検
索者の意図を充分に反映させることができないという問
題があった。However, in such an information search method, the significance of words themselves, the degree of perspective between words, and the like stored in the thesaurus file are determined by the meaning of words, the frequency of use, and the number of words. Grammatical position of
The word is predetermined and fixed from the viewpoint of the linguistic meaning that the word originally has. Thus, for example, if a word is generally of low importance, but is strongly related to the selected keyword, it is once extracted from the thesaurus file because it is strongly related to the keyword. However, the word itself is not so important, and as a result, the word is not extracted. As a result, there has been a problem that the intention of the searcher cannot be sufficiently reflected in the word group on which the search is based.

【０００４】[0004]

【課題を解決するための手段】上述した課題を解決する
ために、本発明に係る情報検索方法では、まず、データ
ベースから所望する情報を検索するのに用いる単語、即
ち、キーワードを選択して入力する。次に、単語と、単
語同士が近いか遠いかを示す遠近度と、単語が一般的に
どの程度重要であるかを示す絶対重要度とを格納するシ
ソーラス・ファイルの中から、遠近度に基づき、そのキ
ーワードから一定の距離の範囲内になる単語を検出す
る。その次に、その検出された単語がキーワードとの関
係においてどの程度重要であるかを示す相対重要度を、
その検出された単語自身の絶対重要度とその単語の周辺
に位置する単語の絶対重要度に基づき算出する。そし
て、検出された単語の中から、相対重要度が一定の閾値
を越える単語を抽出する。最後に、その抽出された単語
とキーワードとを用いて、データベースから所望する情
報を検索する。In order to solve the above-mentioned problems, in an information search method according to the present invention, first, a word used for searching desired information from a database, that is, a keyword is selected and input. I do. Next, from a thesaurus file that stores words, a perspective indicating whether words are near or far from each other, and an absolute importance indicating how important words are generally, based on perspective, , A word within a certain distance from the keyword is detected. Next, the relative importance, which indicates how important the detected word is in relation to the keyword,
It is calculated based on the absolute importance of the detected word itself and the absolute importance of words located around the word. Then, words whose relative importance exceeds a certain threshold are extracted from the detected words. Finally, using the extracted words and keywords, the database is searched for desired information.

【０００５】このように、本発明に係る情報検索方法で
は、単語の相対重要度を使って、検索に用いるべき単語
を抽出するようにしている。このような手段により、一
般的にはあまり重要ではないが、選択されたキーワード
との関係、より正確には、選択されたキーワードに関連
する単語との関係において重要であると認定されるべき
単語を漏れることなく抽出できることになる。その結
果、検索者の検索の意図を充分に反映させることが可能
となる。As described above, in the information retrieval method according to the present invention, words to be used for retrieval are extracted using the relative importance of words. By such means, words that are generally less important but should be identified as important in relation to the selected keyword, or more precisely, in relation to words related to the selected keyword Can be extracted without leakage. As a result, it is possible to sufficiently reflect the searcher's intention of the search.

【０００６】[0006]

【発明の実施の形態】本発明の情報検索方法について、
実施の形態に沿って説明する。実施の形態として、具体
例１、具体例２、具体例３を挙げる。具体例１の主な特
徴点は、入力されたキーワードに関連する単語をシソー
ラス・ファイルから抽出する場合に、そのキーワードと
の関係において、遠近度が一定値の範囲にあり、かつ、
相対重要度が一定値を超える単語を抽出することであ
る。具体例１をより望ましい形態にした具体例２の主な
特徴点は、相対重要度を算出するときに、遠近度に依存
する重み付けを行うことである。同様に具体例１をより
望ましい形態にした具体例３の主な特徴点は、相対重要
度を算出するときに、過去の検索における相対重要度を
参照することである。DESCRIPTION OF THE PREFERRED EMBODIMENTS An information retrieval method according to the present invention will be described.
A description will be given according to the embodiment. Specific examples 1, 2, and 3 are given as embodiments. The main characteristic point of the specific example 1 is that when a word related to an input keyword is extracted from a thesaurus file, the perspective is in a certain value range in relation to the keyword, and
To extract words whose relative importance exceeds a certain value. The main feature point of the specific example 2 in which the specific example 1 is more preferable is that weighting depending on the perspective is performed when calculating the relative importance. Similarly, the main characteristic point of the specific example 3 in which the specific example 1 is changed to a more desirable form is to refer to the relative importance in the past search when calculating the relative importance.

【０００７】〈具体例１の構成〉図１は、具体例１の検
索フローチャートであり、また、図２は、具体例１の情
報検索方法を実現するための情報検索装置のブロック図
である。最初に、図２に沿って構成について説明した後
に、図１に沿って検索手順について説明する。<Configuration of Specific Example 1> FIG. 1 is a search flowchart of the specific example 1, and FIG. 2 is a block diagram of an information search device for realizing the information search method of the specific example 1. First, the configuration will be described with reference to FIG. 2, and then the search procedure will be described with reference to FIG.

【０００８】〈情報検索装置の構成〉図２に示すよう
に、この情報検索装置は、表示操作部１、シソーラス・
ファイル部２、データベース部３、制御部４から構成さ
れている。表示操作部１は、検索者が所望する情報を検
索するのに必要な単語であるキーワードを入力したり、
その検索の結果を表示したりする機能を有する。<Structure of Information Retrieval Apparatus> As shown in FIG. 2, the information retrieval apparatus comprises a display / operation unit 1 and a thesaurus.
It comprises a file unit 2, a database unit 3, and a control unit 4. The display operation unit 1 inputs a keyword, which is a word necessary for a searcher to search for desired information,
It has a function of displaying the results of the search.

【０００９】シソーラス・ファイル部２には、複数の単
語、単語の絶対重要度、単語同士の遠近度等のデータが
格納されている。ここで、「単語」とは、例えば、コン
ピュータ、ＩＣ、真空管、歯車等を始めとする用語であ
る。また、「単語の絶対重要度」とは、その単語の持つ
意味や出現頻度等に基づき算定される、その単語の重要
性であり、例えば、使用頻度が高い単語「ＩＣ」の絶対
重要度は、やや使用頻度が低い単語「真空管」の絶対重
要度よりも大きくなる。さらに、「単語同士の遠近度」
とは、各単語の持つ一般的な意味に基づき算定される、
単語同士の結び付きの程度をいい、例えば、単語「コン
ピュータ」と単語「ＩＣ」とは近い一方で、単語「コン
ピュータ」と単語「歯車」とは遠い等である（以下、距
離が近い場合に「遠近度が小さい」といい、距離が遠い
場合に「遠近度が大きい」という）。なお、単語同士の
遠近度は、電気関係、化学関係等の特定分野において、
単語同士が同時に使用されるか否かという共起度にも基
づき算定されることが望ましい。例えば、分野を「初期
のコンピュータ」に限定した場合に、単語「コンピュー
タ」と単語「歯車」とは、同時に出現する頻度が高くな
る、即ち、共起度が大きくなるので、両単語の遠近度は
小さくなる。The thesaurus file section 2 stores data such as a plurality of words, the absolute importance of words, and the degree of perspective between words. Here, the “word” is a term including, for example, a computer, an IC, a vacuum tube, a gear, and the like. The “absolute importance of a word” is the importance of the word calculated based on the meaning and appearance frequency of the word. For example, the absolute importance of the frequently used word “IC” is , The word "vacuum tube" which is used less frequently is greater than the absolute importance. Furthermore, "Perspective between words"
Is calculated based on the general meaning of each word,
The degree of association between words refers to, for example, words “computer” and word “IC” are close, while words “computer” and word “gear” are far apart (hereinafter, when the distance is short, “ "The degree of perspective is small," and when the distance is long, "the degree of perspective is large." Note that the degree of perspective between words is determined in specific fields such as electrical relations and chemical relations.
It is desirable to calculate based on the degree of co-occurrence of whether words are used simultaneously. For example, when the field is limited to "early computer", the word "computer" and the word "gear" appear frequently at the same time, that is, the co-occurrence degree becomes large. Becomes smaller.

【００１０】データベース部３には、学術誌や学界誌等
を始めとする文献のデータが、ＪＩＳコード等のコード
形式で格納されている。情報検索において中心的な役割
りを担う制御部４は、検出部４Ａ、算出部４Ｂ、抽出部
４Ｃ、検索部４Ｄから構成されている。検出部４Ａは、
入力されたキーワードに基づきシソーラス・ファイル部
２から関連する単語を検出する。算出部４Ｂは、後述す
るように、検出された単語の相対重要度を算出する。抽
出部４Ｃは、検出された単語の中から、一定以上の相対
重要度を持つ単語を抽出する。検索部４Ｄは、データベ
ース部３から、キーワードと抽出された単語を含む文献
を検索する。The database section 3 stores data of documents such as academic journals and academic journals in a code format such as JIS code. The control unit 4, which plays a central role in information retrieval, includes a detection unit 4A, a calculation unit 4B, an extraction unit 4C, and a search unit 4D. The detection unit 4A includes:
A related word is detected from the thesaurus file unit 2 based on the input keyword. The calculation unit 4B calculates the relative importance of the detected word, as described later. The extracting unit 4C extracts words having a relative importance of a certain level or more from the detected words. The search unit 4D searches the database 3 for documents containing keywords and the extracted words.

【００１１】〈シソーラス・ファイルの構成〉図３は、
シソーラス・ファイル部のデータ構造を示す図である。
この図において、各ノードは、単語（ｋ１、ｋ２、…
…）とその単語の絶対重要度（０．８、０．５、……）
とを表している。また、リンクは、単語同士の間におけ
る上位・下位関係、同義・類義関係、共起度等を表して
おり、従って、リンクを縦続接続した場合におけるリン
クの本数は、単語同士の遠近度を表している。以下、単
語をｋｉで表記し、単語ｋｉと特定の単語Ｋとの遠近度
をＤ（ｋｉ、Ｋ）で表記し、特定の単語Ｋに対する単語
ｋｉの相対重要度をＲ（ｋｉ、Ｋ）で表記する（但し、
ｉは任意の整数）。このように構成されるシソーラス・
ファイル部２では、遠近度の閾値や相対重要度の閾値を
基準として、入力されたキーワードに対し、一定の範囲
内に位置し、かつ、一定以上の重要度を有する単語が抽
出される。<Structure of Thesaurus File> FIG.
FIG. 4 is a diagram illustrating a data structure of a thesaurus file unit.
In this figure, each node has a word (k1, k2,...)
…) And the absolute importance of the word (0.8, 0.5, ……)
And Further, the link represents a higher / lower relationship, synonym / synonymous relationship, degree of co-occurrence, etc. between words. Represents. Hereinafter, the word is represented by ki, the perspective of the word ki and the specific word K is represented by D (ki, K), and the relative importance of the word ki to the specific word K is represented by R (ki, K). Notation (However,
i is any integer). The thesaurus composed in this way
The file unit 2 extracts words that are located within a certain range and have a certain importance or more with respect to the input keyword based on the threshold of the perspective and the threshold of the relative importance.

【００１２】〈具体例の動作〉図１に示すように、検索
手順は、入力ステップ、検出ステップ、算出ステップ、
抽出ステップ、検索ステップ、回答ステップから構成さ
れる。以下、それらのステップについて詳述する。な
お、以下の説明中では、入力されたキーワードをＫと
し、遠近度の閾値を２とし、相対重要度の閾値を０．６
とする。また、Ｋとｋ１は同一の単語であったとする。<Operation of Specific Example> As shown in FIG. 1, the search procedure includes an input step, a detection step, a calculation step,
It consists of an extraction step, a search step, and an answer step. Hereinafter, those steps will be described in detail. In the following description, the input keyword is K, the perspective threshold is 2, and the relative importance threshold is 0.6.
And It is also assumed that K and k1 are the same word.

【００１３】〈入力ステップ〉ステップＳ１０：検索者は、所望する文書を得るのに
必要であろうと予想するキーワードＫを表示操作部１か
ら入力する。そして、入力されたキーワードＫは、制御
部４内の検出部４Ａへ送出される。<Input Step> Step S10: The searcher inputs a keyword K expected to be necessary for obtaining a desired document from the display operation unit 1. Then, the input keyword K is sent to the detection unit 4A in the control unit 4.

【００１４】〈検出ステップ〉ステップＳ１１：検出部４Ａは、シソーラス・ファイ
ル部２において、そのキーワードＫに対し遠近度が２以
下である単語を検出する。より具体的には、キーワード
Ｋを起点として縦続接続されるリンクの数が２本以下の
範囲にある単語を検出する。図４は、検出された単語の
範囲を示す図である。ここで、検出された単語の集合を
ＰＲ（Ｋ）で表記すると、ＰＲ（Ｋ）＝｛ｋ１、ｋ２、
ｋ３、ｋ４、ｋ５、ｋ６、ｋ７、ｋ８、ｋ９、ｋ１０、
ｋ１１｝となる。ただし、ｋ１とＫは同一の単語なの
で、図中ではｋ１をＫと表記している。<Detection Step> Step S11: The detection unit 4A detects a word having a perspective of 2 or less for the keyword K in the thesaurus file unit 2. More specifically, a word in which the number of links connected in cascade starting from the keyword K is in a range of two or less is detected. FIG. 4 is a diagram illustrating a range of the detected word. Here, if a set of detected words is represented by PR (K), PR (K) = ｛k1, k2,
k3, k4, k5, k6, k7, k8, k9, k10,
k11}. However, since k1 and K are the same word, k1 is described as K in the figure.

【００１５】〈算出ステップ〉ステップＳ１２：算出部４Ｂは、検出された各単語に
ついて、相対重要度を算出する。ここで、相対重要度Ｒ
（ｋｉ、Ｋ）は、絶対重要度が相対重要度の閾値０．６
よりも大きい単語ｋｉについては、その絶対重要度を相
対重要度とみなす。一方、絶対重要度が相対重要度の閾
値０．６よりも小さい単語ｋｉについては、図１２（ａ
１）に示すように、集合ＰＲ（Ｋ）に属する単語であっ
て、単語ｋｉに隣接する単語ｋｊの集合をＡ（ｋｉ）で
表記すると、集合Ａ（ｋｉ）の単語の絶対重要度と相対
重要度の閾値０．６との差の平均値を、その単語ｋｉの
絶対重要度に加算することにより、相対重要度とする。
なお、図１２（ａ２）に示すように、必要な場合には、
隣接する単語ｋｊの絶対重要度をどの位反映させるかを
示す反映係数εを掛けることにより、隣接する単語ｋｊ
の絶対重要度の影響を大きくしたり、小さくしたりす
る。<Calculation Step> Step S12: The calculation unit 4B calculates the relative importance of each detected word. Here, the relative importance R
(Ki, K) is the absolute importance threshold 0.6 of the relative importance.
For the word ki larger than, the absolute importance is regarded as the relative importance. On the other hand, for the word ki whose absolute importance is smaller than the threshold 0.6 of the relative importance, FIG.
As shown in 1), when a set of words kj adjacent to a word ki and belonging to a set PR (K) is represented by A (ki), the absolute importance and relative importance of the words in the set A (ki) are represented. The relative importance is determined by adding the average value of the difference between the importance threshold 0.6 and the absolute importance of the word ki.
As shown in FIG. 12 (a2), if necessary,
By multiplying the reflection coefficient ε indicating how much the absolute importance of the adjacent word kj is reflected, the adjacent word kj
To increase or decrease the effect of the absolute importance of.

【００１６】〈抽出ステップ〉ステップＳ１３：抽出部４Ｃは、集合ＰＲ（Ｋ）の中
から、算出された相対重要度が相対重要度の閾値よりも
大きい単語を抽出する。図５は、抽出された単語の範囲
を示す図である。ここで、抽出される単語の集合をＲＷ
（Ｋ）で表記すると、ＲＷ（Ｋ）＝｛Ｋ、ｋ２、ｋ３、
ｋ４、ｋ５、ｋ６、ｋ８、ｋ９、ｋ１０、ｋ１１｝とな
る。<Extraction Step> Step S13: The extraction unit 4C extracts words whose calculated relative importance is larger than the threshold value of the relative importance from the set PR (K). FIG. 5 is a diagram showing the range of the extracted words. Here, the set of extracted words is RW
Expressed as (K), RW (K) = ｛K, k2, k3,
k4, k5, k6, k8, k9, k10, k11}.

【００１７】〈検索ステップ〉ステップＳ１４：検索部４Ｄは、抽出された単語と、
最初に入力されたキーワードＫとの論理和ＯＲを使っ
て、データベース部３において、それらの単語やキーワ
ードを含む文書を検索する。〈回答ステップ〉ステップＳ１５：表示操作部１は、検索された文書の
内容を検索者に対し表示する。<Search Step> Step S14: The search unit 4D searches for the extracted word,
Using the logical OR of the keyword K input first, the database unit 3 searches for documents containing those words and keywords. <Answer Step> Step S15: The display operation unit 1 displays the content of the searched document to the searcher.

【００１８】〈具体例１の効果〉上述したように、具体
例１の情報検索方法では、最初に、情報を検索するため
のキーワードを入力する。次に、その入力されたキーワ
ードに対し一定の遠近度を有する、即ち、そのキーワー
ドを中心として一定の距離の範囲内に位置する単語を検
出する。そして、キーワードに対する検出された単語の
相対重要度を、その単語自身が本来的に有する絶対重要
度と、その単語に隣接する一または二以上の単語自身が
本来的に有する絶対重要度とを用いて算出する。さら
に、それら検出された単語の中から、その算出された相
対重要度が、予め定めた閾値よりも大きい、即ち、ある
一定の水準よりも重要である単語のみを抽出する。最後
に、キーワードとその抽出された単語とを使って、それ
らのキーワードや単語を含む文書等を検索する。<Effect of Specific Example 1> As described above, in the information search method of the specific example 1, first, a keyword for searching for information is input. Next, a word having a certain degree of perspective with respect to the input keyword, that is, a word located within a certain distance from the keyword is detected. Then, the relative importance of the detected word with respect to the keyword is determined by using the absolute importance inherent in the word itself and the absolute importance inherent in one or more words adjacent to the word itself. And calculate. Further, from the detected words, only words whose calculated relative importance is larger than a predetermined threshold, that is, words that are more important than a certain level are extracted. Finally, using the keyword and the extracted word, a document or the like containing the keyword or the word is searched.

【００１９】このように、入力されたキーワードに対し
近い関係にある単語が、検索に有用であるか否かを決定
するという、単語の抽出の段階において、従来の情報検
索方法とは異なり、それらの単語自身の絶対重要度を用
いずに、その単語に隣接する単語の絶対重要度等から算
出した相対重要度を用いている。これにより、従来の情
報検索方法では抽出されなかった、周辺に存在する単語
との関係で重要であると認定されるべき単語を抽出する
ことが可能となる。その結果、従来の情報検索方法に比
べて、検索の意図をより明確にすることができるため、
より精度高く情報を検索することができることになる。As described above, unlike the conventional information search method, at the word extraction stage of determining whether or not a word closely related to the input keyword is useful for search, Instead of using the absolute importance of the word itself, the relative importance calculated from the absolute importance of words adjacent to the word is used. This makes it possible to extract words that should not be extracted by the conventional information search method and should be recognized as important in relation to surrounding words. As a result, your search intent can be more clarified than with traditional information search methods,
Information can be searched with higher accuracy.

【００２０】また、検出された単語自身の絶対重要度
が、もともと閾値よりも大きい場合、即ち、その単語が
常に極めて重要である場合には、改めて相対重要度を算
出することをせず、一方、単語自身の絶対重要度が閾値
よりも小さい場合にのみ、相対重要度を算出する。これ
により、その単語が検索に有用であるか否かを迅速に判
断することが可能となる。When the absolute importance of the detected word itself is larger than the threshold value, that is, when the word is always extremely important, the relative importance is not calculated again. The relative importance is calculated only when the absolute importance of the word itself is smaller than the threshold. This makes it possible to quickly determine whether the word is useful for a search.

【００２１】また、検出された単語の相対重要度を算出
する際に、その単語に隣接する単語の絶対重要度をどの
くらい反映させるかを示す反映係数を用いている。これ
により、ある検出された単語に対し及ぼす、隣接する単
語の絶対重要度の影響の大きさを任意に調整することが
可能となるので、検索の意図をより明確にすることが可
能となる。When calculating the relative importance of a detected word, a reflection coefficient indicating how much the absolute importance of a word adjacent to the word is reflected is used. This makes it possible to arbitrarily adjust the magnitude of the influence of the absolute importance of an adjacent word on a certain detected word, so that the intention of the search can be made clearer.

【００２２】なお、単語同士の遠近度を、単語が持つ意
味に基づき定めている。これにより、単語がもともと有
する一般的な意味の観点から、キーワードに近い単語を
検出することが可能となる。また、単語同士の遠近度
を、単語同士が特定の分野において同時に一緒に存在す
るか否かの程度を示す共起度に基づき定めている。これ
により、選択されたキーワードに対し一般的には結び付
きが希薄であるが、特定の分野においては結び付きが強
いような単語をも検出することが可能となる。The degree of perspective between words is determined based on the meaning of the words. This makes it possible to detect words that are close to the keyword in terms of the general meaning that the word originally has. Further, the degree of perspective between words is determined based on the degree of co-occurrence indicating whether or not words are present together in a specific field at the same time. As a result, it is possible to detect a word that is generally weakly connected to the selected keyword, but is strongly connected in a specific field.

【００２３】〈具体例２の構成〉具体例２の情報検索方
法について説明する。具体例２を実現するための情報検
索装置の構成は、概ね具体例１の情報検索装置と同一で
ある。主な相違点は、制御部４の構成である。図６は、
具体例２における制御部のブロック図である。この制御
部４は、具体例１の制御部４に重み付け部４Ｅを追加し
た構成となっている。<Structure of Specific Example 2> An information search method of specific example 2 will be described. The configuration of the information search device for realizing the specific example 2 is almost the same as the information search device of the specific example 1. The main difference is the configuration of the control unit 4. FIG.
FIG. 13 is a block diagram of a control unit in a specific example 2. The control unit 4 has a configuration in which a weighting unit 4E is added to the control unit 4 of the first embodiment.

【００２４】〈具体例２の動作〉具体例２の情報検索方
法の検索手順は、概ね具体例１の情報検索方法の検索手
順と同一である。主な追加点は、相対重要度を算出する
算出ステップにおいて、遠近度に依存する重み付けを行
うことである。図７は、具体例２の検索フローチャート
である。以下、このフローチャートに沿いながらも、特
に、重み付けステップについて詳述する。なお、遠近度
の閾値や相対重要度の閾値は、具体例１の場合と同様で
あるとする（遠近度の閾値は２であり、相対重要度の閾
値は０．６である）。<Operation of Embodiment 2> The search procedure of the information search method of Embodiment 2 is almost the same as the search procedure of the information search method of Embodiment 1. The main addition is that in the calculation step of calculating the relative importance, weighting depending on the perspective is performed. FIG. 7 is a search flowchart of the second embodiment. Hereinafter, the weighting step will be described in detail while following this flowchart. It is assumed that the perspective threshold and the relative importance threshold are the same as those in the first specific example (the perspective threshold is 2 and the relative importance threshold is 0.6).

【００２５】〈入力ステップ〜検出ステップ〉ステップＳ２０〜２１：検索者は、キーワードＫを入
力し、そのキーワードＫは、制御部４内の検出部４Ａへ
引き渡される。次に、検出部４Ａは、シソーラス・ファ
イル部２において、そのキーワードＫに対し遠近度が２
以下である単語を検出することにより、集合ＰＲ（Ｋ）
を生成する。<Input Step to Detection Step> Steps S20 to S21: The searcher inputs a keyword K, and the keyword K is passed to the detection unit 4A in the control unit 4. Next, the detection unit 4A determines that the keyword K has a perspective degree of 2 in the thesaurus file unit 2.
By detecting the following words, the set PR (K)
Generate

【００２６】〈算出ステップ〉ステップＳ２２ａ：算出部４Ｂは、集合ＰＲ（Ｋ）中
の各単語について、相対重要度Ｒ（ｋｉ、Ｋ）を算出す
る。ステップＳ２２ｂ：重み付け部４Ｅは、相対重要度Ｒ
（ｋｉ、Ｋ）と、遠近度の関数である重み付け係数δ
（Ｄ（ｋｉ、Ｋ））とを使って、重み付き相対重要度を
算出する。ここで、重み付き相対重要度をＲｐ（ｋｉ、
Ｋ）で表記すると、図１３（ｂ１）に示す式によって算
出される。特に、遠近度が大きいほど、即ち、キーワー
ドＫから離れれば離れるほど、重み付けの程度を小さく
しようとする場合には、図１３（ｂ２）（ｂ３）に示す
ように、固定値である重み付け係数δと遠近度Ｄ（ｋ
ｉ、Ｋ）とを用いることにより算出することができる。
なお、ここでは、図１３（ｂ２）の式に従い、かつ、δ
＝０．１であると想定する。<Calculation Step> Step S22a: The calculation unit 4B calculates the relative importance R (ki, K) for each word in the set PR (K). Step S22b: The weighting unit 4E determines the relative importance R
(Ki, K) and a weighting coefficient δ that is a function of perspective.
(D (ki, K)) is used to calculate the weighted relative importance. Here, the weighted relative importance is Rp (ki,
Indicated by K), it is calculated by the equation shown in FIG. In particular, when the degree of weighting is to be reduced as the degree of perspective is larger, that is, as the distance from the keyword K is increased, the weighting coefficient δ which is a fixed value is set as shown in FIGS. And perspective D (k
i, K).
Note that here, according to the equation of FIG.
Assume = 0.1.

【００２７】〈抽出ステップ〉ステップＳ２３：抽出部４Ｃは、集合ＰＲ（Ｋ）の単
語の中から、算出された重み付き相対重要度Ｒｐ（ｋ
ｉ、Ｋ）が相対重要度の閾値０．６よりも大きい単語を
抽出する。図８は、抽出された単語の範囲を示す図であ
る。ここで、抽出される単語の集合をＲｐＷ（Ｋ）で表
記すると、ＲｐＷ（Ｋ）＝｛ｋ２、ｋ３、ｋ４、ｋ５、
ｋ８、ｋ９、ｋ１１｝となる。<Extraction Step> Step S23: The extraction unit 4C calculates the weighted relative importance Rp (k) from the words of the set PR (K).
i, K) are extracted for words whose relative importance is greater than the threshold value of 0.6. FIG. 8 is a diagram showing the range of the extracted words. Here, when a set of extracted words is represented by RpW (K), RpW (K) = ｛k2, k3, k4, k5,
k8, k9, k11}.

【００２８】〈検索ステップ〜回答ステップ〉ステップＳ２４〜２５：検索部４Ｄは、抽出された単
語と、最初に入力されたキーワードＫとを使って、デー
タベース部３で、該当する文書を検索する。そして、表
示操作部１は、検索された文書の内容を回答する。<Search Step to Answer Step> Steps S24 to S25: The search unit 4D searches the database unit 3 for a corresponding document using the extracted word and the keyword K input first. Then, the display operation unit 1 answers the content of the searched document.

【００２９】〈具体例２の効果〉上述したように、具体
例２の情報検索方法では、具体例１の情報検索方法とは
異なり、検出された単語が検索に有用であるか否かを決
定する単語の抽出段階において、算出された相対重要度
をそのまま用いることなく、算出された相対重要度をキ
ーワードとその単語自身との遠近度に基づき修正してい
る。これにより、キーワードと単語とが本来的な意味や
共起度等の観点から互いに近いか遠いかという関係を、
相対重要度へ反映させることが可能となる。<Effect of Specific Example 2> As described above, in the information search method of specific example 2, unlike the information search method of specific example 1, it is determined whether or not the detected word is useful for the search. In the extraction step of the word to be performed, the calculated relative importance is corrected based on the degree of perspective between the keyword and the word itself without using the calculated relative importance as it is. As a result, the relationship between whether the keyword and the word are close or far from each other in terms of the original meaning,
This can be reflected on the relative importance.

【００３０】さらには、遠近度が大きいほど、即ち、キ
ーワードと検出された単語とが離れているほど、算出さ
れた相対重要度を小さくするように修正している。これ
により、入力されたキーワードから遠い単語であるにも
拘わらず、隣接する単語の絶対重要度が大きいために相
対重要度が大きくなり、その結果、検索に有用であると
して抽出されてしまうと事態を回避することが可能とな
る。Further, the correction is made so that the calculated relative importance becomes smaller as the perspective becomes larger, that is, as the keyword is separated from the detected word. As a result, even though the keyword is distant from the input keyword, the relative importance increases because the absolute importance of the adjacent word is high, and as a result, the word is extracted as useful for search. Can be avoided.

【００３１】〈具体例３の構成〉具体例３の情報検索方
法について説明する。具体例３を実現するための情報検
索装置の構成も、概ね具体例１の情報検索装置の構成と
同一である。主な相違点は、制御部４の構成である。図
９は、具体例３の情報検索装置における特徴的な部分の
ブロック図である。この情報検索装置は、具体例１の情
報検索装置に、過去の検索の履歴を参照する履歴参照部
４Ｆと、過去の検索の履歴を記憶する履歴記憶部５とを
追加した構成となっている。<Structure of Specific Example 3> An information search method of specific example 3 will be described. The configuration of the information search device for implementing the third embodiment is also substantially the same as the configuration of the information search device of the first embodiment. The main difference is the configuration of the control unit 4. FIG. 9 is a block diagram of a characteristic portion in the information search device of the third embodiment. This information search device has a configuration in which a history reference unit 4F for referring to the history of past searches and a history storage unit 5 for storing the history of past searches are added to the information search device of Example 1. .

【００３２】〈具体例３の動作〉具体例３の情報検索方
法の検索手順は、概ね具体例１の情報検索方法の検索手
順と同一である。主な追加点は、相対重要度を算出する
算出ステップにおいて、過去の検索での相対重要度を参
照することである。図１０は、具体例３の検索フローチ
ャートである。以下、このフローチャートに沿いながら
も、特に、算出ステップについて詳述する。なお、過去
において、キーワードをＫｓ、Ｋｔ、Ｋｕとする検索
Ｓ、Ｔ、Ｕがその順番で実行され、それら各検索で検出
された単語とその相対重要度との履歴が、図１１に示す
ような形で履歴記憶部５に蓄積されていると想定する。<Operation of Embodiment 3> The search procedure of the information search method of Embodiment 3 is almost the same as the search procedure of the information search method of Embodiment 1. The main addition is that in the calculation step of calculating the relative importance, the relative importance in the past search is referred to. FIG. 10 is a search flowchart of the third embodiment. Hereinafter, the calculation step will be described in detail while following this flowchart. In the past, searches S, T, and U with keywords Ks, Kt, and Ku were executed in that order, and a history of words detected in each of the searches and their relative importances is shown in FIG. It is assumed that the data is stored in the history storage unit 5 in the following manner.

【００３３】〈入力ステップ〜検出ステップ〉ステップＳ３０〜３１：検索Ｓ、Ｔ、Ｕを終えた検索
者は、さらにキーワードＫを入力し、そのキーワードＫ
は、制御部４内の検出部４Ａへ受け渡される。次に、検
出部４Ａは、シソーラス・ファイル部２において、その
キーワードＫに対し遠近度が２以下である単語を検出す
ることにより、集合ＰＲ（Ｋ）を作成する。<Input Step to Detection Step> Steps S30 to S31: The searcher who has completed the searches S, T, and U further inputs a keyword K, and
Is transferred to the detection unit 4A in the control unit 4. Next, the detecting unit 4A creates a set PR (K) by detecting words in the thesaurus file unit 2 having a perspective of 2 or less with respect to the keyword K.

【００３４】〈算出ステップ〉ステップＳ３２ａ：算出部４Ｂは、集合ＰＲ（Ｋ）中
の各単語について、相対重要度Ｒ（ｋｉ、Ｋ）を算出す
る。ステップＳ３２ｂ：履歴参照部４Ｆは、算出された相
対重要度Ｒ（ｋｉ、Ｋ）と、履歴記憶部５に蓄積されれ
ている過去の検索Ｓ、Ｔ、Ｕで検出された単語ｋｉとそ
の単語ｋｉの相対重要度Ｒ（ｋｉ、Ｋｓ）、Ｒ（ｋｉ、
Ｋｔ）、Ｒ（ｋｉ、Ｋｕ）とを使って、履歴付き相対重
要度を算出する。<Calculation Step> Step S32a: The calculation unit 4B calculates the relative importance R (ki, K) for each word in the set PR (K). Step S32b: The history reference unit 4F calculates the calculated relative importance R (ki, K), the word ki detected in the past searches S, T, U stored in the history storage unit 5, and the word ki. Relative importance of ki R (ki, Ks), R (ki,
Kt) and R (ki, Ku) are used to calculate the relative importance with history.

【００３５】ここで、履歴付き相対重要度をＲｈ（ｋ
ｉ、Ｋ）で表記すると、図１４（ｃ１）に示すように、
各検索での相対重要度Ｒとその検索をどの程度反映させ
るかを示す、検索時刻の関数である履歴反映係数αの積
の平均値して算出される。より具体的には、キーワード
が時刻をも表すとすると、例えば、キーワードＫに基づ
き時刻Ｋにおいて検索した場合における単語ｋｉの履歴
付き相対重要度Ｒｈ（ｋｉ、Ｋ）は、｛Ｒ（ｋｉ、Ｋ）
＊α（Ｋ）＋Ｒ（ｋｉ、Ｋｓ）＊α（Ｋｓ）＋Ｒ（ｋ
ｉ、Ｋｔ）＊α（Ｋｔ）＋Ｒ（ｋｉ、Ｋｕ）＊α（Ｋ
ｕ）｝／４として求めることができる。Here, the relative importance with history is defined as Rh (k
i, K), as shown in FIG. 14 (c1),
It is calculated as the average value of the product of the relative importance R in each search and the history reflection coefficient α which is a function of the search time and indicates how much the search is reflected. More specifically, assuming that the keyword also represents time, for example, the relative importance Rh (ki, K) with history of the word ki when a search is performed at time K based on keyword K is ｛R (ki, K). )
* Α (K) + R (ki, Ks) * α (Ks) + R (k
i, Kt) * α (Kt) + R (ki, Ku) * α (K
u)｝ / 4.

【００３６】なお、この式（ｃ１）は、本来の相対重要
度Ｒ（ｋｉ、Ｋ）が相対重要度の閾値よりも大きい場合
には、その本来の相対重要度Ｒ（ｋｉ、Ｋ）をそのまま
履歴付き相対重要度Ｒｈ（ｋｉ、Ｋ）であるとみなす。
一方、本来の相対重要度Ｒ（ｋｉ、Ｋ）が相対重要度の
閾値よりも小さい場合には、過去の検索での相対重要度
Ｒ（ｋｉ、Ｋｓ）等の平均値を算出することにより、履
歴付き相対重要度Ｒｈ（ｋｉ、Ｋ）としている。特に、
現在により近い時点での検索であればあるほど、より多
くの重きをその相対重要度Ｒに置くようにする場合に
は、図１４（ｃ２）（ｃ３）に示すように、固定値であ
る履歴反映係数αと、検索時刻がどの程度新しいかある
いは古いかを示す検索新旧度ｔ（ｔは、大きいほど古い
ことを示す）を用いて求めることができる。It should be noted that, when the original relative importance R (ki, K) is larger than the threshold value of the relative importance, the expression (c1) holds the original relative importance R (ki, K) as it is. It is regarded as the relative importance Rh with history (ki, K).
On the other hand, when the original relative importance R (ki, K) is smaller than the relative importance threshold, the average value of the relative importance R (ki, Ks) in the past search is calculated, The relative importance with history Rh (ki, K) is set. Especially,
In order to assign more weight to the relative importance R as the search is performed at a point closer to the present time, as shown in FIGS. 14 (c2) and (c3), a history having a fixed value is used. It can be obtained using the reflection coefficient α and the search new / old degree t indicating how new or old the search time is (t is larger, the older the search time is).

【００３７】〈抽出ステップ〉ステップＳ３３：抽出部４Ｃは、集合ＰＲ（Ｋ）の単
語の中から、算出された履歴付き相対重要度Ｒｈ（ｋ
ｉ、Ｋ）が相対重要度の閾値０．６よりも大きい単語を
抽出する。〈検索ステップ〜回答ステップ〉ステップＳ３４〜３５：検索部４Ｄは、抽出された単
語と、キーワードＫとを使って、データベース部３にお
いて文書を検索する。そして、表示操作部１は、検索さ
れた文書の内容を表示する。<Extraction Step> Step S33: The extraction unit 4C selects, from the words of the set PR (K), the calculated relative importance Rh (k) with history.
i, K) are extracted for words whose relative importance is greater than the threshold value of 0.6. <Search Step to Answer Step> Steps S34 to S35: The search unit 4D searches the database unit 3 for a document using the extracted word and the keyword K. Then, the display operation unit 1 displays the content of the searched document.

【００３８】〈具体例３の効果〉上述したように、具体
例３の情報検索方法では、具体例１の情報検索方法とは
異なり、検出された単語が検索に有用であるか否かを決
める単語の抽出段階において、現時点で算出された相対
重要度を単に用いずに、過去の検索における相対重要度
をも参照して、現時点で算出された相対重要度を修正し
た後に用いている。これにより、検索の意図をより明確
にする足跡であると位置付けられる、過去の一連の検索
での入力キーワードに基づく相対重要度を反映させるこ
とができるので、検索の意図をより明確にすることが可
能となる。なお、現時点での相対重要度が閾値よりも小
さい場合のみに、過去の検索での相対重要度を参照し、
一方、大きい場合には、過去の検索での相対重要度を参
照することなしに、現時点での相対重要度をそのまま用
いている。これにより、単語が検索に有用であるか否か
を迅速に決定することが可能となる。<Effect of Specific Example 3> As described above, in the information search method of specific example 3, unlike the information search method of specific example 1, it is determined whether the detected word is useful for the search. In the word extraction stage, the relative importance calculated at the present time is corrected and used after simply referring to the relative importance in the past search without simply using the relative importance calculated at the present time. As a result, it is possible to reflect the relative importance based on the input keyword in the past series of searches, which is positioned as a footprint that makes the search intention clearer, so that the search intention can be made clearer It becomes possible. In addition, only when the relative importance at the present time is smaller than the threshold, the relative importance in the past search is referred to,
On the other hand, when the relative importance is large, the relative importance at the present time is used without reference to the relative importance in the past search. This makes it possible to quickly determine whether a word is useful for a search.

【００３９】また、検索を行った時点がいつであるか、
即ち、その検索が行われた時点が、現時点に近いか遠い
かによって、その検索の際に算出された相対重要度に重
み付けをしている。これにより、検索した時期に合わせ
て、その検索での相対重要度を参照する度合いを調整す
ることができるので、検索の意図をより一層明確にする
ことが可能となる。Further, when the search was performed,
That is, the relative importance calculated at the time of the search is weighted depending on whether the search is performed near or far from the current time. Thereby, the degree of referring to the relative importance in the search can be adjusted according to the search time, so that the intention of the search can be further clarified.

【００４０】さらに、検索した時点が、現時点に近けれ
ば近いほど、その検索において算出された相対重要度を
参照する程度をより大きくしている。一般に、検索の意
図は現時点に近い程より明確になるので、このようにす
ることにより、検索の意図がより反映された単語を抽出
することが可能となる。Furthermore, the closer the search time is to the current time, the greater the degree of referring to the relative importance calculated in the search. In general, the search intention becomes clearer as the time is closer to the present time, and thus, it is possible to extract words in which the search intention is more reflected.

[Brief description of the drawings]

【図１】具体例１の検索フローチャートである。FIG. 1 is a search flowchart of a first specific example.

【図２】具体例１の情報検索装置のブロック図である。FIG. 2 is a block diagram of an information search device of a specific example 1.

【図３】シソーラス・ファイル部のデータ構造を示す図
である。FIG. 3 is a diagram showing a data structure of a thesaurus file section.

【図４】具体例１で検出された単語の範囲を示す図であ
る。FIG. 4 is a diagram showing a range of words detected in a specific example 1.

【図５】具体例１で抽出された単語の範囲を示す図であ
る。FIG. 5 is a diagram illustrating a range of words extracted in a specific example 1.

【図６】具体例２における制御部のブロック図である。FIG. 6 is a block diagram of a control unit in a specific example 2.

【図７】具体例２の検索フローチャートである。FIG. 7 is a search flowchart of a specific example 2.

【図８】具体例２で抽出された単語の範囲を示す図であ
る。FIG. 8 is a diagram showing a range of words extracted in a specific example 2.

【図９】具体例３の情報検索装置の特徴的な部分のブロ
ック図である。FIG. 9 is a block diagram of a characteristic portion of the information search device according to the third embodiment.

【図１０】具体例３の検索フローチャートである。FIG. 10 is a search flowchart of a specific example 3.

【図１１】具体例３の履歴記憶部に記憶されている履歴
を示す図である。FIG. 11 is a diagram illustrating a history stored in a history storage unit of a specific example 3.

【図１２】算出ステップで用いられる式を示す図（その
１）である。FIG. 12 is a diagram (part 1) illustrating an expression used in a calculation step.

【図１３】算出ステップで用いられる式を示す図（その
２）である。FIG. 13 is a diagram (part 2) illustrating an expression used in the calculation step.

【図１４】算出ステップで用いられる式を示す図（その
３）である。FIG. 14 is a diagram (part 3) illustrating an expression used in the calculation step.

[Explanation of symbols]

Ｓ１０入力ステップＳ１１検出ステップＳ１２算出ステップＳ１３抽出ステップＳ１４検索ステップＳ１５回答ステップ S10 input step S11 detection step S12 calculation step S13 extraction step S14 search step S15 answer step

Claims

[Claims]

1. A step of inputting a keyword used for searching desired information from a database; a plurality of words; a degree of perspective between words; Detecting, from a file, words within a certain range of perspective from the keyword; and determining the relative importance of the detected word with respect to the keyword, the absolute importance of the word itself and the words adjacent to the word. Calculating based on the absolute importance, extracting a word whose relative importance is greater than a predetermined threshold from the detected words, and selecting a word from the database based on the keyword and the extracted word. A step of searching for information.

2. The calculating step according to claim 1, wherein the relative importance of the detected word with respect to the keyword is calculated based on the absolute importance of the word itself, the absolute importance of a word adjacent to the word, and the word. And a reflection coefficient indicating a degree of reflecting the absolute importance of a word adjacent to the word.

3. The calculating step according to claim 1, wherein, if the absolute importance of the detected word itself exceeds a predetermined threshold, the relative importance of the detected word with respect to the keyword is determined. It is calculated based on the absolute importance and the absolute importance of the word adjacent to the word. When the absolute importance of each detected word exceeds a predetermined threshold, the relative value of the detected word with respect to the keyword is calculated. An information retrieval method characterized in that importance is represented by the absolute importance of the detected word itself.

4. A step of inputting a keyword used for searching for desired information from a database; and determining a plurality of words, a perspective between words, and a thesaurus file storing the absolute importance of the words themselves. Detecting words in the range of perspectives of the words; determining the relative importance of the detected words with respect to the keyword; the absolute importance of the words themselves; the absolute importance of words adjacent to the words; Calculating based on relative importance weighting depending on the degree of perspective with the keyword; extracting words whose relative importance exceeds a predetermined threshold from the detected words; Searching for desired information from a database based on the words.

5. The information retrieval method according to claim 4, wherein the weight of the relative importance of the detected word is reduced as the detected word is farther from the keyword.

6. The method according to claim 1, wherein, between the calculating step and the extracting step, the calculated relative importance is corrected using a relative importance of the word with respect to a keyword in a search performed in the past. An information retrieval method, comprising the step of:

7. The information search method according to claim 6, wherein the correction is performed using weighting of relative importance depending on a point in time when the search is performed.

8. The information search method according to claim 7, wherein the closer to the present point in time of the search, the greater the weight of the relative importance of the word with respect to the keyword in the search. Method.

9. The method according to claim 6, wherein, if the absolute importance of the detected word does not exceed a predetermined threshold, the relative importance of the word with respect to the keyword is calculated.
Correcting using the relative importance of the word itself in the past, and when the relative importance of each detected word exceeds a predetermined threshold, the relative importance of the word to the keyword is used as it is. Information retrieval method.

10. The information retrieval method according to claim 1, wherein the degree of perspective between words is determined based on a normal meaning inherent to the word.

11. The information retrieval method according to claim 10, wherein the degree of perspective between words is determined based on the degree of co-occurrence indicating the degree to which words occur simultaneously in a specific field.