JP2002215672A

JP2002215672A - Retrieval expression extension method, retrieval system and retrieval expression extension computer program

Info

Publication number: JP2002215672A
Application number: JP2001013839A
Authority: JP
Inventors: Keiichiro Hoashi; 啓一郎帆足; Kazunori Matsumoto; 一則松本; Naoki Inoue; 直己井ノ上; Kazuo Hashimoto; 和夫橋本
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2001-01-22
Filing date: 2001-01-22
Publication date: 2002-08-02
Anticipated expiration: 2021-01-22
Also published as: JP3862059B2

Abstract

PROBLEM TO BE SOLVED: To provide a retrieval expression extension technology in which high retrieval accuracy is acquired using a coordination filtering technique. SOLUTION: A retrieval expression vector Q is inputted to compute similarity Sim or a correlation coefficient Cor to all retrieval object documents of a retrieval object document group 26. The document group with the high computed similarity or correlation coefficient is extracted from the retrieval object document group, and the score of each word to a retrieval expression after extension is computed using the scores of the words included in the extracted document group. On the basis of the newly computed scores of the words, the extension object word is selected to create an extended retrieval expression. On the basis of the extended retrieval expression Qnew, the retrieval object document group D is retrieved again to extract a suitable document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、検索式拡張方法、
検索システム及び検索式拡張コンピュータプログラムに
関する。[0001] The present invention relates to a method for expanding a retrieval formula,
The present invention relates to a search system and a search expression extended computer program.

【０００２】[0002]

【従来の技術】一般に、文書検索システムでは文書デー
タベースに多数の文書群が、各文書に含まれる単語とそ
の出現頻度等を要素とするベクトルとして登録してお
く。そしてユーザが自然文を入力すれば、検索システム
側でその自然文を解析して検索式を作成し、文書データ
ベースに登録されている多数の文書を検索し、検索式と
類似度の高いベクトルの文書を抽出し、該当文書として
出力する。2. Description of the Related Art Generally, in a document retrieval system, a large number of documents are registered in a document database as a vector having elements of words included in each document and its appearance frequency. When the user inputs a natural sentence, the search system analyzes the natural sentence to create a search formula, searches a large number of documents registered in the document database, and searches for a vector having a high similarity to the search formula. Extract a document and output it as a relevant document.

【０００３】このような文書検索システムにおいて、さ
らに高精度の検索結果を得るために検索式拡張方法が知
られている。この検索式拡張方法は、ユーザが入力した
自然文から検索式に作成し、一度文書データベースを検
索して候補となる文書群を抽出し、さらに抽出された文
書群のベクトル情報を利用して検索式を拡張し、その拡
張された検索式によって文書データベースを再検索する
ことによってよりふさわしい文書群を抽出するものであ
る。[0003] In such a document search system, a search expression expansion method for obtaining a search result with higher precision is known. In this search formula expansion method, a search formula is created from a natural sentence input by a user, a document database is searched once to extract candidate documents, and a search is performed using vector information of the extracted documents. A more suitable document group is extracted by expanding the formula and re-searching the document database using the expanded search formula.

【０００４】そこで従来から知られている検索式拡張方
法の代表的なものとして、ロッキオ（Ｒｏｃｃｈｉｏ）
の方法が知られている。このロッキオの方法は、ベクト
ル空間モデルに基づく類似度検索のために開発された検
索式拡張方法であり、検索式を拡張することによって適
合文書群との類似度を最大化させると共に、非適合文書
群との類似度を最小化させるという基本原理に基づいた
方法である。[0004] As a typical search formula extension method known in the art, Rockchio is known.
The method is known. This method is a search formula expansion method developed for similarity search based on a vector space model. Maximizing the similarity with a set of conforming documents by expanding the search formula, This is a method based on the basic principle of minimizing the similarity with a group.

【０００５】より具体的には、初期検索の結果選択され
た適合文書並びに非適合文書のそれぞれから単語を抽出
し、次の数１式によって検索式を拡張するものである。More specifically, a word is extracted from each of a conforming document and a non-conforming document selected as a result of the initial search, and the retrieval formula is extended by the following equation (1).

【０００６】[0006]

【数１】なお、数１式において、ベクトルＱ_orgは入力されたオ
リジナルの検索式ベクトル、ベクトルＱ_newは拡張検索
式ベクトル、Ｒは文書データベースに登録されている適
合文書の数、Ｎは文書データベースに登録されている非
適合文書の数、ベクトルＤは上述した文書ベクトルであ
る。また、α，β，γは係数であり、例えば、２，３，
−２のような値が設定される。(Equation 1) In the equation (1), the vector Q _org is the input original search expression vector, the vector Q _new is the extended search expression vector, R is the number of matching documents registered in the document database, and N is the number of matching documents registered in the document database. The number of non-conforming documents, vector D, is the above-described document vector. Α, β, and γ are coefficients, for example, 2, 3,
A value such as -2 is set.

【０００７】これにより、検索対象文書群の中から、初
期検索によって選択された適合文書群のベクトルの平均
値と非適合文書群のベクトルの平均値とオリジナルの検
索式ベクトルＱ_orgのそれぞれにα，β，γの所定の係
数（重み）をかけて拡張検索式ベクトルＱ_newを得るの
である。As a result, the average value of the vector of the conforming document group and the average value of the vector of the non-conforming document group selected by the initial retrieval and α of the original retrieval formula vector _Qorg are selected from the retrieval target document group. , Β, γ are multiplied by predetermined coefficients (weights) to obtain an extended search expression vector Q _new .

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、このよ
うな従来のロッキオの方法をはじめとする既存の検索式
拡張方法では、初期検索の結果得られた文書群から自動
的に拡張対象単語を抽出し、検索式拡張を行うので、イ
ンタラクティブ検索システム等の使用時にユーザが任意
の拡張対象単語を指定したい状況下には適用することが
できない問題点があった。However, in the existing search formula expansion methods such as the conventional Rocchio method, words to be expanded are automatically extracted from a document group obtained as a result of the initial search. However, since the search expression is extended, there is a problem that the method cannot be applied to a situation where the user wants to specify an arbitrary expansion target word when using an interactive search system or the like.

【０００９】また、従来の検索式拡張方法では、検索式
拡張の際に初期検索での検索式と検索対象文書との類似
度の高低を考慮していないため、類似度が高い文書から
抽出された単語と類似度が低い文書から抽出された単語
とが全く同等に扱われてしまい、結果的に検索精度が低
下してしまう問題点があった。Further, in the conventional search formula expansion method, since the degree of similarity between the search formula in the initial search and the document to be searched is not taken into account when expanding the search formula, it is extracted from documents having high similarity. There is a problem that the extracted word and the word extracted from the document having low similarity are treated exactly the same, and as a result, the search accuracy is reduced.

【００１０】本発明は、このような従来の問題点を解決
するためになされたものであって、検索式拡張において
協調フィルタリング手法を利用することにより、高い検
索精度が得られる検索式拡張技術を提供することを目的
とする。SUMMARY OF THE INVENTION The present invention has been made to solve such a conventional problem, and a search expression extension technique which can obtain high search accuracy by utilizing a collaborative filtering technique in search expression extension. The purpose is to provide.

【００１１】[0011]

【課題を解決するための手段】請求項１の発明の検索式
拡張方法は、検索式の入力を受け付けるステップと、入
力された検索式と既存の検索対象文書群のすべての検索
対象文書との類似度を算出するステップと、算出された
類似度が高い文書を前記検索対象文書群から抽出するス
テップと、前記抽出された文書中に含まれる単語のスコ
アを利用し、拡張後の検索式に対する各単語のスコアを
算出するステップと、前記単語のスコアを算出した後、
拡張対象単語を選択し、拡張検索式を作成するステップ
とから成るものである。According to a first aspect of the present invention, there is provided a method for expanding a search formula, comprising the steps of: receiving an input of a search formula; and comparing the input search formula with all search target documents in an existing search target document group. Calculating a similarity degree; extracting a document having a high calculated degree of similarity from the search target document group; and using a score of a word included in the extracted document to obtain an expanded search expression. Calculating the score of each word, and calculating the score of the word,
Selecting an expansion target word and creating an expansion search expression.

【００１２】請求項２の発明の検索システムは、検索式
を入力する入力手段と、検索対象文書群を記憶している
記憶手段と、前記入力された検索式と前記検索対象文書
群のすべての検索対象文書との類似度を算出する類似度
算出手段と、前記算出された類似度が高い文書を前記検
索対象文書群から抽出する文書抽出手段と、前記抽出さ
れた文書中に含まれる単語のスコアを利用し、拡張後の
検索式に対する各単語のスコアを算出するスコア算出手
段と、前記スコア算出手段が算出した単語のスコアに基
づき、拡張対象単語を選択して拡張検索式を作成する拡
張検索式作成手段と、前記拡張検索式に基づき、前記検
索対象文書群を再検索する再検索手段とから成るもので
ある。According to a second aspect of the present invention, there is provided a search system, comprising: input means for inputting a search formula; storage means for storing a search target document group; and all of the input search formula and the search target document group. A similarity calculating unit that calculates a similarity with the search target document; a document extracting unit that extracts a document having the calculated high similarity from the search target document group; Score calculating means for calculating the score of each word for the expanded search formula using the score, and expansion for creating an expanded search formula by selecting an expansion target word based on the score of the word calculated by the score calculator. It comprises a search formula creating means and a re-search means for re-searching the search target document group based on the extended search formula.

【００１３】請求項３の発明の検索式拡張コンピュータ
プログラムは、検索式の入力を受け付ける処理と、入力
された検索式と既存の検索対象文書群のすべての検索対
象文書との類似度を算出する処理と、算出された類似度
が高い文書を前記検索対象文書群から抽出する処理と、
前記抽出された文書中に含まれる単語のスコアを利用
し、拡張後の検索式に対する各単語のスコアを算出する
処理と、前記単語のスコアを算出した後、拡張対象単語
を選択して拡張検索式を作成する処理とを実行するもの
である。According to a third aspect of the present invention, there is provided a computer program for expanding a search formula, which receives an input of a search formula, and calculates a similarity between the input search formula and all search target documents in an existing search target document group. Processing, a process of extracting a document having a high calculated similarity from the search target document group,
A process of calculating a score of each word for an expanded search formula using a score of a word included in the extracted document; and calculating a score of the word, and then selecting an expansion target word to perform an expanded search. And processing for creating an expression.

【００１４】請求項４の発明の検索式拡張方法は、検索
式の入力を受け付けるステップと、入力された検索式と
既存の検索対象文書群のすべての検索対象文書との相関
係数を算出するステップと、算出された相関係数が高い
文書を前記検索対象文書群から抽出するステップと、前
記抽出された文書中に含まれる単語のスコアを利用し、
拡張後の検索式に対する各単語のスコアを算出するステ
ップと、前記単語のスコアを算出した後、拡張対象単語
を選択し、拡張検索式を作成するステップとから成るも
のである。According to a fourth aspect of the present invention, there is provided a method for expanding a search formula, comprising: receiving an input of a search formula; and calculating a correlation coefficient between the input search formula and all search documents in an existing search target document group. Extracting a document having a high calculated correlation coefficient from the search target document group, and using a score of a word included in the extracted document,
The method comprises the steps of calculating a score of each word for the expanded search expression, and selecting an expansion target word after calculating the score of the word, and creating an expanded search expression.

【００１５】請求項５の発明の検索システムは、検索式
を入力する入力手段と、検索対象文書群を記憶している
記憶手段と、前記入力された検索式と前記検索対象文書
群のすべての検索対象文書との相関係数を算出する相関
係数算出手段と、前記算出された相関係数が高い文書を
前記検索対象文書群から抽出する文書抽出手段と、前記
抽出された文書中に含まれる単語のスコアを利用し、拡
張後の検索式に対する各単語のスコアを算出するスコア
算出手段と、前記スコア算出手段が算出した単語のスコ
アに基づき、拡張対象単語を選択して拡張検索式を作成
する拡張検索式作成手段と、前記拡張検索式に基づき、
前記検索対象文書群を再検索する再検索手段とから成る
ものである。According to a fifth aspect of the present invention, there is provided a retrieval system, comprising: input means for inputting a search formula; storage means for storing a search target document group; and all of the input search formula and the search target document group. Correlation coefficient calculation means for calculating a correlation coefficient with a search target document, document extraction means for extracting a document having a high calculated correlation coefficient from the search target document group, and including in the extracted document Score calculating means for calculating the score of each word for the expanded search formula using the score of the word to be expanded, and selecting an expansion target word based on the score of the word calculated by the score calculating means to execute the expanded search formula. Based on the extended search formula creating means to be created,
Re-search means for re-searching the search target document group.

【００１６】請求項６の発明の検索式拡張コンピュータ
プログラムは、検索式の入力を受け付ける処理と、入力
された検索式と既存の検索対象文書群のすべての検索対
象文書との相関係数を算出する処理と、算出された相関
係数が高い文書を前記検索対象文書群から抽出する処理
と、前記抽出された文書中に含まれる単語のスコアを利
用し、拡張後の検索式に対する各単語のスコアを算出す
る処理と、前記単語のスコアを算出した後、拡張対象単
語を選択して拡張検索式を作成する処理とを実行するも
のである。According to a sixth aspect of the present invention, there is provided a computer program for expanding a search formula, which receives an input of a search formula, and calculates a correlation coefficient between the input search formula and all search documents in an existing search target document group. And a process of extracting a document having a high calculated correlation coefficient from the group of documents to be searched, and using a score of a word included in the extracted document, and A process of calculating a score and a process of selecting an expansion target word and generating an expanded search formula after calculating the score of the word are executed.

【００１７】本発明によれば、検索式拡張において協調
フィルタリング手法を利用し、ユーザが入力する検索式
と検索対象文書群との類似度又は相関係数を検索式拡張
の際に考慮するために検索対象文書の重要度が検索式拡
張に反映され、ユーザの要求により適合した文書の検索
が可能となる。加えて、ユーザの指定する単語など、任
意の単語を利用した検索式拡張が可能である。According to the present invention, a collaborative filtering technique is used in the expansion of a search expression, and the similarity or the correlation coefficient between the search expression input by the user and the group of documents to be searched is taken into account when the search expression is expanded. The importance of the search target document is reflected in the search expression extension, and a search for a document more suitable for the user's request can be performed. In addition, it is possible to extend a search formula using an arbitrary word such as a word specified by a user.

【００１８】[0018]

【発明の実施の形態】以下、本発明の実施の形態を図に
基づいて詳説する。図１は本発明の１つの実施の形態の
検索システムの機能的な構成を示している。この検索シ
ステムは１台のコンピュータシステム、あるいはＬＡＮ
や通信回線のような情報ネットワークで接続された複数
台のコンピュータネットワークシステムとして実現され
るものであり、後述する各構成要素は、入力インタフェ
ース、出力インタフェース、外部記憶装置のようなハー
ドウェアであったり、コンピュータに組み込まれるソフ
トウェアプログラムの演算処理機能であったりするが、
ここでは説明を簡明にするために、すべてを機能要素と
して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 shows a functional configuration of a search system according to one embodiment of the present invention. This search system is a single computer system or LAN
And is realized as a plurality of computer network systems connected by an information network such as a communication line, and each component described later is hardware such as an input interface, an output interface, and an external storage device. , Or the processing function of a software program embedded in a computer,
Here, for the sake of simplicity, all components will be described as functional elements.

【００１９】本実施の形態の検索システムは、ユーザが
コマンドや検索文を入力し、またその他の必要な種々の
情報を入力するための入力部１、本発明の特徴をなす検
索式拡張処理部２、最終的な検索処理を行う検索処理部
３、諸情報や検索結果を出力する出力部４、そして多数
の文書データと文書ベクトルデータが保存されている文
書データベース５から構成されている。The search system according to the present embodiment has an input unit 1 for a user to input a command or a search sentence and various other necessary information, and a search expression expansion processing unit which is a feature of the present invention. 2, a search processing unit 3 for performing a final search process, an output unit 4 for outputting various information and search results, and a document database 5 storing a large number of document data and document vector data.

【００２０】入力部１は、ネットワークを通じて諸デー
タを入力し、あるいはユーザによって種々のデータを入
力するための入力インタフェースであり、本実施の形態
に特定するならば、検索文を入力して検索処理を実行さ
せるために必要な機能要素としてキーボード及びマウス
その他のポインティングデバイスが該当する。The input unit 1 is an input interface for inputting various data through a network or for inputting various data by a user. If specified in the present embodiment, a search sentence is input and search processing is performed. A keyboard, a mouse, and a pointing device correspond to the functional elements required to execute the operation.

【００２１】検索式拡張処理部２は、入力部１から入力
された検索文に対して検索式ベクトルを作成し、さらに
検索式拡張を実行する機能要素である、この検索式拡張
処理部２は図２に示す構成であり、検索式作成処理部２
１、類似度演算部２２、スコア演算部２３、拡張対象単
語選択部２４、拡張検索式作成部２５、そして文書デー
タベース５に登録されている検索対象文書群２６から構
成される。The search expression expansion processing unit 2 is a functional element that creates a search expression vector for a search sentence input from the input unit 1 and further executes search expression expansion. It is a configuration shown in FIG.
1, a similarity calculation unit 22, a score calculation unit 23, an expansion target word selection unit 24, an expansion search expression creation unit 25, and a search target document group 26 registered in the document database 5.

【００２２】検索式作成処理部２１は、ユーザが入力し
た検索文に対して検索式ベクトルを作成する。類似度演
算部２２は、検索式作成処理部２１が作成した検索式ベ
クトルに対して検索対象文書群２６内の各文書ベクトル
との類似度を演算し、類似度の高い文書ベクトルを抽出
する。スコア演算部２３は、類似度演算部２２が抽出し
た文書ベクトル各々の中に含まれる各単語のスコアを利
用し、拡張後の検索式ベクトルに対する各単語のスコア
を算出する。拡張対象単語選択部２４は、スコア演算部
２３により算出した各単語のスコアに基づき、拡張対象
単語選択する。そして拡張検索式作成部２５は、拡張対
象単語選択部２４が選択した単語を元の検索式ベクトル
に加えることにより検索式拡張を行い、その拡張検索式
ベクトルを出力する。The search formula creation processing unit 21 creates a search formula vector for a search sentence input by the user. The similarity calculation unit 22 calculates the similarity between the search expression vector created by the search expression creation processing unit 21 and each document vector in the search target document group 26, and extracts a document vector having a high similarity. The score calculation unit 23 calculates the score of each word with respect to the expanded search expression vector using the score of each word included in each document vector extracted by the similarity calculation unit 22. The expansion target word selection unit 24 selects an expansion target word based on the score of each word calculated by the score calculation unit 23. Then, the extended search formula creator 25 performs a search formula extension by adding the word selected by the extension target word selector 24 to the original search formula vector, and outputs the extended search formula vector.

【００２３】検索処理部３は、検索式拡張処理部２が拡
張した拡張検索式ベクトルに基づき文書データベース５
内の文書ベクトル群を再検索し、適合文書を抽出する。
出力部４は、検索処理部３により抽出された適合文書を
表示、プリントアウト又はデータ伝送により出力する。The search processing unit 3 is based on the expanded search expression vector expanded by the search expression expansion processing unit 2 and is based on the document database 5.
The document vector group in is re-searched, and a matching document is extracted.
The output unit 4 outputs the conforming document extracted by the search processing unit 3 by display, printout, or data transmission.

【００２４】次に、上記の実施の形態の検索処理システ
ムによる検索処理動作について説明する。ユーザは入力
部１により、検索文を入力する。この検索文はユーザ自
身が手入力し、あるいはフロッピー（登録商標）ディス
クやハードディスクのような外部記憶装置に記憶されて
いる文書を指定して入力する方法であってもよい。Next, a search processing operation by the search processing system of the above embodiment will be described. The user inputs a search sentence using the input unit 1. The search sentence may be manually input by the user, or may be input by designating a document stored in an external storage device such as a floppy (registered trademark) disk or a hard disk.

【００２５】入力部１により検索文が入力されると、検
索式拡張処理部２がこれを受け取り、図３に示すフロー
チャートのように検索式拡張処理を実行する。When a search sentence is input by the input unit 1, the search expression expansion processing unit 2 receives this and executes a search expression expansion process as shown in the flowchart of FIG.

【００２６】まず、検索式拡張処理部２における検索式
作成処理部２１が、入力された検索文に対して、それに
含まれる所定の単語（例えば、日本語文であれば形態素
解析により名詞や動詞、英文であれば名詞、動詞の原形
や語幹。ただし、本検索システムに登録されている単語
は全体で、ｗ₁〜ｗ_nのｎ個である）ごとにその出現頻
度をスコアｑ_i（ｉ＝１〜ｎ：ｎは本システムに登録さ
れている単語辞書に含まれる単語数。したがって、単語
によってはスコアｑ_i＝０となるものも含まれる）とす
るベクトルＱの検索式を作成する。なお、予め文書デー
タベース５には検索対象文書ベクトル群Ｄが登録されて
いる。これらの検索対象文書ベクトル群Ｄは、上述した
検索式ベクトルＱと同様の文書ベクトルｄ_m（ｍ＝１〜
Ｎ）を要素とする集合である。そして検索対象文書ベク
トルｄ_mは、上記の検索式ベクトルＱと同様に、それに
含まれている所定の単語ごとの出現頻度もしくはこれに
所定の演算処理をして得た値をスコアｄ_mj（ｊ＝１〜
ｎ）としている（ステップＳ０１）。First, the search expression creation processing unit 21 in the search expression expansion processing unit 2 converts a predetermined word (for example, a Japanese sentence into a noun, a verb, if the English noun, verb of the original form and stem. However, as a whole words that are registered in the search system, w ₁ ~w _n of n and is) each to the frequency of occurrence of the score q _{i (i} = 1 to n: n is the number of words included in the word dictionary registered in the present system. Therefore, a search formula for a vector Q is created which assumes that words have a score q _i = 0. The search target document vector group D is registered in the document database 5 in advance. These search target document vector groups D have the same document vector d _m (m = 1 to
N). The target document vector d _m, as in the above search expression vector Q, which the value obtained by a predetermined calculation process appearance frequency of each predetermined word contained or to the score d _mj (j = 1
n) (step S01).

【００２７】なお、このスコアｄ_mjの演算手法として
は、一般的にはＴＦ＊ＩＤＦによるものが知られてい
る。このＴＦ＊ＩＤＦとは、（Term Frequency * Inver
se Document Frequency）のことで、単語の出現頻度に
出現文書頻度の逆数をかけ算したものである。単純に単
語の出現頻度だけに注目した場合、それほど重要な意味
のない単語、「例えば、英語であれば（the）のような
単語」が重要視されてしまうので、出現頻度ＴＦにさら
にＩＤＦを掛けることによってより少ない文書に現れる
単語の重要度を高めることができるからである。さら
に、このＴＦ＊ＩＤＦに対して、次のような対数を用い
た式を採用することもできる。As a method of calculating the score _dmj, a method based on TF * IDF is generally known. This TF * IDF is (Term Frequency * Inver
se Document Frequency), which is obtained by multiplying the appearance frequency of a word by the reciprocal of the appearance document frequency. If attention is paid simply to the frequency of occurrence of a word, an insignificant word such as "a word such as (the) in English" is regarded as important. This is because multiplication can increase the importance of words appearing in fewer documents. Furthermore, an equation using the following logarithm can be adopted for TF * IDF.

【００２８】[0028]

【数２】ここで、ＴＦ（ｊ）は単語ｗ_jの出現頻度、ＤＦ（ｊ）
は単語ｗ_jが出現した文書の数、Ｍは登録全文書数であ
る。(Equation 2) Here, TF (j) is the frequency of appearance of word w _j , DF (j)
Is the number of documents in which the word w _j appears, and M is the total number of registered documents.

【００２９】検索式作成処理部２１により入力された検
索文から検索式ベクトルＱが作成されると、この検索式
ベクトルＱは類似度演算部２２に渡される。類似度演算
部２２では、検索式ベクトルＱと検索対象文書ベクトル
群Ｄ内のすべての検索対象文書ベクトルｄ_m（ｍ＝１〜
Ｎ）との類似度Simが演算される。この類似度演算を次
のようにして行われる。When the search expression vector Q is created from the search sentence input by the search expression creation processing unit 21, the search expression vector Q is passed to the similarity calculation unit 22. In the similarity calculation unit 22, the search expression vector Q and all the search target document vectors d _m (m = 1 to
N) is calculated. This similarity calculation is performed as follows.

【００３０】[0030]

【数３】この数３式を数学的に説明すると、ベクトルＱとベクト
ルｄとの角度θの余弦（cosθ）である。したがって、
ベクトル方向が完全に一致する場合にθ＝０であり、類
似度＝１となる。類似度演算部２２は、検索対象文書ベ
クトル群Ｄ中のすべての検索対象文書ベクトルｄ_m（ｍ
＝１〜Ｎ）各々に対して検索式ベクトルＱとの余弦を計
算し、それを類似度Sim（Ｑ，ｄ）として求める（ステ
ップＳ０２）。(Equation 3) To mathematically explain the equation (3), it is the cosine (cos θ) of the angle θ between the vector Q and the vector d. Therefore,
When the vector directions completely match, θ = 0 and the similarity = 1. The similarity calculator 22 calculates all the search target document vectors d _m (m
= 1 to N), the cosine of each of the search formula vectors Q is calculated, and the calculated cosine is obtained as the similarity Sim (Q, d) (step S02).

【００３１】類似度演算部２２はさらに、検索式ベクト
ルＱとの類似度が高い上位ｋ件の検索対象文書ベクトル
ｄ_s1〜ｄ_skを抽出し、これを類似文書ベクトル群Ｄ_sim
とする（ステップＳ０３）。The similarity calculation unit 22 further extracts the top k search target document vectors d _{s1 to} d _sk having a high similarity with the search expression vector Q, and divides them into a similar document vector group D _sim.
(Step S03).

【００３２】続いてスコア演算部２３が、元の検索式ベ
クトルＱにおける１番目の単語ｗ₁〜ｎ番目の単語ｗ_n
についてのスコアｑ₁〜ｑ_nを調べ、ｉ番目の単語ｗ_i
に対するスコアｑ_iがゼロでない場合にはそのスコアを
当該ｉ番目の単語ｗ_iに対するスコアｑ’_i（＝ｑ_i）
とし（ステップＳ０６）、スコアｑ_iがゼロの場合（つ
まり、元の検索式ベクトルＱにはｉ番目の単語ｗ_iが含
まれていない場合）には、次の数４式により当該ｉ番目
の単語ｗ_iに対するスコアを算出し、拡張後の検索式ベ
クトルＱ_newに対するｉ番目の単語ｗ_iに対するスコア
とする（ステップＳ０７）。つまり協調フィルタリング
の原理を適用してシステムに登録されているすべての単
語について、元の検索式ベクトルＱを拡張するために新
たにスコアを求めるのである（ステップＳ０４〜Ｓ０
９）。Subsequently, the score calculator 23 calculates the first word w _{1 to the} n-th word w _{n in the} original search expression vector Q.
Examine the score q ₁ ~q _n about, i-th word w _i
If the score q _i for the word i is not zero, the score is changed to the score q ′ _i (= q _i ) for the i-th word w _i
(Step S06), and when the score q _i is zero (that is, when the original search expression vector Q does not include the i-th word w _i ), the i-th word w _i is obtained by the following Expression 4. calculating a score for a word w _i, and the score for the i-th word w _i with respect to the search expression vector Q _{new new} after expansion (step S07). That is, a new score is obtained for all words registered in the system by applying the principle of collaborative filtering in order to expand the original search expression vector Q (steps S04 to S0).
9).

【００３３】[0033]

【数４】ここで、ｑバーは元の検索式ベクトルＱにおけるスコア
がゼロ以外の単語についてのスコアの平均値を示す。同
様にｄ_jバーは類似文書ベクトル群Ｄ_sim内の類似文書
ベクトルｄ_sjにおけるスコアがゼロ以外の単語について
のスコアの平均値を示す。これらにおいて、例えば、ス
コアがゼロ以外の単語の数が１００個あり、それら１０
０個の単語のスコアの合計が１５００であれば、スコア
の平均値ｑバーは１５００／１００＝１５とするのであ
る。(Equation 4) Here, q bar indicates the average value of the scores of words having a score other than zero in the original search expression vector Q. Similarly, the _dj bar indicates the average value of the scores of words having a score other than zero in the similar document vector d _sj in the similar document vector group D _sim . In these, for example, there are 100 words with non-zero scores,
If the total of the scores of the 0 words is 1500, the average value q of the scores is 1500/100 = 15.

【００３４】数４式においてさらに、スカラーｄ_jiは類
似文書ベクトルｄ_sjにおけるｉ番目の単語ｗ_iのスコア
である。またさらに、係数κは正規化のための係数であ
り、κ×ΣSim（Ｑ，ｄ）＝１になるように設定され
る。In equation (4), the scalar d _ji is the score of the i-th word w _i in the similar document vector d _sj . Further, the coefficient κ is a coefficient for normalization, and is set so that κ × ΣSim (Q, d) = 1.

【００３５】このようにして、元の検索式ベクトルＱの
要素中のスコアがゼロ、つまりこの検索式ベクトルには
含まれていない単語であっても、元の検索式ベクトルに
対する類似度が高い文書ベクトルｄ_s1〜ｄ_sk中には含ま
れており、その類似文書ベクトル中でのスコアが高い単
語については拡張対象単語とするのである。As described above, even if the score in the element of the original search expression vector Q is zero, that is, even if the word is not included in the search expression vector, the document having a high similarity to the original search expression vector is obtained. The words that are included in the vectors d _{s1 to} d _sk and have high scores in the similar document vectors are set as expansion target words.

【００３６】拡張対象単語選択部２４は、システムに登
録されているすべての単語ｗ_jについて、元の検索式ベ
クトルＱに関連して得られたスコアｑ’ _jの高いもの
から所定個数、例えば５個あるいは１０個の単語を選択
して元の検索式ベクトルＱに含まれている単語に追加す
る。そして拡張検索式作成部２５が拡張後検索式ベクト
ルＱ_newを作成する（ステップＳ１０，Ｓ１１）。The expansion target word selection unit 24 determines a predetermined number of all words w _j registered in the system from the highest score q ′ _j obtained in relation to the original search expression vector Q, for example, 5 One or ten words are selected and added to the words included in the original search expression vector Q. Then, the extended search expression creating unit 25 creates an extended search expression vector Q _new (steps S10, S11).

【００３７】例えば、元の検索式ベクトルＱに含まれて
いた単語（したがって、スコアが０ではない単語）がＡ
（１０），Ｂ（５），Ｃ（６），Ｄ（１５）であったと
する。だたし、（）内の数字はスコアを示している。
そして、類似文書ベクトル群Ｄ_sim中の類似文書ベクト
ルｄ_jに対する検索式拡張処理で新たに単語Ｅ（１
１），Ｆ（９），Ｇ（７）が得られたとする。この場
合、元の検索式ベクトルＱは、For example, if the word included in the original search expression vector Q (therefore, the word whose score is not 0) is A
Suppose (10), B (5), C (6), D (15). However, the numbers in parentheses indicate the scores.
Then, a new word E in Query Expansion processing for similar document vectors d _j in the similar document vector group D _sim (1
It is assumed that 1), F (9), and G (7) are obtained. In this case, the original search expression vector Q is

【数５】となるが、拡張後検索式ベクトルＱ_newは次のようにな
る。(Equation 5) Where the expanded search expression vector Q _new is as follows.

【００３８】[0038]

【数６】このようにして得られた拡張検索式ベクトルＱ_newを用
いて、検索処理部３は文書データベース５を再検索し、
適合文書ベクトル群を抽出し、出力部４によりディスプ
レイに表示し、プリントアウトし、又はネットワークを
通じてユーザのコンピュータに転送する。(Equation 6) Using the extended search expression vector Q _new obtained in this way, the search processing unit 3 searches the document database 5 again,
The matching document vector group is extracted, displayed on the display by the output unit 4, printed out, or transferred to the user's computer via the network.

【００３９】これにより、第１の実施の形態の検索シス
テムによれば、ユーザが検索文としては不十分な入力を
行った場合でも、検索システム側で自動的に検索式拡張
処理を行い、入力された検索文に基づく検索式よりも拡
張された検索式により登録文書データベースを検索し、
ユーザが意図するような適切な文書を抽出し、出力する
ことができるのである。Thus, according to the search system of the first embodiment, even when the user makes an insufficient input as a search sentence, the search system automatically performs a search expression expansion process, and The registered document database is searched by a search expression extended from a search expression based on the searched search sentence,
An appropriate document as intended by the user can be extracted and output.

【００４０】次に、本発明の第２の実施の形態の検索シ
ステムを図４及び図５を用いて説明する。第２の実施の
形態は、基本的な作用効果については第１の実施の形態
と同様であるが、第１の実施の形態で用いた類似度の代
わりに、相関係数を用いる点に特徴がある。したがっ
て、本実施の形態の検索システムの機能的な構成は、図
１に示した第１の実施の形態と共通であるが、検索式拡
張処理部２による検索式拡張処理が、図４及び図５に示
すように変更される。これについて、以下に説明する。Next, a search system according to a second embodiment of the present invention will be described with reference to FIGS. The second embodiment is similar to the first embodiment in the basic operation and effect, but is characterized in that a correlation coefficient is used instead of the similarity used in the first embodiment. There is. Therefore, the functional configuration of the search system according to the present embodiment is the same as that of the first embodiment shown in FIG. 1, but the search expression expansion processing by the search expression expansion processing unit 2 is performed as shown in FIGS. It is changed as shown in FIG. This will be described below.

【００４１】本実施の形態における検索式拡張処理部２
は、図４に示すようにユーザが入力する検索文に対して
検索式ベクトルを作成する検索式作成処理部２１、相関
係数演算部２２０、スコア演算部２３０、拡張対象単語
選択部２４そして拡張検索式作成部２５から構成され
る。これらのうち検索式作成処理部２１、拡張対象単語
選択部２４及び拡張検索式作成部２５は、図２に示した
第１の実施の形態と同様である。Search expression expansion processing unit 2 in this embodiment
As shown in FIG. 4, a search formula creation processing unit 21 for creating a search formula vector for a search sentence input by a user, a correlation coefficient calculation unit 220, a score calculation unit 230, an expansion target word selection unit 24, and an expansion It is composed of a search formula creation unit 25. Among them, the search formula creation processing unit 21, the expansion target word selection unit 24, and the extended search formula creation unit 25 are the same as those in the first embodiment shown in FIG.

【００４２】そして本実施の形態の特徴部分である相関
係数演算部２２０は、検索式ベクトルＱに対して、検索
対象文書ベクトル群Ｄに含まれるすべての文書ベクトル
各々と後述する相関係数を演算し、相関係数上位ｋの検
索対象文書ベクトルを抽出し、これをスコア演算部２３
に出力する。またスコア演算部２３０は、相関係数演算
部２２０が抽出した文書ベクトル各々の中に含まれる単
語のスコアを利用し、拡張後の検索式ベクトルに対する
各単語のスコアを算出する。The correlation coefficient calculator 220, which is a characteristic part of the present embodiment, calculates, for the search expression vector Q, all the document vectors included in the search target document vector group D and a correlation coefficient described later. Calculate and extract the search target document vector having the highest k of the correlation coefficient,
Output to The score calculation unit 230 calculates the score of each word with respect to the expanded search expression vector by using the score of the word included in each document vector extracted by the correlation coefficient calculation unit 220.

【００４３】以下、図５に示すフローチャートを用い
て、第２の実施の形態の検索システムの検索式拡張処理
部２における検索式拡張処理について説明する。Hereinafter, the search expression expansion processing in the search expression expansion processing unit 2 of the search system according to the second embodiment will be described with reference to the flowchart shown in FIG.

【００４４】まず、検索式拡張処理部２における検索式
作成処理部２１によるステップＳ０１の処理は、第１の
実施の形態と共通である。First, the processing of step S01 by the search formula creation processing unit 21 in the search formula expansion processing unit 2 is common to that of the first embodiment.

【００４５】そして、ステップＳ０２′において、相関
係数演算部２２０が、検索式ベクトルＱと検索対象文書
ベクトル群Ｄ内のすべての検索対象文書ベクトルｄ
_m（ｍ＝１〜Ｎ）との相関係数Corを演算する。この相
関係数Corの演算は次のようにして行う。Then, in step S02 ', the correlation coefficient calculating section 220 calculates the search expression vector Q and all the search target document vectors d in the search target document vector group D.
The correlation coefficient Cor with _m (m = 1 to N) is calculated. The calculation of the correlation coefficient Cor is performed as follows.

【００４６】[0046]

【数７】ここで、ｑバーは検索式ベクトルＱにおけるスコアが０
以外の値を持つ単語ごとのスコアの平均値を示し、ま
た、ｄ_mバーは検索対象文書のベクトルｄ_mにおけるス
コアがゼロ以外の値を持つ単語ごとのスコアの平均値を
示している。そして、スカラーｑ_jは検索式ベクトルＱ
内のｊ（ｊ＝１〜ｎ）番目の単語のスコアであり、スカ
ラーｄ_mjは検索対象文書ベクトルｄ_m内のｊ（ｊ＝１〜
ｎ）番目の単語のスコアである。(Equation 7) Here, the q bar indicates that the score in the search expression vector Q is 0.
Shows the average value of the scores of each word having a value other than, also, d _m bar represents the average value of the scores for each word score in the vector d _m of the target document has a non-zero value. And the scalar q _j is the search expression vector Q
A j (j = 1~n) th word score of the inner, the scalar d _mj is the target document vector d _m in the j (j =. 1 to
n) The score of the word.

【００４７】相関係数演算部２２０はさらに、検索式ベ
クトルＱとの相関係数が高い上位ｋ件の検索対象文書ベ
クトルｄ_c1〜ｄ_ckを抽出し、これを類似文書ベクトル群
Ｄ_co _rとする（ステップＳ０３′）。The correlation coefficient calculation unit 220 further extracts a search target document vector d _c1 to d _ck of high-level k matter correlation coefficient between the search expression vector Q, which a similar document vector group D _co _r (Step S03 ').

【００４８】続いてスコア演算部２３０が、元の検索式
ベクトルＱにおける１番目〜ｎ番目の単語各々について
のスコアｑ₁〜ｑ_nを調べ、ｉ番目の単語に対するスコ
アｑ _iがゼロでない場合にはそのスコアを当該ｉ番目の
単語に対するスコアｑ’_i（＝ｑ_i）とし（ステップＳ
０６）、スコアｑ_iがゼロの場合には、次の数８式によ
り当該ｉ番目の単語に対するスコアを算出し、拡張後の
検索式ベクトルＱ_newに対するｉ番目の単語に対するス
コアｑ’_iとする（ステップＳ０７′）。つまり、ここ
でも協調フィルタリングの原理を適用してシステムに登
録されているすべての単語について、元の検索式ベクト
ルＱを拡張するために新たにスコアを求めるのである
（ステップＳ０４〜Ｓ０９）。Subsequently, the score calculation unit 230 calculates the original search expression
For each of the first to nth words in vector Q
Score q₁~ Q_nTo find the score for the i-th word
Aq _iIf is not zero, the score is
Score q 'for word_i(= Q_i) And (Step S
06), score q_iIf is zero, then
Calculates the score for the i-th word, and
Search expression vector Q_newFor the i-th word for
Core q '_i(Step S07 '). In other words, here
However, applying the principle of collaborative filtering to the system
The original search vector for all recorded words
To obtain a new score to extend the Q
(Steps S04 to S09).

【００４９】[0049]

【数８】ここで、ｑバーは元の検索式ベクトルＱにおけるスコア
がゼロ以外の単語についてのスコアの平均値を示す。同
様にｄ_jバーは類似文書ベクトル群Ｄ_cor内の類似文書
ベクトルｄ_cjにおけるスコアがゼロ以外の単語について
のスコアの平均値を示し、スカラーｄ_jiは類似文書ベク
トルｄ_cjにおけるｉ番目の単語のスコアを示す。数８式
においてさらに、係数κは正規化のための係数であり、
κ×ΣCor（Ｑ，ｄ）＝１になるように設定される。(Equation 8) Here, q bar indicates the average value of the scores of words having a score other than zero in the original search expression vector Q. Similarly, the bar _dj indicates the average value of the scores of words having a non-zero score in the similar document vector d _cj in the similar document vector group D _cor , and the scalar d _ji indicates the i-th word of the similar document vector d _cj . Show the score. In Equation 8, the coefficient κ is a coefficient for normalization,
It is set so that κ × ΣCor (Q, d) = 1.

【００５０】このようにして、元の検索式ベクトルＱの
要素中のスコアがゼロ、つまりこの検索式ベクトルには
含まれていない単語であっても、元の検索式ベクトルに
対する相関係数が高い文書ベクトルｄ_c1〜ｄ_ck中には含
まれており、その類似文書ベクトル中でのスコアが高い
単語については拡張対象単語とするのである。In this way, even if the score in the element of the original search expression vector Q is zero, that is, even if the word is not included in this search expression vector, the correlation coefficient with the original search expression vector is high. Words that are included in the document vectors d _{c1 to} d _ck and have a high score in the similar document vectors are set as expansion target words.

【００５１】拡張対象単語選択部２４は、第１の実施の
形態と同様に、システムに登録されているすべての単語
ｗ₁〜ｗ_nについて、元の検索式ベクトルＱに関連して
得られたスコアｑ’ _jの高いものから所定個数の単語
を選択して元の検索式ベクトルＱに含まれている単語に
追加する。そして拡張検索式作成部２５が拡張後検索式
ベクトルＱ_newを作成する（ステップＳ１０，Ｓ１
１）。The expansion target word selection unit 24 obtains all the words w _{1 to} w _n registered in the system in relation to the original search expression vector Q, as in the first embodiment. A predetermined number of words are selected from those having a high score q ′ _j and added to the words included in the original search expression vector Q. Then, the extended search expression creating unit 25 creates an extended search expression vector Q _new (steps S10 and S1).
1).

【００５２】このようにして得られた拡張検索式ベクト
ルＱ_newを用いて、検索処理部３は文書データベース５
を再検索し、適合文書ベクトル群を抽出し、出力部４に
よりディスプレイに表示し、プリントアウトし、又はネ
ットワークを通じてユーザのコンピュータに転送する。[0052] Using the thus obtained expanded search expression vector Q _{new new,} the search processing unit 3 document database 5
Is retrieved again, a group of matching document vectors is extracted, displayed on a display by the output unit 4, printed out, or transferred to a user's computer via a network.

【００５３】これにより、第２の実施の形態の検索シス
テムによっても、ユーザが検索文としては不十分な入力
を行った場合でも、検索システム側で自動的に検索式拡
張処理を行い、入力された検索文に基づく検索式よりも
拡張された検索式により登録文書データベースを検索
し、ユーザが意図するような適切な文書を抽出し、出力
することができる。Thus, according to the search system of the second embodiment, even when the user makes an insufficient input as a search sentence, the search system automatically performs a search expression expansion process and inputs the search expression. It is possible to search the registered document database by a search formula extended from a search formula based on the search text, and extract and output an appropriate document intended by the user.

【００５４】なお、上記の両実施の形態において、各式
で用いる係数その他の数値は例示したものであり、特に
限定されるものではなく、システムにより、検索対象文
書の種類により変更され得るものであり、またこれらを
ユーザが入力部から設定・変更操作することもできる。In the above-described embodiments, the coefficients and other numerical values used in the respective equations are mere examples, and are not particularly limited. They can be changed by the system depending on the type of the document to be searched. Yes, and these can be set and changed by the user from the input unit.

【００５５】また、上記の両実施の形態においては、検
索式を作成するためにユーザは自然文の形の検索文を入
力するものとしたが、ユーザに最初から検索式を入力さ
せる方法とってもよい。In both of the above embodiments, the user inputs a search sentence in the form of a natural sentence in order to create a search expression. However, a method in which the user inputs a search expression from the beginning may be used. .

【００５６】さらに、上記の両実施の形態では検索シス
テムについて説明したが、本発明は当該システムに搭載
される検索式拡張コンピュータプログラムをも技術的範
囲とし、また当該検索式拡張コンピュータプログラムの
インストールされたコンピュータプログラムシステムが
実行する検索式拡張方法をも技術的範囲とするものであ
る。Further, in both of the above embodiments, the search system has been described. However, the present invention also covers the search-type extended computer program installed in the system, and the search-type extended computer program is installed. The technical scope also includes a search formula expansion method executed by a computer program system.

【００５７】[0057]

【発明の効果】本発明によれば、検索式拡張において協
調フィルタリング手法を利用し、ユーザが入力する検索
式と検索対象文書群との類似度又は相関係数を検索式拡
張の際に考慮するために、検索対象文書の重要度が検索
式拡張に反映され、ユーザの要求により適合した文書の
検索が可能である。加えて、ユーザの指定する単語な
ど、任意の単語を利用した検索式拡張が可能である。According to the present invention, a similarity or a correlation coefficient between a search formula input by a user and a search target document group is considered when expanding a search formula by using a collaborative filtering technique in search formula expansion. Therefore, the importance of the search target document is reflected in the search expression expansion, and a search for a document that is more suitable for the user's request can be performed. In addition, it is possible to extend a search formula using an arbitrary word such as a word specified by a user.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態の検索システムの機
能的な構成を示すブロック図。FIG. 1 is a block diagram showing a functional configuration of a search system according to a first embodiment of the present invention.

【図２】第１の実施の形態における検索式拡張処理部の
詳しい機能的構成を示すブロック図。FIG. 2 is a block diagram showing a detailed functional configuration of a search expression expansion processing unit according to the first embodiment.

【図３】第１の実施の形態による検索式拡張処理のフロ
ーチャート。FIG. 3 is a flowchart of a search expression expansion process according to the first embodiment.

【図４】本発明の第２の実施の形態における検索式拡張
処理部の詳しい機能的構成を示すブロック図。FIG. 4 is a block diagram showing a detailed functional configuration of a search expression expansion processing unit according to a second embodiment of the present invention.

【図５】第２の実施の形態による検索式拡張処理のフロ
ーチャート。FIG. 5 is a flowchart of a search expression expansion process according to the second embodiment.

[Explanation of symbols]

１入力部２検索式拡張処理部３検索処理部４出力部５文書データベース２１検索式作成処理部２２類似度演算部２３スコア演算部２４拡張対象単語選択部２５拡張検索式作成部２６検索対象文書群２２０相関係数演算部２３０スコア演算部 DESCRIPTION OF SYMBOLS 1 Input part 2 Search formula expansion processing part 3 Search processing part 4 Output part 5 Document database 21 Search formula creation processing part 22 Similarity calculation part 23 Score calculation part 24 Expansion target word selection part 25 Extended search formula creation part 26 Search target document Group 220 correlation coefficient operation unit 230 score operation unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者井ノ上直己埼玉県上福岡市大原２丁目１番15号株式会社ケイディディ研究所内 (72)発明者橋本和夫埼玉県上福岡市大原２丁目１番15号株式会社ケイディディ研究所内Ｆターム(参考） 5B075 ND03 NK02 NK31 PP02 PP03 PP12 PP23 PP26 PQ02 PR04 PR06 QM08 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Naoki Inoue 2-1-115 Ohara, Kamifukuoka City, Saitama Prefecture Inside Kaididi Research Institute Co., Ltd. (72) Kazuo Hashimoto 2-1-1 Ohara, Kamifukuoka City, Saitama Prefecture F-term (reference) in Kaididi Research Institute, Ltd. 5B075 ND03 NK02 NK31 PP02 PP03 PP12 PP23 PP26 PQ02 PR04 PR06 QM08

Claims

[Claims]

A step of receiving an input of a search formula; a step of calculating a similarity between the input search formula and all search target documents of an existing search target document group; a document having a high calculated similarity Extracting from the group of documents to be searched, using the score of a word included in the extracted document to calculate the score of each word for the expanded search formula, and calculating the score of the word And then selecting an expansion target word and creating an extended search expression.

2. An input means for inputting a search formula, a storage means for storing a search target document group, and a similarity between the input search formula and all search target documents of the search target document group. A similarity calculating means for calculating; a document extracting means for extracting a document having a high calculated similarity from the search target document group; and a score after expansion using a word score included in the extracted document. Score calculation means for calculating the score of each word for the search formula, based on the score of the word calculated by the score calculation means,
A search system comprising: an expanded search expression creating unit that selects an expanded target word to create an expanded search expression; and a re-search unit that searches the document group to be searched again based on the expanded search expression.

3. A process for receiving an input of a search formula, a process for calculating a similarity between the input search formula and all search documents of an existing search target document group, and a process for calculating a document having a high calculated similarity From the group of documents to be searched, and using the score of a word included in the extracted document to calculate the score of each word for the expanded search formula; and calculating the score of the word And performing a process of selecting an expansion target word and creating an expanded search expression.

4. A step of receiving an input of a search formula, a step of calculating a correlation coefficient between the input search formula and all search target documents of an existing search target document group, and a step of calculating the correlation coefficient Extracting a high document from the group of documents to be searched; using a score of a word included in the extracted document to calculate a score of each word for the expanded search formula; and a score of the word. , Calculating an expansion target word, and creating an extended search expression.

5. An input means for inputting a search formula, a storage means for storing a search target document group, and a correlation coefficient between the input search formula and all search target documents in the search target document group. Correlation coefficient calculation means for calculating, a document extraction means for extracting a document having a high calculated correlation coefficient from the search target document group, and using a score of a word included in the extracted document, Score calculation means for calculating the score of each word for the expanded search formula, based on the score of the word calculated by the score calculation means,
A search system comprising: an expanded search expression creating unit that selects an expanded target word to create an expanded search expression; and a re-search unit that searches the document group to be searched again based on the expanded search expression.

6. A process for receiving an input of a search formula, a process for calculating a correlation coefficient between the input search formula and all search documents in an existing search target document group, and a process for calculating the correlation coefficient A process of extracting a high document from the group of documents to be searched; a process of calculating a score of each word for the expanded search formula using a score of a word included in the extracted document; and a score of the word. And calculating a word to be expanded and creating an expanded search expression.