JP2002123546A

JP2002123546A - Apparatus and method for document retrieval and recording medium

Info

Publication number: JP2002123546A
Application number: JP2000317006A
Authority: JP
Inventors: Koji Maekawa; 浩司前川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-10-17
Filing date: 2000-10-17
Publication date: 2002-04-26

Abstract

PROBLEM TO BE SOLVED: To perform relatively fast document retrieval. SOLUTION: This document retrieval apparatus which retrieves a document related to a given character string from plural documents is equipped with a means storing 1st word feature information which is generated for each document and shows information related to independent words included in the documents, an extracting means extracting independent words included in the character string, a means generating 2nd word feature information which shows information related to the independent words extracted by the extracting means, a means retrieving a document including all the independent words extracted by the extracting means, and a means sequencing retrieved documents according to the 1st word feature information on the independent word corresponding to the independent word extracted by the extracting means and the 2nd word feature information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書の検索技術に
関する。[0001] 1. Field of the Invention [0002] The present invention relates to a document search technique.

【０００２】[0002]

【従来の技術】文書データベースで管理される文書の検
索方法として、種々の方法が提案されている。図９は、
従来の文書検索方法の一例を示したフローチャートであ
る。2. Description of the Related Art Various methods have been proposed for searching documents managed in a document database. FIG.
9 is a flowchart illustrating an example of a conventional document search method.

【０００３】ステップＳ２０１では、検索条件を設定
し、ステップＳ２０２では検索条件を解析し、その単語
間の関係を取得する。例えば、ステップＳ２０１で検索
条件として「豊富な経験」という文字列が設定された場
合、ステップＳ２０２では、「豊富な」と「経験」との
単語を抽出し、これらの単語間の関係を取得する。In step S201, a search condition is set, and in step S202, the search condition is analyzed to obtain a relationship between the words. For example, when the character string "rich experience" is set as the search condition in step S201, the words "rich" and "experience" are extracted in step S202, and the relationship between these words is acquired. .

【０００４】ステップＳ２０３では、検索対象となる文
書をステップＳ２０２で行ったクエリと同様に解析処理
を行う。図１０のように、管理されている全ての文書１
〜ｎが検索対象文書とされ、各文書に対して順次文書解
析と、検索条件とのマッチングが行われる。In step S203, a document to be searched is analyzed in the same manner as in the query performed in step S202. As shown in FIG. 10, all managed documents 1
To n are search target documents, and document analysis and matching with search conditions are sequentially performed on each document.

【０００５】文書解析の一例として、例えば、図１１に
示す文書Ａ〜文書Ｄが検索対象文書であった場合を想定
すると、文書Ａにおける「Ａ社は豊富な経験を持つ企業
と提携する。」および、「この提携により、情報関連分
野進出の足がかりとする。」という文は、図１２のよう
に解析される。As an example of document analysis, assuming that, for example, documents A to D shown in FIG. 11 are documents to be searched, the document A states that "Company A is affiliated with a company having abundant experience." And, the sentence "This partnership is a stepping stone to advance into the information-related field" is analyzed as shown in FIG.

【０００６】ステップＳ２０４では、検索対象の文書と
検索条件として設定された文字列との関係を比較して順
序付けを行う。ここでは、順序付けの結果は、文書Ａ
＞文書Ｂ＞文書Ｃ＞文書Ｄとなり、検索結果とし
て、検索条件に対して関連性の高い文書から順に提示す
ることが可能である。In step S204, the ordering is performed by comparing the relationship between the document to be searched and the character string set as the search condition. Here, the ordering result is document A
> Document B> Document C> Document D, and as a search result, it is possible to present documents in order from the document having the highest relevance to the search condition.

【０００７】また、従来の文書検索方法の他の方法とし
て、図１３のフローチャートに示すような方法がある。As another method of the conventional document search method, there is a method as shown in a flowchart of FIG.

【０００８】ステップＳ３０１では検索条件を設定す
る。ここでは、「豊富な経験」という文字列が設定され
たと想定する。ステップＳ３０２では、検索条件を解析
する。ここでは、「豊富な」と「経験」という単語の関
係と、全文検索のキーワードとして「豊富」と「経験」
とを抽出する。In step S301, search conditions are set. Here, it is assumed that the character string “rich experience” is set. In step S302, the search condition is analyzed. Here, the relationship between the words "rich" and "experience" and the keywords "rich" and "experience"
And extract

【０００９】ステップＳ３０３では、ステップＳ３０２
で抽出したキーワードに基づいて全文検索処理を実行
し、「豊富」と「経験」とを全て含む文書を検索する。In step S303, step S302
A full-text search process is executed based on the keywords extracted in step (1), and documents containing both "abundance" and "experience" are searched.

【００１０】ステップＳ３０４では全文検索結果を取得
する。ステップＳ３０５では検索結果となった文書の解
析を行い、文の構造などを解析する。ステップＳ３０６
では検索条件の構造と文の構造とを比較して、検索した
文書に優先順位を付加する。In step S304, a full-text search result is obtained. In step S305, the search result document is analyzed, and the structure of the sentence is analyzed. Step S306
Then, the search condition structure and the sentence structure are compared, and a priority is added to the searched documents.

【００１１】ステップＳ３０７では、ステップＳ３０５
からステップＳ３０６までの処理を全文検索の結果取得
した全ての文書に対して実行する。In step S307, step S305
Are executed for all the documents obtained as a result of the full-text search.

【００１２】この結果、例えば、図９に例示した文書Ａ
〜文書Ｄの順位は、文書Ａ＞文書Ｂ＞文書Ｃとな
り、文書Ｄは検索結果から除外される。その後、検索結
果として、上記順位順に各文書が提示等される。As a result, for example, the document A illustrated in FIG.
The order of documents D is document A> document B> document C, and document D is excluded from the search results. Thereafter, each document is presented in the order of the above as a search result.

【００１３】このように、従来の検索技術の中には、単
なる全文検索と比較して、検索条件をより反映した検索
結果を得ることもできる。As described above, some of the conventional search techniques can obtain a search result reflecting search conditions more than a simple full-text search.

【００１４】[0014]

【発明が解決しようとする課題】しかしながら、一般的
に文構造の解析処理は処理が遅いために、文書のような
多くのテキストでは、処理時間が非常に長くなり、実用
することが難しかった。したがって、多くの文書につい
て文構造解析を行う文書検索システムに文構造解析を実
装することは更に困難であった。However, in general, the processing for analyzing a sentence structure is slow, so that the processing time for many texts such as a document becomes extremely long, and it has been difficult to put it to practical use. Therefore, it is more difficult to implement sentence structure analysis in a document search system that performs sentence structure analysis on many documents.

【００１５】従って、本発明の目的は、比較的高速な文
書検索を行い得る文書検索装置、文書検索方法、及び、
記録媒体を提供することにある。Therefore, an object of the present invention is to provide a document search apparatus, a document search method, and a document search method capable of performing a relatively high-speed document search.
It is to provide a recording medium.

【００１６】[0016]

【課題を解決するための手段】本発明によれば、複数の
文書の中から、与えられた文字列に関連する文書を検索
する文書検索装置であって、前記文書毎に予め作成され
た、前記文書に含まれる自立語に関する情報を示す第１
の単語特徴情報を格納する手段と、前記文字列に含まれ
る自立語を抽出する抽出手段と、前記抽出手段により抽
出された前記自立語に関する情報を示す第２の単語特徴
情報を作成する手段と、前記文書の中から、前記抽出手
段により抽出された前記自立語を全て含む文書を検索す
る手段と、検索された文書に含まれる自立語のうち、前
記抽出手段により抽出された前記自立語に対応する自立
語の前記第１の単語特徴情報と、前記第２の単語特徴情
報と、に基づいて、検索された文書の順位付けを行う手
段と、を備えたことを特徴とする文書検索装置が提供さ
れる。According to the present invention, there is provided a document retrieval apparatus for retrieving a document related to a given character string from a plurality of documents. A first indicating information on independent words contained in the document;
Means for storing word feature information, extraction means for extracting an independent word included in the character string, and means for creating second word feature information indicating information about the independent word extracted by the extraction means. Means for searching for a document including all of the independent words extracted by the extracting means from among the documents, and for the independent words extracted by the extracting means, among the independent words included in the searched document, A document search device, comprising: means for ranking the searched documents based on the first word feature information of the corresponding independent word and the second word feature information. Is provided.

【００１７】また、本発明によれば、複数の文書の中か
ら、与えられた文字列に関連する文書を検索する文書検
索方法であって、前記文書毎に、前記文書に含まれる自
立語に関する情報を示す第１の単語特徴情報を予め作成
し、記録媒体に格納する工程と、前記文字列に含まれる
自立語を抽出する抽出工程と、前記抽出工程において抽
出された前記自立語に関する情報を示す第２の単語特徴
情報を作成する工程と、前記文書の中から、前記抽出工
程において抽出された前記自立語を全て含む文書を検索
する工程と、検索された文書に含まれる自立語のうち、
前記抽出手段により抽出された前記自立語に対応する自
立語の前記第１の単語特徴情報を前記記録媒体から読み
出して該第１の単語特徴情報と、前記第２の単語特徴情
報と、に基づいて、検索された文書の順位付けを行う工
程と、を含むことを特徴とする文書検索方法が提供され
る。Further, according to the present invention, there is provided a document search method for searching a document related to a given character string from a plurality of documents, wherein each of the documents relates to an independent word included in the document. Creating first word feature information indicating information in advance, storing the first word feature information in a recording medium, extracting an independent word included in the character string, and extracting information about the independent word extracted in the extracting step. Creating the second word feature information to be shown, searching the document for a document that includes all of the independent words extracted in the extracting step, and selecting the independent words included in the searched document. ,
The first word feature information of the independent word corresponding to the independent word extracted by the extraction means is read from the recording medium, and based on the first word feature information and the second word feature information. And ranking the retrieved documents.

【００１８】また、本発明によれば、複数の文書の中か
ら、与えられた文字列に関連する文書を検索するため
に、コンピュータを、前記文書毎に予め作成された、前
記文書に含まれる自立語に関する情報を示す第１の単語
特徴情報を格納する手段、前記文字列に含まれる自立語
を抽出する抽出手段、前記抽出手段により抽出された前
記自立語に関する情報を示す第２の単語特徴情報を作成
する手段、前記文書の中から、前記抽出手段により抽出
された前記自立語を全て含む文書を検索する手段、検索
された文書に含まれる自立語のうち、前記抽出手段によ
り抽出された前記自立語に対応する自立語の前記第１の
単語特徴情報と、前記第２の単語特徴情報と、に基づい
て、検索された文書の順位付けを行う手段、として機能
させるプログラムを記録した記録媒体が提供される。Further, according to the present invention, in order to search for a document related to a given character string from a plurality of documents, a computer is included in the document created in advance for each document. Means for storing first word feature information indicating information related to an independent word; extracting means for extracting independent words included in the character string; second word feature indicating information related to the independent word extracted by the extracting means Means for creating information, means for searching for a document that includes all of the independent words extracted by the extracting means from among the documents, and among the independent words included in the searched documents, extracted by the extracting means A program functioning as means for ranking the searched documents based on the first word feature information of the independent word corresponding to the independent word and the second word feature information. Recording the recording medium is provided.

【００１９】[0019]

【発明の実施の形態】以下、本発明の好適な実施の形態
について図面を参照して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below with reference to the drawings.

【００２０】図１は、本発明の一実施形態に係る文書検
索装置の構成を表すブロック図である。該装置は、入力
装置１と、ＣＰＵ２と、出力装置３と、記憶装置４と、
を備える汎用のコンピュータ上で実現される。入力装置
１は、例えばキーボード等からなり、検索条件を入力等
するためのものである。ＣＰＵ２は、記憶装置４に格納
された処理プログラム４１に従って、後述する文書検索
処理を実行する。出力装置３は、ディスプレイ等であ
り、検索結果等を表示する。記憶装置４は、例えば、ハ
ードディスク、ＲＡＭ、若しくは、これらの組合せ等で
あり、ＣＰＵ２が実行する処理プログラム４１の他、単
語特徴データ４２や、検索対象となる文書等を格納す
る。FIG. 1 is a block diagram showing a configuration of a document search apparatus according to one embodiment of the present invention. The device includes an input device 1, a CPU 2, an output device 3, a storage device 4,
It is realized on a general-purpose computer provided with. The input device 1 includes, for example, a keyboard, and is used to input search conditions. The CPU 2 executes a later-described document search process according to the processing program 41 stored in the storage device 4. The output device 3 is a display or the like, and displays search results and the like. The storage device 4 is, for example, a hard disk, a RAM, or a combination thereof, and stores, in addition to the processing program 41 executed by the CPU 2, the word feature data 42, a document to be searched, and the like.

【００２１】なお、本発明の文書検索装置は、図１に示
すような単体のコンピュータ以外にも、図２に示すロー
カルネットワーク環境や、図３に示すインターネット環
境上においても実現可能であり、この場合、個々のクラ
イアントコンピュータにおいて検索条件の設定、検索結
果の表示等を行い、サーバにおいて、検索処理が行われ
ることとなる。The document search apparatus of the present invention can be realized not only on a single computer as shown in FIG. 1, but also on a local network environment shown in FIG. 2 or an Internet environment shown in FIG. In this case, each client computer sets search conditions, displays search results, and the like, and the server performs search processing.

【００２２】次に、文書検索処理について説明する。図
４は、文書検索処理を示すフローチャートである。Next, the document search processing will be described. FIG. 4 is a flowchart showing the document search process.

【００２３】ステップＳ１０１では、入力装置１から検
索条件の入力を受け付ける。検索条件の入力は、複数の
キーワード（論理式を含む）、若しくは、自然文等とい
った文字列を入力することにより行う。ステップＳ１０
２では、検索条件から自立語を抽出して、全文検索を実
行するための論理式を作り出す。In step S101, an input of a search condition is received from the input device 1. The search conditions are input by inputting a plurality of keywords (including logical expressions) or character strings such as natural sentences. Step S10
In step 2, an independent word is extracted from the search condition to create a logical expression for executing a full-text search.

【００２４】例えば、検索条件として入力された文字列
が、「豊富な経験」であった場合、「豊富」と「経験」
という自立語が抽出され、「豊富」∩「経験」が全文検
索を実行する条件式とされる。For example, if the character string input as a search condition is “rich experience”, “rich” and “experience”
The word "rich" / "experience" is a conditional expression for executing a full-text search.

【００２５】ステップＳ１０３では、ステップＳ１０２
で作成された条件式によって全文検索処理を実行し、文
書中に「豊富」と「経験」との双方を含む全ての文書が
検索される。全文検索の方式は問われない。本実施形態
では、全文検索の対象として、図５に示す５つの文書Ａ
乃至Ｅを想定する。すると、文書Ｄは、「豊富」と「経
験」とを含まない文書であるので、検索結果から除外さ
れることとなる。また、文書Ａ乃至Ｃ及び文書Ｅは、
「豊富」と「経験」とを含むので、検索結果に含まれ
る。In step S103, step S102
A full-text search process is executed using the conditional expression created in step (1), and all documents including both "rich" and "experience" in the document are searched. The full-text search method is not limited. In the present embodiment, five documents A shown in FIG.
To E are assumed. Then, since the document D does not include “rich” and “experience”, it is excluded from the search results. Documents A to C and Document E are:
Since it includes "rich" and "experience", it is included in the search results.

【００２６】ステップＳ１０４乃至ステップＳ１０６
は、検索された文書の順位付けを行うためのループであ
り、検索された各文書について、ステップＳ１０５乃至
ステップＳ１０６の処理が実行される。Steps S104 to S106
Is a loop for ranking the searched documents, and the processes of steps S105 to S106 are executed for each searched document.

【００２７】ここで、本実施形態の文書検索装置では、
検索対象となる文書毎に、当該文書に含まれる自立語に
関する単語特徴データを記憶装置４に格納している。こ
の単語特徴データは、文書の登録時等、検索前に予め作
成し、記憶装置４に格納されたものである。Here, in the document search device of this embodiment,
For each document to be searched, word characteristic data relating to an independent word included in the document is stored in the storage device 4. This word feature data is created in advance before searching, such as when a document is registered, and stored in the storage device 4.

【００２８】単語特徴データは、まず、図６に示すよう
に、各文書に含まれる自立語に関する情報をテーブル形
式で有している。各自立語は、文単位で区分けされてい
る（文番号）。更に、各自立語には、その自立語の文中
における位置情報、その自立語の活用情報、及び、その
自立語に付されている助詞等の付属語情報がテーブル形
式で付されている。図７は、文書Ａに含まれる一部の自
立語の情報を示した図であり、図８は文書Ｂに含まれる
一部の自立語の情報を示した図である。The word feature data has, as shown in FIG. 6, information on independent words included in each document in a table format. Each independent word is classified by sentence (sentence number). Further, each independent word is attached in a table format with positional information in the sentence of the independent word, information on the use of the independent word, and information on auxiliary words such as particles attached to the independent word. FIG. 7 is a diagram illustrating information of some independent words included in the document A, and FIG. 8 is a diagram illustrating information of some independent words included in the document B.

【００２９】ステップＳ１０５では、検索条件として与
えられた文字列について、上述した単語特徴データを作
成すると共に、検索された文書中に含まれる自立語のう
ち、検索条件に含まれる自立語に対応する自立語につい
ての単語特徴データを記憶装置４から取得する。In step S105, the above-mentioned word feature data is created for the character string given as the search condition, and the word corresponding to the independent word included in the search condition among the independent words included in the searched document. The word feature data on the independent word is acquired from the storage device 4.

【００３０】ステップＳ１０６では、検索条件中の自立
語と検索された文書中の自立語との一致度を計算して、
検索された各文書の順位を付する。In step S106, the degree of coincidence between the independent word in the search condition and the independent word in the searched document is calculated.
The ranking of each searched document is assigned.

【００３１】本実施形態では、以下のような関係で順位
付けをする。In the present embodiment, ranking is performed in the following relationship.

【００３２】構成単語の特徴が一致＞同一文に出現
＞同一文書に出現このため、ステップＳ１０６での単
語特徴一致度計算処理は、単語特徴データの取得は、ま
ず、同一文内に単語が出現する文のみについて行うこと
となる。Matching features of constituent words> Appear in same sentence
> Appearing in the same document For this reason, in the word feature matching degree calculation processing in step S106, the acquisition of the word feature data is first performed only for a sentence in which a word appears in the same sentence.

【００３３】従って、文書Ａ〜文書Ｅのうち、文書Ｄは
上述した通り全文検索時に除外され、文書Ｅは自立語の
特徴の観点から最も低い順位が付される。Therefore, of the documents A to E, the document D is excluded during the full-text search as described above, and the document E is given the lowest ranking from the viewpoint of the feature of the independent word.

【００３４】一致度は、以下の観点から計算することが
できる。・距離スコア単語特徴データの位置情報に基づいて、検索条件の自立
語と文書内の自立語との間の距離差を文の大きさで正規
化する。最大で１とする。The degree of coincidence can be calculated from the following viewpoints. -Distance score Based on the position information of the word feature data, the distance difference between the independent word in the search condition and the independent word in the document is normalized by the size of the sentence. The maximum is 1.

【００３５】距離スコア（ＳＬ）＝１−（単語間の距離
／文の大きさ）・活用形スコア単語特徴データの活用情報に基づいて、検索条件の自立
語と文書内の自立語との活用形が一致しているかどうか
を計算する。一致した場合、スコアは１とする。活用形
が異なる場合、活用形の変更に伴うコストがかかる。・付属語単語特徴データの付属語情報に基づいて、付属語が一致
しているかどうかを計算する。付属語の属性と付属語の
修飾タイプが存在し、これが一致した場合はスコアは１
になる。異なる場合は変更に伴うコストがかかる。Distance score (SL) = 1- (distance between words / sentence size) Inflection type score Based on the utilization information of the word feature data, the independent words in the search condition and the independent words in the document are used. Calculate whether the shapes match. If they match, the score is 1. If the usage type is different, there is a cost associated with changing the usage type. -Attached words Based on the attached word information of the word feature data, calculate whether the attached words match. The attribute of the adjunct and the modifier type of the adjunct exist, and if they match, the score is 1
become. If different, there is a cost associated with the change.

【００３６】ここで、検索条件「豊富な経験」は、係り側単語：豊富、位置：１、活用形：連体、付属語
情報：なし受け側単語：経験、位置：３、活用形：語幹、付属語
情報：なしと解析される。Here, the search condition "rich experience" is as follows: Involved word: rich, position: 1, inflected form: continuous, adjunct information: none Recipient word: experience, position: 3, inflected form: stem, Attached word information: Analyzed as None.

【００３７】文書Ａの単語特徴データ（図７）と比較す
ると、すべての特徴において一致しているので、一致度
は１となる。When compared with the word feature data of the document A (FIG. 7), the match is 1 for all features.

【００３８】文書Ｂの単語特徴データ（図８）と比較す
ると、検索条件の係り側の活用は同じだが、受け側が異
なるために一致度は文書Ａの時よりも小さくなる。以上
のように、全文検索結果に対して一致度を計算して、各
文書の順位付けを行う。本例の場合、文書の順位は、Ａ
＞Ｂ＞Ｃ＞Ｅとなる。なお、上記例では、単語の位置の
みが優先順位の決定に関与した。Ｓ１０７では、Ｓ１０
６で付けた優先順位の高い順に検索結果を出力する。以
上により検索処理が終了する。When compared with the word feature data of document B (FIG. 8), the use of the search condition is the same, but the matching degree is smaller than that of document A because the receiving side is different. As described above, the degree of coincidence is calculated for the full-text search results, and each document is ranked. In this example, the order of the documents is A
>B>C> E. In the above example, only the position of the word was involved in the determination of the priority. In S107, S10
The search results are output in descending order of the priority assigned in step 6. Thus, the search processing ends.

【００３９】このように、検索対象とされる文書に含ま
れる単語の特徴を表す特徴データをあらかじめ登録して
おくことにより、文の切り出し処理や文構造解析を行う
ことなく高速な検索が可能となる。As described above, by preliminarily registering feature data representing the features of words contained in a document to be searched, a high-speed search can be performed without performing sentence extraction processing or sentence structure analysis. Become.

【００４０】また、単語に着目した特徴データと、検索
条件として与えられた条件文の中から取得した特徴デー
タを単純に比較するだけで単語と単語間にある関係の特
徴から一致度を計算しているために、ユーザーの検索意
図に沿った精度の高い検索結果を高速で出力することが
可能となる。更に、全文検索をベースとしているため
に、構造解析を行う場合と比較して、漏れの少ない検索
結果を得ることができる。Further, by simply comparing the feature data focused on the word and the feature data obtained from the conditional sentence given as the search condition, the degree of coincidence is calculated from the feature of the relationship between the words. Therefore, it is possible to output a search result with high accuracy according to the user's search intention at high speed. Further, since the search is based on the full-text search, search results with less omission can be obtained as compared with the case where the structural analysis is performed.

【００４１】次に付属語の言い換えに関する例を説明す
る。Next, an example relating to paraphrasing of an accessory word will be described.

【００４２】検索条件が「山に関する本」という文字列
である場合を想定すると、解析により、自立語「山」に
付属する付属語「に関して（関連・連体）」が抽出され
る。一方、文書中に存在する下記の文に対して、単語
「山」に付属する付属語部分の特徴データとしては以下
のようなものが設定される。１）山についての本について（関連・連体）２）山に関して書いた本に関して（関連・連用）３）山にまつわる本にまつわる（関連・連体）４）山に関した本に（場所・連用）５）山の本の（--・連体）６）山へ本をへ（場所・連体）上記（１）、（３）の文に関しては、表現の方法は違う
が付属語の属性および、修飾型が同一であり、ほとんど
同義であると判断することができる。（２）の文では、
属性はあっているが、修飾の主体が間違えている可能性
があり、一致度は減少する。つぎに、（４）の文は付属
語の属性が違い、一致度は更に減少する。（５）の文で
は、付属語の属性は異なるが、修飾の対象は同一である
ことがわかる。（６）では、修飾の対象は同一である
が、付属語の属性が異なっている。Assuming that the search condition is a character string of "book about mountains", the analysis extracts an associated word "(related / union)" attached to the independent word "yama". On the other hand, for the following sentence in the document, the following is set as the feature data of the attached word part attached to the word “mountain”. 1) Books about mountains (related / continuous) 2) Books written about mountains (related / continuous) 3) Books related to mountains (related / continuous) 4) Books related to mountains (location / continuous) 5) Yamano-no book (-, continuous) 6) Yama-he book to (place, continuous) For the above sentences (1) and (3), the method of expression is different, but the attributes of the attached words and the modifier types are different. It can be determined that they are the same and almost synonymous. In the sentence (2),
Although there is an attribute, the subject of the decoration may be wrong, and the matching degree decreases. Next, the sentence of (4) differs in the attribute of the attached word, and the degree of coincidence further decreases. In the sentence (5), it can be seen that the attributes of the attached words are different, but the objects of the modification are the same. In (6), the objects to be modified are the same, but the attributes of the attached words are different.

【００４３】以上により、（１）、（３）＞（２）
＞（４）、（５）＞（６）という順位付けができる
ようになる。As described above, (1), (3)> (2)
> (4), (5)> (6).

【００４４】なお、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体（または記録媒体）を、システムあるい
は装置に供給し、そのシステムあるいは装置のコンピュ
ータ（またはCPUやMPU）が記憶媒体に格納されたプログ
ラムコードを読み出し実行することによっても、達成さ
れることは言うまでもない。この場合、記憶媒体から読
み出されたプログラムコード自体が前述した実施形態の
機能を実現することになり、そのプログラムコードを記
憶した記憶媒体は本発明を構成することになる。また、
コンピュータが読み出したプログラムコードを実行する
ことにより、前述した実施形態の機能が実現されるだけ
でなく、そのプログラムコードの指示に基づき、コンピ
ュータ上で稼働しているオペレーティングシステム(OS)
などが実際の処理の一部または全部を行い、その処理に
よって前述した実施形態の機能が実現される場合も含ま
れることは言うまでもない。An object of the present invention is to supply a storage medium (or a recording medium) on which a program code of software for realizing the functions of the above-described embodiments is recorded to a system or an apparatus, and to provide a computer (or a computer) of the system or the apparatus. It is needless to say that the present invention can also be achieved by a CPU or an MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention. Also,
By executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instructions of the program code.
It goes without saying that a case where the functions of the above-described embodiments are implemented by performing some or all of the actual processing, and the processing performs the functions of the above-described embodiments.

【００４５】さらに、記憶媒体から読み出されたプログ
ラムコードが、コンピュータに挿入された機能拡張カー
ドやコンピュータに接続された機能拡張ユニットに備わ
るメモリに書込まれた後、そのプログラムコードの指示
に基づき、その機能拡張カードや機能拡張ユニットに備
わるCPUなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into the memory provided in the function expansion card inserted into the computer or the function expansion unit connected to the computer, the program code is read based on the instruction of the program code. Needless to say, the CPU included in the function expansion card or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【発明の効果】以上、説明したように本発明によれば、
比較的高速な文書検索を行うことができる。As described above, according to the present invention,
A relatively high-speed document search can be performed.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る文書検索装置の構成
を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a document search device according to an embodiment of the present invention.

【図２】本発明の文書検索装置が実現される他の構成例
を示した図である。FIG. 2 is a diagram showing another configuration example in which the document search device of the present invention is realized.

【図３】本発明の文書検索装置が実現される他の構成例
を示した図である。FIG. 3 is a diagram showing another configuration example in which the document search device of the present invention is realized.

【図４】本発明の一実施形態に係る文書検索処理を示す
フローチャートである。FIG. 4 is a flowchart illustrating a document search process according to an embodiment of the present invention.

【図５】検索対象となる文書の一例を示した図である。FIG. 5 is a diagram illustrating an example of a document to be searched;

【図６】単語特徴データを示す図である。FIG. 6 is a diagram showing word feature data.

【図７】単語特徴データを示す図である。FIG. 7 is a diagram showing word feature data.

【図８】単語特徴データを示す図である。FIG. 8 is a diagram showing word feature data.

【図９】従来の文書検索方法の一例を示したフローチャ
ートである。FIG. 9 is a flowchart illustrating an example of a conventional document search method.

【図１０】従来の文書検索方法における、検索対象とな
る各文書に対する処理を示した図である。FIG. 10 is a diagram showing a process for each document to be searched in the conventional document search method.

【図１１】検索対象の一例を示した図である。FIG. 11 is a diagram illustrating an example of a search target.

【図１２】図１１の文書Ａの解析結果を示す図である。FIG. 12 is a diagram showing an analysis result of document A in FIG. 11;

【図１３】従来の文書検索方法の他の例を示したフロー
チャートである。FIG. 13 is a flowchart showing another example of a conventional document search method.

Claims

[Claims]

1. A document retrieval apparatus for retrieving a document related to a given character string from a plurality of documents, comprising: information on an independent word included in the document, which is created in advance for each document. Means for storing the first word feature information shown, extraction means for extracting independent words included in the character string, and creation of second word feature information indicating information about the independent words extracted by the extraction means Means for searching for a document including all of the independent words extracted by the extracting means from the documents; and extracting the independent words included in the searched words from the independent words included in the searched document by the extracting means. Means for ranking the retrieved documents based on the first word feature information of the independent word corresponding to the independent word and the second word feature information. Document search device.

2. The document search device according to claim 1, wherein both the first word feature information and the second word feature information are syntactic information of the independent word.

3. The first word feature information includes information indicating the position of the independent word in the document, and the second word feature information includes information indicating the position of the independent word in the search character string. 2. The document search device according to claim 1, wherein information indicating a position in the document is included.

4. The document search apparatus according to claim 1, wherein the first word feature information and the second word feature information each include information on an adjunct to the independent word.

5. The document search device according to claim 1, wherein the first word feature information and the second word feature information each include information on the use of the independent word.

6. A document retrieval method for retrieving a document related to a given character string from a plurality of documents, comprising: for each of the documents, a first information indicating information on an independent word included in the document. A step of creating word feature information in advance and storing the word feature information in a recording medium; an extraction step of extracting an independent word included in the character string; and a second word feature indicating information on the independent word extracted in the extraction step Creating information; and searching the document for a document that includes all of the independent words extracted in the extraction step; extracting the independent words included in the searched document by the extraction unit Reading out the first word feature information of the independent word corresponding to the independent word from the recording medium,
A step of ranking the searched documents based on the word feature information and the second word feature information.

7. A computer for retrieving a document related to a given character string from a plurality of documents, the computer showing information on an independent word included in the document, which is created in advance for each document. Means for storing first word feature information; extracting means for extracting independent words included in the character string; means for creating second word feature information indicating information about the independent words extracted by the extracting means; Means for searching a document including all of the independent words extracted by the extracting means from the document, corresponding to the independent words extracted by the extracting means among the independent words included in the searched document A recording medium storing a program functioning as means for ranking search documents based on the first word feature information of the independent word and the second word feature information.