JP2006235928A

JP2006235928A - Document search method, document search apparatus, and storage medium with document search program recorded thereon

Info

Publication number: JP2006235928A
Application number: JP2005048848A
Authority: JP
Inventors: Masateru Yotsuya; 雅輝四ッ谷; Tadataka Matsubayashi; 忠孝松林; Jugo Noda; 十悟野田; Giyu Iijima; 岐勇飯島; Yuichi Ogawa; 祐一小川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-02-24
Filing date: 2005-02-24
Publication date: 2006-09-07
Anticipated expiration: 2025-02-24
Also published as: JP4634821B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document search method that evaluates the validity of tracing a document group and links to other documents. <P>SOLUTION: At document registration, a registered document analyzing part 121 analyzes keywords included in documents, and a document information acquiring part 122 registers the documents to be searched in a registered document management table 140. At document search, a hit document acquiring part 131 searches for documents including a keyword given at the search, a document group deciding part 132 decides target document groups including the hit documents, a document group matching degree calculating part 133 calculates document matching degrees from information about the document groups and hit documents, a document accessibility calculating part 134 calculates document accessibility from the list of hit documents, and a search result outputting part 135 outputs the documents in order of decreasing document group matching degree and document accessibility. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、コンピュータを用いた文書検索方法、文書検索装置および文書検索プログラムを記録した記憶媒体に関する。 The present invention relates to a document search method using a computer, a document search apparatus, and a storage medium storing a document search program.

近年、パーソナルコンピュータやインターネット技術の普及に伴い、電子化文書が爆発的に増加しつつある。このような状況下において、蓄積された膨大な電子化文書の中から、必要とする情報を含んだ文書を効率的に検索したいという要望が高まってきている。 In recent years, with the spread of personal computers and Internet technology, digitized documents are increasing explosively. Under such circumstances, there is an increasing demand for efficiently searching for documents including necessary information from among a large amount of stored electronic documents.

このような要望に応えるための基本的な技術として、全文検索技術がある。全文検索技術の一例としては、特許文献１で開示されている技術がある。この技術では、文書の登録時に文書中の全ての連続するn文字からなる文字列（以下、n-gramと呼ぶ）をインデクスとして格納しておき、検索時に指定された文字列（以下、検索タームと呼ぶ）を構成するn-gramを参照して、検索タームを含む文書を検索する。そして、この技術によれば、予め作成したインデクスを利用することによって、検索者が指定した検索タームを含む文書を漏れなく検索することができる。 There is a full-text search technique as a basic technique for meeting such a demand. As an example of the full-text search technique, there is a technique disclosed in Patent Document 1. In this technology, a character string consisting of all consecutive n characters in a document (hereinafter referred to as an n-gram) is stored as an index when the document is registered, and a character string (hereinafter referred to as a search term) specified at the time of retrieval is stored. The document including the search term is searched with reference to the n-grams that make up the search term. According to this technique, it is possible to search for a document including a search term designated by a searcher without omission by using an index created in advance.

しかし、大量に表示された検索結果から、所望の情報が記載された文書を取得することは、検索者にとって多くの時間が必要となる。このため、所望する情報を短時間に取得したいという要望が高まってきている。 However, it takes a lot of time for a searcher to acquire a document in which desired information is described from search results displayed in large quantities. For this reason, there is an increasing demand for acquiring desired information in a short time.

このような要望に応える技術として、検索条件に対する適合度の高い文書を検索結果の上位に表示するランキング技術がある。ランキング技術の一例として、特許文献２および非特許文献１で開示されている技術がある。この技術は、インターネット上に存在する文書において、有用な文書には多くの文書からリンクが張られている、という仮定に基づき文書の有用性を算出し、その降順に検索条件を満たす文書（以下、ヒット文書と呼ぶ）を表示する。この技術によれば、検索者は、有用性が高いと評価されたヒット文書を検索結果から容易に取得することができるため、検索結果から所望する情報が記載された文書を取得するための時間を短縮することができる。
特開平８−１９４７１８号公報（請求項１など）米国特許第６，７９９，１７６号明細書 Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine.”, Proc. of the 7th International World Wide Web Conference,（米国）1988． As a technique for meeting such a demand, there is a ranking technique for displaying a document having a high degree of conformity to a search condition at the top of a search result. As an example of the ranking technique, there are techniques disclosed in Patent Document 2 and Non-Patent Document 1. This technology calculates usefulness of documents based on the assumption that useful documents are linked from many documents among documents existing on the Internet. , Called hit document). According to this technique, the searcher can easily obtain a hit document evaluated as highly useful from the search result. Therefore, a time for acquiring a document in which desired information is described from the search result. Can be shortened.
JP-A-8-194718 (Claim 1 etc.) US Pat. No. 6,799,176 Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine.”, Proc. Of the 7th International World Wide Web Conference, (USA) 1988.

しかし、従来技術では、検索者の所望する情報が複数の話題から構成されており、それぞれの話題が異なる文書に記載されている場合には、次のような問題点がある。すなわち、従来技術では、対象としている文書に張られたリンクの数に基づき、有効性を評価しているが、リンクを利用して別の文書へ辿ることに関しての有効性は評価されなかった。この結果、検索者の所望する情報に含まれる複数の話題のうち、一部の話題に関する情報しか得ることができない文書であっても高い評価を与えてしまい、より広範な情報を含む文書をうまく選択できない場合があるという問題点がある。 However, in the related art, when the information desired by the searcher is composed of a plurality of topics, and each topic is described in a different document, there is the following problem. That is, in the prior art, the effectiveness is evaluated based on the number of links attached to the target document, but the effectiveness of tracing to another document using the link has not been evaluated. As a result, even a document that can obtain only information on some topics among a plurality of topics included in the information desired by the searcher is given high evaluation, and a document including a wider range of information is successfully obtained. There is a problem that it may not be possible to select.

例えば、自動車Ａに搭載可能なカーステレオやカーナビなどの装備に関して記載された文書（以下、「オプション装備」の文書と呼ぶ）や、自動車Ａの燃費や最大出力などのエンジン性能に関して記載された文書（以下、「エンジン性能」の文書と呼ぶ）と、これらの文書に対するリンクを持った「製品トップ」の文書から成るサイトにおいて、前記「エンジン性能」の文書に対して多くの文書からリンクが張られている場合、検索条件「自動車Ａ」による検索結果には、前記「エンジン性能」の文書が上位に表示されることになる。 For example, documents describing equipment such as car stereos and car navigation systems that can be mounted on the car A (hereinafter referred to as “optional equipment” documents), and documents describing the engine performance such as the fuel consumption and maximum output of the car A (Hereinafter referred to as “engine performance” documents), in a site consisting of “product top” documents having links to these documents, links from many documents are linked to the “engine performance” documents. If it is, the document of “engine performance” is displayed at the top in the search result by the search condition “car A”.

これでは、自動車Ａに対する調査を目的に、検索条件として「自動車Ａ」と入力した検索者にとって、この「エンジン性能」の文書からは、自動車Ａのエンジン性能に関する情報しか取得することができないため、所望する情報の一部を取得するに留まる結果となる。すなわち、検索者にとって有用な情報を持つ文書へのリンクを持ち、しかも、その文書自体も求めている情報の概要を持っている文書が、必ずしも重要な文書と判断されるとは限らない。 In this case, for a searcher who inputs “car A” as a search condition for the purpose of investigating the car A, only information related to the engine performance of the car A can be acquired from the document “engine performance”. As a result, only a part of the desired information is acquired. That is, a document that has a link to a document having useful information for a searcher and also has an outline of information that the document itself is seeking is not necessarily determined as an important document.

このような問題点に対処すべく、本発明では、検索者の所望する情報が複数の話題から構成されており、それぞれの話題が異なる文書に記載されている場合であっても、これらの話題に関して幅広く情報を提示できる文書検索装置を提供することを課題とする。 In order to deal with such problems, in the present invention, even if the information desired by the searcher is composed of a plurality of topics and each topic is described in different documents, these topics It is an object of the present invention to provide a document search apparatus capable of presenting a wide range of information regarding the above.

前記の課題を解決するために、本発明では、文書検索装置が、登録する文書を解析する手段と、文書をデータベースに登録する手段と、文書を解析して登録する手段と、与えられたキーワードを含むヒット文書を検索する手段と、有用な文書を含む文書群を決定する手段と、文書群に含まれるヒット文書の数である文書群適合度を算出する手段と、ヒット文書から出ている有用な文書へのリンクの数である文書アクセス容易度を算出する手段と、検索結果を出力する手段とを備え、文書登録時には、前記登録する文書を解析する手段が前記文書に含まれるキーワードを解析し、前記文書をデータベースに登録する手段が検索対象となる文書を登録し、文書検索時には、前記ヒット文書を検索する手段が検索時に与えられたキーワードを含む文書を検索し、文書群を決定する手段が対象となるヒット文書を含む文書群を決定し、文書群適合度を算出する手段が前記文書群とヒット文書の情報を元に文書適合度を算出し、文書アクセス容易度を算出する手段がヒット文書のリストを元に文書アクセス容易度を算出し、検索結果を出力する手段が文書群適合度と文書アクセス容易度の高い文書を重要な順に出力する。 In order to solve the above-mentioned problems, in the present invention, the document search apparatus has a means for analyzing a document to be registered, a means for registering a document in a database, a means for analyzing and registering a document, and a given keyword. A means for searching for hit documents including a document, a means for determining a document group including a useful document, a means for calculating a document group fitness that is the number of hit documents included in the document group, and a hit document. A means for calculating a document accessibility that is the number of links to a useful document; and a means for outputting a search result. At the time of document registration, the means for analyzing the document to be registered selects a keyword included in the document. The means for analyzing and registering the document in the database registers the document to be searched, and when searching for a document, the means for searching for the hit document includes a sentence including the keyword given at the time of the search. The means for determining the document group determines the document group including the hit document to be targeted, and the means for calculating the document group suitability calculates the document suitability based on the information of the document group and the hit document. The means for calculating the document accessibility calculates the document accessibility based on the hit document list, and the means for outputting the search results outputs the documents having the highest document group compatibility and document accessibility in the order of importance. .

このような構成によれば、その文書自身が有用な情報を持ち、かつ、有用な情報を持つ文書へのリンクを持つ文書を効率よく検索することができる。 According to such a configuration, it is possible to efficiently search for a document having useful information and having a link to a document having useful information.

本発明によれば、単一の文書としての情報提示だけではなく、文書群およびリンク先の文書を用いて、高精度な検索結果を提示することができ、検索者の負担を軽減することができる。 According to the present invention, not only information as a single document but also a document group and a linked document can be used to present a highly accurate search result, and the burden on the searcher can be reduced. it can.

以下、図を参照して、本発明の実施形態について説明する。
なお、本発明においては、後記する文書群の決定方法に応じて複数の実施形態がある。そのうち、ディレクトリを利用して文書群を決定する実施形態を第１の実施形態とし、リンクを利用して文書群を決定する実施形態を第２の実施形態として、それぞれについて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
In the present invention, there are a plurality of embodiments according to a document group determination method described later. Of these, an embodiment for determining a document group using a directory will be described as a first embodiment, and an embodiment for determining a document group using a link will be described as a second embodiment.

<<第１の実施形態>>
第１の実施形態は、本発明の基本的な実施形態である。ここでは、まず、第１の実施形態の文書検索装置の構成を説明し、その後、各部についての処理内容を説明する。 << first embodiment >>
The first embodiment is a basic embodiment of the present invention. Here, the configuration of the document search apparatus according to the first embodiment will be described first, and then the processing contents for each unit will be described.

〔文書検索装置の構成〕
図１は第１の実施形態の装置構成を説明する図である。図１に示されているように、文書検索装置１０は、ディスプレイ１００、キーボード１０１、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央演算処理装置）１０２、磁気ディスク装置１０３、主メモリ１０４、これらを結ぶバス１０５および他の機器と本システムを接続するネットワーク１０６から構成される。 [Configuration of document retrieval device]
FIG. 1 is a diagram for explaining the apparatus configuration of the first embodiment. As shown in FIG. 1, the document search device 10 includes a display 100, a keyboard 101, a CPU (Central Processing Unit) 102, a magnetic disk device 103, a main memory 104, a bus 105 connecting them, and The network 106 is configured to connect other devices to the system.

なお、磁気ディスク装置１０３は、二次記憶装置の一例であり、他の二次記憶装置を用いてもよい。この磁気ディスク装置１０３には、登録文書管理テーブル１４０が格納される。 The magnetic disk device 103 is an example of a secondary storage device, and other secondary storage devices may be used. The magnetic disk device 103 stores a registered document management table 140.

主メモリ１０４は、例えば半導体メモリなどで構成される記憶装置である。主メモリ１０４には、システム制御処理部１１０の機能を実現させるためのプログラムが格納されると共に、ワークエリア１５０が確保される。そして、システム制御処理部１１０は、登録制御処理部１２０および検索制御処理部１３０を含んで構成される。 The main memory 104 is a storage device composed of, for example, a semiconductor memory. The main memory 104 stores a program for realizing the functions of the system control processing unit 110 and secures a work area 150. The system control processing unit 110 includes a registration control processing unit 120 and a search control processing unit 130.

このうち、登録制御処理部１２０は、登録される文書の構文解析を行う登録文書解析処理部１２１および文書に含まれるキーワードなどの情報を取得する文書情報取得処理部１２２を含み、これらの制御を行う。 Among these, the registration control processing unit 120 includes a registered document analysis processing unit 121 that performs syntax analysis of a document to be registered, and a document information acquisition processing unit 122 that acquires information such as a keyword included in the document. Do.

検索制御処理部１３０は、検索時に与えられたキーワードを含むヒット文書を取得するヒット文書取得処理部１３１、検索処理において後記する適合度および文書アクセス容易度を算出する基礎となる文書群を決定する文書群決定処理部１３２、決定された文書群から文書群適合度を算出する文書群適合度算出処理部１３３、決定された文書群から文書アクセス容易度を算出する文書アクセス容易度算出処理部１３４および文書群適合度や文書アクセス容易度が高い文書を表示する検索結果出力処理部１３５で構成され、これらの制御を行う。
なお、これらは、それぞれ機能を実現するためのプログラムを主メモリ１０４に読み込むことで実現される。 The search control processing unit 130 determines a hit document acquisition processing unit 131 that acquires a hit document including a keyword given at the time of search, and a document group that is a basis for calculating the fitness and the document accessibility described later in the search processing. Document group determination processing unit 132, document group compatibility calculation processing unit 133 that calculates document group compatibility from the determined document group, and document access ease calculation processing unit 134 that calculates document accessibility from the determined document group The search result output processing unit 135 displays a document with a high degree of document group compatibility and a high degree of document access, and performs these controls.
Note that these are realized by reading a program for realizing each function into the main memory 104.

本実施形態においては、登録制御処理部１２０および検索制御処理部１３０は、検索システムの利用者によるキーボード１０１からの入力に応じてシステム制御処理部１１０によって起動される。 In the present embodiment, the registration control processing unit 120 and the search control processing unit 130 are activated by the system control processing unit 110 in response to an input from the keyboard 101 by a user of the search system.

なお、本実施形態では、キーボード１０１から入力されたコマンドにより登録制御処理部１２０および検索制御処理部１３０が起動されるものとしたが、他の入力装置を介して入力されたコマンドあるいはイベントにより起動されるものであってもよい。 In this embodiment, the registration control processing unit 120 and the search control processing unit 130 are activated by a command input from the keyboard 101. However, the registration control processing unit 120 and the search control processing unit 130 are activated by a command or event input via another input device. It may be done.

また、これらの処理を実行させるためのプログラムは主メモリ１０４に格納されるものとしたが、磁気ディスク装置１０３、フロッピディスク（登録商標）、ＭＯ（Magneto-Optical disk)、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ（Digital Versatile Disk）等の記憶媒体（図１には図示せず）に格納され、駆動装置を介して主メモリ１０４に読み込まれ、ＣＰＵ１０２によって実行されるものであってもよい。あるいは、これらの処理を実行させるためのプログラムは、ネットワーク１０６を介して主メモリ１０４に読み込まれて、ＣＰＵ１０２によって実行する構成としてもよい。 A program for executing these processes is stored in the main memory 104, but a magnetic disk device 103, a floppy disk (registered trademark), an MO (Magneto-Optical disk), a CD-ROM (Compact Disk). Even if it is stored in a storage medium (not shown in FIG. 1) such as Read Only Memory (DVD) or DVD (Digital Versatile Disk), read into the main memory 104 via the drive device, and executed by the CPU 102 Good. Alternatively, a program for executing these processes may be read into the main memory 104 via the network 106 and executed by the CPU 102.

さらに、本実施形態では登録文書管理テーブル１４０は磁気ディスク装置１０３に格納されるものとしたが、フロッピディスク（登録商標）、ＭＯ、ＣＤ−ＲＯＭ、ＤＶＤ等の記憶媒体（図１には図示せず）に格納され、駆動装置を介して主メモリ１０４に読み込まれ、利用することも可能であるし、あるいは、ネットワーク１０６を介して、他の計算機システムに接続された記憶媒体（図１には図示せず）に格納されていて、ここから読み込まれるものであってもよい。 Furthermore, in this embodiment, the registered document management table 140 is stored in the magnetic disk device 103, but a storage medium (not shown in FIG. 1) such as a floppy disk (registered trademark), MO, CD-ROM, or DVD. Can be stored in the main memory 104 via the driving device and used, or a storage medium (see FIG. 1) connected to another computer system via the network 106. (Not shown) and may be read from here.

[文書検索装置の処理]
以下、本実施形態における文書検索装置１０の処理手順を説明する。文書検索装置１０における処理は、システム制御処理部１１０によって実行される。 [Processing of document search device]
Hereinafter, a processing procedure of the document search apparatus 10 in the present embodiment will be described. Processing in the document search apparatus 10 is executed by the system control processing unit 110.

[システム制御処理部の処理]
図２は、システム制御処理部の処理手順を説明する図である。まず、図２のＰＡＤ（Problem Analysis Diagram）図を用いて、システム制御処理部１１０の処理手順について説明する（適宜図１参照）。 [Processing of system control processing section]
FIG. 2 is a diagram illustrating a processing procedure of the system control processing unit. First, the processing procedure of the system control processing unit 110 will be described using a PAD (Problem Analysis Diagram) diagram of FIG. 2 (see FIG. 1 as appropriate).

システム制御処理部１１０は、まずキーボード１０１から入力されたコマンドを解析し、コマンドの種類を判定する（Ｓ２００）。この判定結果が、登録処理のコマンドであると解析された場合には（Ｓ２００の「登録処理」）、登録制御処理部１２０を起動して、登録実行のコマンドで指定された文書の登録処理を実行する（Ｓ２１０）。また、検索処理のコマンドであると解析された場合には（Ｓ２００の「検索処理」）、検索制御処理部１３０を起動して、検索条件に適合する文書の検索処理を実行し（Ｓ２２０）、処理を終了する。
以上が、システム制御処理部１１０の処理手順である。 The system control processing unit 110 first analyzes a command input from the keyboard 101 and determines the type of command (S200). If the determination result is analyzed as a registration processing command (“registration processing” in S200), the registration control processing unit 120 is activated to perform registration processing of the document specified by the registration execution command. Execute (S210). If it is analyzed that the command is a search process command (“search process” in S200), the search control processing unit 130 is activated to execute a search process for documents that meet the search conditions (S220). The process ends.
The processing procedure of the system control processing unit 110 has been described above.

[登録制御処理部の処理]
図３は、登録制御処理部の処理手順を説明する図である。
ここでは、図２に示したステップＳ２１０でシステム制御処理部１１０により起動される登録制御処理部１２０の処理手順について、図３のＰＡＤ図を用いて説明する。 [Processing of registration control processing section]
FIG. 3 is a diagram illustrating a processing procedure of the registration control processing unit.
Here, the processing procedure of the registration control processing unit 120 activated by the system control processing unit 110 in step S210 shown in FIG. 2 will be described with reference to the PAD diagram of FIG.

まず、登録制御処理部１２０は、登録文書解析処理部１２１を起動し、登録対象として指定された文書（以下、登録対象文書と呼ぶ）を解析して、テキストとリンク先の文書（以下、リンク先文書と呼ぶ）のＵＲＬ（Uniform Resource Locator）（あるいは識別情報）を取得し、登録対象文書のＵＲＬ（識別情報）と共にワークエリア１５０に格納する（Ｓ３００）。 First, the registration control processing unit 120 activates the registered document analysis processing unit 121, analyzes a document designated as a registration target (hereinafter referred to as a registration target document), and text and a link destination document (hereinafter referred to as a link). The URL (Uniform Resource Locator) (or identification information) of the previous document is acquired and stored in the work area 150 together with the URL (identification information) of the registration target document (S300).

次に、文書情報取得処理部１２２を起動し、登録対象文書に文書ＩＤを付与し、ワークエリアに格納された該文書のＵＲＬ、テキストおよびリンク先文書のＵＲＬを該文書に付与された文書ＩＤと共に磁気ディスク装置１０３上の登録文書管理テーブル１４０に格納する（Ｓ３０１）。以上が、登録制御処理部１２０の処理手順である。 Next, the document information acquisition processing unit 122 is started, a document ID is assigned to the registration target document, and the document ID assigned to the document is stored with the URL of the document, the text stored in the work area, and the URL of the link destination document. At the same time, it is stored in the registered document management table 140 on the magnetic disk device 103 (S301). The processing procedure of the registration control processing unit 120 has been described above.

ここで、図３に示した文書検索システムにおける文書の登録処理の手順について図４を用いて具体的に説明する。
図４は、ＨＴＭＬ（Hyper Text Markup Language）形式で記述された文書Ｄ００１〜文書Ｄ００６が登録される場合の処理の流れの具体例を説明する図である。 Here, the procedure of document registration processing in the document search system shown in FIG. 3 will be specifically described with reference to FIG.
FIG. 4 is a diagram for explaining a specific example of a processing flow when documents D001 to D006 described in HTML (Hyper Text Markup Language) format are registered.

図４に示した例では、文書Ｄ００１〜文書Ｄ００３はディレクトリＡに含まれており、文書Ｄ００４〜文書Ｄ００６はディレクトリＢに含まれることを示している。文書Ｄ００１は、ＵＲＬが「/A/a1.htm」で示されるファイルに、「中古車販売のトップ」というテキストが記載されており、「/A/a2.htm」および「/A/a3.htm」がリンク先文書のＵＲＬとして記載されていることを表している。 In the example illustrated in FIG. 4, the documents D001 to D003 are included in the directory A, and the documents D004 to D006 are included in the directory B. In the document D001, the text “/A/a2.htm” and “/A/a3.htm” is written in the file whose URL is indicated by “/A/a1.htm”. “htm” indicates that it is described as the URL of the linked document.

また、文書Ｄ００２は、ＵＲＬが「/A/a2.htm」で示されるファイルに、「自動車Ａ」がテキストとして記載されており、リンク先文書のＵＲＬは存在しないことを表している。また、文書Ｄ００３は、ＵＲＬが「/A/a3.htm」で示されるファイルに、「自動車Ｂ」がテキストとして記載されており、リンク先文書のＵＲＬは存在しないことを表している。以下、ディレクトリＢに含まれる文書Ｄ００４〜文書Ｄ００６についても同様である。 Document D002 indicates that “car A” is described as text in a file whose URL is indicated by “/A/a2.htm”, and the URL of the link destination document does not exist. Document D003 indicates that “car B” is described as text in a file whose URL is indicated by “/A/a3.htm”, and the URL of the link destination document does not exist. The same applies to the documents D004 to D006 included in the directory B.

まず、ディレクトリＡおよびディレクトリＢに含まれる文書Ｄ００１〜文書Ｄ００６に対して、登録文書解析処理部１２１により各文書のテキスト、ＵＲＬおよびリンク先文書のＵＲＬが図４中のワークエリア１５０上に格納される（図３のＳ３００に該当）。 First, for the documents D001 to D006 included in the directory A and the directory B, the registered document analysis processing unit 121 stores the text and URL of each document and the URL of the linked document on the work area 150 in FIG. (Corresponding to S300 in FIG. 3).

図４に示した例では、文書Ｄ００１は、ＵＲＬが「/A/a1.htm」、テキストが「中古車販売のトップ」、リンク先の文書のＵＲＬが「/A/a2.htm」および「/A/a3.htm」が解析結果５０１ａとして、ワークエリア１５０上に格納されたことを示している。以下、文書Ｄ００２〜文書Ｄ００６についても同様である。 In the example shown in FIG. 4, the document D001 has the URL “/A/a1.htm”, the text “used car sales top”, and the linked document URLs “/A/a2.htm” and “ “/A/a3.htm” is stored in the work area 150 as the analysis result 501a. The same applies to the documents D002 to D006.

次に、ワークエリア１５０上に格納された文書Ｄ００１〜文書Ｄ００６の解析結果５０１ａ〜５１３ａに対して、文書情報取得処理部１２２により登録対象として指定された文書に文書ＩＤが付与され、該文書のＵＲＬ、テキストおよびリンク先文書のＵＲＬが該文書の文書ＩＤと共に登録文書管理テーブル１４０に格納される（図３のＳ３０１に該当）。 Next, for the analysis results 501a to 513a of the documents D001 to D006 stored in the work area 150, a document ID is assigned to the document designated as the registration target by the document information acquisition processing unit 122, and the document The URL, text, and URL of the link destination document are stored in the registered document management table 140 together with the document ID of the document (corresponding to S301 in FIG. 3).

図４に示した例では、ワークエリア１５０上に格納された文書Ｄ００１の解析結果５０１ａに含まれていた、文書ＩＤ「Ｄ００１」、ＵＲＬ「/A/a1.htm」、テキスト「中古車販売のトップ」およびリンク先文書のＵＲＬ「/A/a2.htm」「/A/a3.htm」が、登録文書管理テーブル１４０における１件目の登録文書５０１ｂに登録されるデータとして格納されたことを示している。なお、リンク先文書が存在しない場合は、登録文書管理テーブル１４０上のリンク先文書のＵＲＬが格納される領域に「-」が格納される。なお、文書Ｄ００２〜文書Ｄ００６についても同様の処理を行う。以上が、本実施形態に示した文書検索システムにおける文書の登録処理の具体的な流れである。 In the example shown in FIG. 4, the document ID “D001”, the URL “/A/a1.htm”, and the text “used car sales” included in the analysis result 501a of the document D001 stored in the work area 150. “Top” and URLs “/A/a2.htm” and “/A/a3.htm” of linked documents are stored as data to be registered in the first registered document 501b in the registered document management table 140. Show. If there is no link destination document, “-” is stored in the area where the URL of the link destination document is stored on the registered document management table 140. The same processing is performed for the documents D002 to D006. The above is the specific flow of the document registration process in the document search system shown in the present embodiment.

なお、図３に示したステップＳ３００において、登録文書解析処理部１２１により、登録対象文書を解析する際に、その登録対象文書に関連付けられた文書の識別情報として、リンク先文書のＵＲＬを用いるものとしたが、登録対象文書がメールなど添付機能を有した文書の場合は、その登録対象文書に関連付けられた文書の識別情報として、添付ファイル名を利用するものとしてもよい。また、以上挙げたような、登録対象文書とその登録対象文書に関連付けられた文書の識別情報は、組み合わせて用いてもよい。これにより、多種類の文書を登録の対象とすることができるため、検索者は多種類の文書を検索することができる。 When the registered document analysis processing unit 121 analyzes the registration target document in step S300 shown in FIG. 3, the URL of the link destination document is used as the document identification information associated with the registration target document. However, when the registration target document is a document having an attachment function such as an e-mail, the attached file name may be used as the identification information of the document associated with the registration target document. Further, the identification information of the registration target document and the document associated with the registration target document as described above may be used in combination. Thereby, since many types of documents can be registered, the searcher can search for many types of documents.

[検索制御処理部の処理]
図５は、検索制御処理部の処理を説明する図である。ここでは、図２に示したステップＳ２２０でシステム制御処理部１１０により起動される検索制御処理部１３０の処理手順について、図５のＰＡＤ図を用いて説明する（適宜図１参照）。 [Processing of search control processing section]
FIG. 5 is a diagram for explaining the processing of the search control processing unit. Here, the processing procedure of the search control processing unit 130 activated by the system control processing unit 110 in step S220 shown in FIG. 2 will be described with reference to the PAD diagram of FIG. 5 (see FIG. 1 as appropriate).

まず、ヒット文書取得処理部１３１を起動し、登録文書管理テーブル１４０から指定された検索条件に適合する文書（以下、ヒット文書と呼ぶ）を取得する（Ｓ４００）。 First, the hit document acquisition processing unit 131 is activated to acquire a document (hereinafter referred to as a hit document) that meets the specified search condition from the registered document management table 140 (S400).

次に、文書群決定処理部１３２を起動し、登録文書管理テーブル１４０に格納された全ての文書に対して、同じディレクトリの下にある文書の集合を文書群と決定する（Ｓ４０１）。なお、ここでは、前記文書群決定処理部１３２による処理は、検索時に行うものとしたが、登録処理時に予め行うものとしてもよい。この場合は、検索処理を短時間で行うことができるようになる。 Next, the document group determination processing unit 132 is activated to determine a set of documents under the same directory as a document group for all the documents stored in the registered document management table 140 (S401). Here, the processing by the document group determination processing unit 132 is performed at the time of retrieval, but may be performed in advance at the time of registration processing. In this case, the search process can be performed in a short time.

その後、文書群適合度算出処理部１３３を起動し、ステップＳ４０１で切り出された文書群に含まれるヒット文書の数を計数し、該文書群の文書群適合度（Ｍ１）として算出する（Ｓ４０２）。なお、前記文書群決定処理部１３２による処理は、文書群に含まれるヒット文書を計数する際に、文書群に含まれる文書のＵＲＬとヒット文書のＵＲＬのマッチングにより一致判定を行い、計数するものとしたが、各々の文書ＩＤで一致判定を行ってもよい。この場合は、一致判定の処理を高速に行うことができるため、検索処理を短時間で行うことができる。なお、ここでは、ヒット文書の数を文書群適合度（Ｍ１）としているが、文書群に含まれる文書の数に対するヒット文書数の割合を文書群適合度としてもよい。 Thereafter, the document group suitability calculation processing unit 133 is activated, and the number of hit documents included in the document group cut out in step S401 is counted and calculated as the document group suitability (M1) of the document group (S402). . The process performed by the document group determination processing unit 132 counts the hit document included in the document group by matching the URL of the document included in the document group and the URL of the hit document. However, a match determination may be performed using each document ID. In this case, since the matching determination process can be performed at high speed, the search process can be performed in a short time. Here, the number of hit documents is the document group fitness (M1), but the ratio of the number of hit documents to the number of documents included in the document group may be the document group fitness.

次に、文書アクセス容易度算出処理部１３４を起動し、対象とする文書群に含まれる全てのヒット文書について、リンク先文書の数を計数し、文書アクセス容易度Ｍ２として算出する（Ｓ４０３）。 Next, the document accessibility calculation processing unit 134 is activated, the number of linked documents is counted for all hit documents included in the target document group, and the document accessibility M2 is calculated (S403).

そして、検索結果出力処理部１３５を起動し、ステップＳ４０２で取得した文書群適合度（Ｍ１）を第１ソートキー、ステップＳ４０３で取得した文書アクセス容易度（Ｍ２）を第２ソートキーに設定して、ヒット文書をそれぞれ降順に並び換えて表示する（Ｓ４０４）。以上が、検索制御処理部１３０の処理手順である。 Then, the search result output processing unit 135 is activated, the document group suitability (M1) acquired in step S402 is set as the first sort key, and the document accessibility (M2) acquired in step S403 is set as the second sort key. The hit documents are sorted and displayed in descending order (S404). The processing procedure of the search control processing unit 130 has been described above.

次に、図５に示した文書検索システムにおける文書の検索処理の手順について図６を用いて具体的に説明する（適宜図１参照）。図６は、図５に示した文書Ｄ００１〜文書Ｄ００６が登録された文書検索システムに対し、検索条件６００「自動車Ａ」が指定され、検索される場合の処理の流れの具体例を説明する図である。 Next, the procedure of document search processing in the document search system shown in FIG. 5 will be specifically described with reference to FIG. 6 (see FIG. 1 as appropriate). FIG. 6 is a diagram for explaining a specific example of a processing flow when the search condition 600 “car A” is designated and searched for the document search system in which the documents D001 to D006 shown in FIG. 5 are registered. It is.

まず、ヒット文書取得処理部１３１により登録文書管理テーブル１４０から指定された検索条件６００「自動車Ａ」に適合する文書の文書ＩＤ、ＵＲＬおよびリンク先文書のＵＲＬがヒット文書リスト６０１として取得される（図５のＳ４００に該当）。 First, the hit document acquisition processing unit 131 acquires, as the hit document list 601, the document IDs and URLs of the documents that match the search condition 600 “Car A” specified from the registered document management table 140 (URL) of the linked document ( Corresponding to S400 in FIG. 5).

図６に示した例では、ヒット文書リスト６０１における１件目のヒット文書のデータとしては、文書ＩＤ「Ｄ００２」、ＵＲＬ「/A/a2.htm」、リンク先文書のＵＲＬ「-」の文書であることを示している。以下同様にして、ヒット文書リスト６０１では、全部で４件の文書がヒットしたことを示している。 In the example shown in FIG. 6, the data of the first hit document in the hit document list 601 includes the document ID “D002”, the URL “/A/a2.htm”, and the link destination document URL “−”. It is shown that. Similarly, the hit document list 601 indicates that a total of four documents have been hit.

次に、文書群決定処理部１３２により、登録文書管理テーブル１４０に格納された全ての文書に対して、同じディレクトリの下にある文書の集合（以下、文書群と呼ぶ）が切り出され、文書群リスト６０２として取得される（図５のＳ４０１に該当）。 Next, a set of documents (hereinafter referred to as a document group) under the same directory is cut out for all the documents stored in the registered document management table 140 by the document group determination processing unit 132, and the document group Obtained as a list 602 (corresponding to S401 in FIG. 5).

図６に示した例では、文書群リスト６０２における１件目の文書群のデータとしては、文書群「/A/」に含まれる文書のＵＲＬは「/A/a1.htm」、「/A/a2.htm」および「/A/a3.htm」であることを示している。同様にして、文書群リスト６０２における２件目の文書群のデータとしては、文書群「/B/」に含まれる文書のＵＲＬは「/B/b1.htm」、「/B/b2.htm」および「/B/b3.htm」であることを示している。 In the example shown in FIG. 6, as the data of the first document group in the document group list 602, the URLs of the documents included in the document group “/ A /” are “/A/a1.htm” and “/ A”. "/a2.htm" and "/A/a3.htm". Similarly, as the data of the second document group in the document group list 602, the URLs of the documents included in the document group “/ B /” are “/B/b1.htm” and “/B/b2.htm”. "And" /B/b3.htm ".

その後、文書群適合度算出処理部１３３により、ヒット文書リスト６０１と文書群リスト６０２が読み込まれ、各文書群に含まれるヒット文書の数が計数された値が文書群適合度（Ｍ１）として算出される。そして、算出されたすべての文書群に対する文書群適合度（Ｍ１）が文書群適合度算出結果６０３として取得される（図５のＳ４０２に該当）。 Thereafter, the document group suitability calculation processing unit 133 reads the hit document list 601 and the document group list 602, and calculates the value obtained by counting the number of hit documents included in each document group as the document group suitability (M1). Is done. Then, the document group fitness (M1) for all the calculated document groups is acquired as the document group fitness calculation result 603 (corresponding to S402 in FIG. 5).

図６に示した例では、文書群適合度算出結果６０３における１件目のデータを取得するために、まず、文書群リスト６０２が参照されることにより「/A/a1.htm」、「/A/a2.htm」および「/A/a3.htm」が文書群「/A/」に含まれる文書のＵＲＬとして取得される。次に、ヒット文書リスト６０１を参照することにより前記３文書のうちヒット文書は「/A/a2.htm」であることが判定され、ヒット文書の数は「１」と計数される。文書群適合度算出結果６０３における１件目のデータは、前記計数されたヒット文書の数「１」が文書群「/A/」の文書群適合度（Ｍ１）として取得されたことを示している。 In the example shown in FIG. 6, in order to acquire the first data in the document group compatibility calculation result 603, first, the document group list 602 is referred to, so that “/A/a1.htm”, “/ “A / a2.htm” and “/A/a3.htm” are acquired as URLs of documents included in the document group “/ A /”. Next, it is determined by referring to the hit document list 601 that the hit document is “/A/a2.htm” among the three documents, and the number of hit documents is counted as “1”. The first data in the document group suitability calculation result 603 indicates that the counted number “1” of hit documents is acquired as the document group suitability (M1) of the document group “/ A /”. Yes.

同様にして、文書群適合度算出結果６０３における２件目のデータを取得するために、まず、文書群リスト６０２が参照されることにより「/B/b1.htm」、「/B/b2.htm」および「/B/b3.htm」が文書群「/B/」に含まれる文書のＵＲＬとして取得される。次にヒット文書リスト６０１を参照することにより前記３文書のうちヒット文書は「/B/b1.htm」、「/B/b2.htm」および「/B/b3.htm」であることが判定され、ヒット文書の数は「３」と計数される。文書群適合度算出結果６０３における２件目のデータは、前記計数されたヒット文書の数「３」が文書群「/B/」の文書群適合度（Ｍ１）として取得されたことを示している。 Similarly, in order to acquire the second data in the document group suitability calculation result 603, first, the document group list 602 is referred to to obtain “/B/b1.htm” and “/ B / b2. “htm” and “/B/b3.htm” are acquired as URLs of documents included in the document group “/ B /”. Next, by referring to the hit document list 601, it is determined that the hit documents among the three documents are “/B/b1.htm”, “/B/b2.htm”, and “/B/b3.htm”. The number of hit documents is counted as “3”. The second data in the document group suitability calculation result 603 indicates that the counted hit document number “3” is acquired as the document group suitability (M1) of the document group “/ B /”. Yes.

次に、文書アクセス容易度算出処理部１３４により、ヒット文書リスト６０１が読み込まれ、リンク先文書のＵＲＬを取得して、リンク先文書のＵＲＬの種類数が計数された値が文書アクセス容易度（Ｍ２）として算出される。そして、すべてのヒット文書について算出された結果が文書アクセス容易度算出結果６０４として取得される（図４のＳ４０３に該当）。 Next, the hit document list 601 is read by the document accessibility calculation processing unit 134, the URL of the link destination document is acquired, and the value obtained by counting the number of types of URLs of the link destination document is the document accessibility ( M2) is calculated. Then, the results calculated for all hit documents are acquired as the document accessibility calculation result 604 (corresponding to S403 in FIG. 4).

図６に示した例では、文書アクセス容易度算出結果６０４における１件目のデータを取得するために、ヒット文書リスト６０１を参照することにより「-」が文書ＩＤ「Ｄ００２」のリンク先文書のＵＲＬとして取得され、リンク先文書のＵＲＬの種類数として「０」が計数される。文書アクセス容易度算出結果６０４における１件目のデータは、前記計数されたリンク先文書のＵＲＬの種類数「０」が文書ＩＤ「Ｄ００２」の文書アクセス容易度（Ｍ２）として取得されたことを示している。 In the example shown in FIG. 6, in order to obtain the first data in the document accessibility calculation result 604, by referring to the hit document list 601, “-” is the link destination document with the document ID “D002”. It is acquired as a URL, and “0” is counted as the number of types of URLs of linked documents. The first data in the document accessibility calculation result 604 indicates that the number of URL types “0” of the counted link destination document is acquired as the document accessibility (M2) of the document ID “D002”. Show.

同様にして、文書アクセス容易度算出結果６０４における２件目のデータを取得するために、ヒット文書リスト６０１を参照することにより「-」が文書ＩＤ「Ｄ００４」のリンク先文書のＵＲＬとして取得され、リンク先文書のＵＲＬの種類数として「０」が計数される。文書アクセス容易度算出結果６０４における２件目のデータは、前記計数されたリンク先文書のＵＲＬの種類数「０」が文書ＩＤ「Ｄ００４」の文書アクセス容易度（Ｍ２）として取得されたことを示している。 Similarly, “-” is acquired as the URL of the linked document with the document ID “D004” by referring to the hit document list 601 in order to acquire the second data in the document accessibility calculation result 604. “0” is counted as the number of types of URLs of the link destination document. The second data in the document accessibility calculation result 604 indicates that the number of URL types “0” of the linked document is acquired as the document accessibility (M2) of the document ID “D004”. Show.

また、同様にして、文書アクセス容易度算出結果６０４における３件目のデータを取得するために、ヒット文書リスト６０１を参照することにより「/B/b1.htm」および「/B/b3.htm」が文書ＩＤ「Ｄ００５」のリンク先文書のＵＲＬとして取得され、リンク先文書のＵＲＬの種類数として「２」が計数される。文書アクセス容易度算出結果６０４における３件目のデータは、前記計数されたリンク先文書のＵＲＬの種類数「２」が文書ＩＤ「Ｄ００５」の文書アクセス容易度（Ｍ２）として取得されたことを示している。 Similarly, in order to obtain the third data in the document accessibility calculation result 604, by referring to the hit document list 601, "/B/b1.htm" and "/B/b3.htm" Is obtained as the URL of the link destination document with the document ID “D005”, and “2” is counted as the number of types of URLs of the link destination document. The third data in the document accessability calculation result 604 indicates that the counted number of types “2” of the URLs of the linked documents is acquired as the document accessability (M2) of the document ID “D005”. Show.

さらに、同様にして、文書アクセス容易度算出結果６０４における４件目のデータを取得するために、ヒット文書リスト６０１を参照することにより「/B/b2.htm」が文書ＩＤ「Ｄ００６」のリンク先文書のＵＲＬとして取得され、リンク先文書のＵＲＬの種類数として「１」が計数される。文書アクセス容易度算出結果６０４における４件目のデータは、前記計数されたリンク先文書のＵＲＬの種類数「１」が文書ＩＤ「Ｄ００６」の文書アクセス容易度（Ｍ２）として取得されたことを示している。 Similarly, in order to obtain the fourth data in the document accessibility calculation result 604, the hit document list 601 is referred to and “/B/b2.htm” is linked to the document ID “D006”. It is acquired as the URL of the destination document, and “1” is counted as the number of types of URLs of the link destination document. The fourth data in the document accessibility calculation result 604 indicates that the counted number of types “1” of the URL of the linked document is acquired as the document accessibility (M2) of the document ID “D006”. Show.

次に、検索結果出力処理部１３５により、文書群適合度算出結果６０３と文書アクセス容易度算出結果６０４が読み込まれ、文書群適合度（Ｍ１）を第１ソートキー、文書アクセス容易度（Ｍ２）を第２ソートキーに設定して、それぞれ降順に並び換え、検索結果６０５が表示される（図４のＳ４０４に該当）。 Next, the search result output processing unit 135 reads the document group suitability calculation result 603 and the document access ease calculation result 604, and sets the document group suitability (M1) as the first sort key and the document access ease (M2). The second sort key is set and the search results 605 are displayed in the descending order (corresponding to S404 in FIG. 4).

図６に示した例では、検索結果６０５における１件目のデータを取得するために、文書群適合度算出結果６０３を参照することにより、文書群適合度（Ｍ１）の最も高い値が「３」であることが判定され、その文書群適合度（Ｍ１）に対応する文書群「/B/」が取得される。次に、文書アクセス容易度算出結果６０４を参照することにより、文書群「/B/」において文書アクセス容易度（Ｍ２）が最も高い値が「２」であることが判定され、その文書アクセス容易度（Ｍ２）に対応する文書ＩＤ「Ｄ００５」が取得される。検索結果６０５における１件目のデータは、並び換えられた後の順位「１」として、文書ＩＤ「Ｄ００５」が取得されたことを示している。以下同様にして、検索結果６０５はすべてのヒット文書４件が並び替えられた結果を示している。以上が、本発明の第１の実施形態に示した文書検索装置１０における文書の検索処理の具体的な流れである。 In the example illustrated in FIG. 6, the highest value of the document group fitness (M1) is “3” by referring to the document group fitness calculation result 603 in order to acquire the first data in the search result 605. And the document group “/ B /” corresponding to the document group matching level (M1) is acquired. Next, by referring to the document access degree calculation result 604, it is determined that the highest document access degree (M2) is “2” in the document group “/ B /”, and the document access is easy. The document ID “D005” corresponding to the degree (M2) is acquired. The first data in the search result 605 indicates that the document ID “D005” is acquired as the order “1” after the rearrangement. In the same manner, the search result 605 shows the result of rearranging all four hit documents. The above is the specific flow of the document search process in the document search apparatus 10 shown in the first embodiment of the present invention.

この第1の実施形態によれば、単一の文書としての情報提示だけではなく、ヒット文書を含むディレクトリ内に含まれる文書群およびヒット文書からのリンク先の文書を用いて、検索者が求めている情報をより多く含む文書に対する高精度な検索結果を提示することができる。その結果、検索者は少ない回数の文書参照で必要な情報を入手することができるので、検索者の負担を軽減することができる。 According to the first embodiment, not only information presentation as a single document but also a searcher uses a document group included in a directory including a hit document and a link destination document from the hit document. It is possible to present a highly accurate search result for a document containing more information. As a result, the searcher can obtain necessary information with a small number of document references, so the burden on the searcher can be reduced.

[ディレクトリを考慮した文書群適合度の算出]
第１の実施形態では、文書適合度（Ｍ１）を算出する際に、図５に示したステップＳ４０２が実行され、文書群適合度算出処理部１３３により、文書群に含まれるヒット文書の数を計数するが、このときに文書群を含んでいるディレクトリに含まれるサブディレクトリも考慮する算出方法をとることもできる。 [Calculation of document group fitness considering directory]
In the first embodiment, when the document relevance (M1) is calculated, step S402 shown in FIG. 5 is executed, and the document group relevance calculation processing unit 133 determines the number of hit documents included in the document group. In this case, it is possible to take a calculation method that also considers subdirectories included in the directory including the document group.

この場合の文書群適合度算出方法では、文書群適合度算出処理部１３３が、該文書群に含まれるサブディレクトリにヒット文書が含まれるかを判定し、ヒット文書が含まれるサブディレクトリの数（以下、ヒットディレクトリ数と呼ぶ）も計数し、該文書群に含まれるヒット文書数およびヒットディレクトリ数の少なくとも一方に基づき、文書群適合度（Ｍ１）を算出する。 In the document group suitability calculation method in this case, the document group suitability calculation processing unit 133 determines whether the subdirectory included in the document group includes a hit document, and the number of subdirectories including the hit document ( Hereinafter, the number of hit directories is also counted, and the document group fitness (M1) is calculated based on at least one of the number of hit documents and the number of hit directories included in the document group.

例えば、ディレクトリ「/A/」が文書群を決定するディレクトリであって、その下に文書群に含まれる文書に加えて、サブディレクトリ「/A/B/」、「/A/C/」、「/A/B/D/」、「/A/B/E/」および「/A/C/F/」を含み、それぞれのサブディレクトリが、ヒット文書を含んでいるとする。この場合には、これらのサブディレクトリは全てヒットディレクトリとなり、ヒットディレクトリ数は５となる。 For example, the directory “/ A /” is a directory for determining a document group, and in addition to the documents included in the document group under that, subdirectories “/ A / B /”, “/ A / C /”, Assume that “/ A / B / D /”, “/ A / B / E /”, and “/ A / C / F /” are included, and each subdirectory includes a hit document. In this case, all of these subdirectories are hit directories, and the number of hit directories is 5.

なお、文書群適合度は、前記のヒットディレクトリ数としてもよいし、文書群に含まれるディレクトリの数に対するヒットディレクトリ数の割合としてもよい。 The document group compatibility may be the number of hit directories described above, or the ratio of the number of hit directories to the number of directories included in the document group.

この算出方法を採用した場合には、文書群適合度（Ｍ１）を算出する際、ディレクトリも評価対象に含めることができるため、文書群適合度（Ｍ１）をより高精度に算出できるようになる。 When this calculation method is adopted, when calculating the document group suitability (M1), the directory can also be included in the evaluation target, so that the document group suitability (M1) can be calculated with higher accuracy. .

[ヒット文書による対象の限定]
本実施形態で示した検索処理手順では、図５に示したステップＳ４０１が実行され、文書群決定処理部１３２により、文書群を決定する際に、登録文書管理テーブル１４０に格納された全ての文書に対して、同じディレクトリの下にある文書全ての集合を文書群と決定するものとしたが、ヒット文書リスト６０１から取得したヒット文書に限定して文書群を切り出してもよい。この場合には、文書群を切り出す対象を登録された全文書数から、ヒット文書数に限定することができるため、処理時間を短縮することができる。 [Target limitation by hit document]
In the search processing procedure shown in the present embodiment, step S401 shown in FIG. 5 is executed, and all documents stored in the registered document management table 140 when the document group determination processing unit 132 determines a document group. On the other hand, the set of all documents under the same directory is determined as the document group, but the document group may be cut out only to the hit documents acquired from the hit document list 601. In this case, the processing time can be shortened because the number of documents to be extracted can be limited from the total number of registered documents to the number of hit documents.

また、本実施形態で示した検索処理手順では、図５に示したステップＳ４０３が実行され、文書アクセス容易度算出処理部１３４により、文書アクセス容易度を算出する際に、計数するリンク先文書の対象を全てのリンク先文書とするものとしたが、ヒット文書リスト６０１から取得したヒット文書に限定して計数してもよい。 In the search processing procedure shown in the present embodiment, step S403 shown in FIG. 5 is executed, and when the document access ease calculation processing unit 134 calculates the document access ease, the link destination documents to be counted are counted. Although the target is all linked documents, counting may be limited to hit documents acquired from the hit document list 601.

<<第２の実施形態>>
第１の実施形態においては、文書群をディレクトリによって特定するが、この文書群を決定するための適切なディレクトリがない場合がある。このような場合には、ディレクトリを用いずに他の方法で文書群を形成することができる。第２の実施形態では、第１の実施形態とは異なる文書群の決定方法を用いて、ディレクトリではなくリンクを辿ることによって文書群を決定する。ただし、文書群の決定以外の処理および装置の構成などは、第１の実施形態と同じである。 << Second Embodiment >>
In the first embodiment, a document group is specified by a directory, but there may be no appropriate directory for determining the document group. In such a case, a document group can be formed by another method without using a directory. In the second embodiment, a document group is determined by following a link instead of a directory using a document group determination method different from that of the first embodiment. However, processes other than the document group determination, the configuration of the apparatus, and the like are the same as those in the first embodiment.

図７は、第２の実施形態におけるリンクを辿ることによる文書群決定を説明する図である。文書群決定以前の処理は、第１の実施形態と同じなので説明を省略する。まず、文書群に入れるべきリンク先文書の取得範囲を決定するために用いるリンクの経由回数Ｌを取得する（Ｓ８００）。ここでのリンクの経由回数とは、リンクを辿って次の文書を参照する回数を指す。例えば、文書Ａから文書Ｂへのリンクおよび文書Ｂから文書Ｃへのリンクが存在する場合に、文書Ａから文書Ｂへのリンクを参照して文書Ｂを取得する場合の経由回数は１回となり、文書Ａから文書Ｂを経て文書Ｃまでを取得する場合の経由回数を２回となる。この経由回数を変えることで文書群の範囲を制御できる。 FIG. 7 is a diagram for explaining document group determination by following a link in the second embodiment. Since the processing before the document group determination is the same as that in the first embodiment, the description thereof is omitted. First, the number L of link passages used to determine the acquisition range of linked documents to be included in the document group is acquired (S800). Here, the number of times the link is passed refers to the number of times the link is followed and the next document is referred to. For example, when there is a link from the document A to the document B and a link from the document B to the document C, the number of times the document B is acquired by referring to the link from the document A to the document B is one. In this case, the number of times of passing through the document A through the document B to the document C is two. The range of the document group can be controlled by changing the number of times of passing.

次に、登録文書管理テーブル１４０に格納された各文書に対して、以下のステップＳ８１１からステップＳ８１３までの処理を繰り返し（Ｓ８１０）、所定の処理が終了した後には処理を終了する。 Next, the following processing from step S811 to step S813 is repeated for each document stored in the registered document management table 140 (S810), and the processing ends after the predetermined processing ends.

ステップＳ８１１では、該文書のＵＲＬと該文書からＬ回以内のリンク経由で到達するリンク先文書のＵＲＬを取得する。そして、ステップＳ８１２では、該文書とステップＳ８１１で取得した該文書のリンク先文書の集合を文書群と決定する。次に、ステップＳ８１３では、ここまでに決定した文書群に対して文書群ＩＤを付与する。そして、ステップＳ８１０で対象となった文書がまだ残っていれば、ステップＳ８１１に戻って次の文書に対する処理を繰り返し、全ての文書に対して処理した後に、文書群決定の処理を終了する。
これ以降の処理は、第１の実施形態と同じなので説明を省略する。 In step S811, the URL of the document and the URL of the link destination document that is reached from the document via links within L times are acquired. In step S812, a set of the document and the link destination document of the document acquired in step S811 is determined as a document group. In step S813, a document group ID is assigned to the document group determined so far. If there is still a document that is a target in step S810, the process returns to step S811, the process for the next document is repeated, the process for all the documents is performed, and the document group determination process ends.
Subsequent processing is the same as in the first embodiment, and a description thereof will be omitted.

図８は、第２の実施形態においてリンクを辿ることによる文書群決定の具体例を説明する図である。まず、文書群決定処理部１３２ａがリンクを辿る経由回数を取得するが（図７のＳ８００に該当）、ここでは、取得の結果、経由回数（Ｌ）は１と設定されたものとする。そして、図８の上部に示されている登録文書管理テーブル１４０に格納されている文書Ｄ００１から文書Ｄ００６までの各文書を対象として（図７のＳ８１０に該当）、リンク先ＵＲＬの取得し（図７のＳ８１１に該当）、文書群と決定し（図７のＳ８１２に該当）、文書群ＩＤを付与する処理（図７のＳ８１３に該当）を繰り返す。 FIG. 8 is a diagram for explaining a specific example of document group determination by following links in the second embodiment. First, the document group determination processing unit 132a acquires the number of times of passage through the link (corresponding to S800 in FIG. 7). Here, it is assumed that the number of times of passage (L) is set to 1 as a result of the acquisition. Then, for each document from the document D001 to the document D006 stored in the registered document management table 140 shown in the upper part of FIG. 8 (corresponding to S810 in FIG. 7), the link destination URL is acquired (FIG. 8). 7 (corresponding to S811 in FIG. 7), the document group is determined (corresponding to S812 in FIG. 7), and the process of assigning the document group ID (corresponding to S813 in FIG. 7) is repeated.

前記の文書群決定の処理を文書Ｄ００１に対して行うと、１回のリンク経由で到達するリンク先文書のＵＲＬである「/A/a2.htm」および「/A/a3.htm」を文書群に含むことになり、図８の中央部左に示されている文書群Ｇ００１が決定される。同様に、文書Ｄ００２に対しては文書群Ｇ００２が、文書Ｄ００３に対しては文書群Ｇ００３が、文書Ｄ００４に対しては文書群Ｇ００４が、文書Ｄ００５に対しては文書群Ｇ００５が、文書Ｄ００６に対しては文書群Ｇ００６が、決定される。 When the document group determination process is performed on the document D001, “/A/a2.htm” and “/A/a3.htm”, which are URLs of linked documents that are reached via a single link, are documented. The document group G001 shown on the left in the center of FIG. 8 is determined. Similarly, the document group G002 for the document D002, the document group G003 for the document D003, the document group G004 for the document D004, the document group G005 for the document D005, and the document D006. On the other hand, the document group G006 is determined.

そして、これらの文書群は、図８の下部に示されている文書群リスト６０２ａに格納する。なお、この文書群リスト６０２ａを用いて、文書群適合度（Ｍ１）、文書アクセス容易度（Ｍ２）を求める処理などの第１の実施形態と同様の処理が続行される。また、文書アクセス容易度(Ｍ２)を求める処理においては、ヒット文書からリンクで関連付けられた文書のうち、その文書が属する文書群内にある文書の数を文書アクセス容易度（Ｍ２）としてもよいし、ヒット文書からリンクで関連付けられたヒット文書のうち、その文書が属する文書群内にあるヒット文書の数を文書アクセス容易度（Ｍ２）としてもよい。あるいは、ヒット文書からリンクで関連付けられた文書のうち、その文書が属する文書群内にある文書の数に対するその文書が属する文書群内にあるヒット文書の数の割合を文書アクセス容易度（Ｍ２）としてもよい。 These document groups are stored in a document group list 602a shown at the bottom of FIG. It should be noted that processing similar to that in the first embodiment such as processing for obtaining the document group compatibility (M1) and the document accessibility (M2) is continued using the document group list 602a. In the process of obtaining the document accessibility (M2), the number of documents in the document group to which the document belongs among the documents associated with the links from the hit document may be set as the document accessibility (M2). The number of hit documents in the document group to which the document belongs among the hit documents associated with the link from the hit document may be set as the document accessibility (M2). Alternatively, the ratio of the number of hit documents in the document group to which the document belongs to the number of documents in the document group to which the document belongs out of the documents associated with the link from the hit document is the document accessibility (M2). It is good.

なお、図８の例では、経由回数（Ｌ）を１としているが、経由回数（Ｌ）を増加した場合には、文書群に含まれる文書は一般的には増加する傾向を示す。ただし、図８の中心部および下部に示されている例では、リンクの記載例が少ないので、文書群に含まれる文書の数は増えない。例えば、文書群Ｇ００５において、ＵＲＬとしては「/B/b3.htm」で示される文書Ｄ００６からＵＲＬとしては「/B/b2.htm」で示される文書Ｄ００５へとリンクを辿ることができるが、このリンクは、元の文書に戻っているだけなので、文書群Ｇ００５に含まれる文書の数は増えないことになる。 In the example of FIG. 8, the number of times of passing (L) is set to 1, but when the number of times of passing (L) is increased, the documents included in the document group generally tend to increase. However, in the example shown in the central part and the lower part of FIG. 8, since there are few examples of description of links, the number of documents included in the document group does not increase. For example, in the document group G005, a link can be traced from a document D006 indicated by “/B/b3.htm” as a URL to a document D005 indicated by “/B/b2.htm” as a URL. Since this link only returns to the original document, the number of documents included in the document group G005 does not increase.

この第２の実施形態によれば、検索でヒットした文書が適切なディレクトリによって分類されていない場合でも、第１の実施形態と同様に、単一の文書としての情報提示だけではなく、リンクによって関連付けられた文書群およびヒットした該文書のリンク先の文書を用いて、検索者が求めている情報をより多く含む文書に対する高精度な検索結果を提示することができる。 According to the second embodiment, even when a document hit by the search is not classified by an appropriate directory, as in the first embodiment, not only information presentation as a single document but also a link is used. By using the associated document group and the linked document of the hit document, it is possible to present a highly accurate search result for a document including more information requested by the searcher.

<<その他の実施形態>>
本発明においては、第１の実施形態および第２の実施形態以外にも、多くの実施形態が可能である。以下では、その他の実施形態の例を挙げて説明する。 << Other Embodiments >>
In the present invention, many embodiments are possible in addition to the first embodiment and the second embodiment. Hereinafter, examples of other embodiments will be described.

[文書群外へのリンクも処理対象とする実施形態例]
第１の実施形態および第２の実施形態においては、文書群の中にある文書は対象とするが、文書群の外にある文書については対象とせずに、文書アクセス容易度（Ｍ２）を求めている。しかし、文書群の中にある文書に加えて文書群の外にある文書も対象にすることもできる。図９は、リンク先文書が文書群に含まれる場合と含まれない場合で重みを付けて文書アクセス容易度（Ｍ２）を求める処理を説明する図である。ここでは、文書適合度（Ｍ１）は第１の実施形態と同じであり、文書アクセス容易度（Ｍ２）を求める処理以外の処理や装置構成なども、第１の実施形態と同じとする。 [Embodiment example in which links outside the document group are also processed]
In the first embodiment and the second embodiment, the documents within the document group are targeted, but the documents outside the document group are not targeted, and the document accessibility (M2) is obtained. ing. However, in addition to documents in the document group, documents outside the document group can also be targeted. FIG. 9 is a diagram for explaining processing for obtaining the document accessibility (M2) with weights depending on whether or not the link destination document is included in the document group. Here, the document conformity (M1) is the same as that of the first embodiment, and the processing and apparatus configuration other than the processing for obtaining the document accessibility (M2) are the same as those of the first embodiment.

図９に示すように、リンク先文書が文書群に含まれる場合と含まれない場合で重みを付けて文書アクセス容易度（Ｍ２）を求める処理では、まず、リンク先文書が文書群に含まれる場合の重みＷ１とリンク先文書が文書群に含まれない場合の重みＷ２を取得する（Ｓ７００）。 As shown in FIG. 9, in the process of obtaining the document accessibility (M2) with weights depending on whether or not the linked document is included in the document group, the linked document is first included in the document group. The weight W1 for the case and the weight W2 for the case where the linked document is not included in the document group are acquired (S700).

そして、対象とする文書群に含まれる全てのヒット文書に対して、以下のステップＳ７２０からステップＳ７３２までの処理を繰り返す（Ｓ７１０）。 Then, the following processing from step S720 to step S732 is repeated for all hit documents included in the target document group (S710).

ステップＳ７２０では、まず、該ヒット文書のリンク先文書のＵＲＬを取得する。そして、取得したリンク先文書のＵＲＬが該文書群に含まれるか否かを判定する（Ｓ７２０）。この判定の結果、取得したリンク先文書が該文書群に含まれる場合（Ｓ７３０の「文書群に含まれる場合」）、該文書群に含まれるリンク先文書の数Ｎ１を計数する（Ｓ７３１）。取得したリンク先文書が該文書群に含まれない場合（Ｓ７３０の「文書群に含まれない場合」）、該文書群に含まれないリンク先文書の数Ｎ２を計数する（Ｓ７３２）。そして、ここまでの処理をステップＳ７１０で対象となったヒット文書全てについて処理が終了するまで、ステップＳ７２０からステップＳ７３２までの処理を繰り返す。 In step S720, first, the URL of the linked document of the hit document is acquired. Then, it is determined whether the URL of the acquired link destination document is included in the document group (S720). If the acquired link destination document is included in the document group as a result of this determination (“when included in the document group” in S730), the number N1 of link destination documents included in the document group is counted (S731). If the acquired linked document is not included in the document group (“not included in document group” in S730), the number N2 of linked documents not included in the document group is counted (S732). The processing from step S720 to step S732 is repeated until the processing up to this point is completed for all hit documents targeted in step S710.

最後に、リンク先文書が文書群に含まれる場合の重みＷ１およびリンク先文書が文書群に含まれない場合の重みＷ２と、該文書群に含まれるリンク先文書の数Ｎ１および該文書に含まれないリンク先文書の数Ｎ２を用いて文書アクセス容易度（Ｍ２）を算出して（Ｓ７４０）、処理を終了する。このとき、例えば、以下に示す式（１）によって、文書アクセス容易度（Ｍ２）を求めることができる。
Ｍ２＝Ｗ１＊Ｎ１＋Ｗ２＊Ｎ２式（１）
なお、文書アクセス容易度（Ｍ２）を求める式は、この例に限定されず、様々な計算方法を用いた式を定義することができる。 Finally, the weight W1 when the link destination document is included in the document group, the weight W2 when the link destination document is not included in the document group, the number N1 of the link destination documents included in the document group, and the document group The document access ease (M2) is calculated using the number N2 of link destination documents that are not available (S740), and the process ends. At this time, for example, the document accessibility (M2) can be obtained by the following equation (1).
M2 = W1 * N1 + W2 * N2 Formula (1)
The formula for obtaining the document accessibility (M2) is not limited to this example, and formulas using various calculation methods can be defined.

このような文書アクセス容易度（Ｍ２）の計算を行う場合には、リンク先の文書がヒット文書であるか否か、あるいは、リンク先の文書が文書群内に存在するか否かを評価することができるようになるため、文書アクセス容易度（Ｍ２）をより高精度に算出できるようになる。 When calculating the document accessibility (M2), it is evaluated whether the linked document is a hit document or whether the linked document exists in the document group. Therefore, the document accessibility (M2) can be calculated with higher accuracy.

[検索結果の表示例]
図１０は、文書検索装置１０において検索を行った結果を示す画面の例である。例えば、図１のディスプレイ１００やネットワーク１０６を介して接続した端末装置（図示せず）に画面１０００が表示される。この例では、「自動車Ａ」という単語を用いて検索を行い、その結果として、符号１０１０によって示されている文書群Ｂと符号１０２０によって示されている文書群Ａの２つの文書群が示されている。 [Search result display example]
FIG. 10 is an example of a screen showing a result of a search performed by the document search apparatus 10. For example, the screen 1000 is displayed on a terminal device (not shown) connected via the display 100 or the network 106 in FIG. In this example, a search is performed using the word “car A”, and as a result, two document groups, a document group B indicated by reference numeral 1010 and a document group A indicated by reference numeral 1020, are shown. ing.

この例においては、文書群Ｂは文書群適合度が３であり、文書群Ａは文書群適合度が１であるので、文書群適合度が高い文書群Ｂから先に表示している。そして、文書群Ｂに含まれる複数の文書については、文書アクセス容易度が高いものから表示されている。この例においては、符号１０１１で示されている文書は文書アクセス容易度が２であり、符号１０１２で示されている文書は文書アクセス容易度が１であり、符号１０１３で示されている文書は文書アクセス容易度が０であるので、最初に符号１０１１で示されている文書、次に符号１０１２で示されている文書、最後に符号１０１３で示されている文書の順に示されている。文書群Ａには、符号１０２１で示されている文書の１件しかないが、複数の文献が含まれている場合には、文書群Ｂと同様に文書アクセス容易度が高いものから表示される。 In this example, the document group B has a document group fitness level of 3, and the document group A has a document group fitness level of 1. Therefore, the document group B having a higher document group fitness level is displayed first. A plurality of documents included in the document group B are displayed in order from the document having a high degree of document access. In this example, the document indicated by reference numeral 1011 has a document accessibility of 2, the document indicated by reference numeral 1012 has a document accessibility of 1, and the document indicated by reference numeral 1013 is Since the document accessibility is 0, the document indicated by reference numeral 1011 is shown first, followed by the document indicated by reference numeral 1012, and finally by the document indicated by reference numeral 1013. The document group A includes only one document indicated by reference numeral 1021, but when a plurality of documents are included, the document group A is displayed in the order of high document accessibility as in the document group B. .

この例では、まず、文書群適合度に従って文書群を降順に並べ、さらに、各文書群に含まれる文書を文書アクセス容易度に従って降順に並べているが、文書を表示する順序はこの例に限定されない。例えば、文書群適合度（Ｍ１）と文書アクセス容易度（Ｍ２）を用いて文書の重要度を算出し、該重要度の降順にヒット文書を出力するものとしてよい。また、前記の文書群適合度（Ｍ１）と文書アクセス容易度（Ｍ２）の他に、文書の更新日時などの書誌情報も加味して算出した文書の重要度の降順に出力するものとしてもよい。この場合には、文書群適合度（Ｍ１）、文書アクセス容易度（Ｍ２）および別途追加した指標で文書を評価することができ、ヒット文書の重要度を詳細に算出することができるようになる。その結果として、検索者は所望する情報を検索結果から効率良く取得できるようになる。 In this example, first, the document groups are arranged in descending order according to the document group suitability, and further, the documents included in each document group are arranged in descending order according to the document accessibility, but the order in which the documents are displayed is not limited to this example. . For example, the importance level of the document may be calculated using the document group compatibility (M1) and the document accessibility (M2), and hit documents may be output in descending order of the importance. Further, in addition to the document group compatibility (M1) and the document accessibility (M2), the document may be output in descending order of the importance of the document calculated in consideration of the bibliographic information such as the update date and time of the document. . In this case, the document can be evaluated using the document group compatibility (M1), the document accessibility (M2), and a separately added index, and the importance of the hit document can be calculated in detail. . As a result, the searcher can efficiently obtain desired information from the search result.

なお、ここまでに説明した実施形態では、登録対象文書は、ＨＴＭＬ形式で記述されているものを用いて説明を行ったが、メールソフトやワープロソフトなどのアプリケーションソフトを用いて作成された文書であっても構わず、形式は限定されない。また、リンクもＵＲＬの指定によるリンクに限定されず、例えば、文書中でリンク先の文書ＩＤを記述する方法であってもよい。 In the embodiment described so far, the registration target document is described using the document described in the HTML format. However, the registration target document is a document created using application software such as mail software or word processing software. There may be, and the format is not limited. Further, the link is not limited to the link specified by the URL, and for example, a method of describing the document ID of the link destination in the document may be used.

ここまで説明したように、本発明によれば、検索者の所望する情報が複数の話題から構成されており、それぞれの話題が異なる文書に記載されている場合であっても、所望する情報に関する話題が多く含まれた文書群を検索した上で、それらの話題へ関連度の高い文書を取得することにより、所望する情報に関する多くの話題を取得できるようになる。結果として、本発明の文書検索装置は、検索者の負担を軽減することができる。 As described so far, according to the present invention, even if the information desired by the searcher is composed of a plurality of topics and each topic is described in different documents, By searching for a document group containing many topics and acquiring documents highly relevant to these topics, it is possible to acquire many topics related to desired information. As a result, the document search apparatus of the present invention can reduce the burden on the searcher.

なお、本発明の実施形態は、本発明の趣旨を逸脱しない範囲で変更することが可能である。例えば、文書検索装置１０を１台の計算機で構成するのではなく、複数の計算機によって構成し、これらの計算機の協調的な処理によって本発明の処理を実行してもよい。また、本発明の各実施形態は、計算機の演算手段を用いてプログラムを実行することで実現されており、このプログラムを記録した記憶媒体から計算機に読み込むことで稼動可能な装置を構成することができる。 It should be noted that the embodiments of the present invention can be modified without departing from the spirit of the present invention. For example, the document search apparatus 10 may be configured not by a single computer but by a plurality of computers, and the processing of the present invention may be executed by cooperative processing of these computers. In addition, each embodiment of the present invention is realized by executing a program using a computing unit of a computer, and an apparatus that can be operated by reading the program from a storage medium on which the program is recorded can be configured. it can.

文書検索装置の第一の実施形態における全体構成を示す図である。It is a figure which shows the whole structure in 1st embodiment of a document search device. 第一の実施形態におけるシステム制御処理部の処理手順を説明するＰＡＤ図である。It is a PAD explaining the processing procedure of the system control processing unit in the first embodiment. 第一の実施形態における登録制御処理部の処理手順を説明するＰＡＤ図である。It is a PAD explaining the processing procedure of the registration control processing unit in the first embodiment. 第一の実施形態におけるＨＴＭＬ文書の登録処理の流れの具体例を説明する図である。It is a figure explaining the specific example of the flow of the registration process of the HTML document in 1st embodiment. 第一の実施形態における検索制御処理部の処理手順を説明するＰＡＤ図である。It is a PAD figure explaining the process sequence of the search control process part in 1st embodiment. 第一の実施形態における検索処理の流れの具体例を説明する図である。It is a figure explaining the specific example of the flow of the search process in 1st embodiment. 第２の実施形態における文書群決定の処理を説明する図である。It is a figure explaining the process of document group determination in 2nd Embodiment. 第２の実施形態における文書群決定の具体例を説明する図である。It is a figure explaining the specific example of document group determination in 2nd Embodiment. リンク先文書が文書群に含まれる場合と含まれない場合で重みを付けて文書アクセス容易度（Ｍ２）を求める処理を説明する図である。It is a figure explaining the process which calculates | requires a document access ease (M2) by giving a weight with the case where a link destination document is included in a document group, and when it is not included. 文書検索装置において検索を行った結果を示す画面の例である。It is an example of the screen which shows the result of having searched in the document search device.

Explanation of symbols

１００ディスプレイ
１０１キーボード
１０２中央演算処理装置（ＣＰＵ）
１０３磁気ディスク装置
１０４主メモリ
１０５バス
１０６ネットワーク
１１０システム制御処理部
１２０登録制御処理部
１２１登録文書解析処理部
１２２文書情報取得処理部
１３０検索制御処理部
１３１ヒット文書取得処理部
１３２文書群決定処理部
１３３文書群適合度算出処理部
１３４文書アクセス度算出処理部
１３５検索結果出力処理部
１４０登録文書管理テーブル
１５０ワークエリア 100 display 101 keyboard 102 central processing unit (CPU)
DESCRIPTION OF SYMBOLS 103 Magnetic disk apparatus 104 Main memory 105 Bus 106 Network 110 System control processing part 120 Registration control processing part 121 Registered document analysis processing part 122 Document information acquisition processing part 130 Search control processing part 131 Hit document acquisition processing part 132 Document group determination processing part 133 Document Group Conformity Calculation Processing Unit 134 Document Access Level Calculation Processing Unit 135 Search Result Output Processing Unit 140 Registered Document Management Table 150 Work Area

Claims

A computer calculates means for searching for a hit document including a given keyword, means for determining a document group including the hit document, and a document group fitness that is a measure for the hit document included in the document group. A document search method to be executed, comprising: means; a means for calculating a document access degree that is a measure related to a link to another document coming out of the hit document; and a means for outputting a search result.
The means for searching for the hit document searches for a hit document including the keyword given at the time of the search, and records the searched hit document in a list of hit documents;
The means for determining the document group determining a document group including the hit document;
Calculating means for calculating the document group based on information on the document group and the hit document,
The means for calculating the document accessibility calculating the document accessibility based on the hit document list;
The document search method, characterized in that the means for outputting the search result comprises a step of outputting a document in order of importance based on the document group compatibility and the document accessibility.

The computer further comprises means for analyzing a document to be registered, and means for registering the document in a database,
The means for analyzing the document to be registered obtains text included in the document;
The document search method according to claim 1, further comprising: a step of registering a document to be searched by associating the document with the text.

The means for determining the document group determining the document group including the hit document;
The document search method according to claim 1, wherein a set of documents included in the same directory as the hit document is determined as a document group.

The means for determining the document group determining the document group including the hit document;
3. The document search method according to claim 1, wherein a set of documents that can be reached by following links from a hit document a predetermined number of times is determined as a document group.

In the step of calculating the document group suitability based on the information of the document group and the hit document, the means for calculating the document group suitability
4. The document search method according to claim 3, wherein a subdirectory including a hit document is counted from subdirectories included in the directory including the document group to obtain a document group compatibility.

In the step of calculating the document accessibility based on the list of hit documents, the means for calculating the document accessibility is:
The document search method according to claim 1 or 2, wherein documents accessible by tracing a link from the hit document a predetermined number of times are counted as document access ease.

In the step of calculating the document accessibility based on the list of hit documents, the means for calculating the document accessibility is:
Counting the number of documents included in the document group and the number of documents not included in the document group for documents that can be reached by following a predetermined number of links from the hit document,
3. The document search according to claim 1, wherein the document access degree is calculated by a predetermined calculation formula from the number of documents included in the document group and the number of documents not included in the document group. Method.

The means for analyzing the document to be registered analyzes the keyword included in the document,
Obtain storage location information from the document to be registered,
In the step of registering the document to be searched by means for registering the document in the database,
The document retrieval method according to claim 2, wherein the storage location information is registered in a database.

In the step of outputting the search result in the order of importance in accordance with the document group compatibility and the document accessibility,
6. The document search method according to claim 5, wherein documents are classified in descending order according to the document group compatibility, and output in descending order according to the document accessibility for each set of the classified documents. .

The calculator
Means for searching for a hit document including a keyword given at the time of search and recording the hit document in the hit document list;
Means for determining the document group for determining the document group including the hit document;
Means for calculating the document group suitability for calculating the document suitability based on the document group and hit document information;
Means for calculating the document accessibility based on the hit document list;
A document search apparatus comprising: means for outputting the search results for outputting documents in order of importance based on document group compatibility and document accessibility.

The calculator is
Means for analyzing the document to be registered for obtaining text included in the document;
11. The document search apparatus according to claim 10, further comprising means for registering the document for registering a search target document in a database in association with the text.

12. The document search apparatus according to claim 10, wherein the means for determining the document group determines a set of documents included in the same directory as the hit document as a document group.

12. The document search apparatus according to claim 10, wherein the means for determining the document group determines a set of documents that can be reached by following links from the hit document a predetermined number of times as the document group.

13. The document group matching degree according to claim 12, wherein the means for calculating the document group suitability counts subdirectories containing hit documents from subdirectories contained in the directory containing the document group. Document retrieval device.

12. The document according to claim 10, wherein the means for calculating the document access degree counts the documents that can be reached by following a link a predetermined number of times from the hit document to obtain the document access degree. Search device.

The means for calculating the document accessibility counts the number of documents included in the document group and the number of documents not included in the document group with respect to documents that can be reached by following links from the hit document a predetermined number of times. And
12. The document search according to claim 10 or 11, wherein the document accessibility is calculated by a predetermined calculation formula from the number of documents included in the document group and the number of documents not included in the document group. apparatus.

Means for analyzing the document to be registered acquires storage location information from the document to be registered;
12. The document search apparatus according to claim 11, wherein means for registering the document in a database registers the storage location information in the database.

The means for outputting the search result classifies the documents in descending order according to the document group suitability, and outputs the sorted sets of documents in descending order according to the document accessibility. Item 15. The document search device according to Item 14.

A computer calculates means for searching for a hit document including a given keyword, means for determining a document group including the hit document, and a document group fitness that is a measure for the hit document included in the document group. A storage medium storing a document search program to be executed, comprising: means; a means for calculating a document access degree, which is a measure related to a link to another document coming out of the hit document; and a means for outputting a search result Because
The means for searching for the hit document searches for a hit document including the keyword given at the time of the search, and records the searched hit document in a list of hit documents;
The means for determining the document group determining a document group including the hit document;
The means for calculating the document group suitability calculating the document suitability based on the document group and hit document information;
The means for calculating the document accessibility calculating the document accessibility based on the hit document list;
A storage medium storing a document search program, wherein the means for outputting the search result executes a step of outputting a document in order of importance based on document group compatibility and document accessibility.

The computer further comprises means for analyzing a document to be registered, and means for registering the document in a database,
The means for analyzing the document to be registered obtains text included in the document;
20. The storage medium storing the document search program according to claim 19, wherein the means for registering the document in the database further executes a step of registering the document to be searched for in the database in association with the text.