JP5644558B2

JP5644558B2 - Document relevance calculation device

Info

Publication number: JP5644558B2
Application number: JP2011021831A
Authority: JP
Inventors: 文キ周
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-02-03
Filing date: 2011-02-03
Publication date: 2014-12-24
Anticipated expiration: 2031-02-03
Also published as: JP2012164015A

Description

本発明は、検索文字列と文書ファイルとの関連の深さを表す関連度を算出する文書関連度算出装置に関する。 The present invention relates to a document relevance calculation device that calculates a relevance level that represents the depth of association between a search character string and a document file.

ユーザにより入力された検索文字列と、文書ファイルが表す文書に含まれる文字列と、の関連の深さを表す関連度を算出する文書関連度算出装置が知られている（例えば、特許文献１を参照）。この文書関連度算出装置は、算出された関連度が表す関連の深さの順に並べて、文書ファイルを識別するための文書ファイル識別情報（例えば、ＵＲＩ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒ））を出力する。 There is known a document relevance calculation device that calculates a relevance representing the depth of relevance between a search character string input by a user and a character string included in a document represented by a document file (for example, Patent Document 1). See). This document relevance calculation device outputs document file identification information (for example, URI (Uniform Resource Identifier)) for identifying a document file, in the order of the relationship depths represented by the calculated relevance.

特開２０１０−６１３２２号公報JP 2010-61322 A

ところで、同一のディレクトリに格納されている複数の文書ファイルは、互いに比較的深く関連していることが多い。従って、検索文字列と比較的深く関連する文書ファイルがより多く格納されているディレクトリに格納されている文書ファイルほど、ユーザにとって重要な情報を含む文書ファイルである可能性が高い。 Incidentally, a plurality of document files stored in the same directory are often relatively deeply related to each other. Therefore, a document file stored in a directory in which more document files that are relatively deeply related to the search character string are stored is more likely to be a document file including information important to the user.

しかしながら、上記文書関連度算出装置は、検索文字列とディレクトリとの関連の深さを反映した関連度を算出することができない。即ち、上記文書関連度算出装置は、検索文字列と文書ファイルとの関連の深さを高い精度にて表す関連度を算出することができないという問題があった。 However, the document relevance calculation device cannot calculate the relevance that reflects the depth of relevance between the search character string and the directory. In other words, the document relevance calculation device has a problem in that it cannot calculate a relevance representing the depth of relevance between the search character string and the document file with high accuracy.

このため、本発明の目的は、上述した課題である「検索文字列と文書ファイルとの関連の深さを高い精度にて表す関連度を算出することができない場合が生じること」を解決することが可能な文書関連度算出装置を提供することにある。 For this reason, the object of the present invention is to solve the above-mentioned problem that “the degree of association representing the depth of association between the search character string and the document file cannot be calculated with high accuracy”. An object of the present invention is to provide a document relevance degree calculating apparatus capable of

かかる目的を達成するため本発明の一形態である文書関連度算出装置は、
検索文字列を受け付ける検索文字列受付手段と、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、上記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する文書ファイル関連度算出手段と、
を備える。 In order to achieve such an object, a document relevance calculation apparatus according to one aspect of the present invention provides:
A search character string receiving means for receiving a search character string;
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the received search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document file relevance calculating means for calculating a document file relevance representing the depth of relevance to the file;
Is provided.

また、本発明の他の形態である文書関連度算出方法は、
検索文字列を受け付け、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、上記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する方法である。 In addition, a document relevance calculation method according to another aspect of the present invention includes:
Accept search strings,
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the received search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document This is a method of calculating a document file relevance degree that represents the depth of association with a file.

また、本発明の他の形態である文書関連度算出プログラムは、
情報処理装置に、
検索文字列を受け付け、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、上記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する、処理を実行させるためのプログラムである。 Further, a document relevance calculation program according to another embodiment of the present invention is provided.
In the information processing device,
Accept search strings,
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the received search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document It is a program for executing a process for calculating a document file relevance level representing the depth of association with a file.

本発明は、以上のように構成されることにより、検索文字列と文書ファイルとの関連の深さを高い精度にて表す文書ファイル関連度を算出することができる。 According to the present invention configured as described above, it is possible to calculate a document file relevance degree that represents the depth of association between a search character string and a document file with high accuracy.

本発明の第１実施形態に係る文書検索システムの概略構成を表す図である。1 is a diagram illustrating a schematic configuration of a document search system according to a first embodiment of the present invention. 本発明の第１実施形態に係る文書検索システムの機能の概略を表すブロック図である。It is a block diagram showing the outline of the function of the document search system which concerns on 1st Embodiment of this invention. 本発明の第１実施形態に係る文書検索サーバ装置が実行するプログラムを示したフローチャートである。It is the flowchart which showed the program which the document search server apparatus concerning 1st Embodiment of this invention performs. 本発明の第１実施形態に係る文書ファイルサーバ装置が記憶する文書ファイルを概念的に示した説明図である。It is explanatory drawing which showed notionally the document file which the document file server apparatus which concerns on 1st Embodiment of this invention memorize | stores. 本発明の第１実施形態に係る文書検索サーバ装置が算出する、文書内文字列関連度、及び、文書ファイル関連度を示したテーブルである。It is the table which showed the character string relevance degree in a document and the document file relevance degree which the document search server apparatus which concerns on 1st Embodiment of this invention calculates. 本発明の第２実施形態に係る文書検索サーバ装置が算出する、文書内文字列関連度、及び、文書ファイル関連度を示したテーブルである。It is the table which showed the character string relevance degree in a document, and the document file relevance degree which the document search server apparatus concerning 2nd Embodiment of this invention calculates. 本発明の第３実施形態に係る文書関連度算出装置の機能の概略を表すブロック図である。It is a block diagram showing the outline of the function of the document related degree calculation apparatus which concerns on 3rd Embodiment of this invention.

以下、本発明に係る、文書関連度算出装置、文書関連度算出方法、及び、文書関連度算出プログラム、の各実施形態について図１〜図７を参照しながら説明する。 Embodiments of a document relevance calculation device, a document relevance calculation method, and a document relevance calculation program according to the present invention will be described below with reference to FIGS.

＜第１実施形態＞
（構成）
図１に示したように、第１実施形態に係る文書検索システム１は、クライアント装置１０と、文書ファイルサーバ装置２０と、文書検索サーバ装置（文書関連度算出装置）３０と、を備える。なお、文書検索システム１は、文書ファイルサーバ装置２０を複数備えていてもよい。クライアント装置１０、文書ファイルサーバ装置２０、及び、文書検索サーバ装置３０は、通信回線（本例では、ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）網を構成する通信回線）ＮＷを介して、互いに通信可能に接続されている。 <First Embodiment>
(Constitution)
As shown in FIG. 1, the document search system 1 according to the first embodiment includes a client device 10, a document file server device 20, and a document search server device (document relevance calculation device) 30. The document search system 1 may include a plurality of document file server devices 20. The client device 10, the document file server device 20, and the document search server device 30 are communicably connected to each other via a communication line (in this example, a communication line constituting an IP (Internet Protocol) network) NW. Yes.

クライアント装置１０は、情報処理装置（本例では、パーソナル・コンピュータ）である。なお、クライアント装置１０は、携帯電話端末、ＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙｐｈｏｎｅＳｙｓｔｅｍ）、ＰＤＡ（ＰｅｒｓｏｎａｌＤａｔａＡｓｓｉｓｔａｎｃｅ、ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、スマートフォン、カーナビゲーション端末、又は、ゲーム端末等であってもよい。 The client device 10 is an information processing device (in this example, a personal computer). The client device 10 may be a mobile phone terminal, a PHS (Personal Handyphone System), a PDA (Personal Data Assistance, a Personal Digital Assistant), a smartphone, a car navigation terminal, or a game terminal.

クライアント装置１０は、図示しない中央処理装置（ＣＰＵ；ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、記憶装置（メモリ及びハードディスク駆動装置（ＨＤＤ；ＨａｒｄＤｉｓｋＤｒｉｖｅ））、入力装置（本例では、キーボード、及び、マウス）、及び、出力装置（本例では、ディスプレイ）を備える。クライアント装置１０は、記憶装置に記憶されているプログラムをＣＰＵが実行することにより、後述する機能を実現するように構成されている。 The client device 10 includes a central processing unit (CPU; Central Processing Unit) (not shown), a storage device (memory and hard disk drive (HDD)), an input device (in this example, a keyboard and a mouse), and And an output device (in this example, a display). The client device 10 is configured to realize functions to be described later when the CPU executes a program stored in the storage device.

文書ファイルサーバ装置２０、及び、文書検索サーバ装置３０のそれぞれは、情報処理装置である。文書ファイルサーバ装置２０、及び、文書検索サーバ装置３０のそれぞれは、クライアント装置１０と同様に、図示しないＣＰＵ及び記憶装置を備える。文書ファイルサーバ装置２０、及び、文書検索サーバ装置３０のそれぞれは、クライアント装置１０と同様に、記憶装置に記憶されているプログラムをＣＰＵが実行することにより、後述する機能を実現するように構成されている。 Each of the document file server device 20 and the document search server device 30 is an information processing device. Each of the document file server device 20 and the document search server device 30 includes a CPU and a storage device (not shown), similar to the client device 10. Each of the document file server device 20 and the document search server device 30 is configured to implement the functions described later when the CPU executes a program stored in the storage device, like the client device 10. ing.

文書ファイルサーバ装置２０は、複数の文書ファイルを記憶している。各文書ファイルは、文字列を含む文書を表す情報である。各文書ファイルは、ファイルシステムにおいて、複数のディレクトリのいずれかに格納されている。 The document file server device 20 stores a plurality of document files. Each document file is information representing a document including a character string. Each document file is stored in one of a plurality of directories in the file system.

（機能）
図２は、上記のように構成された文書検索システム１の機能を表すブロック図である。
クライアント装置１０の機能は、検索文字列送信部１１と、検索結果出力部１２と、を含む。
検索文字列送信部１１は、ユーザにより入力装置を介して入力された検索文字列を受け付ける。検索文字列送信部１１は、受け付けられた検索文字列を文書検索サーバ装置３０へ送信する。 (function)
FIG. 2 is a block diagram showing functions of the document search system 1 configured as described above.
The functions of the client device 10 include a search character string transmission unit 11 and a search result output unit 12.
The search character string transmission unit 11 accepts a search character string input by the user via the input device. The search character string transmission unit 11 transmits the accepted search character string to the document search server device 30.

検索結果出力部１２は、文書検索サーバ装置３０から検索結果（を表す情報）を受信する。検索結果出力部１２は、受信された検索結果を、出力装置を介して出力する（本例では、ディスプレイに表示させる）。 The search result output unit 12 receives a search result (information representing) from the document search server device 30. The search result output unit 12 outputs the received search result via an output device (in this example, the search result output unit 12 displays the search result).

文書検索サーバ装置３０の機能は、検索文字列受信部（検索文字列受付手段）３１と、文書ファイル関連度算出部（文書ファイル関連度算出手段）３２と、検索結果送信部３３と、を含む。 The functions of the document search server device 30 include a search character string receiving unit (search character string receiving unit) 31, a document file relevance calculating unit (document file relevance calculating unit) 32, and a search result transmitting unit 33. .

検索文字列受信部３１は、クライアント装置１０により送信された検索文字列を受信する（受け付ける）。 The search character string receiving unit 31 receives (accepts) the search character string transmitted by the client device 10.

文書ファイル関連度算出部３２は、文書内文字列関連度算出処理部３２ａと、ディレクトリ構造情報取得部３２ｂと、ディレクトリ関連度算出処理部３２ｃと、文書ファイル関連度算出処理部３２ｄと、を含む。 The document file relevance calculation unit 32 includes an in-document character string relevance calculation processing unit 32a, a directory structure information acquisition unit 32b, a directory relevance calculation processing unit 32c, and a document file relevance calculation processing unit 32d. .

文書内文字列関連度算出処理部３２ａは、文書ファイルサーバ装置２０に記憶されている、すべての文書ファイルのそれぞれに対して、文書内文字列関連度を算出する。文書内文字列関連度は、検索文字列受信部３１により受信された検索文字列と、文書ファイルが表す文書に含まれる文字列と、の関連の深さを表す。 The in-document character string relevance calculation processing unit 32 a calculates the in-document character string relevance for each of all document files stored in the document file server device 20. The in-document character string relevance level represents the depth of association between the search character string received by the search character string receiving unit 31 and the character string included in the document represented by the document file.

本例では、文書内文字列関連度算出処理部３２ａは、ｔｆ−ｉｄｆ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ−ＩｎｖｅｒｓｅＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ）に従って、文書内文字列関連度を算出する。なお、文書内文字列関連度算出処理部３２ａは、他のアルゴリズムに従って（例えば、検索文字列が文書において出現する頻度、及び／又は、文書の構造等に基づいて）文書内文字列関連度を算出してもよい。
本例では、文書内文字列関連度は、検索文字列と、文書ファイルが表す文書に含まれる文字列と、の関連の深さが深くなるほど大きくなる値を有する。 In this example, the in-document character string relevance calculation processing unit 32a calculates the in-document character string relevance according to tf-idf (Term Frequency-Inverse Document Frequency). The in-document character string relevance calculation processing unit 32a determines the in-document character string relevance according to another algorithm (for example, based on the frequency at which the search character string appears in the document and / or the structure of the document). It may be calculated.
In this example, the in-document character string relevance has a value that increases as the relationship between the search character string and the character string included in the document represented by the document file increases.

更に、文書内文字列関連度算出処理部３２ａは、算出された文書内文字列関連度が表す関連の深さが、予め設定された閾値深さよりも深い文書ファイルを抽出する。本例では、文書内文字列関連度算出処理部３２ａは、算出された文書内文字列関連度が、予め設定された閾値（例えば、０）よりも大きい文書ファイルを抽出する。 Further, the in-document character string relevance calculation processing unit 32a extracts a document file in which the related depth represented by the calculated in-document character string relevance is greater than a preset threshold depth. In this example, the in-document character string relevance calculation processing unit 32a extracts a document file in which the calculated in-document character string relevance is greater than a preset threshold (for example, 0).

ディレクトリ構造情報取得部３２ｂは、文書内文字列関連度算出処理部３２ａにより抽出された文書ファイルのそれぞれに対して、ディレクトリ構造情報を取得する。ディレクトリ構造情報は、文書ファイルと、当該文書ファイルが格納されているディレクトリに格納されているすべての文書ファイルと、を対応付ける情報を含む。 The directory structure information acquisition unit 32b acquires directory structure information for each of the document files extracted by the in-document character string relevance calculation processing unit 32a. The directory structure information includes information for associating the document file with all the document files stored in the directory in which the document file is stored.

ディレクトリ関連度算出処理部３２ｃは、文書内文字列関連度算出処理部３２ａにより抽出された文書ファイルが格納されているディレクトリのそれぞれに対して、ディレクトリ関連度を算出する。ディレクトリ関連度は、検索文字列受信部３１により受信された検索文字列と、ディレクトリと、の関連の深さを表す。
本例では、ディレクトリ関連度は、検索文字列と、ディレクトリと、の関連の深さが深くなるほど大きくなる値を有する。 The directory association degree calculation processing unit 32c calculates the directory association degree for each of the directories in which the document files extracted by the in-document character string association degree calculation processing unit 32a are stored. The directory relevance level represents the depth of association between the search character string received by the search character string receiving unit 31 and the directory.
In this example, the directory association level has a value that increases as the relationship between the search character string and the directory increases.

ディレクトリ関連度算出処理部３２ｃは、ディレクトリ構造情報取得部３２ｂにより取得されたディレクトリ構造情報と、文書内文字列関連度算出処理部３２ａにより算出された文書内文字列関連度と、に基づいてディレクトリ関連度を算出する。 The directory relevance calculation processing unit 32c is configured based on the directory structure information acquired by the directory structure information acquisition unit 32b and the in-document character string relevance calculated by the in-document character string relevance calculation processing unit 32a. Calculate relevance.

具体的には、ディレクトリ関連度算出処理部３２ｃは、あるディレクトリに格納されているすべての文書ファイルに対して算出された文書内文字列関連度の平均値を当該ディレクトリに対するディレクトリ関連度として算出する。即ち、ディレクトリ関連度算出処理部３２ｃは、数式１に基づいて、ｉ番目のディレクトリに対するディレクトリ関連度Ｄ_ｉを算出する。ここで、ｍは、当該ディレクトリに格納されている文書ファイルの数である。また、Ｓ_ｉｊは、当該ディレクトリに格納されている、ｊ番目の文書ファイルに対して算出された文書内文字列関連度である。

Specifically, the directory relevance calculation processing unit 32c calculates the average value of the character string relevance in a document calculated for all document files stored in a certain directory as the directory relevance for the directory. . That is, the directory relevance calculation processing unit 32c calculates the directory relevance D _i for the i-th directory based on Equation 1. Here, m is the number of document files stored in the directory. S _ij is the in-document character string relevance calculated for the j-th document file stored in the directory.

このように、ディレクトリ関連度算出処理部３２ｃは、文書ファイルが格納されているディレクトリに格納されている、すべての文書ファイルのそれぞれに対して算出された文書内文字列関連度に基づいて、当該ディレクトリに対するディレクトリ関連度を算出する、と言うことができる。 As described above, the directory relevance calculation processing unit 32c performs the processing based on the in-document character string relevance calculated for each of all the document files stored in the directory in which the document file is stored. It can be said that the degree of directory relevance for a directory is calculated.

文書ファイル関連度算出処理部３２ｄは、文書内文字列関連度算出処理部３２ａにより抽出された文書ファイルのそれぞれに対して、ディレクトリ関連度算出処理部３２ｃにより算出されたディレクトリ関連度と、文書内文字列関連度算出処理部３２ａにより算出された文書内文字列関連度と、に基づいて文書ファイル関連度を算出する。 The document file relevance calculation processing unit 32d, for each document file extracted by the in-document character string relevance calculation processing unit 32a, and the directory relevance calculated by the directory relevance calculation processing unit 32c, The document file relevance is calculated based on the in-document character string relevance calculated by the character string relevance calculation processing unit 32a.

文書ファイル関連度は、検索文字列受信部３１により受信された検索文字列と、文書ファイルと、の関連の深さを表す。本例では、文書ファイル関連度は、検索文字列と、文書ファイルと、の関連の深さが深くなるほど大きくなる値を有する。 The document file association degree represents the depth of association between the search character string received by the search character string receiving unit 31 and the document file. In this example, the document file relevance level has a value that increases as the relationship between the search character string and the document file increases.

具体的には、文書ファイル関連度算出処理部３２ｄは、文書ファイルに対して算出された文書内文字列関連度と、当該文書ファイルが格納されているディレクトリに対して算出されたディレクトリ関連度と、に基づいて文書ファイル関連度を算出する。即ち、文書ファイル関連度算出処理部３２ｄは、数式２に基づいて、ｉ番目のディレクトリに格納されているｊ番目の文書ファイルに対する文書ファイル関連度Ｒ_ｉｊを算出する。

Specifically, the document file relevance calculation processing unit 32d calculates the intra-document character string relevance calculated for the document file and the directory relevance calculated for the directory in which the document file is stored. , The document file relevance level is calculated. That is, the document file relevance calculation processing unit 32d calculates the document file relevance R _ij for the j th document file stored in the i th directory, based on Equation 2.

ここで、Ｓ_Ｈは、文書内文字列関連度算出処理部３２ａが算出し得る、文書内文字列関連度の最大値（本例では、１００）である。 Here, S _H is a document in the string relevance calculating unit 32a may calculate, document in the string relevance maximum value (in this example, 100) is.

文書ファイル関連度算出部３２は、文書ファイル関連度算出処理部３２ｄにより算出された文書ファイル関連度に基づいて検索結果を生成する。検索結果は、文書ファイル関連度が表す関連の深さの順に、文書ファイル識別情報を並べた情報を含む。文書ファイル識別情報は、文書ファイルを識別するための情報である。本例では、文書ファイル識別情報は、ＵＲＩ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒ）である。 The document file relevance calculation unit 32 generates a search result based on the document file relevance calculated by the document file relevance calculation processing unit 32d. The search result includes information in which the document file identification information is arranged in the order of the relation depth represented by the document file relation degree. The document file identification information is information for identifying a document file. In this example, the document file identification information is a URI (Uniform Resource Identifier).

なお、文書ファイル関連度算出部３２は、文書ファイル関連度算出処理部３２ｄにより算出された文書ファイル関連度に基づいて、複数の文書ファイルから文書ファイルを抽出し、当該抽出された文書ファイルを識別するための文書ファイル識別情報を含む検索結果を生成するように構成されていてもよい。この場合、例えば、文書ファイル関連度算出部３２は、文書ファイル関連度が大きい文書ファイルほど優先して抽出することが好適である。また、文書ファイル関連度算出部３２は、予め設定された抽出数だけ文書ファイルを抽出するように構成されていてもよい。 The document file relevance calculation unit 32 extracts document files from a plurality of document files based on the document file relevance calculated by the document file relevance calculation processing unit 32d, and identifies the extracted document files. It may be configured to generate a search result including document file identification information for the purpose. In this case, for example, it is preferable that the document file relevance calculating unit 32 preferentially extracts a document file having a higher document file relevance. Further, the document file relevance calculation unit 32 may be configured to extract document files by a preset number of extractions.

検索結果送信部３３は、文書ファイル関連度算出部３２により生成された検索結果をクライアント装置１０へ送信する。 The search result transmission unit 33 transmits the search result generated by the document file relevance calculation unit 32 to the client device 10.

（作動）
次に、上述した文書検索システム１の作動について説明する。
先ず、クライアント装置１０は、ユーザにより入力装置を介して入力された検索文字列を受け付ける。そして、クライアント装置１０は、受け付けられた検索文字列を文書検索サーバ装置３０へ送信する。 (Operation)
Next, the operation of the document search system 1 described above will be described.
First, the client device 10 receives a search character string input by the user via the input device. Then, the client device 10 transmits the accepted search character string to the document search server device 30.

一方、文書検索サーバ装置３０は、図３にフローチャートにより示したプログラムを実行するようになっている。
具体的に述べると、文書検索サーバ装置３０は、上記プログラムの処理を開始すると、検索文字列を受信するまで待機する（ステップＳ１０１）。 On the other hand, the document search server device 30 is configured to execute the program shown in the flowchart in FIG.
Specifically, when the processing of the program is started, the document search server device 30 waits until a search character string is received (step S101).

そして、文書検索サーバ装置３０は、検索文字列を受信すると、ステップＳ１０２へ進み、文書ファイルサーバ装置２０に記憶されている、すべての文書ファイルのそれぞれに対して、文書内文字列関連度を算出する。 Upon receiving the search character string, the document search server device 30 proceeds to step S102 and calculates the in-document character string relevance level for each of all document files stored in the document file server device 20. To do.

いま、図４に示したように、文書ファイルサーバ装置２０が、文書ファイル＃１〜＃６を記憶している場合を想定する。更に、ディレクトリ＃１が、文書ファイル＃１〜＃３を格納し、且つ、ディレクトリ＃２が、文書ファイル＃４〜＃６を格納している場合を想定する。 Assume that the document file server device 20 stores document files # 1 to # 6 as shown in FIG. Furthermore, it is assumed that the directory # 1 stores the document files # 1 to # 3 and the directory # 2 stores the document files # 4 to # 6.

加えて、文書検索サーバ装置３０が、文書ファイル＃１に対する文書内文字列関連度Ｓ_１１として０を算出し、文書ファイル＃２に対する文書内文字列関連度Ｓ_１２として８０を算出し、文書ファイル＃３に対する文書内文字列関連度Ｓ_１３として７８を算出した場合を想定する。 In addition, the document search server 30 calculates a 0 as a document in a string relevance S ₁₁ to the document file # 1, and calculates the document in a string relevance S ₁₂ as 80 for the document file # 2, the document file # assume a case of calculating the document in a string relevance S ₁₃ as 78 for 3.

更に、文書検索サーバ装置３０が、文書ファイル＃４に対する文書内文字列関連度Ｓ_２１として８０を算出し、文書ファイル＃５に対する文書内文字列関連度Ｓ_２２として７４を算出し、文書ファイル＃６に対する文書内文字列関連度Ｓ_２３として８２を算出した場合を想定する。 Further, the document search server device 30 calculates 80 as the in-document character string relevance S ₂₁ for the document file # 4, calculates 74 as the in-document character string relevance S ₂₂ for the document file # 5, and then calculates the document file #. It is assumed that 82 is calculated as the in-document character string relevance degree S ₂₃ for 6.

次いで、文書検索サーバ装置３０は、算出された文書内文字列関連度が、予め設定された閾値（本例では、０）よりも大きい文書ファイルを抽出する（ステップＳ１０３）。上記仮定に従えば、文書検索サーバ装置３０は、文書ファイル＃２〜＃６を抽出する。そして、文書検索サーバ装置３０は、抽出された文書ファイルのそれぞれに対して、ディレクトリ構造情報を取得する（ステップＳ１０４）。 Next, the document search server device 30 extracts a document file in which the calculated in-document character string relevance degree is greater than a preset threshold value (0 in this example) (step S103). According to the above assumption, the document search server device 30 extracts the document files # 2 to # 6. Then, the document search server device 30 acquires directory structure information for each of the extracted document files (step S104).

次いで、文書検索サーバ装置３０は、取得されたディレクトリ構造情報と、算出された文書内文字列関連度と、に基づいて、抽出された文書ファイルが格納されているディレクトリのそれぞれに対して、ディレクトリ関連度を算出する（ステップＳ１０５）。 Next, the document search server device 30 performs directory search for each of the directories in which the extracted document files are stored based on the acquired directory structure information and the calculated in-document character string relevance. The relevance is calculated (step S105).

上記仮定に従えば、文書検索サーバ装置３０は、ディレクトリ＃１に対するディレクトリ関連度Ｄ_１として、５２．７（＝（８０＋７８）／３）を算出する。同様に、文書検索サーバ装置３０は、ディレクトリ＃２に対するディレクトリ関連度Ｄ_２として、７８．７（＝（８０＋７４＋８２）／３）を算出する。 According to the above assumption, the document search server device 30 calculates 52.7 (= (80 + 78) / 3) as the directory relevance D ₁ for the directory # 1. Similarly, the document search server device 30 calculates 78.7 (= (80 + 74 + 82) / 3) as the directory relevance D ₂ for the directory # 2.

そして、文書検索サーバ装置３０は、抽出された文書ファイルのそれぞれに対して、算出されたディレクトリ関連度と、算出された文書内文字列関連度と、に基づいて文書ファイル関連度を算出する（ステップＳ１０６）。 Then, the document search server device 30 calculates the document file relevance for each of the extracted document files based on the calculated directory relevance and the calculated in-document character string relevance ( Step S106).

上記仮定に従えば、文書検索サーバ装置３０は、文書ファイル＃２に対する文書ファイル関連度として、５７．７４（＝８０（ｌｏｇ（５２．７／１００）＋１））を算出する。同様に、文書検索サーバ装置３０は、文書ファイル＃３に対する文書ファイル関連度として、５６．３０（＝７８（ｌｏｇ（５２．７／１００）＋１））を算出する。 According to the above assumption, the document search server device 30 calculates 57.74 (= 80 (log (52.7 / 100) +1)) as the document file relevance level for the document file # 2. Similarly, the document search server device 30 calculates 56.30 (= 78 (log (52.7 / 100) +1)) as the document file relevance level for the document file # 3.

また、文書検索サーバ装置３０は、文書ファイル＃４に対する文書ファイル関連度として、７１．６８（＝８０（ｌｏｇ（７８．７／１００）＋１））を算出する。同様に、文書検索サーバ装置３０は、文書ファイル＃５に対する文書ファイル関連度として、６６．３０（＝７４（ｌｏｇ（７８．７／１００）＋１））を算出する。また、文書検索サーバ装置３０は、文書ファイル＃６に対する文書ファイル関連度として、７３．４７（＝８２（ｌｏｇ（７８．７／１００）＋１））を算出する。 Further, the document search server device 30 calculates 71.68 (= 80 (log (78.7 / 100) +1)) as the document file relevance level for the document file # 4. Similarly, the document search server device 30 calculates 66.30 (= 74 (log (78.7 / 100) +1)) as the document file relevance level for the document file # 5. Further, the document search server device 30 calculates 73.47 (= 82 (log (78.7 / 100) +1)) as the document file relevance level for the document file # 6.

図５に示したように、文書内文字列関連度が同一であった２つの文書ファイル（文書ファイル＃２、及び、文書ファイル＃４）のそれぞれに対して算出された文書ファイル関連度は、互いに異なる値を有している。 As shown in FIG. 5, the document file relevance calculated for each of the two document files (document file # 2 and document file # 4) having the same character string relevance in the document is They have different values.

次いで、文書検索サーバ装置３０は、算出された文書ファイル関連度に基づいて検索結果を生成する（ステップＳ１０７）。そして、文書検索サーバ装置３０は、生成された検索結果をクライアント装置１０へ送信する（ステップＳ１０８）。その後、文書検索サーバ装置３０文書検索サーバ装置３０は、ステップＳ１０１へ戻り、ステップＳ１０１〜ステップＳ１０８の処理を繰り返し実行する。 Next, the document search server device 30 generates a search result based on the calculated document file relevance (step S107). Then, the document search server device 30 transmits the generated search result to the client device 10 (step S108). Thereafter, the document search server device 30 returns to step S101, and repeatedly executes the processing of steps S101 to S108.

これにより、クライアント装置１０は、文書検索サーバ装置３０により送信された検索結果を受信する。クライアント装置１０は、受信された検索結果を出力装置を介して出力する。 Thereby, the client device 10 receives the search result transmitted by the document search server device 30. The client device 10 outputs the received search result via the output device.

以上、説明したように、本発明の第１実施形態に係る文書検索サーバ装置３０によれば、検索文字列と文書ファイルとの関連の深さを高い精度にて表す関連度（文書ファイル関連度）を算出することができる。この結果、ユーザは、ユーザにとって重要な情報を含む文書ファイルを容易に且つ迅速に特定することができる。 As described above, according to the document search server device 30 according to the first embodiment of the present invention, the relevance (document file relevance) that represents the depth of the relevance between the search character string and the document file with high accuracy. ) Can be calculated. As a result, the user can easily and quickly specify a document file including information important to the user.

また、本発明の第１実施形態に係る文書検索サーバ装置３０は、文書内文字列関連度が表す関連の深さが、予め設定された閾値深さよりも深い文書ファイルを抽出し、当該抽出された文書ファイルに対して、ディレクトリ関連度及び文書ファイル関連度を算出するように構成されている。 In addition, the document search server device 30 according to the first embodiment of the present invention extracts a document file in which the relation depth represented by the in-document character string relevance degree is deeper than a preset threshold depth, and extracts the document file. A directory relevance level and a document file relevance level are calculated for each document file.

これによれば、算出された文書内文字列関連度に関係なく、ディレクトリ関連度、及び、文書ファイル関連度を算出する場合と比較して、文書検索サーバ装置３０の処理負荷を軽減することができる。 According to this, the processing load of the document search server device 30 can be reduced compared with the case of calculating the directory relevance and the document file relevance regardless of the calculated character string relevance in the document. it can.

＜第２実施形態＞
次に、本発明の第２実施形態に係る文書検索システムについて説明する。第２実施形態に係る文書検索システムは、上記第１実施形態に係る文書検索システムに対して、ディレクトリの名称に基づいてディレクトリ関連度を算出する点において相違している。従って、以下、かかる相違点を中心として説明する。 Second Embodiment
Next, a document search system according to the second embodiment of the present invention will be described. The document search system according to the second embodiment differs from the document search system according to the first embodiment in that the degree of directory relevance is calculated based on the directory name. Accordingly, the following description will focus on such differences.

（機能）
第２実施形態に係るディレクトリ関連度算出処理部３２ｃは、ディレクトリ名称関連度に基づいてディレクトリ関連度を算出する。ディレクトリ名称関連度は、検索文字列と、ディレクトリの名称と、の関連の深さを表す。 (function)
The directory association degree calculation processing unit 32c according to the second embodiment calculates the directory association degree based on the directory name association degree. The directory name relevance indicates the depth of association between the search character string and the directory name.

本例では、ディレクトリ関連度算出処理部３２ｃは、ディレクトリの名称が検索文字列を含む場合、ディレクトリ名称関連度Ｎを予め設定された第１の値（本例では、１００）に設定し、一方、ディレクトリの名称が検索文字列を含まない場合、ディレクトリ名称関連度Ｎを第１の値よりも小さい第２の値（本例では、０）に設定する。即ち、本例では、ディレクトリ名称関連度は、検索文字列と、ディレクトリの名称と、の関連の深さが深くなるほど大きくなる値を有する。 In this example, when the directory name includes a search character string, the directory relevance calculation processing unit 32c sets the directory name relevance N to a preset first value (100 in this example), When the directory name does not include the search character string, the directory name relevance N is set to a second value (0 in this example) that is smaller than the first value. That is, in this example, the directory name relevance has a value that increases as the relationship between the search character string and the directory name increases.

ディレクトリ関連度算出処理部３２ｃは、あるディレクトリに格納されているすべての文書ファイルに対して算出された文書内文字列関連度の和に、ディレクトリ名称関連度Ｎを加えた値を、当該ディレクトリに格納されている文書ファイルの数ｍに１を加えた値により除した値を、当該ディレクトリに対するディレクトリ関連度として算出する。即ち、ディレクトリ関連度算出処理部３２ｃは、数式３に基づいて、ｉ番目のディレクトリに対するディレクトリ関連度Ｄ_ｉを算出する。

The directory relevance calculation processing unit 32c adds a value obtained by adding the directory name relevance N to the sum of the in-document character string relevance calculated for all document files stored in a certain directory. A value obtained by dividing the number m of stored document files by 1 is calculated as the degree of directory relevance for the directory. That is, the directory relevance calculation processing unit 32c calculates the directory relevance D _i for the i-th directory based on Equation 3.

このように、ディレクトリ関連度算出処理部３２ｃは、ディレクトリ名称関連度が表す関連の深さが深くなるほど深くなる関連の深さを表すディレクトリ関連度を算出する、と言うことができる。 In this way, it can be said that the directory relevance calculation processing unit 32c calculates the directory relevance that represents the depth of the relationship that becomes deeper as the depth of the relationship represented by the directory name relevance increases.

（作動）
次に、上述した文書検索システム１の作動について説明する。第１実施形態と同様に、図４に示したように、文書ファイルサーバ装置２０が、文書ファイル＃１〜＃６を記憶するとともに、文書検索サーバ装置３０が、各文書ファイル＃１〜＃６に対して文書内文字列関連度を算出した場合を想定する。 (Operation)
Next, the operation of the document search system 1 described above will be described. Similar to the first embodiment, as shown in FIG. 4, the document file server device 20 stores the document files # 1 to # 6, and the document search server device 30 stores the document files # 1 to # 6. Assuming that the in-document character string relevance is calculated.

また、ディレクトリ＃１の名称が検索文字列を含み、且つ、ディレクトリ＃２の名称が検索文字列を含まない場合を想定する。 Further, it is assumed that the name of directory # 1 includes a search character string and the name of directory # 2 does not include a search character string.

上記仮定に従えば、文書検索サーバ装置３０は、ディレクトリ＃１に対するディレクトリ関連度Ｄ_１として、６４．５（＝（８０＋７８＋１００）／４）を算出する。同様に、文書検索サーバ装置３０は、ディレクトリ＃２に対するディレクトリ関連度Ｄ_２として、５９．０（＝（８０＋７４＋８２）／４）を算出する（ステップＳ１０５）。 According to the above assumption, the document search server device 30 calculates 64.5 (= (80 + 78 + 100) / 4) as the directory relevance D ₁ for the directory # 1. Similarly, the document retrieval server device 30, as the directory relevance _{D 2} to the directory # 2, calculates the 59.0 (= (80 + 74 + 82) / 4) (step S105).

上記仮定に従えば、文書検索サーバ装置３０は、文書ファイル＃２に対する文書ファイル関連度として、６４．７６（＝８０（ｌｏｇ（６４．５／１００）＋１））を算出する。同様に、文書検索サーバ装置３０は、文書ファイル＃３に対する文書ファイル関連度として、６３．１５（＝７８（ｌｏｇ（６４．５／１００）＋１））を算出する。 According to the above assumption, the document search server device 30 calculates 64.76 (= 80 (log (64.5 / 100) +1)) as the document file relevance level for the document file # 2. Similarly, the document search server device 30 calculates 63.15 (= 78 (log (64.5 / 100) +1)) as the document file relevance level for the document file # 3.

また、文書検索サーバ装置３０は、文書ファイル＃４に対する文書ファイル関連度として、６１．６７（＝８０（ｌｏｇ（５９．０／１００）＋１））を算出する。同様に、文書検索サーバ装置３０は、文書ファイル＃５に対する文書ファイル関連度として、５７．０４（＝７４（ｌｏｇ（５９．０／１００）＋１））を算出する。また、文書検索サーバ装置３０は、文書ファイル＃６に対する文書ファイル関連度として、６３．２１（＝８２（ｌｏｇ（５９．０／１００）＋１））を算出する。 Further, the document search server device 30 calculates 61.67 (= 80 (log (59.0 / 100) +1)) as the document file relevance level for the document file # 4. Similarly, the document search server device 30 calculates 57.04 (= 74 (log (59.0 / 100) +1)) as the document file relevance level for the document file # 5. Further, the document search server device 30 calculates 63.21 (= 82 (log (59.0 / 100) +1)) as the document file relevance level for the document file # 6.

図６に示したように、文書内文字列関連度が同一であった２つの文書ファイル（文書ファイル＃２、及び、文書ファイル＃４）のそれぞれに対して算出された文書ファイル関連度は、互いに異なる値を有している。 As shown in FIG. 6, the document file relevance calculated for each of the two document files (document file # 2 and document file # 4) having the same character string relevance in the document is They have different values.

ところで、ディレクトリの名称は、ディレクトリに格納される文書ファイルと関連が比較的深い名称に設定されることが多い。従って、第２実施形態に係る文書検索サーバ装置３０によれば、第１実施形態に係る文書検索サーバ装置３０が奏する効果と同様の効果に加えて、検索文字列とディレクトリとの関連の深さを高い精度にて表す関連度（ディレクトリ関連度）を算出することができる。 By the way, the directory name is often set to a name that is relatively deeply related to the document file stored in the directory. Therefore, according to the document search server device 30 according to the second embodiment, in addition to the same effect as the effect obtained by the document search server device 30 according to the first embodiment, the depth of association between the search character string and the directory. It is possible to calculate a relevance level (directory relevance level) that expresses with high accuracy.

なお、文書検索サーバ装置３０は、検索文字列とディレクトリの名称とが類似する程度を表す類似度を、予め記憶する辞書に基づいて、又は、予め定められたアルゴリズムに従って算出し、算出された類似度をディレクトリ名称関連度として用いるように構成されていてもよい。 The document search server device 30 calculates a similarity indicating the degree of similarity between the search character string and the directory name based on a dictionary stored in advance or according to a predetermined algorithm, and the calculated similarity. The degree may be configured to be used as the directory name relevance degree.

＜第３実施形態＞
次に、本発明の第３実施形態に係る文書関連度算出装置について図７を参照しながら説明する。
第３実施形態に係る文書関連度算出装置１００は、
検索文字列を受け付ける検索文字列受付部（検索文字列受付手段）１０１と、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、上記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する文書ファイル関連度算出部（文書ファイル関連度算出手段）１０２と、
を備える。 <Third Embodiment>
Next, a document relevance calculation apparatus according to a third embodiment of the present invention will be described with reference to FIG.
The document relevance calculation apparatus 100 according to the third embodiment
A search character string receiving unit (search character string receiving means) 101 for receiving a search character string;
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the received search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document file relevance calculating unit (document file relevance calculating means) 102 for calculating a document file relevance representing the depth of relevance to a file;
Is provided.

これによれば、検索文字列と文書ファイルとの関連の深さを高い精度にて表す関連度（文書ファイル関連度）を算出することができる。この結果、例えば、ユーザは、ユーザにとって重要な情報を含む文書ファイルを容易に且つ迅速に特定することができる。 According to this, it is possible to calculate the relevance (document file relevance) that represents the depth of the relevance between the search character string and the document file with high accuracy. As a result, for example, the user can easily and quickly specify a document file including information important to the user.

以上、上記実施形態を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成及び詳細に、本願発明の範囲内において当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the above embodiment, the present invention is not limited to the above-described embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

なお、上記各実施形態において文書関連度算出装置の各機能は、ＣＰＵがプログラム（ソフトウェア）を実行することにより実現されていたが、回路等のハードウェアにより実現されていてもよい。 In each of the above embodiments, each function of the document relevance calculation device is realized by the CPU executing a program (software), but may be realized by hardware such as a circuit.

また、上記各実施形態においてプログラムは、記憶装置に記憶されていたが、コンピュータが読み取り可能な記録媒体に記憶されていてもよい。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 In each of the above embodiments, the program is stored in the storage device, but may be stored in a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

また、上記実施形態の他の変形例として、上述した実施形態及び変形例の任意の組み合わせが採用されてもよい。 Further, any other combination of the above-described embodiment and modification examples may be adopted as another modification example of the above-described embodiment.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のように記載され得るが、以下には限られない。 <Appendix>
A part or all of the above embodiment can be described as the following supplementary notes, but is not limited thereto.

（付記１）
検索文字列を受け付ける検索文字列受付手段と、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、前記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する文書ファイル関連度算出手段と、
を備える文書関連度算出装置。 (Appendix 1)
A search character string receiving means for receiving a search character string;
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the accepted search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document file relevance calculating means for calculating a document file relevance representing the depth of relevance to the file;
A document relevance calculation device.

（付記２）
付記１に記載の文書関連度算出装置であって、
前記文書ファイル関連度算出手段は、前記文書ファイル関連度を算出する対象となる文書ファイルが格納されているディレクトリに格納されている、すべての文書ファイルのそれぞれに対して前記文書内文字列関連度を算出するとともに、当該算出された文書内文字列関連度に基づいて当該ディレクトリに対する前記ディレクトリ関連度を算出するように構成された文書関連度算出装置。 (Appendix 2)
The document relevance calculation device according to attachment 1, wherein
The document file relevance calculating means calculates the document character string relevance for each of all document files stored in a directory in which the document file for which the document file relevance is calculated is stored. A document relevance calculating device configured to calculate the directory relevance for the directory based on the calculated in-document character string relevance.

（付記３）
付記１又は付記２に記載の文書関連度算出装置であって、
前記文書ファイル関連度算出手段は、前記文書ファイル関連度を算出する対象となる文書ファイルに対する前記文書内文字列関連度が表す関連の深さが深くなるほど深くなり、且つ、当該文書ファイルが格納されているディレクトリに対する前記ディレクトリ関連度が表す関連の深さが深くなるほど深くなる、関連の深さを表す前記文書ファイル関連度を算出するように構成された文書関連度算出装置。 (Appendix 3)
A document relevance calculation device according to attachment 1 or attachment 2,
The document file relevance calculating means is deeper as a relationship depth represented by the character string relevance in the document with respect to a document file for which the document file relevance is calculated, and the document file is stored. A document relevance calculating device configured to calculate the document file relevance representing the depth of association, the deeper the relevance represented by the directory relevance with respect to a given directory.

（付記４）
付記１乃至付記３のいずれかに記載の文書関連度算出装置であって、
前記文書ファイル関連度算出手段は、前記検索文字列と、前記文書ファイル関連度を算出する対象となる文書ファイルが格納されているディレクトリの名称と、の関連の深さを表すディレクトリ名称関連度に基づいて前記ディレクトリ関連度を算出するように構成された文書関連度算出装置。 (Appendix 4)
The document relevance calculating apparatus according to any one of appendix 1 to appendix 3,
The document file relevance calculation means calculates a directory name relevance level that represents a depth of relevance between the search character string and the name of the directory in which the document file for which the document file relevance level is calculated is stored. A document relevance calculating apparatus configured to calculate the directory relevance based on the document relevance.

ところで、ディレクトリの名称は、ディレクトリに格納される文書ファイルと関連が比較的深い名称に設定されることが多い。従って、上記のように構成された文書関連度算出装置によれば、検索文字列とディレクトリとの関連の深さを高い精度にて表す関連度（ディレクトリ関連度）を算出することができる。 By the way, the directory name is often set to a name that is relatively deeply related to the document file stored in the directory. Therefore, according to the document relevance calculation apparatus configured as described above, it is possible to calculate the relevance (directory relevance) that represents the depth of the relevance between the search character string and the directory with high accuracy.

（付記５）
付記４に記載の文書関連度算出装置であって、
前記文書ファイル関連度算出手段は、前記ディレクトリ名称関連度が表す関連の深さが深くなるほど深くなる関連の深さを表す前記ディレクトリ関連度を算出するように構成された文書関連度算出装置。 (Appendix 5)
The document relevance calculating apparatus according to appendix 4, wherein
The document file relevance calculating unit is configured to calculate the directory relevance level that represents a relationship depth that becomes deeper as a relationship depth expressed by the directory name relevance level becomes deeper.

（付記６）
付記１乃至付記５のいずれかに記載の文書関連度算出装置であって、
前記文書ファイル関連度算出手段は、複数の文書ファイルのそれぞれに対して、前記文書内文字列関連度を算出するとともに、当該算出された文書内文字列関連度が表す関連の深さが、予め設定された閾値深さよりも深い文書ファイルを抽出し、当該抽出された文書ファイルが格納されているディレクトリのそれぞれに対して前記ディレクトリ関連度を算出し、且つ、当該抽出された文書ファイルのそれぞれに対して当該算出されたディレクトリ関連度と当該算出された文書内文字列関連度とに基づいて前記文書ファイル関連度を算出するように構成された文書関連度算出装置。 (Appendix 6)
The document relevance calculating apparatus according to any one of appendices 1 to 5,
The document file relevance calculation means calculates the in-document character string relevance for each of a plurality of document files, and the relationship depth represented by the calculated in-document character string relevance is determined in advance. Document files deeper than the set threshold depth are extracted, the directory relevance is calculated for each of the directories in which the extracted document files are stored, and each of the extracted document files is calculated. On the other hand, a document relevance calculation device configured to calculate the document file relevance based on the calculated directory relevance and the calculated in-document character string relevance.

これによれば、算出された文書内文字列関連度に関係なく、ディレクトリ関連度、及び、文書ファイル関連度を算出する場合と比較して、文書関連度算出装置の処理負荷を軽減することができる。 According to this, the processing load of the document relevance calculation device can be reduced compared to the case of calculating the directory relevance and the document file relevance regardless of the calculated character string relevance in the document. it can.

（付記７）
付記１乃至付記６のいずれかに記載の文書関連度算出装置であって、
前記算出された文書ファイル関連度が表す関連の深さの順に並べて、前記文書ファイルを識別するための文書ファイル識別情報を出力するように構成された文書関連度算出装置。 (Appendix 7)
The document relevance calculating apparatus according to any one of appendix 1 to appendix 6,
A document relevance calculation device configured to output document file identification information for identifying the document file, arranged in order of the relationship depth represented by the calculated document file relevance.

（付記８）
付記１乃至付記７のいずれかに記載の文書関連度算出装置であって、
前記算出された文書ファイル関連度に基づいて、前記複数の文書ファイルから文書ファイルを抽出し、当該抽出された文書ファイルを識別するための文書ファイル識別情報を出力するように構成された文書関連度算出装置。 (Appendix 8)
The document relevance calculating apparatus according to any one of appendix 1 to appendix 7,
A document relevance level configured to extract a document file from the plurality of document files based on the calculated document file relevance level and output document file identification information for identifying the extracted document file Calculation device.

（付記９）
検索文字列を受け付け、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、前記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する、文書関連度算出方法。 (Appendix 9)
Accept search strings,
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the accepted search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document relevance calculation method for calculating a document file relevance representing a depth of relevance with a file.

（付記１０）
付記９に記載の文書関連度算出方法であって、
前記文書ファイル関連度を算出する対象となる文書ファイルが格納されているディレクトリに格納されている、すべての文書ファイルのそれぞれに対して前記文書内文字列関連度を算出するとともに、当該算出された文書内文字列関連度に基づいて当該ディレクトリに対する前記ディレクトリ関連度を算出するように構成された文書関連度算出方法。 (Appendix 10)
The document relevance calculation method according to attachment 9, wherein
The in-document character string association degree is calculated for each of all the document files stored in the directory in which the document file that is a target for calculating the document file association degree is stored, and the calculated A document relevance calculation method configured to calculate the directory relevance with respect to the directory based on an intra-document character string relevance.

（付記１１）
付記９又は付記１０に記載の文書関連度算出方法であって、
前記検索文字列と、前記文書ファイル関連度を算出する対象となる文書ファイルが格納されているディレクトリの名称と、の関連の深さを表すディレクトリ名称関連度に基づいて前記ディレクトリ関連度を算出するように構成された文書関連度算出方法。 (Appendix 11)
The document relevance calculation method according to appendix 9 or appendix 10, wherein
The directory relevance is calculated based on the directory name relevance representing the depth of relevance between the search character string and the name of the directory in which the document file for which the document file relevance is calculated is stored. Document relevance calculation method configured as described above.

（付記１２）
情報処理装置に、
検索文字列を受け付け、
それぞれが、複数のディレクトリのいずれかに格納され且つ文書を表す、複数の文書ファイルのそれぞれに対して、前記受け付けられた検索文字列と当該文書ファイルが表す文書に含まれる文字列との関連の深さを表す文書内文字列関連度と、当該検索文字列と当該文書ファイルが格納されているディレクトリとの関連の深さを表すディレクトリ関連度と、に基づいて、当該検索文字列と当該文書ファイルとの関連の深さを表す文書ファイル関連度を算出する、処理を実行させるための文書関連度算出プログラム。 (Appendix 12)
In the information processing device,
Accept search strings,
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the accepted search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document relevance calculation program for executing processing for calculating a document file relevance representing a depth of relevance to a file.

（付記１３）
付記１２に記載の文書関連度算出プログラムであって、
前記文書ファイル関連度を算出する対象となる文書ファイルが格納されているディレクトリに格納されている、すべての文書ファイルのそれぞれに対して前記文書内文字列関連度を算出するとともに、当該算出された文書内文字列関連度に基づいて当該ディレクトリに対する前記ディレクトリ関連度を算出するように構成された文書関連度算出プログラム。 (Appendix 13)
A document relevance calculation program according to attachment 12, wherein
The in-document character string association degree is calculated for each of all the document files stored in the directory in which the document file that is a target for calculating the document file association degree is stored, and the calculated A document relevance calculation program configured to calculate the directory relevance for the directory based on a character string relevance in a document.

（付記１４）
付記１２又は付記１３に記載の文書関連度算出プログラムであって、
前記検索文字列と、前記文書ファイル関連度を算出する対象となる文書ファイルが格納されているディレクトリの名称と、の関連の深さを表すディレクトリ名称関連度に基づいて前記ディレクトリ関連度を算出するように構成された文書関連度算出プログラム。 (Appendix 14)
A document relevance calculation program according to attachment 12 or attachment 13, wherein
The directory relevance is calculated based on the directory name relevance representing the depth of relevance between the search character string and the name of the directory in which the document file for which the document file relevance is calculated is stored. A document relevance calculation program configured as described above.

本発明は、検索文字列と文書ファイルとの関連の深さを表す関連度を算出する文書関連度算出装置、及び、検索文字列と関連が深い文書を検索する文書検索装置等に適用可能である。 The present invention can be applied to a document relevance calculation device that calculates a relevance level that represents the depth of association between a search character string and a document file, a document search device that searches for a document that is closely related to a search character string, and the like. is there.

１文書検索システム
１０クライアント装置
１１検索文字列送信部
１２検索結果出力部
２０文書ファイルサーバ装置
３０文書検索サーバ装置（文書関連度算出装置）
３１検索文字列受信部
３２文書ファイル関連度算出部
３２ａ文書内文字列関連度算出処理部
３２ｂディレクトリ構造情報取得部
３２ｃディレクトリ関連度算出処理部
３２ｄ文書ファイル関連度算出処理部
３３検索結果送信部
１００文書関連度算出装置
１０１検索文字列受付部
１０２文書ファイル関連度算出部 DESCRIPTION OF SYMBOLS 1 Document search system 10 Client apparatus 11 Search character string transmission part 12 Search result output part 20 Document file server apparatus 30 Document search server apparatus (document relevance degree calculation apparatus)
31 Search character string reception unit 32 Document file relevance calculation unit 32a In-document character string relevance calculation processing unit 32b Directory structure information acquisition unit 32c Directory relevance calculation processing unit 32d Document file relevance calculation processing unit 33 Search result transmission unit 100 Document relevance calculation device 101 Search character string reception unit 102 Document file relevance calculation unit

Claims

A search character string receiving means for receiving a search character string;
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the accepted search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document file relevance calculating means for calculating a document file relevance representing the depth of relevance to the file;
A document relevance calculation device.

The document relevance calculating apparatus according to claim 1,
The document file relevance calculating means calculates the document character string relevance for each of all document files stored in a directory in which the document file for which the document file relevance is calculated is stored. A document relevance calculating device configured to calculate the directory relevance for the directory based on the calculated in-document character string relevance.

The document relevance calculating apparatus according to claim 1 or 2,
The document file relevance calculating means is deeper as a relationship depth represented by the character string relevance in the document with respect to a document file for which the document file relevance is calculated, and the document file is stored. A document relevance calculating device configured to calculate the document file relevance representing the depth of association, the deeper the relevance represented by the directory relevance with respect to a given directory.

The document relevance calculation device according to any one of claims 1 to 3,
The document file relevance calculation means calculates a directory name relevance level that represents a depth of relevance between the search character string and the name of the directory in which the document file for which the document file relevance level is calculated is stored. A document relevance calculating apparatus configured to calculate the directory relevance based on the document relevance.

The document relevance calculation device according to claim 4,
The document file relevance calculating unit is configured to calculate the directory relevance level that represents a relationship depth that becomes deeper as a relationship depth expressed by the directory name relevance level becomes deeper.

A document relevance calculation device according to any one of claims 1 to 5,
The document file relevance calculation means calculates the in-document character string relevance for each of a plurality of document files, and the relationship depth represented by the calculated in-document character string relevance is determined in advance. Document files deeper than the set threshold depth are extracted, the directory relevance is calculated for each of the directories in which the extracted document files are stored, and each of the extracted document files is calculated. On the other hand, a document relevance calculation device configured to calculate the document file relevance based on the calculated directory relevance and the calculated in-document character string relevance.

The document relevance calculation device according to any one of claims 1 to 6,
A document relevance calculation device configured to output document file identification information for identifying the document file, arranged in order of the relationship depth represented by the calculated document file relevance.

The document relevance calculation device according to any one of claims 1 to 7,
A document relevance level configured to extract a document file from the plurality of document files based on the calculated document file relevance level and output document file identification information for identifying the extracted document file Calculation device.

Information processing device
Accept search strings,
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the accepted search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document relevance calculation method for calculating a document file relevance representing a depth of relevance with a file.

In the information processing device,
Accept search strings,
For each of a plurality of document files, each stored in one of a plurality of directories and representing a document, the relationship between the accepted search character string and the character string included in the document represented by the document file Based on the in-document character string relevance level indicating the depth and the directory relevance level indicating the relationship depth between the search character string and the directory in which the document file is stored, the search character string and the document A document relevance calculation program for executing processing for calculating a document file relevance representing a depth of relevance to a file.