JP2009176044A

JP2009176044A - Document retrieving method, device, program, and computer readable recording medium

Info

Publication number: JP2009176044A
Application number: JP2008014006A
Authority: JP
Inventors: Masakazu Hasegawa; 雅一長谷川; Mitsuaki Tsunakawa; 光明綱川; Masaki Hisada; 正樹久田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-01-24
Filing date: 2008-01-24
Publication date: 2009-08-06

Abstract

<P>PROBLEM TO BE SOLVED: To time-axially retrieve a document which is changed by new generation, overwritten saving, and deletion, related to the document to be preserved in a general purpose file server without replacing an existing file system. <P>SOLUTION: A document retrieving device receives a document operation report of new document generation, updating, or deletion, and the document from a shared document server, and stores them in a temporal transposition index storage means after imparting document identifiers by each word of the document. A document list including information (operation dates) about the document identifier and the document change and the documents is stored in a document list storage means. When the plurality of document identifiers are present in a retrieval result which is obtained by retrieving the temporal transposition index storage means and the document list storage means on the basis of the retrieval conditions (a keyword, a designation date, and a document name) of an input retrieval request, the document with the older operation date is deleted from the retrieval result and outputted. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、文書検索方法及び装置及びプログラム及びコンピュータ読取可能な記録媒体に係り、特に、汎用のファイルサーバで既存のファイルシステムを置き換えず保存される文書を検索するための文書検索方法及び装置及びプログラム及びコンピュータ読取可能な記録媒体に関する。 The present invention relates to a document search method and apparatus, a program, and a computer-readable recording medium, and more particularly, a document search method and apparatus for searching a document stored without replacing an existing file system in a general-purpose file server, and The present invention relates to a program and a computer-readable recording medium.

ファイル管理システムとして、Ｌｉｎｕｘのローカルファイルシステムに保存されている文書の変遷を転置インデックスとするログ構造化方式を採用することにより、ディスクブロックの上書きがなくなり、障害発生時の被害を最小限に抑えることが可能なNilfsがある。このNilfsはユーザフレンドリであるため、スナップショットをキーワードで検索する技術がある（例えば、非特許文献１参照）。 By adopting a log structuring method that uses the transition of documents stored in the Linux local file system as an inverted index as a file management system, disk block overwrites are eliminated and damage in the event of a failure is minimized. There are Nilfs capable. Since this Nilfs is user-friendly, there is a technique for searching for snapshots using keywords (see, for example, Non-Patent Document 1).

また、文書検索システムとして、膨大な文書を検索するための全文検索システムの検索エンジンや、個人用のファイル検索等の小規模な検索に用いられるNamazuがある（例えば、非特許文献３参照）。
天海他、「Linux用ログ構造化ファイルシステムnilfsの設計と実装」情報処理学会研究報告、IPSJ SIG Technical Report 2005-OS-99 (9) 2005/5/26, pp61-68. Namazu (http://www.namazu.org/) Further, as a document search system, there are search engines of a full-text search system for searching an enormous number of documents, and Namazu used for small-scale searches such as personal file search (for example, see Non-Patent Document 3).
Amami et al., "Design and Implementation of Log Structured File System nilfs for Linux", IPSJ SIG Technical Report 2005-OS-99 (9) 2005/5/26, pp61-68. Namazu (http://www.namazu.org/)

しかしながら、上記のNamazuを用いて時制検索しようとすると、転置インデックス作成処理の手数が大きくなるという問題がある。なお、ここで、「時制」とは、時間軸上の一時点を基にして時間の前後関係（現在、過去、未来等）を表すものであり、「時制検索」とは時間軸に基づいて検索対象を検索することを言う。 However, if a tense search is performed using the above-mentioned Namazu, there is a problem that the number of processes for creating an inverted index increases. Here, “temporal system” represents the time relationship (current, past, future, etc.) based on a point in time on the time axis, and “tense search” is based on the time axis. Say to search the search target.

また、Nilfsにより時制検索しようとするとローカルファイルシステムに保存されている文書の変遷を転置インデックスとしているため、汎用のファイルサーバ（リモートファイルシステム）に保存されている文書の変遷を転置インデックスにするためには、事前に文書を複製してローカルファイルシステムに保存する必要がある。 Also, when trying to search tense by Nilfs, since the transition of the document stored in the local file system is used as the inverted index, the transition of the document stored in the general-purpose file server (remote file system) is used as the inverted index. To do this, it is necessary to duplicate the document in advance and save it in the local file system.

また、Nilfsでファイルサーバを実現する場合は、事前にファイルシステムをNilfsに変更する必要がある。 Also, when implementing a file server with Nilfs, it is necessary to change the file system to Nilfs in advance.

さらに、NilfsとNamazuを単純に組み合わせても、Namazuの転置インデックスの変遷がNilfsに保存されて最新のNamazuの転置インデックスを検索することになるため、上記の問題は依然解決されない。 Furthermore, even if Nilfs and Namazu are simply combined, the transition of Namazu's transposed index is stored in Nilfs and the latest Namazu's transposed index is searched, so the above problem is still not solved.

本発明は、上記の点に鑑みなされたもので、汎用のファイルサーバで既存のファイルシステムを置き換えず保存される文書（ファイル）について、新規作成、上書き保存、削除により変遷した文書を時制検索可能な文書検索方法及び装置及びプログラム及びコンピュータ読取可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above points, and for a document (file) stored on a general-purpose file server without replacing an existing file system, it is possible to search for a document that has changed due to new creation, overwriting, and deletion. An object of the present invention is to provide a simple document search method and apparatus, program, and computer-readable recording medium.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、共有文書サーバと接続され、文書を検索する検索装置における文書検索方法であって、
文書収集手段が、共有文書サーバから新規文書作成、更新、または、削除の文書操作通知及び文書を受信し、該文書を複製して該文書毎に文書識別子を付与して複製文書記憶手段に格納する（ステップ１）文書収集ステップと、
インデクシング手段が、複製文書記憶手段から文書を取得して、該文書の単語毎に文書識別子を付与して時制転置インデックス記憶手段に格納する（ステップ２）と共に、該文書識別子、操作日時を含む文書の変遷に関する情報及び文書を含む文書リストを文書リスト記憶手段に格納する（ステップ３）インデクシングステップと、
検索要求解析手段が、入力された検索要求を解析し、検索属性と検索条件を抽出する（ステップ４）検索要求解析ステップと、
検索実行手段が、検索条件に基づいて時制転置インデックス記憶手段から単語を検索し、該単語に対応する文書識別子（転置インデックス）を取得し（ステップ５）、該転置インデックスに基づいて文書リスト記憶手段を検索することにより、検索結果の文書を取得する（ステップ６）検索ステップと、
検索結果結合手段が、検索ステップで取得した文書の文書識別子が複数ある場合には、操作日時の古い文書を検索結果から削除して出力する（ステップ７）検索結果結合ステップと、を行う。 The present invention (Claim 1) is a document search method in a search device connected to a shared document server and searching for a document,
The document collecting means receives a document operation notification for creating, updating, or deleting a new document from the shared document server and copies the document, assigns a document identifier for each document, and stores it in the duplicate document storage means. (Step 1) Document collection step;
The indexing unit obtains a document from the duplicate document storage unit, assigns a document identifier to each word of the document, stores the document identifier in the time-based transposed index storage unit (step 2), and includes the document identifier and the operation date and time. Storing the document list including the information on the transition of the document and the document in the document list storage means (step 3);
The search request analyzing means analyzes the input search request and extracts search attributes and search conditions (step 4).
The search execution means searches the word from the temporally transposed index storage means based on the search condition, acquires a document identifier (transposed index) corresponding to the word (step 5), and the document list storage means based on the transposed index To obtain a search result document (step 6),
When there are a plurality of document identifiers of the document acquired in the search step, the search result combining means deletes the document with the old operation date from the search result and outputs it (step 7).

また、本発明（請求項２）は、検索ステップにおいて、
検索属性に対応する検索条件として、文書の作成、更新、または削除の日時が指定されている場合には、文書リスト記憶手段から該日時に一致する文書識別子を取得して、該文書識別子に対応する文書を取得する。 Further, the present invention (Claim 2), in the search step,
If the date, time of creation, update, or deletion of a document is specified as a search condition corresponding to the search attribute, a document identifier that matches the date is obtained from the document list storage unit, and the document identifier Get the document to be used.

また、本発明（請求項３）は、検索ステップにおいて、
検索属性に対応する検索条件として、文書名が指定されている場合には、文書リスト記憶手段から該文書名に一致する文書識別子を取得して、該文書識別子に対応する文書を取得する。 Further, according to the present invention (Claim 3), in the search step,
When a document name is specified as a search condition corresponding to the search attribute, a document identifier matching the document name is acquired from the document list storage unit, and a document corresponding to the document identifier is acquired.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項４）は、共有文書サーバと接続され、文書を検索する文書検索装置であって、
共有文書サーバから新規文書作成、更新、または、削除の文書操作通知及び文書を受信し、該文書を複製して該文書毎に文書識別子を付与して複製文書記憶手段１７０に格納する文書収集手段１７５と、
複製文書記憶手段１７０から文書を取得して、該文書の単語毎に文書識別子を付与して時制転置インデックス記憶手段１５５に格納すると共に、該文書識別子、操作日時を含む文書の変遷に関する情報及び文書を含む文書リストを文書リスト記憶手段１５０に格納するインデクシング手段１６０と、
入力された検索要求を解析し、検索属性と検索条件を抽出する検索要求解析手段１３０と、
検索条件に基づいて時制転置インデックス記憶手段１５５から単語を検索し、該単語に対応する文書識別子（転置インデックス）を取得し、該転置インデックスに基づいて文書リスト記憶手段１５０を検索することにより、検索結果の文書を取得する検索実行手段１４０と、
検索実行手段１４０で取得した文書の文書識別子が複数ある場合には、操作日時の古い文書を検索結果から削除して出力する検索結果結合手段１３５と、を有する。 The present invention (Claim 4) is a document retrieval apparatus that is connected to a shared document server and retrieves a document,
Document collection means for receiving a document operation notification and document for creating, updating, or deleting a new document from a shared document server, copying the document, assigning a document identifier for each document, and storing it in the duplicate document storage means 170 175,
A document is acquired from the duplicate document storage unit 170, a document identifier is assigned to each word of the document, and the document is stored in the temporal displacement index storage unit 155. The document identifier, information on the transition of the document including the operation date and time, and the document Indexing means 160 for storing a document list including the document list in the document list storage means 150;
A search request analysis unit 130 that analyzes the input search request and extracts search attributes and search conditions;
A search is performed by searching for a word from the tense transposed index storage unit 155 based on the search condition, obtaining a document identifier (transposed index) corresponding to the word, and searching the document list storage unit 150 based on the transposed index. Search execution means 140 for obtaining a result document;
When there are a plurality of document identifiers of the document acquired by the search execution unit 140, the search execution unit 140 includes a search result combination unit 135 that deletes and outputs a document with an old operation date and time from the search result.

また、本発明（請求項５）は、検索実行手段１４０において、
検索属性に対応する検索条件として、文書の作成、更新、または削除の日時が指定されている場合には、文書リスト記憶手段から該日時に一致する文書識別子を取得して、該文書識別子に対応する文書を取得する手段を含む。 Further, the present invention (Claim 5) is provided in the search execution means 140.
If the date, time of creation, update, or deletion of a document is specified as a search condition corresponding to the search attribute, a document identifier that matches the date is obtained from the document list storage unit, and the document identifier Means for obtaining a document to be executed.

また、本発明（請求項６）は、検索実行手段１４０において、
検索属性に対応する検索条件として、文書名が指定されている場合には、文書リスト記憶手段から該文書名に一致する文書識別子を取得して、該文書識別子に対応する文書を取得する手段を含む。 Further, the present invention (Claim 6) is provided in the search execution means 140.
Means for acquiring a document identifier corresponding to the document name from the document list storage means and acquiring a document corresponding to the document identifier when a document name is specified as a search condition corresponding to the search attribute; Including.

本発明（請求項７）は、請求項４乃至６のいずれか１項に記載の文書検索装置を構成する各手段としてコンピュータを機能させる文書検索プログラムである。 The present invention (Claim 7) is a document search program for causing a computer to function as each means constituting the document search apparatus according to any one of Claims 4 to 6.

本発明（請求項８）は、請求項７記載の文書検索プログラムを格納したコンピュータ読取可能な記録媒体である。 The present invention (Claim 8) is a computer-readable recording medium storing the document search program according to Claim 7.

本発明は、汎用の共用文書サーバで保存されている文書の新規作成・更新・削除等の変遷を示す日時情報と当該文書を取得して記憶しておくことにより、当該サーバで管理される既存のファイルシステムを置き換えずに日時指定を含む検索条件に基づいて検索する時制検索が可能となる。 The present invention obtains and stores date and time information indicating transitions such as new creation / update / deletion of a document stored in a general-purpose shared document server, and stores the document by acquiring and storing the document. This makes it possible to conduct a tense search based on a search condition including a date and time specification without replacing the file system.

さらに、削除の日時も共用文書サーバから取得して保持しておくことにより、検索キーに削除日時も指定することが可能となる。 Furthermore, the date and time of deletion can also be specified as a search key by acquiring and storing the date and time of deletion from the shared document server.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態における時制検索装置の構成を示す。 FIG. 3 shows the configuration of the tense search device according to the embodiment of the present invention.

同図に示す時制検索装置は、ユーザインタフェース部１２５、検索要求解析部１３０、検索結果結合部１３５、検索実行部１４０、文書リスト部１５０、時制転置インデックス記憶部１５５、インデクシング部１６０、形態素解析部１６５、複製文書部１７０、文書収集インタフェース部１７５から構成される。 The tense search device shown in the figure includes a user interface unit 125, a search request analysis unit 130, a search result combination unit 135, a search execution unit 140, a document list unit 150, a tense transposed index storage unit 155, an indexing unit 160, and a morphological analysis unit. 165, a duplicate document unit 170, and a document collection interface unit 175.

ユーザインタフェース１２５は、LAN(Local Area Network)を介してユーザ端末１１０と接続され、当該ユーザ端末１１０との間で検索構文１２０と検索結果１１５を送受信する。また、文書収集インタフェース部１７５は、LANを介して共有文書（ファイル）サーバ１８０と接続される。 The user interface 125 is connected to the user terminal 110 via a LAN (Local Area Network), and transmits / receives a search syntax 120 and a search result 115 to / from the user terminal 110. The document collection interface unit 175 is connected to the shared document (file) server 180 via the LAN.

文書収集インタフェース部１７５は、共有文書（ファイル）サーバ１８０から取得した文書及び新規文書作成、更新（上書き）や削除等の操作種別と、それらの操作時間を取得し、文書毎にDocument IDを払い出し、取得した文書を複製して複製文書部１７０にDocument IDと共に格納する。 The document collection interface unit 175 obtains a document acquired from the shared document (file) server 180, operation types such as new document creation, update (overwrite), and deletion, and operation times thereof, and issues a Document ID for each document. The obtained document is duplicated and stored in the duplicate document unit 170 together with the Document ID.

インデクシング部１６０は、取得した文書を形態素解析部１６５に渡し、形態素解析結果である単語を取得して、単語毎にDocument IDを付与して時制転置インデックス記憶部１５５に格納する。また、共有文書（ファイル）サーバ１８０から取得した情報及びDocument ID、操作時間等を文書リスト部１５０に追加書き込みする。 The indexing unit 160 passes the acquired document to the morpheme analysis unit 165, acquires a word that is the result of the morpheme analysis, assigns a Document ID to each word, and stores it in the temporal transposed index storage unit 155. Further, information acquired from the shared document (file) server 180, Document ID, operation time, and the like are additionally written in the document list unit 150.

検索要求解析部１３０は、ユーザ端末１１０から入力された検索構文を解析して、検索属性と当該検索属性に対応する検索条件の組み合わせを抽出し、検索実行部１４０に出力する。 The search request analysis unit 130 analyzes the search syntax input from the user terminal 110, extracts a combination of search attributes and search conditions corresponding to the search attributes, and outputs the combination to the search execution unit 140.

検索実行部１４０は、検索属性に対応する検索条件（キーワードとなる単語）に基づいて、時制転置インデックス記憶部１５５より、時制転置インデックスとして、文書を特定するための文書識別子であるDocument IDを取得し、当該Document IDに基づいて、文書リスト部１５０から時制検索するための文書の新規作成日時・上書き日時、または、削除日時を取得し、また、検索条件（時間軸の指定及びファイル名（文書名））に基づいて、文書リスト部１５０の文書を検索し、検索結果（文書の一覧）と時制転置インデックスを検索結果結合部１３５に出力する。 The search execution unit 140 obtains a Document ID, which is a document identifier for identifying a document, as a temporally transposed index from the temporally transposed index storage unit 155 based on a search condition (word as a keyword) corresponding to the search attribute. Then, based on the Document ID, a new creation date / overwrite date / time or deletion date / time of the document for the tense search is acquired from the document list unit 150, and the search condition (time axis designation and file name (document Name)) is searched for a document in the document list unit 150, and a search result (a list of documents) and a temporally transposed index are output to the search result combining unit 135.

検索結果結合部１３５は、同一文書で同じ操作日時の文書の重複を排除し、ユーザインタフェース部１２５を介してユーザ端末１１０に出力する。 The search result combining unit 135 eliminates duplication of documents with the same operation date and time in the same document, and outputs the result to the user terminal 110 via the user interface unit 125.

最初に、検索処理を行う前の、文書収集インタフェース部１７５とインデクシング部１６０における時制転置インデックス及び文書リスト作成時の処理について説明する。 First, a description will be given of the processing at the time of creating a temporally transposed index and a document list in the document collection interface unit 175 and the indexing unit 160 before performing the search processing.

図４は、本発明の一実施の形態における転置インデックス作成・文書リスト作成処理のフローチャートである。 FIG. 4 is a flowchart of the inverted index creation / document list creation processing according to the embodiment of the present invention.

文書収集インタフェース部１７５は、共有文書サーバ１８０から文書操作通知を受信すると（ステップ１０１）、当該文書操作通知のメッセージ種別が、新規作成や更新（上書き保存）いずれであるかを判定する。いずれでもない場合は（ステップ１０２、Ｎｏ）、メッセージ種別が文書の削除であると判定し、文書操作通知のメッセージの操作日時情報を文書リスト記憶部１５０のファイルの削除日時の領域に設定し、また、ファイル（文書）の削除フラグを立て、文書リスト記憶部１５０を更新する（ステップ１０７）。いずれかである場合は（ステップ１０２、Ｙｅｓ）、文書収集インタフェース部１７５において、共有文書サーバ１８０から取得した文書（ファイル）を複製し、複製した文書に対応するDocument IDを作成し、当該Document IDと複製した文書を複製文書部１７０に格納する（ステップ１０３）。 When the document collection interface unit 175 receives a document operation notification from the shared document server 180 (step 101), the document collection interface unit 175 determines whether the message type of the document operation notification is newly created or updated (overwritten). If none of them (step 102, No), it is determined that the message type is document deletion, and the operation date / time information of the document operation notification message is set in the file deletion date / time area of the document list storage unit 150; Further, a deletion flag for the file (document) is set, and the document list storage unit 150 is updated (step 107). If it is any (step 102, Yes), the document collection interface unit 175 duplicates the document (file) acquired from the shared document server 180, creates a Document ID corresponding to the duplicated document, and creates the Document ID The copied document is stored in the duplicate document unit 170 (step 103).

次に、インデクシング部１６０において、複製文書部１７０から文書を読み込み（ステップ１０４）、形態素解析部１６５において文書を形態素解析し、解析結果の単語及び、Document IDをインデクシング部１６０に渡す（ステップ１０５）。これにより、インデクシング部１６０は、時制転置インデックス記憶部１５５に図５に示すような構造で格納する（ステップ１０６）。 Next, the indexing unit 160 reads the document from the duplicate document unit 170 (step 104), the morpheme analysis unit 165 performs morpheme analysis, and passes the analysis result word and the Document ID to the indexing unit 160 (step 105). . As a result, the indexing unit 160 stores the structure in the tense transposed index storage unit 155 as shown in FIG. 5 (step 106).

また、インデクシング部１６０は、図６に示すように、複製文書部１７０から読み出した情報に基づいて、ファイル識別子（Document IDを論理的な１つのファイルとするための番号）、ファイルの変遷（版）番号、複製文書を特定するための番号である文書識別子（Document ID）、ファイルの新規作成日時・上書き日時、ファイルの本文（文章）、共有文書サーバ名、ファイルを保存しているフォルダ名であるパス名、ファイル名を文書リスト部１５０に格納する（ステップ１０７）。上記の処理を文書操作通知を受信する毎に実施する。 Further, as shown in FIG. 6, the indexing unit 160, based on the information read from the duplicate document unit 170, changes the file identifier (number for making the Document ID a logical one file), the transition of the file (version ) Number, document identifier (Document ID) that identifies the duplicate document, new file creation date / overwrite date, file body (text), shared document server name, folder name where the file is stored A certain path name and file name are stored in the document list section 150 (step 107). The above processing is performed every time a document operation notification is received.

次に、検索時の処理を説明する。 Next, processing at the time of search will be described.

図７は、本発明の一実施の形態における検索処理のフローチャートである。 FIG. 7 is a flowchart of search processing according to an embodiment of the present invention.

ステップ２１０）ユーザインタフェース部１２５がユーザ端末１１０からLANを介して検索要求である検索構文を受け付ける。 Step 210) The user interface unit 125 accepts a search syntax as a search request from the user terminal 110 via the LAN.

ステップ２２０）検索要求解析部１３０は、検索構文を解析し、検索属性と検索条件の組み合わせを抽出する。本処理の詳細については後述する。 Step 220) The search request analysis unit 130 analyzes the search syntax and extracts a combination of search attributes and search conditions. Details of this processing will be described later.

ステップ２３０）検索結果実行部１４０は、時制転置インデックス記憶部１５５を参照して検索条件に一致する文書を文書リスト部１５０から取得する。本処理の詳細については後述する。 Step 230) The search result execution unit 140 refers to the hourly transposed index storage unit 155 and acquires a document that matches the search condition from the document list unit 150. Details of this processing will be described later.

ステップ２４０）検索結果結合部１３５は、検索結果において同一文書で同じ操作日時の文書の重複を排除し、ユーザインタフェース部１２５に出力する。本処理の詳細については後述する。 Step 240) The search result combining unit 135 eliminates duplication of documents with the same operation date and time in the same document in the search result, and outputs it to the user interface unit 125. Details of this processing will be described later.

ステップ２６０）ユーザインタフェース部１２５は、検索結果１１５をユーザ端末１１０に返却する。 Step 260) The user interface unit 125 returns the search result 115 to the user terminal 110.

次に、上記の図７のステップ２２０の検索要求解析部１３０の処理について詳細に説明する。 Next, the processing of the search request analysis unit 130 in step 220 of FIG. 7 will be described in detail.

図８は、本発明の一実施の形態における検索構文解析処理のフローチャートである。 FIG. 8 is a flowchart of search syntax analysis processing according to the embodiment of the present invention.

ステップ２２１）入力された検索構文を検索属性と検索条件の組み合わせに分解する。 Step 221) The input search syntax is decomposed into combinations of search attributes and search conditions.

ステップ２２２）検索属性と検索条件の組数が０であるかを判定し、０の場合はステップ２２５に移行し、そうでない場合にはステップ２２３に移行する。 Step 222) It is determined whether or not the number of combinations of the search attribute and the search condition is 0. If it is 0, the process proceeds to Step 225. If not, the process proceeds to Step 223.

ステップ２２３）検索属性を取り出す。 Step 223) Retrieve search attributes.

ステップ２２４）検索条件を取り出し、ステップ２２３で取り出された検索属性と共にメモリ（図示せず）に格納し、ステップ２２２に移行する。 Step 224) The search condition is extracted and stored in a memory (not shown) together with the search attribute extracted in Step 223, and the process proceeds to Step 222.

ステップ２２５）ステップ２２２において検索属性と検索条件の組が０になっている場合は、検索属性が全文検索指定のものが含まれているかを判定し、含まれている場合には、前述のステップ２３０の処理に移行し、含まれていない場合は処理を終了する。 Step 225) When the combination of the search attribute and the search condition is 0 in Step 222, it is determined whether or not the search attribute includes a full-text search specification. The process proceeds to 230, and if not included, the process ends.

上位の処理で取り出された検索属性と検索条件の例を図９に示す。 FIG. 9 shows an example of search attributes and search conditions extracted by the upper processing.

取り出された検索属性及び検索条件は、キーワード、日時、ファイル名であり、図９の例では、指定されている検索キーワードのAND条件で検索することを指定している。また、日時は、年月日、時刻により指定され、指定日時に存在していたファイルに限定して検索する、または、指定日時以前、指定日時以降、指定期間に存在していたファイルに限定して検索することを指定する。またファイル名の一部を指定することにより、指定されたファイル名を含む文書を検索対象とすることができる。 The retrieved search attributes and search conditions are a keyword, date and time, and a file name. In the example of FIG. 9, the search is specified using the AND condition of the specified search keyword. Also, the date and time is specified by the date and time, and the search is limited to the files that existed at the specified date and time, or the files that existed in the specified period before and after the specified date and time are limited. To search. By specifying a part of the file name, a document including the specified file name can be searched.

次に、図７のステップ２３０の検索実行部１４０における検索実行処理について説明する。 Next, the search execution process in the search execution unit 140 in step 230 of FIG. 7 will be described.

図１０は、本発明の一実施の形態における検索処理のフローチャートである。 FIG. 10 is a flowchart of search processing according to an embodiment of the present invention.

ステップ２３１）検索実行部１４０は、取得した検索属性の検索条件のキーワードに基づいて時制転置インデックス記憶部１５５より検索した単語と組になっているDocument IDを取得し、メモリ（図示せず）に格納する。このとき、キーワードが複数指定されている場合には、ＡＮＤ条件で検索する。 Step 231) The search execution unit 140 acquires the Document ID paired with the searched word from the tense transposed index storage unit 155 based on the acquired search condition keyword of the search attribute, and stores it in a memory (not shown). Store. At this time, if a plurality of keywords are specified, the search is performed using an AND condition.

ステップ２３２）上記のキーワードに対応する単語について時制転置インデックス記憶部１５５をすべて検索して、対応するDocument IDをメモリ（図示せず）に格納する。 Step 232) The tense transposed index storage unit 155 is searched for all the words corresponding to the keyword, and the corresponding Document ID is stored in a memory (not shown).

ステップ２３３）検索条件に日時指定があるかを判定する。日時指定には、図９に示すように、
・指定日時；
・指定日時以前；
・指定日時以降；
・指定期間；
等の指定の方法がある。日時指定がある場合はステップ２３４に移行し、無い場合はステップ２３５に移行する。 Step 233) It is determined whether or not there is a date specification in the search condition. To specify the date and time, as shown in FIG.
・ Specified date and time;
・ Before specified date and time;
・ After specified date and time;
・ Specified period;
There is a method of designation. If there is a date and time designation, the process proceeds to step 234, and if not, the process proceeds to step 235.

ステップ２３４）ステップ２３２で取得した日時指定に基づいて、文書リスト記憶部１５０を検索し、当該日時指定に対応するDocument IDを取得し、メモリ（図示せず）に格納する。 Step 234) Based on the date / time designation acquired in step 232, the document list storage unit 150 is searched, a Document ID corresponding to the date / time designation is obtained, and stored in a memory (not shown).

ステップ２３５）検索属性に文書名（ファイル名）の指定があるかを判定し、ある場合は、ステップ２３６に移行し、無い場合はステップ２４０に移行する。 Step 235) It is determined whether or not a document name (file name) is specified in the search attribute. If there is, the process proceeds to Step 236, and if not, the process proceeds to Step 240.

ステップ２３６）文書名（ファイル名）に基づいて文書リスト部１５０を検索し、当該文書名（ファイル名）に対応するDocument IDを取得し、メモリ（図示せず）に格納する。 Step 236) The document list portion 150 is searched based on the document name (file name), and the Document ID corresponding to the document name (file name) is acquired and stored in the memory (not shown).

次に、図７のステップ２４０における検索結果結合部１３５の処理を詳細に説明する。 Next, the processing of the search result combining unit 135 in step 240 of FIG. 7 will be described in detail.

図１１は、本発明の一実施の形態における検索結果の結合処理のフローチャートである。 FIG. 11 is a flowchart of search result combining processing according to an embodiment of the present invention.

ステップ２４１）検索実行部１４０から上記のステップ２３２で取得し、メモリ（図示せず）に格納されているDocument IDを第１のキーとし、上記のステップ２３４、ステップ２３６で取得し、メモリ（図示せず）に格納されているDocument IDを第２のキーとして、メモリ（図示せず）上に降順に並べる。 Step 241) The document ID acquired from the search execution unit 140 in the above step 232 and stored in the memory (not shown) as the first key is acquired in the above step 234 and step 236, and the memory (see FIG. Document IDs stored in (not shown) are arranged in descending order on a memory (not shown) using the Document ID as the second key.

ステップ２４２）第１のキーに対する第２のキーが複数あるかを判定する。複数ある場合はステップ２４３に移行し、そうでない場合はステップ２４９に移行する。 Step 242) It is determined whether there are a plurality of second keys for the first key. If there are more than one, the process proceeds to step 243, and if not, the process proceeds to step 249.

ステップ２４３）文書リスト部１５０から第２のキーに対応するDocument IDと一致する操作日時を取得する。 Step 243) The operation date and time corresponding to the Document ID corresponding to the second key is acquired from the document list portion 150.

ステップ２４４）比較対象となる操作日時があるかどうかを判定し、ある場合はステップ２４５に移行し、無い場合はステップ２４８に移行する。 Step 244) It is determined whether or not there is an operation date and time to be compared. If there is, the process proceeds to Step 245, and if not, the process proceeds to Step 248.

ステップ２４５）１つのDocument IDに対して同じ操作日時がある場合には、ステップ２４６に移行し、無い場合はステップ２４７に移行する。 Step 245) If there is the same operation date and time for one Document ID, the process proceeds to Step 246, and if there is not, the process proceeds to Step 247.

ステップ２４６）ステップ２４５において同じ操作日時がある場合は、重複しているので、古い操作日時のDocument IDを検索結果から削除し、ステップ２４４に移行する。 Step 246) If there is the same operation date and time in Step 245, it is duplicated, so the Document ID of the old operation date and time is deleted from the search result, and the process proceeds to Step 244.

ステップ２４７）ステップ２４５において重複しない場合には、次の操作日時を設定し、ステップ２４４に移行する。 Step 247) If there is no overlap in Step 245, the next operation date and time is set, and the process proceeds to Step 244.

ステップ２４８）ステップ２４４において、比較する操作日時がない場合には、比較元に次に比較対象となる文書を設定し、ステップ２４３に移行する。 Step 248) If there is no operation date and time to be compared in Step 244, the document to be compared next is set as the comparison source, and the process proceeds to Step 243.

ステップ２４９）ステップ２４２において第２のキーであるDocument IDが複数ない場合には、検索一覧を並べ替えるためのランキングの計算を行う。ランキングの計算としては、例えば、ＴＦ・ＩＤＦ（Term Frequency / Inverted Document Frequency）を用いるものとする。 Step 249) If there are not a plurality of Document IDs as the second key in Step 242, the ranking for rearranging the search list is calculated. For the ranking calculation, for example, TF / IDF (Term Frequency / Inverted Document Frequency) is used.

ステップ２５０）ランキングの値に基づいて、検索結果一覧を並べ替える。 Step 250) The search result list is rearranged based on the ranking value.

なお、上記の時制検索装置の構成要素の動作をプログラムとして構築し、時制検索装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 Note that the operation of the components of the above tense search device can be constructed as a program and installed in a computer used as the tense search device for execution, or distributed via a network.

また、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、汎用のファイルサーバのファイルシステムの検索に適用可能である。 The present invention can be applied to search of a file system of a general-purpose file server.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における時制検索装置の構成図である。It is a block diagram of the tense search apparatus in one embodiment of this invention. 本発明の一実施の形態における転置インデックス作成処理のフローチャートである。It is a flowchart of the transposition index creation process in one embodiment of this invention. 本発明の一実施の形態における時制転置インデックス記憶部の構造の例である。It is an example of the structure of the temporal displacement index storage part in one embodiment of this invention. 本発明の一実施の形態における文書リスト部の構造の例である。It is an example of the structure of the document list part in one embodiment of this invention. 本発明の一実施の形態における検索処理のフローチャートである。It is a flowchart of the search process in one embodiment of this invention. 本発明の一実施の形態における検索構文解析処理のフローチャートである。It is a flowchart of the search syntax analysis process in one embodiment of the present invention. 本発明の一実施の形態における抽出された検索属性と検索条件の例である。It is an example of the extracted search attribute and search condition in one embodiment of this invention. 本発明の一実施の形態における検索処理のフローチャートである。It is a flowchart of the search process in one embodiment of this invention. 本発明の一実施の形態における検索結果の結合処理のフローチャートである。It is a flowchart of the search result combining process in an embodiment of the present invention.

Explanation of symbols

１１０ユーザ端末
１１５検索結果
１２０検索構文
１２５ユーザインタフェース部
１３０検索要求解析手段、検索要求解析部
１３５検索結果結合手段、検索結果結合部
１４０検索実行手段、検索実行部
１５０文書リスト記憶手段、文書リスト部
１５５時制転置インデックス記憶手段、時制転置インデックス記憶部
１６０インデクシング手段、インデクシング部
１６５形態素解析部
１７０複製文書記憶手段、複製文書部
１７５文書収集手段、文書収集インタフェース部
１８０共有文書（ファイル）サーバ 110 User terminal 115 Search result 120 Search syntax 125 User interface unit 130 Search request analysis unit, search request analysis unit 135 Search result combination unit, search result combination unit 140 Search execution unit, search execution unit 150 Document list storage unit, document list unit 155 Temporary transposed index storage means, temporal transposed index storage section 160 Indexing means, indexing section 165 Morphological analysis section 170 Duplicate document storage means, duplicate document section 175 Document collection means, document collection interface section 180 Shared document (file) server

Claims

A document search method in a search device connected to a shared document server and searching for a document,
A document collection unit receives a document operation notification and document for creating, updating, or deleting a new document from the shared document server, duplicates the document, assigns a document identifier to each document, and stores it in a duplicate document storage unit. A document collection step to store;
An indexing unit acquires the document from the duplicate document storage unit, assigns the document identifier to each word of the document, stores the document identifier in the time-replaced index storage unit, and stores the document identifier and operation date An indexing step of storing information on the transition and a document list including the document in a document list storage unit;
A search request analyzing means for analyzing the input search request and extracting a search attribute and a search condition;
Search execution means searches for a word from the temporal transposed index storage means based on the search condition, obtains a document identifier (transposed index) corresponding to the word, and uses the document list storage means based on the transposed index. A search step for obtaining a search result document by searching;
When the search result combining means has a plurality of document identifiers of the document acquired in the search step, a search result combining step for deleting and outputting the old operation date and time document from the search result;
A document search method characterized by:

In the search step,
When a date, time of creation, update, or deletion of a document is specified as a search condition corresponding to the search attribute, a document identifier matching the date is acquired from the document list storage unit, and the document identifier The document search method according to claim 1, wherein a document corresponding to is acquired.

In the search step,
When a document name is specified as a search condition corresponding to the search attribute, a document identifier matching the document name is acquired from the document list storage unit, and a document corresponding to the document identifier is acquired. The document search method according to claim 1.

A document search device connected to a shared document server and searching for a document,
Document collection means for receiving a document operation notification and document for creating, updating, or deleting a new document from the shared document server, copying the document, assigning a document identifier for each document, and storing the document in the duplicate document storage means When,
Obtaining the document from the duplicate document storage means, assigning the document identifier for each word of the document and storing it in the temporally-transposed index storage means, and information on the transition of the document including the document identifier and operation date and time, and Indexing means for storing a document list including the document in a document list storage means;
A search request analysis means for analyzing the input search request and extracting a search attribute and a search condition;
By searching for a word from the temporal transposed index storage unit based on the search condition, obtaining a document identifier (transposed index) corresponding to the word, and searching the document list storage unit based on the transposed index, A search execution means for obtaining a search result document;
When there are a plurality of document identifiers of the documents acquired by the search execution means, search result combining means for deleting and outputting a document with an old operation date and time from the search results;
A document search apparatus characterized by comprising:

The search execution means includes
When a date, time of creation, update, or deletion of a document is specified as a search condition corresponding to the search attribute, a document identifier matching the date is acquired from the document list storage unit, and the document identifier The document search apparatus according to claim 4, further comprising means for acquiring a document corresponding to.

The search execution means includes
When a document name is specified as a search condition corresponding to the search attribute, a document identifier matching the document name is acquired from the document list storage unit, and a document corresponding to the document identifier is acquired. 5. The document retrieval apparatus according to claim 4, further comprising means.

7. A document search program for causing a computer to function as each means constituting the document search apparatus according to claim 4.

A computer-readable recording medium storing the document search program according to claim 7.