JP2011081642A

JP2011081642A - Retrieval server, information retrieval method, program and storage medium

Info

Publication number: JP2011081642A
Application number: JP2009233981A
Authority: JP
Inventors: Yorifumi Kinoshita; 順史木下; Yasuhiro Fujii; 康広藤井; Yoshinori Honda; 義則本多; Susumu Serita; 進芹田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-10-08
Filing date: 2009-10-08
Publication date: 2011-04-21

Abstract

<P>PROBLEM TO BE SOLVED: To conceal relevance between a document and confidential information or the like inside the document, and to generate a proper retrieval result. <P>SOLUTION: This retrieval server includes: a management part receiving the confidential information of the document and access right information to the confidential information from a terminal via a communication network, associating the received information to an index word of the document or a position of the index word, and managing it; and a control part deciding presence/absence of an access right of each index word or each position of the index word based on the management part and a word inside a retrieval sentence included in a retrieval request of the document when receiving the retrieval request of the document from the terminal via the communication network, creating a document list wherein information about the documents including the index word is enumerated when there is the access right by the decision, calculating a compatibility degree of the document, rearranging the respective documents inside the document list based on a calculation result thereof, and transmitting it to the terminal of a retrieval request source of the document via the communication network as a retrieval result candidate. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、文書等の情報を検索する技術に関する。 The present invention relates to a technique for retrieving information such as documents.

近年、企業や組織において横断的な情報活用のために情報検索システムの導入が一般的になりつつある。しかし、企業や組織内の情報の中には機密性の高いものやプライバシーに配慮すべきものが含まれている場合があるため、適切な権限を持つユーザーにのみ適切な情報を開示するアクセス制御が必要である。現在市場に流通する企業内検索製品においては、このような需要に応えるために、文書単位あるいは文書の集合単位でアクセス制御が広く行われている。例えば、特許文献１には、検索システムにおいて、ファイルに付与されたアクセス権を用いてアクセス制御を行う技術が開示されている。また、非特許文献１〜４の企業内検索製品において、ファイルあるいはファイルの集合に対して、ユーザーの権限に応じて検索の可否や検索結果への表示可否が行われている。 In recent years, introduction of information retrieval systems has become common for companies and organizations to use information across the board. However, some information within a company or organization may contain sensitive or privacy-sensitive information, so access control that discloses appropriate information only to users with appropriate authority is required. is necessary. In-house search products currently distributed in the market, in order to meet such demand, access control is widely performed in document units or document collection units. For example, Patent Document 1 discloses a technique for performing access control using an access right given to a file in a search system. Further, in the in-company search products of Non-Patent Documents 1 to 4, whether or not a search is possible for a file or a set of files is displayed according to the user's authority.

また、従来技術として、ユーザーの検索キーワードを制限する形態のアクセス制御方式がある。例えば、特許文献２には、ユーザーが検索を行う際、検索キーワードと共にパスワードの入力を要する技術が開示されている。 As a conventional technique, there is an access control method in which a user's search keyword is limited. For example, Patent Document 2 discloses a technology that requires a user to input a password together with a search keyword when performing a search.

また、従来技術として、文書内の一部を墨塗り等によって非開示とするような、より細かい粒度でアクセス制御を実施する技術も存在する。例えば、特許文献３や特許文献４には、文書を表示する際に文書の特定の箇所を非開示とする技術が開示されている。また、非特許文献５には、文書の一部を暗号化する技術が開示されている。 Further, as a conventional technique, there is a technique for performing access control with a finer granularity such that a part of a document is not disclosed by sanitization or the like. For example, Patent Literature 3 and Patent Literature 4 disclose a technique for disclosing a specific portion of a document when the document is displayed. Non-Patent Document 5 discloses a technique for encrypting a part of a document.

特開平１０−２０７７７５号公報JP-A-10-207775 特開平１０−１８７５４２号公報Japanese Patent Laid-Open No. 10-187542 特開２００１−３０６５５８号公報JP 2001-306558 A 特開２００８−１５９００１号公報JP 2008-159001 A

「Autonomy White Paper: Enterprise Search - Addressing Security and Entitlement Issues」、[2009年７月２０日検索]インターネット<URL:http://publications.autonomy.com/docs/Autonomy%20Security%20White%20Paper>“Autonomy White Paper: Enterprise Search-Addressing Security and Entitlement Issues”, [Search July 20, 2009] Internet <URL: http://publications.autonomy.com/docs/Autonomy%20Security%20White%20Paper> 「Search Best Practices, secure search」、[2009年７月２０日検索]インターネット<URL: http://www.microsoft.com/enterprisesearch/en/us/FAST-technical.aspx>“Search Best Practices, secure search”, [Search July 20, 2009] Internet <URL: http://www.microsoft.com/enterprisesearch/en/us/FAST-technical.aspx> 「OmniFind Enterprise Edition v8.5, Administering Guide」、[2009年７月２０日検索]インターネット<URL:www.ibm.com/software/data/enterprise-search/omnifind-enterprise/>"OmniFind Enterprise Edition v8.5, Administering Guide", [Search July 20, 2009] Internet <URL: www.ibm.com/software/data/enterprise-search/omnifind-enterprise/> 「OmniFind Enterprise Edition v8.5, Administering Guide」、[2009年７月２０日検索]インターネット<URL: www.ibm.com/software/data/enterprise-search/omnifind-enterprise/>"OmniFind Enterprise Edition v8.5, Administering Guide", [Search July 20, 2009] Internet <URL: www.ibm.com/software/data/enterprise-search/omnifind-enterprise/> 「Documentation for the Google Search Appliance (software version 5.2),Managing Search for Controlled-Access Content」、[2009年７月２０日検索]インターネット<URL: www.google.com/enterprise/gsa/>"Documentation for the Google Search Appliance (software version 5.2), Managing Search for Controlled-Access Content", [Search July 20, 2009] Internet <URL: www.google.com/enterprise/gsa/>

特許文献１や非特許文献１〜４に開示されているような文書単位でのアクセス制御を用いて機密情報やプライバシー情報を含む文書を検索結果に非列挙とすると、当該文書は検索にヒットしなくなる。この場合、企業内や組織内の情報共有においては、可能な限り情報を共有することが望ましいにも関わらず、当該文書内の非機密情報や非プライバシー情報も共有されなくなってしまう。また、特許文献１や非特許文献１〜４で開示されているような文書単位でのアクセス制御を行い、機密情報やプライバシー情報を含む文書を検索結果に列挙する一方で、当該文書の内容の閲覧を制限することもできる。しかし、この場合、ユーザーが入力した検索文中のキーワードに当該文書がヒットすることから、ユーザーは当該文書が当該キーワードを含む事実を把握できる。ユーザーはさらにいくつかのキーワードでの検索や、複数のキーワードでのＡＮＤ検索やＯＲ検索等を実行して当該文書のヒットの有無やヒット順位の変動を見ることで、当該文書の内容を推測できるため、結果として間接的に機密情報やプライバシー情報の漏洩につながり得る。 If a document containing confidential information or privacy information is not enumerated in the search results using access control in document units as disclosed in Patent Document 1 or Non-Patent Documents 1 to 4, the document hits the search. Disappear. In this case, in information sharing within a company or organization, although it is desirable to share information as much as possible, non-confidential information and non-privacy information in the document are not shared. In addition, access control is performed in document units as disclosed in Patent Document 1 and Non-Patent Documents 1 to 4, and documents including confidential information and privacy information are listed in the search results, while the contents of the documents are You can also restrict browsing. However, in this case, since the document hits the keyword in the search sentence input by the user, the user can grasp the fact that the document includes the keyword. The user can infer the contents of the document by performing a search with several keywords, an AND search with a plurality of keywords, an OR search, etc. As a result, it may indirectly lead to leakage of confidential information and privacy information.

また、特許文献２に開示されているような技術を用いて、検索キーワードを制限すると、当該キーワードで検索が一律制限されてしまう。ある文書で、ある単語が機密扱いであったからといって、別の文書でも同じ単語が機密扱いであるとは限らないため、検索キーワードの制限は非機密情報や非プライバシー情報の共有を阻害してしまう。 Moreover, if a search keyword is restrict | limited using the technique as disclosed by patent document 2, a search will be restrict | limited uniformly by the said keyword. Limiting search keywords hinders sharing of non-confidential and non-privacy information because the same word is not classified in another document because one word is classified in one document. End up.

また、特許文献３や４、非特許文献５に開示されているような部分非開示技術を用いて文書内の機密情報やプライバシー情報を隠蔽し、索引語として使用しない場合、当該文書の保有者等の適切な権限を持つユーザーが当該文書を検索できなくなってしまう。また、文書内の機密情報やプライバシー情報を索引語として使用する一方で、文書の閲覧時に特許文献３や４、非特許文献５に開示されているような技術を用いて、文書内の機密情報やプライバシー情報を隠蔽することもできる。しかし、この場合、ユーザーが検索時に入力した検索文や当該文書のヒット順位、隠蔽されていない箇所に含まれる情報から、隠蔽されている箇所に含まれる情報を推測できてしまう。例えば、ある人名で検索を実行し、ある文書がヒットして当該文書を開いた際、隠蔽されていない箇所に当該人名が含まれていなければ、隠蔽箇所に当該人名が含まれることは明白である。また、ある人名Ａと人名ＢでＯＲ検索を実行し、ある文書が高順位でヒットし、当該文書の非隠蔽箇所に人名Ｂのみが含まれていれば、たとえＯＲ検索であっても隠蔽箇所に人名Ａが含まれている可能性が高く、部分非開示の有効性が薄れてしまう。 In addition, when confidential information or privacy information in a document is concealed using a partial non-disclosure technique such as disclosed in Patent Documents 3 and 4 and Non-Patent Document 5, the owner of the document is not used as an index word. It becomes impossible for a user with an appropriate authority to search for the document. In addition, while using confidential information and privacy information in the document as index words, the confidential information in the document can be obtained using techniques such as those disclosed in Patent Documents 3 and 4 and Non-Patent Document 5 when browsing the document. And privacy information can be hidden. However, in this case, the information contained in the concealed part can be estimated from the search sentence entered by the user at the time of retrieval, the hit rank of the document, and the information contained in the part that is not concealed. For example, when a search is performed with a certain person's name and a certain document is hit and the document is opened, if the person's name is not included in the part that is not concealed, it is obvious that the person's name is included in the concealed part. is there. In addition, if an OR search is performed with a certain person name A and a person name B, a document hits in a high order, and only the person name B is included in the non-hidden part of the document, the hidden part even in the OR search There is a high possibility that the personal name A is included in the name, and the effectiveness of partial non-disclosure is diminished.

以上のことから、機密情報やプライバシー情報を含む文書を取扱うこれまでの情報検索システムにおいては、適切な検索結果を生成できないという課題があった。 From the above, there has been a problem that an appropriate search result cannot be generated in conventional information search systems that handle documents including confidential information and privacy information.

本発明の目的は、ユーザーが検索時に検索文として入力した検索キーワードや論理条件式、検索文に対する文書のヒットの有無やヒット順位、文書内で隠蔽されていない部位に含まれる情報等から、ユーザーが文書内に含まれる機密情報やプライバシー情報を推測することを困難にする検索結果を生成する検索サーバ、情報検索方法、プログラムおよび記憶媒体を提供することにある。 The object of the present invention is based on a search keyword or logical conditional expression entered as a search sentence by a user at the time of search, presence / absence of hit of a document with respect to the search sentence, information included in a part not hidden in the document, etc. It is an object of the present invention to provide a search server, an information search method, a program, and a storage medium that generate a search result that makes it difficult to guess confidential information and privacy information included in a document.

本発明では、前述のような課題を解決するために、文書保持者や管理者が文書内の機密情報やプライバシー情報に対するアクセス権情報を事前に登録し、登録内容を索引語に関連付け、同じ文書に対して索引語や索引語の位置毎に異なるアクセス権を管理する。さらに、検索エンジンが検索キーワードに合致する索引語に対して、検索キーワードを入力したユーザーのアクセス権を確認し、適合する文書リストを作成してスコアリングを行う。 In the present invention, in order to solve the above-described problems, a document holder or administrator registers access right information for confidential information or privacy information in a document in advance, associates the registered contents with an index word, and the same document. In contrast, different access rights are managed for each index word and each index word position. Further, the search engine confirms the access right of the user who has input the search keyword with respect to the index word matching the search keyword, creates a matching document list, and performs scoring.

また別の実施形態として、文書保持者や管理者が文書内の機密情報やプライバシー情報に対するアクセス権情報を事前に登録し、該登録内容とユーザーが入力した検索文の検索キーワード数や論理条件式を基に、検索エンジンの上位プログラムが検索エンジンによる検索およびスコアリング結果に含まれる文書の削除や文書の順位変更を行って、最終的な検索結果を生成する。 As another embodiment, the document holder or administrator registers access right information for confidential information and privacy information in the document in advance, and the number of search keywords and logical conditional expressions in the search contents entered by the registered contents and the user Based on the above, the higher-order program of the search engine deletes the documents included in the search and scoring results by the search engine and changes the order of the documents to generate a final search result.

さらに別の実施形態として、文書保持者や管理者が文書内の機密情報やプライバシー情報に対するアクセス権情報を事前に登録し、検索エンジンの上位プログラムが当該文書の当該機密情報あるいは当該プライバシー情報を隠蔽した別文書を内部で生成し、元文書と紐付け管理して、双方に対して索引を作成する。さらに上位プログラムは、アクセス権情報や紐付け管理情報を基に、検索エンジンによる検索およびスコアリング結果に含まれる文書の削除を行って、最終的な検索結果を生成する。 In yet another embodiment, the document holder or administrator pre-registers access right information for confidential information and privacy information in the document, and a higher-level program of the search engine hides the confidential information or privacy information of the document. The other document is generated internally, linked with the original document, and an index is created for both. Further, the upper program generates a final search result by performing a search by the search engine and deleting a document included in the scoring result based on the access right information and the association management information.

本発明によれば、ユーザーが検索時に検索文として入力した検索キーワードや論理条件式、検索文に対する文書のヒットの有無やヒット順位、文書内で隠蔽されていない部位に含まれる情報等から、ユーザーが文書内に含まれる機密情報やプライバシー情報を推測することを困難にする検索結果を生成する検索サーバ、情報検索方法、プログラムおよび記憶媒体を提供することができる。結果として、文書と当該文書内の機密情報やプライバシー情報の関連性を隠蔽して適切な検索結果を生成可能となり、情報共有と機密保持やプライバシー保護をバランス良く実現することが可能となる。 According to the present invention, the search keyword or logical conditional expression that the user has input as a search sentence at the time of search, the presence or absence of hits of the document with respect to the search sentence, the order of hits, the information included in the part not hidden in the document, etc. Can provide a search server, an information search method, a program, and a storage medium that generate a search result that makes it difficult to guess confidential information and privacy information included in a document. As a result, it is possible to conceal the relevance between the document and the confidential information and privacy information in the document, thereby generating an appropriate search result, and it is possible to realize information sharing, confidentiality protection, and privacy protection in a balanced manner.

本発明の実施例１に係る情報検索システムの構成例を示す図である。It is a figure which shows the structural example of the information search system which concerns on Example 1 of this invention. 本発明の実施例１に係るクライアントコンピュータ２のハードウェアおよびソフトウェア構成例を示す図である。It is a figure which shows the hardware and software structural example of the client computer 2 which concerns on Example 1 of this invention. 本発明の実施例１に係る文書共有サーバ３のハードウェアおよびソフトウェア構成例を示す図である。It is a figure which shows the hardware and software structural example of the document sharing server 3 which concerns on Example 1 of this invention. 本発明の実施例１に係る検索サーバ１のハードウェアおよびソフトウェア構成例を示す図である。It is a figure which shows the hardware and software structural example of the search server 1 which concerns on Example 1 of this invention. 本発明の実施例１に係る機密情報管理テーブル１１５の構成例を示す図である。It is a figure which shows the structural example of the confidential information management table 115 which concerns on Example 1 of this invention. 本発明の実施例１に係るインデックス１３の構成例を示す図である。It is a figure which shows the structural example of the index 13 which concerns on Example 1 of this invention. 本発明の実施例１に係るユーザーグループテーブル１１４の構成例を示す図である。It is a figure which shows the structural example of the user group table 114 which concerns on Example 1 of this invention. 本発明の実施例１に係るインタフェース（機密情報登録画面）の一例を示す図である。It is a figure which shows an example of the interface (secret information registration screen) which concerns on Example 1 of this invention. 本発明の実施例１に係る機密情報登録処理手順の一例を示す図である。It is a figure which shows an example of the confidential information registration processing procedure which concerns on Example 1 of this invention. 本発明の実施例１に係る検索の全体処理手順の一例を示す図である。It is a figure which shows an example of the whole process procedure of the search which concerns on Example 1 of this invention. 本発明の実施例１に係る検索エンジンプログラム１１２による検索処理手順の一例を示す図である。It is a figure which shows an example of the search processing procedure by the search engine program 112 which concerns on Example 1 of this invention. 本発明の実施例２に係るインデックス１１３の一例を示す図である。It is a figure which shows an example of the index 113 which concerns on Example 2 of this invention. 本発明の実施例２に係る検索サービスプログラム１１０による検索処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the search processing procedure by the search service program 110 which concerns on Example 2 of this invention. 本発明の実施例３に係る検索サーバ１のハードウェアおよびソフトウェア構成を示す図である。It is a figure which shows the hardware and software structure of the search server 1 which concern on Example 3 of this invention. 本発明の実施例３に係る文書関係テーブル１１６の一例を示す図である。It is a figure which shows an example of the document relation table 116 which concerns on Example 3 of this invention. 本発明の実施例３に係る文書の複製を生成する概念の一例を示す図である。It is a figure which shows an example of the concept which produces | generates the replication of the document which concerns on Example 3 of this invention. 本発明の実施例３に係る機密情報登録処理手順の一例を示す図である。It is a figure which shows an example of the confidential information registration processing procedure which concerns on Example 3 of this invention. 本発明の実施例３に係る検索処理手順の一例を示す図である。It is a figure which shows an example of the search processing procedure which concerns on Example 3 of this invention. 本発明の実施例４に係る機密情報登録処理手順の一例を示す図である。It is a figure which shows an example of the confidential information registration processing procedure which concerns on Example 4 of this invention. 本発明の実施例４に係る検索処理手順の一例を示す図である。It is a figure which shows an example of the search processing procedure which concerns on Example 4 of this invention.

以下、本発明の実施形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施例１に係る情報検索システムの構成例を示す図である。情報検索システムは、検索サーバ１、複数のクライアントコンピュータ（端末）２、文書共有サーバ３、通信ネットワーク４から構成され、各コンピュータは、通信ネットワーク４を介して接続される。情報検索システムのユーザーは、クライアントコンピュータ２を用いて検索サーバ１に接続し、文書共有サーバ３やクライアントコンピュータ２、検索サーバ１、あるいは図示していないＷｅｂサーバ等の他の社内システム上に格納された情報を検索する。通信ネットワーク４は、公衆網、インターネット、ＩＳＤＮ、専用線、ＬＡＮ等の有線網や、移動通信用基地局や通信用人工衛星等を利用した無線網等によって実現できる。通信ネットワーク４において、各コンピュータは、個々のコンピュータに予め付与された識別情報によって識別され、この識別情報を基に、各コンピュータは他のコンピュータに接続して通信を行う。 FIG. 1 is a diagram illustrating a configuration example of an information search system according to the first embodiment of the present invention. The information search system includes a search server 1, a plurality of client computers (terminals) 2, a document sharing server 3, and a communication network 4, and each computer is connected via the communication network 4. The user of the information search system connects to the search server 1 using the client computer 2 and is stored on the document sharing server 3, the client computer 2, the search server 1, or another in-house system such as a web server (not shown). Search for information. The communication network 4 can be realized by a public network, the Internet, ISDN, a dedicated line, a wired network such as a LAN, a wireless network using a mobile communication base station, a communication artificial satellite, or the like. In the communication network 4, each computer is identified by identification information given in advance to each computer, and each computer is connected to another computer for communication based on this identification information.

図２は、本発明の実施例１に係るクライアントコンピュータ２のハードウェアおよびソフトウェアの構成例を示す図である。クライアントコンピュータ２は、ＣＰＵ等からなる制御部２０、記憶部２１、通信ネットワーク４に接続するためのネットワークインタフェース部２２、表示部２３、入力部２４、これらを接続するデータバス２５によって構成できる。記憶部２１は、半導体メモリ（ＲＡＭ）等の揮発性記憶装置、あるいはハードディスクやＳＳＤ等の読み書き可能な不揮発性記憶装置、光磁気メディア等の読み出し専用不揮発性記憶装置等により実現できる。表示部２３はＣＲＴディスプレイや液晶ディスプレイ等、入力部２４はキーボードやマウス等により実現できる。クライアントコンピュータ２において、例えば、情報や文書の検索等に伴う演算処理は制御部２０によって実行される。制御部２０が実行する検索クライアントプログラム２１０や、検索クライアントプログラム２１０が使用するデータについては、あらかじめ記憶部２１に格納されていてもよいし、他のコンピュータから通信ネットワーク４を介して導入されてもよいし、ＣＤ−ＲＯＭ等の記憶媒体から導入するようにしてもよい。また、検索クライアントプログラム２１０の機能は、ＬＳＩ等のハードウェアにより実現してもよい。 FIG. 2 is a diagram illustrating a configuration example of hardware and software of the client computer 2 according to the first embodiment of the present invention. The client computer 2 can be configured by a control unit 20 including a CPU, a storage unit 21, a network interface unit 22 for connecting to the communication network 4, a display unit 23, an input unit 24, and a data bus 25 for connecting them. The storage unit 21 can be realized by a volatile storage device such as a semiconductor memory (RAM), a readable / writable nonvolatile storage device such as a hard disk or an SSD, or a read-only nonvolatile storage device such as a magneto-optical medium. The display unit 23 can be realized by a CRT display or a liquid crystal display, and the input unit 24 can be realized by a keyboard, a mouse, or the like. In the client computer 2, for example, arithmetic processing associated with information and document search is executed by the control unit 20. The search client program 210 executed by the control unit 20 and the data used by the search client program 210 may be stored in the storage unit 21 in advance or may be introduced from another computer via the communication network 4. Alternatively, it may be introduced from a storage medium such as a CD-ROM. Further, the function of the search client program 210 may be realized by hardware such as LSI.

検索クライアントプログラム２１０は、情報検索システムのユーザーに対して情報の検索を行うためのユーザーインタフェースを提供するプログラムであり、例えばウェブブラウザや検索専用アプリケーション等の形態をとることができる。検索クライアントプログラム２１０は、ユーザーが入力部２４を用いて入力した検索文を検索サーバ１に送信し、検索結果を検索サーバ１から受け取って表示部２３に表示し、ユーザーに提示する。また、検索クライアントプログラム２１０は、情報検索システムのユーザーに対して、文書内の機密情報やプライバシー情報（以下、単に機密情報と記す）の登録を行うためのインタフェースを提供する。 The search client program 210 is a program that provides a user interface for searching information to the user of the information search system, and can take the form of, for example, a web browser or a search-dedicated application. The search client program 210 transmits a search sentence input by the user using the input unit 24 to the search server 1, receives the search result from the search server 1, displays it on the display unit 23, and presents it to the user. In addition, the search client program 210 provides an interface for registering confidential information and privacy information (hereinafter simply referred to as confidential information) in a document to the user of the information search system.

図３は、本発明の実施例１に係る文書共有サーバ３のハードウェアおよびソフトウェアの構成例を示す図である。文書共有サーバ３は、クライアントコンピュータ２と同様に、ＣＰＵ等からなる制御部３０、記憶部３１、通信ネットワーク４に接続するためのネットワークインタフェース部３２、表示部３３、入力部３４、これらを接続するデータバス３５によって構成できる。記憶部３１は、半導体メモリ（ＲＡＭ）等の揮発性記憶装置、あるいはハードディスクやＳＳＤ等の読み書き可能な不揮発性記憶装置、光磁気メディア等の読み出し専用不揮発性記憶装置等により実現できる。表示部３３はＣＲＴディスプレイや液晶ディスプレイ等、入力部３４はキーボードやマウス等により実現できる。また、文書共有サーバ３は、表示部３３や入力部３４を省略した構成にすることもできる。また、文書共有サーバ３において、制御部３０が実行する文書共有サービスプログラム３１０や、文書共有サービスプログラム３１０が使用するデータについては、あらかじめ記憶部３１に格納されていてもよいし、他のコンピュータから通信ネットワーク４を介して導入されてもよいし、ＣＤ−ＲＯＭ等の記憶媒体から導入するようにしてもよい。また、文書共有サービスプログラム３１０の機能は、ＬＳＩ等のハードウェアにより実現してもよい。 FIG. 3 is a diagram illustrating a configuration example of hardware and software of the document sharing server 3 according to the first embodiment of the present invention. Similarly to the client computer 2, the document sharing server 3 connects a control unit 30 including a CPU and the like, a storage unit 31, a network interface unit 32 for connecting to the communication network 4, a display unit 33, and an input unit 34. A data bus 35 can be used. The storage unit 31 can be realized by a volatile storage device such as a semiconductor memory (RAM), a readable / writable nonvolatile storage device such as a hard disk or an SSD, or a read-only nonvolatile storage device such as a magneto-optical medium. The display unit 33 can be realized by a CRT display or a liquid crystal display, and the input unit 34 can be realized by a keyboard, a mouse, or the like. Further, the document sharing server 3 can be configured such that the display unit 33 and the input unit 34 are omitted. In the document sharing server 3, the document sharing service program 310 executed by the control unit 30 and the data used by the document sharing service program 310 may be stored in advance in the storage unit 31 or from other computers. It may be introduced via the communication network 4 or may be introduced from a storage medium such as a CD-ROM. Further, the function of the document sharing service program 310 may be realized by hardware such as LSI.

文書共有サービスプログラム３１０は、情報検索システムのユーザーに対して、電子化された文書やファイル（以下、単に文書と記す）３１１の格納や読み出しを行うためのインタフェースを提供することで、ユーザー間で文書の共有を可能にするプログラムである。文書共有サービスプログラム３１０は、例えば、ＮＦＳやＣＩＦＳ等を用いたファイル共有サービスプログラム、あるいはプロプライエタリな文書管理サービスプログラム、あるいは構造化されたデータを格納するデータベースプログラム等の形態をとることが可能である。情報検索システムのユーザーは、クライアントコンピュータ２から検索クライアントプログラム２１０や文書共有サービスプログラム３１０を介して文書共有サーバ３上に文書３１１を格納し、文書共有サービスプログラム３１０や検索クライアントプログラム２１０を介して文書３１１を参照することができる。 The document sharing service program 310 provides an interface for storing and reading an electronic document or file (hereinafter simply referred to as a document) 311 to users of the information search system. It is a program that enables document sharing. The document sharing service program 310 can take the form of, for example, a file sharing service program using NFS, CIFS, or the like, a proprietary document management service program, or a database program for storing structured data. . The user of the information search system stores the document 311 on the document sharing server 3 from the client computer 2 via the search client program 210 or the document sharing service program 310, and the document via the document sharing service program 310 or the search client program 210. 311 can be referred to.

図４は、本発明の実施例１に係る検索サーバ１のハードウェアおよびソフトウェアの構成例を示す図である。検索サーバ１は、クライアントコンピュータ２や文書共有サーバ３と同様に、ＣＰＵ等からなる制御部１０、記憶部１１、通信ネットワーク４に接続するためのネットワークインタフェース部１２、表示部１３、入力部１４、これらを接続するデータバス１５によって構成できる。記憶部１１は、半導体メモリ（ＲＡＭ）等の揮発性記憶装置、あるいはハードディスクやＳＳＤ等の読み書き可能な不揮発性記憶装置、光磁気メディア等の読み出し専用不揮発性記憶装置等により実現できる。表示部１３はＣＲＴディスプレイや液晶ディスプレイ等、入力部１４はキーボードやマウス等により実現できる。また、検索サーバ１は、表示部１３や入力部１４を省略した構成にすることもできる。また、検索サーバ１において、制御部１０が実行する検索サービスプログラム１１０、クローラープログラム１１１、検索エンジンプログラム１１２や、これらプログラムが使用するデータについては、あらかじめ記憶部１１に格納されていてもよいし、他のコンピュータから通信ネットワーク４を介して導入されてもよいし、ＣＤ−ＲＯＭ等の記憶媒体から導入するようにしてもよい。また、検索サービスプログラム１１０、クローラープログラム１１１、検索エンジンプログラム１１２の機能は、ＬＩＳ等のハードウェアにより実現してもよい。 FIG. 4 is a diagram illustrating a configuration example of hardware and software of the search server 1 according to the first embodiment of the present invention. Similar to the client computer 2 and the document sharing server 3, the search server 1 includes a control unit 10 including a CPU, a storage unit 11, a network interface unit 12 for connecting to the communication network 4, a display unit 13, an input unit 14, A data bus 15 connecting them can be used. The storage unit 11 can be realized by a volatile storage device such as a semiconductor memory (RAM), a readable / writable nonvolatile storage device such as a hard disk or an SSD, or a read-only nonvolatile storage device such as a magneto-optical medium. The display unit 13 can be realized by a CRT display or a liquid crystal display, and the input unit 14 can be realized by a keyboard, a mouse, or the like. Further, the search server 1 may be configured such that the display unit 13 and the input unit 14 are omitted. In the search server 1, the search service program 110, the crawler program 111, the search engine program 112 executed by the control unit 10, and data used by these programs may be stored in the storage unit 11 in advance. It may be introduced from another computer via the communication network 4 or may be introduced from a storage medium such as a CD-ROM. The functions of the search service program 110, the crawler program 111, and the search engine program 112 may be realized by hardware such as LIS.

検索サービスプログラム１１０は、検索クライアントプログラム２１０や他のプログラムに対して、情報の検索を行うためのインタフェースを提供するプログラムである。検索サービスプログラム１１０は、ユーザーがクライアントコンピュータ２の入力部２４を用いて入力した検索文を検索クライアントプログラム２１０を介して受け取り、当該検索文を基に検索クエリ（文書の検索要求）を生成して検索エンジンプログラム１１２に発行する。次いで、検索サービスプログラム１１０は、検索クエリに対する文書の全文検索結果として文書リストを検索エンジンプログラム１１２より受け取り、当該文書リストを基に、検索クライアントプログラム２１０に返すための最終的な検索結果を生成する。検索サービスプログラム１１０は、最終的な検索結果の生成に当たり、文書や文書の属性、文書名等に含まれ得る機密情報に対するユーザーのアクセス権限を踏まえてアクセス制御を実施し、適切な検索結果を生成し、これを検索クライアントプログラム２１０を介してユーザーに提示する。また、検索サービスプログラム１１０は、情報検索システムのユーザーに対して、文書内の機密情報の登録を行うためのインタフェースを検索クライアントプログラム２１０を介してユーザーに提供する。 The search service program 110 is a program that provides an interface for searching information to the search client program 210 and other programs. The search service program 110 receives a search sentence input by the user using the input unit 24 of the client computer 2 via the search client program 210, and generates a search query (document search request) based on the search sentence. Issued to the search engine program 112. Next, the search service program 110 receives a document list from the search engine program 112 as a full-text search result of the document for the search query, and generates a final search result to be returned to the search client program 210 based on the document list. . The search service program 110 performs access control based on the user's access authority to confidential information that can be included in documents, document attributes, document names, etc., when generating final search results, and generates appropriate search results. This is presented to the user via the search client program 210. Further, the search service program 110 provides the user of the information search system with an interface for registering confidential information in the document via the search client program 210.

クローラープログラム１１１は、文書共有サーバ３に格納された文書およびその付随情報を取得して、検索エンジンプログラム１１２が文書３１１の全文検索を行う際に必要とする情報を生成するためのプログラムである。クローラープログラム１１１は、定期的に文書共有サーバ３にアクセスし、文書共有サービスプログラム３１０を介して文書共有サーバ３上の文書３１１を取得する。また、クローラープログラム１１１は、検索エンジンプログラム１１２が文書３１１に対して全文検索を実施するために必要な文書３１１のインデックス１１３を作成する。また、クローラープログラム１１１は、文書共有サーバ３から、文書３１１と共に、文書に付与されているアクセス権等のセキュリティ属性情報を取得し、このセキュリティ属性情報を、検索サービスプログラム１１０や検索エンジンプログラム１１２が文書単位でのアクセス制御を実施する上で利用可能な状態で、検索サーバ１の記憶部１１に格納する。また、クローラープログラム１１１は、辞書を用いた形態素解析や機械学習による固有表現抽出等の公知技術を用いて、取得文書から特徴情報を抽出することができる。 The crawler program 111 is a program for acquiring a document stored in the document sharing server 3 and its accompanying information and generating information necessary for the search engine program 112 to perform a full-text search of the document 311. The crawler program 111 periodically accesses the document sharing server 3 and acquires the document 311 on the document sharing server 3 via the document sharing service program 310. In addition, the crawler program 111 creates the index 113 of the document 311 necessary for the search engine program 112 to perform a full text search on the document 311. Further, the crawler program 111 acquires the security attribute information such as the access right given to the document together with the document 311 from the document sharing server 3, and the search service program 110 and the search engine program 112 obtain this security attribute information. The information is stored in the storage unit 11 of the search server 1 in a state where it can be used for performing access control in document units. Further, the crawler program 111 can extract feature information from the acquired document by using a known technique such as morphological analysis using a dictionary or extraction of a specific expression by machine learning.

検索エンジンプログラム１１２は、ユーザーがクライアントコンピュータ２の入力部２４を用いて入力した検索文を用いて、文書の全文検索を行うためのプログラムである。検索エンジンプログラム１１２は、クローラープログラム１１１が作成したインデックス１１３を使用して、ユーザーが入力した検索文に適合する文書を見つけ出す。また、検索エンジンプログラム１１２は、検索サービスプログラム１１０から、検索文に基づき生成された検索クエリを受け取って全文検索を実施した後、検索文に含まれる単語に合致する索引語あるいは索引語の位置毎に、文書内での出現頻度や希少度合いに基づいた値を算出し、その索引語毎の算出値を用いて検索文に対する各文書の適合度をスコアとして算出し、スコア順に文書リストをソートして検索サービスプログラム１１０に返す。なお、前述したように、検索エンジンプログラム１１２が、文書単位でのアクセス制御を行い、機密情報を含む文書３１１を検索結果に列挙する一方、当該文書３１１の内容の閲覧を制御する方法では、ユーザーが入力した検索文中のキーワードに当該文書３１１がヒットすることから、ユーザーは当該文書３１１が当該キーワードを含む事実を把握でき、機密情報漏洩のリスクがある。そのため、本実施例１においては、文書３１１内の情報の機密度合いに応じて文書３１１のアクセス権を設定可能とし、検索エンジンプログラム１１２は、そのアクセス権情報を用いて適切な文書３１１を抽出およびスコアリングして文書リストを検索サービスプログラム１１０に返す構成とした。 The search engine program 112 is a program for performing a full-text search of a document using a search sentence input by the user using the input unit 24 of the client computer 2. The search engine program 112 uses the index 113 created by the crawler program 111 to find a document that matches the search sentence entered by the user. In addition, the search engine program 112 receives a search query generated based on the search sentence from the search service program 110 and performs a full-text search, and then searches each index word or index word position that matches a word included in the search sentence. In addition, a value based on the appearance frequency and rarity degree in the document is calculated, and the degree of suitability of each document with respect to the search sentence is calculated as a score using the calculated value for each index word, and the document list is sorted in the order of score. To the search service program 110. As described above, in the method in which the search engine program 112 performs access control in units of documents and enumerates the documents 311 including confidential information in the search result, while controlling the browsing of the contents of the documents 311, Since the document 311 hits the keyword in the search sentence entered by the user, the user can grasp the fact that the document 311 includes the keyword, and there is a risk of leakage of confidential information. Therefore, in the first embodiment, the access right of the document 311 can be set according to the confidentiality of the information in the document 311, and the search engine program 112 extracts and extracts an appropriate document 311 using the access right information. The document list is returned to the search service program 110 after scoring.

インデックス１１３は、一般的に、文書３１１に含まれる単語や文字、文字列の出現箇所を目録として表現したものを指す。特に情報検索分野においては、検索性能の向上のために、単語や文字、文字列毎に、それが出現する文書と文書内の位置を表現する転置インデックスが用いられる。文書３１１内の単語や文字、文字列の解析においては、例えば、形態素解析やＮ−ｇｒａｍ等が用いられる。また、前述した従来の文書３１１単位でのアクセス制御の実現方式においては、文書３１１毎のアクセス権情報をインデックスに関連付けて保持するものもある。 The index 113 generally refers to a list of the appearance locations of words, characters, and character strings included in the document 311. Particularly in the field of information retrieval, in order to improve retrieval performance, for each word, character, and character string, a transposed index that represents the document in which it appears and the position within the document is used. In analysis of words, characters, and character strings in the document 311, for example, morphological analysis, N-gram, or the like is used. Further, in the above-described conventional method for implementing access control in units of documents 311, there is a method in which access right information for each document 311 is stored in association with an index.

ユーザーグループテーブル１１４には、ユーザーを識別する情報と当該ユーザーが所属するグループとの関係を示す情報が格納されている。本実施例１においては、検索サービスプログラム１１０が、ユーザーをユーザーを識別する情報によって識別し、さらにユーザーグループテーブル１１４を用いてユーザーの所属グループを特定し、そのグループの情報を検索クエリと共に検索エンジンプログラム１１２に渡す。これにより、検索エンジンプログラム１１２が当該情報を用いて適切な文書を抽出することができる。また、本実施例１においては、検索サービスプログラム１１０が最終的な検索結果を生成する際に、ユーザーグループテーブル１１４と後述する機密情報管理テーブル１１５を参照し、文書名等に含まれる機密情報に対して、グループ単位でのアクセス制御を実施する。これにより、システム管理者はユーザー毎にアクセス権を設定および管理する手間を削減することができる。なお、ユーザーグループテーブル１１４は、図示していない認証サーバやディレクトリサーバ等の外部システムに格納されていてもよい。 The user group table 114 stores information indicating the relationship between information for identifying a user and a group to which the user belongs. In the first embodiment, the search service program 110 identifies a user by information for identifying the user, further identifies the group to which the user belongs using the user group table 114, and searches the group information together with the search query as a search engine. Pass to program 112. Thereby, the search engine program 112 can extract an appropriate document using the information. In the first embodiment, when the search service program 110 generates a final search result, the user group table 114 and a confidential information management table 115 to be described later are referred to, and the confidential information included in the document name or the like is referred to. On the other hand, access control is performed in units of groups. Thereby, the system administrator can reduce the trouble of setting and managing the access right for each user. Note that the user group table 114 may be stored in an external system such as an authentication server or directory server not shown.

機密情報管理テーブル１１５には、文書内の機密情報と当該機密情報のアクセス権に関する情報が含まれる。機密情報管理テーブル１１５の情報は、ユーザーにより入力された文書内の機密情報およびアクセス権情報が検索クライアントプログラム２１０と検索サービスプログラム１１０を介して機密情報管理テーブル１１５に登録される。また、検索サービスプログラム１１０は、機密情報とアクセス権を機密情報管理テーブル１１５に設定するだけでなく、インデックス１１３にも設定する。検索エンジンプログラム１１２は、このインデクッス１１３を用いることにより、ユーザーのアクセス権に応じた適切な文書を抽出することができる。また、検索サービスプログラム１１０は、最終的な検索結果を生成する際に、機密情報管理テーブル１１５とユーザーグループテーブル１１４を参照し、文書名等に含まれる機密情報に対してグループ単位でのアクセス制御を実施する。 The confidential information management table 115 includes confidential information in the document and information regarding the access right of the confidential information. As the information in the confidential information management table 115, confidential information and access right information in the document input by the user are registered in the confidential information management table 115 via the search client program 210 and the search service program 110. The search service program 110 sets not only the confidential information and the access right in the confidential information management table 115 but also the index 113. The search engine program 112 can extract an appropriate document according to the user's access right by using the index 113. In addition, the search service program 110 refers to the confidential information management table 115 and the user group table 114 when generating a final search result, and controls access to the confidential information included in the document name in groups. To implement.

なお、本実施例１においては、検索サービスプログラム１１０やクローラープログラム１１１、検索エンジンプログラム１１２が同一の検索サーバ１内で動作するものとして以降の説明を行うが、本構成に限定するものではない。これらは異なる計算機上で動作して相互に通信ネットワークを介して協調動作してもよい。 In the first embodiment, the following description will be made assuming that the search service program 110, the crawler program 111, and the search engine program 112 operate in the same search server 1. However, the present invention is not limited to this configuration. These may operate on different computers and cooperate with each other via a communication network.

図５は、本発明の実施例１に係る機密情報管理テーブル１１５の構成例を示す図である。文書ＩＤ４０１は、情報検索システムにおいて各文書を一意に識別するための識別情報である。フィールド４０２は、文書の構造を識別するための識別情報であり、例えば「タイトル」や「本文」、「属性」等である。機密情報４０３は、文書保持者や管理者等によって登録される文書内の機密情報であり、例えばある特定の人名や組織名、場所名、あるいは、ある事実を含む文章等である。同じ文書内あるいはフィールド内に複数の機密情報が含まれる場合は、一つの文書４０１あるいはフィールド４０２に対して複数の機密情報４０３が対応する。位置４０４は、機密情報４０３の文書内の位置を示す情報であり、同じ機密情報が文書３１１内に複数存在する場合は、一つの機密情報４０３に対して複数の位置情報４０４が対応する。アクセス権４０５は、文書３１１内の個々の機密情報に対するアクセス権情報である。本実施例１では、アクセス権４０５は、当該機密情報を検索可能なユーザーのグループ情報である。なお、アクセス権４０５は、ユーザー識別情報や、機密情報の検索の可否や文書の閲覧可否等を示す情報であってもよい。 FIG. 5 is a diagram illustrating a configuration example of the confidential information management table 115 according to the first embodiment of the present invention. The document ID 401 is identification information for uniquely identifying each document in the information search system. A field 402 is identification information for identifying the structure of the document, such as “title”, “text”, and “attribute”. The confidential information 403 is confidential information in a document registered by a document holder, an administrator, or the like, for example, a specific person name, organization name, place name, or a sentence including a certain fact. When a plurality of confidential information is included in the same document or field, a plurality of confidential information 403 corresponds to one document 401 or field 402. The position 404 is information indicating the position of the confidential information 403 in the document. When there are a plurality of the same confidential information in the document 311, the plurality of positional information 404 corresponds to one confidential information 403. The access right 405 is access right information for individual confidential information in the document 311. In the first embodiment, the access right 405 is group information of users who can search for the confidential information. The access right 405 may be information indicating user identification information, whether confidential information can be searched, whether a document can be viewed, and the like.

図６は、本発明の実施例１に係るインデックス１１３の構成例を示す図であり、特に文書３１１の本文に含まれる単語を索引語とする転置インデックスの概念例を示す。同様に、文書名等の本文以外の部分に対するインデックスも存在するが、同様の構成であるため図を省略する。また、文書の本文や文書名等を一つのインデックスとする構成であってもよい。このような転置インデックスにおいては、一般的に、単語（索引語）５０１に対して、当該単語が含まれる文書情報５０２と文書３１１内の位置情報５０３を保持する。なお、図６はリスト構造の組み合わせとなっているが、テーブルあるいは複数のテーブルの組み合わせの形態であってもよい。また、本実施例１においては、文書３１１内の個々の単語に対してアクセス制御を実現するために、個々の単語の文書３１１内の位置に対応してアクセス権情報（アクセス許可グループ情報）が関連付けされる。例えば、図６に示す文書Ｘでは、文書Ｘ内の個々の単語（ここでは人名ｘと単語ｙ）毎にアクセス許可グループ情報５０４および５０５が関連付けされており、単語毎にその内容が異なっている。 FIG. 6 is a diagram illustrating a configuration example of the index 113 according to the first embodiment of the present invention, and particularly illustrates a conceptual example of an inverted index in which words included in the body of the document 311 are index words. Similarly, there is an index for a part other than the body text such as a document name. Moreover, the structure which uses the text of a document, a document name, etc. as one index may be sufficient. In such an inverted index, generally, for a word (index word) 501, document information 502 including the word and position information 503 in the document 311 are held. Although FIG. 6 shows a list structure combination, it may be a table or a combination of a plurality of tables. In the first embodiment, in order to realize access control for individual words in the document 311, access right information (access permission group information) is associated with the position of each word in the document 311. Associated. For example, in the document X shown in FIG. 6, the access permission group information 504 and 505 are associated with each word (in this case, the person name x and the word y) in the document X, and the contents are different for each word. .

図７は、本発明の実施例１に係るユーザーグループテーブル１１４の一例を示す図である。グループ６０１は、情報検索システムにおいて、ユーザーが所属するグループを一意に識別するための情報（以下、グループ情報と記す）である。また、ユーザー６０２は、情報検索システムにおいて、ユーザーを一意に識別するための情報（以下、ユーザー識別情報と記す）である。 FIG. 7 is a diagram showing an example of the user group table 114 according to the first embodiment of the present invention. The group 601 is information (hereinafter referred to as group information) for uniquely identifying the group to which the user belongs in the information search system. The user 602 is information (hereinafter referred to as user identification information) for uniquely identifying the user in the information search system.

図８は、本発明の実施例１に係る文書３１１内の機密情報を登録するインタフェース（機密情報登録画面）の一例を示す図である。画面７０１は、文書名の表示部位７０２、本文の表示部位７０３、ユーザーあるいはプログラムが指定した機密情報一覧の表示部位７０４、アクセス許可グループの選択部位７０５を含む。 FIG. 8 is a diagram illustrating an example of an interface (confidential information registration screen) for registering confidential information in the document 311 according to the first embodiment of the present invention. The screen 701 includes a document name display part 702, a text display part 703, a confidential information list display part 704 specified by the user or program, and an access permission group selection part 705.

図９は、本発明の実施例１に係る文書内の機密情報登録手順の一例を示すフローチャートである。 FIG. 9 is a flowchart showing an example of the confidential information registration procedure in the document according to the first embodiment of the present invention.

まずユーザーは、図８に示すようなインタフェースを介して文書３１１内の機密情報に対してアクセス権を設定する（ステップ１００１）。機密情報は、ユーザーが入力部２４を用いて文書名の表示部位７０２や本文の表示部位７０３で指定してもよいし、クローラープログラム１１１等のプログラムが事前に辞書等を用いて抽出および設定してもよい。あるいは、学習済みの情報を基に文書から抽出し、検索クライアントプログラム２１０を介して、ユーザーに機密情報の候補（機密情報一覧）７０４を提示し、ユーザが入力部２４を用いてその機密情報一覧７０４から機密情報を指定するようにしてもよい。機密情報を指定後、ユーザーは、入力部２４を用いてその機密情報に対してアクセス権７０５を設定し、ＯＫボタン７０６を押下する操作を行うと、検索サービスプログラム１１０は、文書３１１中の機密情報と機密情報に対するアクセス権情報を取得し（ステップ１００１）、これを機密情報管理テーブル１１５に登録する（ステップ１００２）。 First, the user sets an access right for confidential information in the document 311 via an interface as shown in FIG. 8 (step 1001). The confidential information may be specified by the user using the input unit 24 in the document name display part 702 or the text display part 703, or a program such as the crawler program 111 may be extracted and set in advance using a dictionary or the like. May be. Alternatively, the information is extracted from the document based on the learned information, and the candidate of confidential information (confidential information list) 704 is presented to the user via the search client program 210, and the user uses the input unit 24 to list the confidential information. The confidential information may be designated from 704. After specifying the confidential information, the user sets the access right 705 for the confidential information using the input unit 24 and performs an operation of pressing the OK button 706, so that the search service program 110 stores the confidential information in the document 311. The access right information for the information and the confidential information is acquired (step 1001) and registered in the confidential information management table 115 (step 1002).

つぎに検索サービスプログラム１１０は、ユーザーがステップ１００１において設定した内容を基に、図６で一例を示したような形態で、インデックス１１３にアクセス権情報を登録する（ステップ１００３）。 Next, the search service program 110 registers access right information in the index 113 in the form as shown in FIG. 6 based on the contents set by the user in step 1001 (step 1003).

図１０は、本発明の実施例１に係る検索の全体処理手順の一例を示すフローチャートである。 FIG. 10 is a flowchart illustrating an example of the entire search processing procedure according to the first embodiment of the invention.

まず検索サービスプログラム１１０は、ユーザーがクライアントコンピュータ２の入力部２４を用いて入力した検索文およびユーザー識別情報を検索クライアントプログラム２１０を介して受け取る（ステップ１１０１）。なお、ユーザー識別情報は、検索文の入力前に入力するようにしてもよい。 First, the search service program 110 receives a search sentence and user identification information input by the user using the input unit 24 of the client computer 2 via the search client program 210 (step 1101). Note that the user identification information may be input before inputting the search text.

つぎに検索サービスプログラム１１０は、ユーザー識別情報を基にユーザーグループテーブル１１４を参照し、該当グループ情報を取得し、ステップ１１０１において受け取った検索文と、取得したユーザーのグループ情報を検索エンジンプログラム１１２に渡し、検索エンジンプログラムは図１１で示す処理手順に従って検索およびアクセス制御を実施する（ステップ１１０２）。 Next, the search service program 110 refers to the user group table 114 based on the user identification information, acquires the corresponding group information, and sends the search statement received in step 1101 and the acquired user group information to the search engine program 112. Then, the search engine program performs search and access control according to the processing procedure shown in FIG. 11 (step 1102).

つぎに検索サービスプログラム１１０は、検索エンジンプログラム１１２から全文検索結果である文書リストを受け取る（ステップ１１０３）。ここで、この文書リストに記載される文書名には、機密情報が残っている可能性があるため、これをステップ１１０４以降で対処する。 Next, the search service program 110 receives a document list as a full-text search result from the search engine program 112 (step 1103). Here, since there is a possibility that confidential information remains in the document names described in the document list, this is dealt with in step 1104 and the subsequent steps.

つぎに検索サービスプログラム１１０は、機密情報管理テーブル１１５およびユーザーグループテーブル１１４を参照し（ステップ１１０４）、検索文に含まれる単語（キーワード）が、文書リストに列挙される文書名に含まれており、且つ当該ユーザーが所属するユーザーグループにアクセス許可が与えられているかどうかを、文書リストに含まれる文書毎に確認する（ステップ１１０５）。 Next, the search service program 110 refers to the confidential information management table 115 and the user group table 114 (step 1104), and the words (keywords) included in the search sentence are included in the document names listed in the document list. In addition, it is confirmed for each document included in the document list whether or not the access permission is given to the user group to which the user belongs (step 1105).

ステップ１１０５においてアクセス許可が与えられていないと判定した場合、つぎに検索サービスプログラム１１０は、当該文書名に含まれる機密情報を削除あるいは伏字等に置換する（ステップ１１０６）。 If it is determined in step 1105 that access permission is not granted, the search service program 110 then deletes confidential information included in the document name or replaces it with prone characters (step 1106).

つぎに検索サービスプログラム１１０は、検索結果を検索クライアントプログラム２１０を介してユーザーに提示し（ステップ１１０７）、一連の検索処理を終了する。 Next, the search service program 110 presents the search results to the user via the search client program 210 (step 1107), and the series of search processing ends.

つぎにユーザーが検索結果中の文書を選択した場合（ステップ１１０８）、検索サービスプログラム１１０は、文書単位でのアクセス制御を実施する（ステップ１１０９）。ユーザーは、その結果に基づいて、文書内容の閲覧を許可あるいは禁止される。 Next, when the user selects a document in the search result (step 1108), the search service program 110 performs access control in units of documents (step 1109). Based on the result, the user is permitted or prohibited to view the document contents.

また、ステップ１１０５においてアクセス許可が与えられていると判定した場合は、検索サービスプログラム１１０は、ステップ１１０７に進む。 If it is determined in step 1105 that access permission has been granted, the search service program 110 proceeds to step 1107.

図１１は、本発明の実施例１に係る検索エンジンプログラム１１２による検索の処理手順の一例を示すフローチャートである。 FIG. 11 is a flowchart illustrating an example of a search processing procedure performed by the search engine program 112 according to the first embodiment of the present invention.

まず検索エンジンプログラム１１２は、検索サービスプログラム１１０から、ユーザーが入力した検索文および当該ユーザが所属するグループ情報を受け取る（ステップ１２０１）。 First, the search engine program 112 receives from the search service program 110 the search text entered by the user and the group information to which the user belongs (step 1201).

つぎに検索エンジンプログラム１１２は、検索文に含まれる単語を抽出する（ステップ１２０２）。単語の抽出については公知の技術を用いる。例えば日本語であれば形態素解析等を用いることができる。また、検索文を空白等の区切りで分割して単語を抽出してもよい。 Next, the search engine program 112 extracts words included in the search sentence (step 1202). A known technique is used for word extraction. For example, morphological analysis can be used for Japanese. In addition, the search sentence may be divided by spaces or the like to extract words.

つぎに検索エンジンプログラム１１２は、インデックス１１３を参照し、ステップ１２０２において抽出した単語に合致する索引語５０１を含む文書３１１を一件選択し（ステップ１２０３）、その選択した文書中の当該単語の個々の位置毎にアクセス許可グループ情報５０４を確認する（ステップ１２０４）。 Next, the search engine program 112 refers to the index 113, selects one document 311 including the index word 501 that matches the word extracted in step 1202 (step 1203), and selects each of the words in the selected document. The access permission group information 504 is confirmed for each position (step 1204).

つぎに検索エンジンプログラム１１２は、ステップ１２０４において確認したアクセス許可グループ情報と、ステップ１２０１で受け取ったユーザーのグループ情報を比較することで、当該ユーザーがステップ１２０３で選択した文書３１１に含まれる索引語に対してアクセス権を持つかどうかを判定する（ステップ１２０５）。図６および図８で示した通り、インデックス１１３で索引語の位置を管理しており、同じ索引語が文書内の複数の位置に含まれる場合には、検索エンジンプログラム１１２は、個々の索引語の文書３１１内の位置に関連付けられたアクセス許可グループ情報５０４を用いて、ユーザーのアクセス権の有無を判定する。 Next, the search engine program 112 compares the access permission group information confirmed in Step 1204 with the group information of the user received in Step 1201, so that the index word included in the document 311 selected in Step 1203 by the user is obtained. It is determined whether or not the user has an access right (step 1205). As shown in FIG. 6 and FIG. 8, when the index 113 manages the position of the index word and the same index word is included in a plurality of positions in the document, the search engine program 112 reads each index word. The access permission group information 504 associated with the position in the document 311 is used to determine whether the user has the access right.

ステップ１２０５において、ユーザーが当該単語に合致する文書内の索引語の何れかに対してアクセス権があると判定した場合、つぎに検索エンジンプログラム１１２は、当該文書を検索サービスプログラム１１０に渡すための文書リストに追加する（ステップ１２０６）。 If it is determined in step 1205 that the user has access rights to any of the index words in the document that match the word, the search engine program 112 then passes the document to the search service program 110. It adds to the document list (step 1206).

つぎに検索エンジンプログラム１１２は、当該単語に対する当該文書のスコアを算出する（ステップ１２０７）。スコアの算出式は公知のものやプロプライエタリなものを用いて行い、また単語単位ではなく検索文全体に対してステップ１２１１に示すスコアの算出を行ってもよい。スコアの算出方法としては様々なものがあるが、例えば、公知の非特許文献「ＡｐａｃｈｅＬｕｃｅｎｅ − Ｓｃｏｒｉｎｇ、インターネットＵＲＬ：ｈｔｔｐ：／／ｌｕｃｅｎｅ．ａｐａｃｈｅ．ｏｒｇ／ｊａｖａ／２＿４＿０／ｓｃｏｒｉｎｇ．ｈｔｍｌ」に記載の方法を用いることができる。この方法では、ユーザーが入力した検索文中に含まれる単語に合致する索引語毎に、文書内での出現頻度や希少度合いに基づいた値を算出し、その算出した値を基に検索文に対する各文書のスコアを算出する。また、同じ索引語が文書内に複数含まれる場合には、文書内の索引語のうちユーザーがアクセス権を持つと判定されたもののみを踏まえてスコアを算出する。このように、本実施例１では、検索文に対する文書のスコアを算出するにあたり、当該検索文中に含まれる単語に合致する索引語のうち、ユーザーがアクセス権を持つ索引語だけを用いてスコアを算出し、その算出したスコアが高い文書ほど、当該検索文に対して適合度合いが高いと判断する。 Next, the search engine program 112 calculates the score of the document for the word (step 1207). The score calculation formula may be a known formula or a proprietary formula, and the score shown in step 1211 may be calculated for the entire search sentence instead of word units. There are various methods for calculating the score. For example, the score is described in a known non-patent document “Apache Lucene-Scoring, Internet URL: http://lucene.apache.org/java/2_4_0/scoring.html”. The method can be used. In this method, for each index word that matches a word included in the search sentence entered by the user, a value based on the appearance frequency or rarity degree in the document is calculated, and each value for the search sentence is calculated based on the calculated value. Calculate the score of the document. When a plurality of the same index words are included in the document, the score is calculated based on only the index words in the document that are determined to have the access right by the user. As described above, in the first embodiment, when calculating the score of the document for the search sentence, the index is calculated using only the index words that the user has access rights among the index words that match the words included in the search sentence. It is determined that the higher the calculated score, the higher the degree of matching with the search sentence.

つぎに検索エンジンプログラム１１２は、インデックス１１３を参照し、当該単語に合致する索引語を含む文書が他にまだあるかどうかを確認し（ステップ１２０８）、ある場合はステップ１２０３からステップ１２０７の処理を次の文書に対して行う。 Next, the search engine program 112 refers to the index 113 to check whether there are any other documents that contain the index word that matches the word (step 1208). If there are, then the processing from step 1203 to step 1207 is performed. Do this for the next document.

一方、ステップ１２０８において、最後の文書である場合、検索エンジンプログラム１１２は、検索文を参照し、検索文の中に他の単語がまだ含まれるかどうかを確認し（ステップ１２０９）、含まれる場合はステップ１２０２からステップ１２０８の各処理を次の単語に対して行う。 On the other hand, if it is the last document in step 1208, the search engine program 112 refers to the search sentence and checks whether another word is still included in the search sentence (step 1209). Performs the processing from step 1202 to step 1208 on the next word.

一方、ステップ１２０９において、最後の単語である場合、検索エンジンプログラム１１２は、検索文中の個々の単語に対して得られた文書リストを、ユーザーにより指定された検索論理式（検索条件）に応じてマージする（ステップ１２１０）。検索エンジンプログラム１１２は、例えば、ＡＮＤ検索の場合は個々のリストの共通文書リストを、ＯＲ検索の場合は個々のリストの論理和を生成する。 On the other hand, if it is the last word in step 1209, the search engine program 112 selects the document list obtained for each word in the search sentence according to the search logical expression (search condition) specified by the user. Merge (step 1210). For example, the search engine program 112 generates a common document list of individual lists in the case of AND search, and generates a logical sum of the individual lists in the case of OR search.

つぎに検索エンジンプログラム１１２は、文書リストに含まれる文書毎に、検索文に対するスコアを算出し、そのスコアが高い順（適合度が高い順）に並び替え、その結果である文書リストを検索サービスプログラム１１０に返す（ステップ１２１１）。 Next, the search engine program 112 calculates scores for the search sentences for each document included in the document list, rearranges the scores in descending order (in descending order of suitability), and retrieves the resulting document list as a search service. It returns to the program 110 (step 1211).

また、ステップ１２０５において、検索エンジンプログラム１１２は、ユーザーが当該単語に合致する文書内の索引語の何れかに対してアクセス権がないと判定した場合は、ステップ１２０８に進む。 In step 1205, if the search engine program 112 determines that the user does not have access rights to any of the index words in the document that match the word, the process proceeds to step 1208.

以上、本発明の実施例１について説明した。実施例１によると、文書保持者や管理者が文書内の機密情報やプライバシー情報に対するアクセス権情報を事前に登録し、これを索引語に関連付け、同じ文書に対して索引語や索引語の位置毎に異なるアクセス権を管理する。さらに、検索エンジンプログラム１１２が検索文中の単語に合致する索引語に対して、検索文を入力したユーザーのアクセス権を確認し、適合する文書リストを作成してスコアリングを行う。これにより、ユーザーは検索文や論理条件式の変更に対する検索結果の順位変動等から、文書と当該文書内の機密情報の関連性を推測困難となり、結果として文書内の機密情報を踏まえた適切な検索結果を生成可能である。 The first embodiment of the present invention has been described above. According to the first embodiment, a document holder or administrator registers access right information for confidential information or privacy information in a document in advance, associates the access right information with an index word, and positions of index words and index words for the same document. Manage different access rights for each. Further, the search engine program 112 confirms the access right of the user who has input the search sentence for the index word matching the word in the search sentence, creates a matching document list, and performs scoring. This makes it difficult for the user to guess the relevance between the document and the confidential information in the document based on changes in the order of the search results in response to changes in the search statement or logical conditional expression. As a result, the appropriate information based on the confidential information in the document is obtained. Search results can be generated.

本発明を適用する情報検索システムにおいて、他の実施例について説明する。以下、特に説明の無い箇所は実施例１と同じものとする。 Another embodiment of the information search system to which the present invention is applied will be described. In the following, parts not specifically described are the same as those in the first embodiment.

本実施例２は、検索サービスプログラム１１０が、検索エンジンプログラム１１２より受け取った文書リスト中の各文書に対して、ユーザーの権限と当該文書に含まれる機密情報に応じて、当該文書の削除あるいは順位変更を行って最終的な検索結果を生成するという点で実施例１とは異なる。すなわち、実施例２では、索引語単位でのアクセス権チェックを必要としない。 In the second embodiment, the search service program 110 deletes or ranks each document in the document list received from the search engine program 112 according to the authority of the user and the confidential information included in the document. This is different from the first embodiment in that a final search result is generated by making a change. That is, in the second embodiment, an access right check for each index word is not required.

図１２は、本発明の実施例２に係るインデックス１１３の構成例を示すた図である。図１２に示すように、例えば、図１２に示す文書Ｘに対して、文書内に含まれる索引語に関わらず一つのアクセス許可グループ情報５０６のみが関連付けられている。したがって、本実施例２では、検索エンジンプログラム１１２は、文書単位でのアクセス制御のみを実施する。 FIG. 12 is a diagram illustrating a configuration example of the index 113 according to the second embodiment of the present invention. As illustrated in FIG. 12, for example, only one access permission group information 506 is associated with the document X illustrated in FIG. 12 regardless of the index word included in the document. Therefore, in the second embodiment, the search engine program 112 performs only access control in document units.

図１３は、本発明の実施例２に係る検索サービスプログラム１１０による検索処理手順の一例を示すフローチャートである。 FIG. 13 is a flowchart illustrating an example of a search processing procedure by the search service program 110 according to the second embodiment of the present invention.

まず検索サービスプログラム１１０は、ユーザーがクライアントコンピュータ２の入力部２４を用いて入力した検索文およびユーザー識別情報を検索クライアントプログラム２１０を介して受け取る（ステップ１３０１）。 First, the search service program 110 receives a search sentence and user identification information input by the user using the input unit 24 of the client computer 2 via the search client program 210 (step 1301).

つぎに検索サービスプログラム１１０は、検索文およびグループ情報を検索エンジンプログラム１１２に渡し、検索エンジンプログラム１１２は、前述した方法および手順で検索およびスコアリングを行い、文書リストを検索サービスプログラム１１０に返す（ステップ１３０２）。 Next, the search service program 110 passes the search text and group information to the search engine program 112, and the search engine program 112 performs search and scoring by the method and procedure described above, and returns a document list to the search service program 110 ( Step 1302).

つぎに検索サービスプログラム１１０は、検索エンジンプログラム１１２より受け取った文書リストから一件の文書を選択する（ステップ１３０３）。 Next, the search service program 110 selects one document from the document list received from the search engine program 112 (step 1303).

つぎに検索サービスプログラム１１０は、検索文に含まれる単語が一つかどうかを判定し（ステップ１３０４）、一つであればステップ１３０５に、複数であればステップ１３０９に進む。 Next, the search service program 110 determines whether or not there is one word included in the search sentence (step 1304), and if there is one, the process proceeds to step 1305, and if there is a plurality, the process proceeds to step 1309.

ステップ１３０４において、検索文に含まれる単語が一つと判定した場合、検索サービスプログラム１１０は、機密情報管理テーブル１１５およびユーザーグループテーブル１１４を参照し、検索単語が文書内の機密情報に合致し、且つ当該機密情報が複数の位置にある場合はその何れかに対してユーザーがアクセス権を持つかどうかを判定する（ステップ１３０５）。判定の結果、検索サービスプログラム１１０は、アクセス権を持つ場合は文書リストに他の文書がまだあるかどうかを確認し（ステップ１３０６）、アクセス権を持たない場合は処理中の文書を文書リストから削除する処理を行う（ステップ１３０８）。 If it is determined in step 1304 that the search sentence includes one word, the search service program 110 refers to the confidential information management table 115 and the user group table 114, the search word matches the confidential information in the document, and If the confidential information is in a plurality of positions, it is determined whether or not the user has access to any of them (step 1305). As a result of the determination, the search service program 110 checks whether there is another document in the document list if it has access right (step 1306). If it does not have access right, the search service program 110 checks the document being processed from the document list. Processing for deletion is performed (step 1308).

ステップ１３０６において、検索サービスプログラム１１０は、文書リストに他の文書が無ければステップ１３０７に進み、他の文書が有ればステップ１３０３からステップ１３０６までの各処理を次の文書について実施する。 In step 1306, if there is no other document in the document list, the search service program 110 proceeds to step 1307, and if there is another document, performs the processing from step 1303 to step 1306 for the next document.

また、ステップ１３０６において文書リストに他の文書が無いと判定した場合、検索サービスプログラム１１０は、ステップ１３０３からステップ１３０６において作成した検索結果を検索クライアントプログラム２１０を介してユーザーに提示する（ステップ１３０７）。 If it is determined in step 1306 that there are no other documents in the document list, the search service program 110 presents the search results created in steps 1303 to 1306 to the user via the search client program 210 (step 1307). .

また、ステップ１３０４において検索文に含まれる単語が複数あると判定した場合検索サービスプログラム１１０は、ユーザーが検索文に含めた複数の検索単語に対してＡＮＤ検索を行ったのか、ＯＲ検索を行ったのかを、ユーザーにより指定された検索論理式から判定する（ステップ１３０９）。検索サービスプログラム１１０は、ＡＮＤ検索であればステップ１３１０に、ＯＲ検索であればステップ１３１１に進む。 If it is determined in step 1304 that there are a plurality of words included in the search sentence, the search service program 110 has performed an OR search to determine whether the user has performed an AND search on the plurality of search words included in the search sentence. Is determined from the search logical expression designated by the user (step 1309). The search service program 110 proceeds to step 1310 if it is an AND search, and to step 1311 if it is an OR search.

ステップ１３０９においてＡＮＤ検索が行われたと判定した場合、検索サービスプログラム１１０は、機密情報管理テーブル１１５およびユーザーグループテーブル１１４を参照し、検索文中に含まれる複数の検索単語の何れかが文書内の機密情報に合致し、且つ当該機密情報が複数の位置にある場合はその何れかに対してユーザーがアクセス権を持つかどうかを判定する（ステップ１３１０）。判定の結果、検索サービスプログラム１１０は、アクセス権を持つ場合はステップ１３０６に進み、アクセス権を持たない場合は処理中の文書を文書リストから削除する（ステップ１３０８）。 If it is determined in step 1309 that an AND search has been performed, the search service program 110 refers to the confidential information management table 115 and the user group table 114, and any one of a plurality of search words included in the search sentence is classified in the document. If it matches the information and the confidential information is in a plurality of positions, it is determined whether or not the user has an access right for any of them (step 1310). As a result of the determination, the search service program 110 proceeds to step 1306 if it has access right, and deletes the document being processed from the document list if it does not have access right (step 1308).

また、ステップ１３０９においてＯＲ検索が行われたと判定した場合、検索サービスプログラム１１０は、機密情報管理テーブル１１５およびユーザーグループテーブル１１４を参照し、検索文中に含まれる複数の検索単語の何れかが文書内の機密情報に合致するかどうかを判定する（ステップ１３１１）。判定の結果、検索サービスプログラム１１０は、合致する場合は処理中の文書の順位を文書リスト内において降下させ（ステップ１３１２）、検索単語中の何れも文書内で機密指定された情報に合致しない場合はステップ１３０６に進む。 If it is determined in step 1309 that an OR search has been performed, the search service program 110 refers to the confidential information management table 115 and the user group table 114, and any one of a plurality of search words included in the search sentence is included in the document. It is determined whether or not it matches the confidential information (step 1311). As a result of the determination, if there is a match, the search service program 110 lowers the rank of the document being processed in the document list (step 1312), and none of the search words matches the information designated as confidential in the document. Proceeds to step 1306.

ステップ１３１２において、降下量は、検索サービスプログラム１１０が、前述したスコア算出方法により、スコアを再計算し、再計算されたスコアを基に文書リスト内の所定のスコア位置まで下げる。再計算に当たっては、文書内で機密指定された情報のうち、ユーザーがアクセス権を持つものだけを考慮して実施する。なお、降下量については、文書保有者や管理者が事前に定数として定めてもよい。 In step 1312, the retrieval service program 110 recalculates the score by the above-described score calculation method, and lowers it to a predetermined score position in the document list based on the recalculated score. The recalculation is performed considering only the information that the user has access to among the information designated confidentially in the document. Note that the amount of descent may be determined as a constant in advance by the document owner or administrator.

ステップ１３１２の処理後、検索サービスプログラム１１０は、検索文中に含まれる検索単語の何れかが処理中の文書名において機密登録された情報に合致するかどうかを判定し、合致する場合はタイトル中の当該情報箇所を削除あるいは伏字で置き換える等の処理を行う（ステップ１３１３）。 After the processing in step 1312, the search service program 110 determines whether any of the search words included in the search sentence matches the confidentially registered information in the document name being processed. Processing such as deleting the information portion or replacing it with an abbreviation is performed (step 1313).

以上、本発明の実施例２について説明した。実施例２によると、文書保持者や管理者が文書内の機密情報に対するアクセス権情報を事前に登録し、該登録内容とユーザーが入力した検索文の検索キーワード数や論理条件式を基に、検索サービスプログラムが検索エンジンによる検索およびスコアリング結果に含まれる文書の削除や文書の順位変更を行うことで、文書内の機密情報やプライバシー情報を踏まえた適切な検索結果を生成可能である。 The second embodiment of the present invention has been described above. According to the second embodiment, a document holder or administrator registers access right information for confidential information in a document in advance, and based on the registered contents and the number of search keywords of a search sentence input by the user and a logical conditional expression, When the search service program deletes the documents included in the search and scoring results by the search engine and changes the order of the documents, an appropriate search result based on confidential information and privacy information in the documents can be generated.

以下、本発明を適用する情報検索システムの他の実施例について説明する。以下、特に説明の無い箇所は実施例1あるいは実施例２と同じとする。 Hereinafter, another embodiment of the information search system to which the present invention is applied will be described. In the following, parts not specifically described are the same as those in the first or second embodiment.

本実施例３は、検索サービスプログラム１１０が、文書保有者あるいは管理者が機密情報を登録した文書に対して、元文書の複製を内部的に生成し、これを元文書と関連付けて管理し、複製文書については機密登録箇所を索引対象外とした上で双方の文書をインデックスに登録する点で、実施例1および実施例２とは異なる。 In the third embodiment, the search service program 110 internally generates a copy of an original document for a document in which confidential information is registered by a document owner or administrator, manages this in association with the original document, The duplicated document is different from the first and second embodiments in that both documents are registered in the index after the confidential registration location is excluded from the index target.

図１４は、本発明の実施例３に係る検索サーバ１のハードウェアおよびソフトウェアの構成例を示している。 FIG. 14 shows a hardware and software configuration example of the search server 1 according to the third embodiment of the present invention.

文書関係テーブル１１６には文書間の関係情報が含まれる。本実施例３においては、ユーザーがある文書内の情報を機密登録する場合に、検索サービスプログラム１１０が当該文書の複製文書を内部で作成し、その複製文書に識別情報を付与する。また、文書関係テーブル１１６には、新たに作成した複製文書の識別情報と元文書の識別情報とを関連付けて記憶する。 The document relationship table 116 includes relationship information between documents. In the third embodiment, when information in a document is confidentially registered, the search service program 110 internally creates a duplicate document of the document and gives identification information to the duplicate document. Further, the document relation table 116 stores the newly created duplicate document identification information and the original document identification information in association with each other.

図１５は、本発明の実施例３に係る文書関係テーブルの一例を示す図である。文書８０１は、情報検索システムにおいて文書を一意に識別するための情報である。元文書８０２は、文書８０１がユーザーが機密情報を指定した際に新たに生成される元文書の複製文書である場合に、当該複製文書の元文書を一意に識別するための情報を格納する。 FIG. 15 is a diagram illustrating an example of a document relationship table according to the third embodiment of the present invention. The document 801 is information for uniquely identifying a document in the information search system. The original document 802 stores information for uniquely identifying the original document of the duplicate document when the document 801 is a duplicate document of the original document newly generated when the user specifies confidential information.

図１６は、本発明の実施例３に係る文書の複製を生成する概念の一例を示す図である。文書保有者あるいは管理者は同じ文書内の複数の機密情報に個別にアクセス許可グループを設定できる。例えば、図１６に示す通り、文書Ｚ内の機密情報に対してそれぞれアクセス許可グループを設定すると、機密情報管理テーブル１１５において９０１に示すように、ユーザーグループｐやユーザーグループｑ、ユーザーグループｒに異なる開示範囲が設定される。この場合、ユーザーグループｑとユーザーグループｒに所属するユーザーはそれぞれ異なる部分開示制限を受けることになるため、検索サービスプログラム１１０は、個々の部分開示範囲に応じて最低限二つの複製文書を生成し、これらを文書関係テーブル１１６において関連付けて管理する。 FIG. 16 is a diagram illustrating an example of a concept for generating a copy of a document according to the third embodiment of the present invention. The document owner or administrator can set an access permission group individually for a plurality of confidential information in the same document. For example, as shown in FIG. 16, when an access permission group is set for each confidential information in the document Z, as shown in 901 in the confidential information management table 115, it is different for the user group p, the user group q, and the user group r. A disclosure range is set. In this case, since the users belonging to the user group q and the user group r are subject to different partial disclosure restrictions, the search service program 110 generates at least two duplicate documents according to individual partial disclosure ranges. These are managed in association with each other in the document relation table 116.

図１７は、本発明の実施例３に係る文書内の機密情報登録手順の一例を示すフローチャートである。 FIG. 17 is a flowchart showing an example of the confidential information registration procedure in the document according to the third embodiment of the present invention.

まず、ユーザーは、文書保有者あるいは管理者は図８で一例を示したようなインタフェースを介して文書内の機密情報に対してアクセス許可グループを設定する（ステップ１４０１）。機密情報の指定については、文書保有者や管理者が行ってもよいし、プログラムが提示した候補を基にユーザーが確認し指定してもよい。 First, the user sets the access permission group for the confidential information in the document through the interface as shown in FIG. 8 by the document owner or administrator (step 1401). The confidential information may be designated by the document owner or administrator, or the user may confirm and designate the confidential information based on the candidates presented by the program.

つぎに検索サービスプログラム１１０は、機密情報が登録された文書に対し、元文書とは別に複製文書を内部で生成する（ステップ１４０２）。新たに生成する複製文書については、機密指定箇所を削除あるいは伏字での置き換え、暗号化等を実施することで機密情報を隠蔽する。これにより、複製文書内の機密情報については索引付けの対象外とする。 Next, the search service program 110 internally generates a duplicate document separately from the original document for the document in which confidential information is registered (step 1402). For newly created duplicate documents, the confidential information is concealed by deleting the secret designated portion, replacing it with a hidden character, or performing encryption. As a result, the confidential information in the duplicate document is not indexed.

つぎに検索サービスプログラム１１０は、ユーザーがステップ１４０１において設定した内容を基に、元文書を機密情報管理テーブル１１５に登録する（ステップ１４０３）。 Next, the search service program 110 registers the original document in the confidential information management table 115 based on the contents set by the user in step 1401 (step 1403).

つぎに検索サービスプログラム１１０は、元文書とステップ１４０２で生成した複製文書の双方をインデックス１１３に登録する（ステップ１４０４）。前述の通り、複製文書内の機密情報については索引付けされず、非機密箇所のみが索引付け対象となる。 Next, the search service program 110 registers both the original document and the duplicate document generated in step 1402 in the index 113 (step 1404). As described above, confidential information in a duplicate document is not indexed, and only non-confidential parts are indexed.

図１８は、本発明の実施例３に係る検索サービスプログラム１１０による検索処理手順の一例を示すフローチャートである。 FIG. 18 is a flowchart illustrating an example of a search processing procedure by the search service program 110 according to the third embodiment of the present invention.

まず検索サービスプログラム１１０は、ユーザーがクライアントコンピュータ２の入力部２４を用いて入力した検索文およびユーザー識別情報を検索クライアントプログラム２１０を介して受け取る（ステップ１５０１）。 First, the search service program 110 receives a search sentence and user identification information input by the user using the input unit 24 of the client computer 2 via the search client program 210 (step 1501).

つぎに検索サービスプログラム１１０は、検索文を検索エンジンプログラム１１２に渡し、検索エンジンプログラム１１２は、前述したスコア算出方法および手順で検索およびスコアリングを行い、文書リストを検索サービスプログラム１１０に返す（ステップ１５０２）。 Next, the search service program 110 passes the search sentence to the search engine program 112, and the search engine program 112 performs search and scoring by the above-described score calculation method and procedure, and returns a document list to the search service program 110 (step). 1502).

つぎに検索サービスプログラム１１０は、検索エンジンプログラム１１２より受け取った文書リストから一件の文書を選択し、ステップ１５０４〜ステップ１５０８までを文書毎に実施する（ステップ１５０３）。 Next, the search service program 110 selects one document from the document list received from the search engine program 112, and performs steps 1504 to 1508 for each document (step 1503).

つぎに検索サービスプログラム１１０は、文書関係テーブル１１６を参照する（ステップ１５０４）。 Next, the search service program 110 refers to the document relation table 116 (step 1504).

つぎに検索サービスプログラム１１０は、処理中の文書に元文書が存在するかどうかを判定する（ステップ１５０５）。判定の結果、検索サービスプログラム１１０は、元文書が存在すればステップ１５０６に、元文書が存在しなければステップ１５０８に進む。 Next, the search service program 110 determines whether or not the original document exists in the document being processed (step 1505). As a result of the determination, the search service program 110 proceeds to step 1506 if the original document exists, and proceeds to step 1508 if the original document does not exist.

ステップ１５０５において、処理中の文書に元文書が存在すると判定した場合、検索サービスプログラム１１０は、機密情報管理テーブル１１５を参照し、ユーザーが検索文に含まれる一つ以上の単語に合致する元文書内の情報の全てに対してアクセスを許可されているかどうかを判定する（ステップ１５０６）。判定の結果、検索サービスプログラム１１０は、アクセス権を持つ場合はステップ１５０７へ、アクセス権を持たない場合はステップ１５１０へ進む。 If it is determined in step 1505 that the original document exists in the document being processed, the search service program 110 refers to the confidential information management table 115 and the user matches the one or more words included in the search sentence. It is determined whether or not access to all of the information is permitted (step 1506). As a result of the determination, the search service program 110 proceeds to step 1507 if it has the access right, and proceeds to step 1510 if it does not have the access right.

ステップ１５０６において、ユーザーがアクセス権を持つと判定した場合、検索サービスプログラム１１０は、処理中の文書を元文書に関連付けを行う（ステップ１５０７）。関連付けの手法として、例えば処理中の文書のスコアとは無関係に、処理中の文書を元文書の直下に配置する。 If it is determined in step 1506 that the user has an access right, the search service program 110 associates the document being processed with the original document (step 1507). As a method of association, for example, the document being processed is arranged immediately below the original document regardless of the score of the document being processed.

つぎに検索サービスプログラム１１０は、文書リストに他にまだ文書があるかどうかを判定し（ステップ１５０８）、ある場合は次の文書に対してステップ１５０３以降を実施し、無い場合は、文書リストを検索結果として、検索クライアントプログラム２１０を介してユーザーに提示する（ステップ１５０８）。 Next, the search service program 110 determines whether there are any other documents in the document list (step 1508). If there is, the search service program 110 executes step 1503 and subsequent steps for the next document. The search result is presented to the user via the search client program 210 (step 1508).

また、ステップ１５０６において、ユーザーがアクセス権を持たないと判定した場合、検索サービスプログラム１１０は、処理中の文書を元文書を文書リストから削除し、さらに、当該元文書の処理中の文書以外の複製文書が文書リストにあれば、当該複製文書をリストから削除し（ステップ１５１０）、ステップ１５０８に進む。 If it is determined in step 1506 that the user does not have access rights, the search service program 110 deletes the document being processed from the document list, and the document other than the document being processed of the original document. If the duplicate document is in the document list, the duplicate document is deleted from the list (step 1510), and the process proceeds to step 1508.

実施例３では、検索サービスプログラム１１０が検索エンジンプログラム１１２より受け取った文書リスト中の各文書とその元文書に対して、ユーザーの権限と当該文書に含まれる機密情報に応じて検索結果を生成する場合の処理手順の一例を説明したが、本実施例４では、検索エンジンプログラム１１２が、複製文書とその元文書に対して、ユーザーの権限と当該文書へのアクセス権に応じて文書リストを生成し、検索サービスプログラム１１０が最終的な検索結果を生成する場合の処理手順の一例を説明する。 In the third embodiment, the search service program 110 generates a search result for each document in the document list received from the search engine program 112 and the original document according to the user authority and the confidential information included in the document. In the fourth embodiment, the search engine program 112 generates a document list according to the user's authority and the access right to the document in the fourth embodiment. An example of a processing procedure when the search service program 110 generates a final search result will be described.

図１９は、本発明の実施例４に係る文書内の機密情報登録手順の例を示すフローチャートである。 FIG. 19 is a flowchart illustrating an example of a procedure for registering confidential information in a document according to the fourth embodiment of the present invention.

ステップ１４０１からステップ１４０４までの各処理は、図１７と同様である。 Each processing from step 1401 to step 1404 is the same as that in FIG.

ステップ１４０４の処理後、検索サービスプログラム１１０は、ユーザーがステップ１４０３において設定した内容を基に、インデックス１１３に図１２で一例を示したような形態でアクセス許可グループ情報を文書単位で登録する（ステップ１６０５）。 After the processing of step 1404, the search service program 110 registers access permission group information in units of documents in the form shown in FIG. 12 in the index 113 based on the contents set by the user in step 1403 (step 1605).

図２０は、本発明の実施例４に係る検索サービスプログラム１１０による検索処理手順のもう一つの例を示すフローチャートである。 FIG. 20 is a flowchart showing another example of the search processing procedure by the search service program 110 according to the fourth embodiment of the present invention.

まず検索サービスプログラム１１０は、ユーザーが検索クライアントプログラム２１０を介して入力した検索文およびユーザー識別情報を受け取る（ステップ１５０１）。 First, the search service program 110 receives a search sentence and user identification information input by the user via the search client program 210 (step 1501).

つぎに検索サービスプログラム１１０は、検索文を検索エンジンプログラム１１２に渡し、検索エンジンプログラム１１２は、前述したスコア算出方法および手順で検索および文書単位でのアクセス制御およびスコアリングを行い、文書リストを検索サービスプログラム１１０に返す。アクセス制御においては、ステップ１６０５で設定した文書単位でのアクセス権情報を利用する。 Next, the search service program 110 passes the search sentence to the search engine program 112, and the search engine program 112 performs search, access control and scoring in document units by the above-described score calculation method and procedure, and searches the document list. Return to service program 110. In access control, access right information in document units set in step 1605 is used.

ステップ１５０３からステップ１５０９までの各処理は、図１８と同様である。本実施例４においては、事前にステップ１６０５にて設定した文書単位でのアクセス権情報を用いてステップ１７０２でアクセス制御を実施し、その時点で元文書と複製文書のうちユーザーに権限のあるもののみが文書リストに含まれて検索サービスプログラム１１０に渡されるため、図１８に示すステップ１５１０の処理を省略できる。 Each processing from step 1503 to step 1509 is the same as that in FIG. In the fourth embodiment, access control is performed in step 1702 using access right information in document units set in advance in step 1605, and at that time, the user has authority over the original document and the duplicate document. 18 is included in the document list and passed to the search service program 110, so the processing of step 1510 shown in FIG. 18 can be omitted.

以上、本発明の実施例３、４について説明した。実施例３、４によると、文書保持者や管理者が文書内の機密情報やプライバシー情報に対するアクセス権情報を事前に登録し、検索エンジンプログラム１１２の上位プログラムが当該文書の当該機密情報を隠蔽した別文書を内部で生成し、この生成した文書を元文書と関連付けて管理し、双方に対して索引を作成する。また、検索サービスプログラムは、アクセス権情報や上記関連付け情報を基に、検索エンジンによる検索およびスコアリング結果に含まれる文書の削除を行うことで、文書内の機密情報やプライバシー情報を踏まえた適切な検索結果を生成可能である。 The embodiments 3 and 4 of the present invention have been described above. According to the third and fourth embodiments, the document holder or administrator registers access right information for confidential information or privacy information in the document in advance, and the upper program of the search engine program 112 conceals the confidential information of the document. Another document is generated internally, the generated document is managed in association with the original document, and an index is created for both. In addition, the search service program deletes the documents included in the search and scoring results by the search engine based on the access right information and the above association information, so that appropriate information based on the confidential information and privacy information in the document is obtained. Search results can be generated.

以上、本発明の実施の形態について、いくつかの実施例を挙げて具体的に説明したが、本発明はこれらの実施例に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。 The embodiments of the present invention have been specifically described with reference to some examples. However, the present invention is not limited to these examples, and various modifications can be made without departing from the scope of the present invention. It is.

１・・・検索サーバ、２・・・クライアントコンピュータ（端末）、３・・・文書共有サーバ、４・・・通信ネットワーク、１０、２０、３０・・・制御部、１１、２１、３１・・・記憶部、１２、２２、３２・・・ネットワークインタフェース部、１３、２３、３３・・・表示部、１４、２４、３４・・・入力部、１５、２５、３５・・・データバス。 DESCRIPTION OF SYMBOLS 1 ... Search server, 2 ... Client computer (terminal), 3 ... Document sharing server, 4 ... Communication network, 10, 20, 30 ... Control part, 11, 21, 31, ... Storage unit 12, 22, 32 ... Network interface unit 13, 23, 33 ... Display unit, 14, 24, 34 ... Input unit, 15, 25, 35 ... Data bus.

Claims

A search server connected to a plurality of terminals via a communication network,
A management unit that receives confidential information of the document and access right information for the confidential information from the terminal via the communication network, and manages the received information in association with the index word of the document or the position of the index word;
When a document search request is received from the terminal via the communication network, the index word or the access right for each position of the index word based on the words in the search sentence included in the document search request and the management unit If there is an access right as a result of the determination, a document list is created in which information of documents including the index word is listed, the degree of conformity of the document is calculated, and the document list is calculated based on the calculation result. A control unit that rearranges each of the documents therein, and transmits this as a search result candidate to the search request source terminal of the document via the communication network;
Search server with

The management unit
Managing confidential information of the document and access right information for the confidential information in association with the structure of the document;
The controller is
For each document in the search result candidate, the confidential information included in the document name of the document is concealed based on the access right information related to the structure of the document, and this is used as a final search result. Transmitting to the search requesting terminal of the document via a communication network;
The search server according to claim 1.

A search server connected to a plurality of terminals via a communication network,
A management unit that receives confidential information of the document and access right information for the confidential information from the terminal via the communication network, and manages the received information in association with the structure and words of the document;
A document search request is received from the terminal via the communication network, and a document that matches the document search request is searched based on a word in a search sentence included in the document search request and the management unit. This is used as a full text search result, and the degree of conformity of the document is calculated. The full text search result is rearranged based on the calculation result, and this is used as a search result candidate as a search request source of the document via the communication network. A control unit that transmits to the terminal;
Search server with

The controller is
Delete a document containing a word that the user cannot access from the search result candidates, and send this as a final search result to the search requesting terminal of the document via the communication network.
The search server according to claim 3.

The controller is
In the search result candidate, the output rank of the document including the word accessible by the user is lowered, and this is transmitted as a final search result to the terminal that is the search request source of the document via the communication network.
The search server according to claim 4.

The controller is
Based on the confidential information of the document and the access right information of the user for the confidential information, the document is copied, the confidential information is concealed in the copied document, and the document from which the document is created Associating and storing in the previous term management section, indexing each document,
The search server according to claim 3.

The controller is
Based on the information of the management unit, it is determined whether or not the user has an access right to the original document of the duplicate document, and if the access right is determined by the determination, the document is selected as the original document in the search result candidate. Reordering in association with this, and sending this as a final search result to the terminal of the document search request source via the communication network,
The search server according to claim 6.

Based on the information of the management unit, the presence or absence of the user's access right to the original document of the duplicate document is determined, and if the access right is not determined by the determination, the original document is deleted in the search result candidate. Then, this is transmitted as a final search result to the search request source terminal of the document via the communication network.
The search server according to claim 7.

An information retrieval method in a computer,
By the control unit provided in the computer,
Obtain confidential information of the document and access right information for the confidential information;
Associating the acquired information with the index word of the document or the position of the index word;
Get a document search request,
Based on the word in the search sentence included in the search request for the document and the association information, it is determined whether or not there is an access right for each position of the index word or the index word,
If there is an access right according to the determination, a document list that lists information of documents including the index word having the access right is created.
Calculating the fitness of the document containing the index term;
Reordering each document in the document list based on the calculation result, and outputting this as a search result candidate;
Information retrieval method.

On the computer,
Obtain confidential information of the document and access right information for the confidential information;
Associating the received information with an index word of the document or a position of the index word;
Get a document search request,
Based on the word in the search sentence included in the search request for the document and the association information, it is determined whether or not there is an access right for each position of the index word or the index word,
If there is an access right according to the determination, a document list that lists information of documents including the index word having the access right is created.
Calculating the fitness of the document containing the index term;
Reordering each document in the document list based on the calculation result, and executing a process of outputting this as a search result candidate;
program.

A computer-readable storage medium storing the program according to claim 10.