JP2011018267A

JP2011018267A - Security management system, server device, security management method, program and recording medium

Info

Publication number: JP2011018267A
Application number: JP2009163452A
Authority: JP
Inventors: Kyoichi Ake; 匡一朱
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2009-07-10
Filing date: 2009-07-10
Publication date: 2011-01-27

Abstract

PROBLEM TO BE SOLVED: To efficiently set security policy to be applied to an electronic document.SOLUTION: The server device 110 includes: a means 210 receiving a parameter for characterizing one or more document classes; an acquisition means 210 acquiring a plurality of electronic documents to which policies are set, from a resource 160 on a network; an analysis means 214 extracting a text from each acquired electronic document, applying natural language analysis, gathering the documents having similarity of contents according to the received parameter, and giving the document class to each electronic document; a means 218 receiving the security policy to be applied to the electronic document with each document class; and a setting means 218 generating setting data wherein access right is determined according to the security policy corresponding to the document class given to each classified electronic document.

Description

本発明は、情報セキュリティ技術に関し、より詳細には、電子文書に適用するセキュリティポリシを効率的に設定するためのセキュリティ管理システム、サーバ装置、セキュリティ管理方法、プログラムおよび記録媒体に関する。 The present invention relates to information security technology, and more particularly to a security management system, a server device, a security management method, a program, and a recording medium for efficiently setting a security policy applied to an electronic document.

近年、官公庁、企業、教育機関などの組織においては、業務上使用する文書や画像を紙媒体ではなく電子文書として取り扱う、いわゆるペーパレス化が進んでいる。電子文書として取り扱うことにより、インターネットやイントラネットを介して多数人や遠隔地の人に配布したり、ファイルサーバまたはデータベースに電子文書に配置して、適時アクセスして利用したりすることができ、業務の効率化が図られる。一方、このような電子文書は、その複製の容易性に起因して情報漏洩し易く、上述のような組織においては、企業秘密や個人情報など機密情報を含む電子文書に対する利用権限を適切に管理し、機密情報の漏洩を防止する施策を施すことが、信用維持にとって重要となっている。 In recent years, in organizations such as government offices, companies, and educational institutions, so-called paperless processing is progressing in which documents and images used for business are handled as electronic documents instead of paper media. By handling it as an electronic document, it can be distributed to a large number of people or remote people via the Internet or an intranet, or it can be placed on a file server or database as an electronic document and accessed and used in a timely manner. Efficiency. On the other hand, such electronic documents are subject to information leakage due to the ease of copying, and in the organizations as described above, the authority to use electronic documents containing confidential information such as trade secrets and personal information is appropriately managed. Therefore, it is important for maintaining credit to take measures to prevent leakage of confidential information.

電子文書の利用を管理することを目的とした技術としては、ドキュメントファイルを開く際にユーザ認証を求めて、正当なユーザに対してのみ閲覧を許可するか、また印刷の際に該ユーザの印刷の権限を確認し、印刷可能な権限を有するユーザに対してのみ印刷を許可する技術がある（例えば、特許文献１：特開２００４−１５２２６１号公報）。 Technologies for managing the use of electronic documents include asking for user authentication when opening document files and allowing only legitimate users to view, or printing the user when printing There is a technique for confirming the right of the user and permitting printing only to a user having the right of printing (for example, Japanese Patent Application Laid-Open No. 2004-152261).

また特開２００７−４６１６号公報（特許文献２）は、時系列でセキュリティポリシが変更されるドキュメントファイルを提供することを目的として、利用者が閲覧等するドキュメントファイルのセキュリティ管理を行うため、ドキュメントファイルの閲覧権限者情報等の利用条件を定めたセキュリティポリシが複数記載され、ドキュメントファイルに現時点で適用されるポリシとしてこの複数のポリシの中から選択されているポリシを示す有効ポリシ番号が記載されているセキュリティ管理ファイルを格納する、セキュリティ管理データベースを開示している。 Japanese Patent Laid-Open No. 2007-4616 (Patent Document 2) discloses a document file security management for a document file that is viewed by a user for the purpose of providing a document file whose security policy is changed in time series. A plurality of security policies that define usage conditions such as information on authorized users to view files are described, and an effective policy number indicating a policy selected from the plurality of policies as a policy currently applied to the document file is described. A security management database for storing security management files is disclosed.

その他、特開２００８−４０６５９号公報（特許文献３）は、電子文書に記載された情報の漏洩をより確実に阻止することを目的として、クライアントからの設定依頼に応じて電子文書にポリシを設定し、ポリシ情報保持部にて保持するとともに電子文書を暗号化して保存させる、ポリシ管理サーバを開示している。 In addition, Japanese Patent Laid-Open No. 2008-40659 (Patent Document 3) sets a policy for an electronic document in response to a setting request from a client for the purpose of more reliably preventing leakage of information described in the electronic document. In addition, a policy management server is disclosed in which an electronic document is encrypted and stored while being held in a policy information holding unit.

上記従来技術によれば、セキュリティ保護すべき電子文書に対して、閲覧権限や印刷権限を含むセキュリティポリシを設定し、このポリシに従った電子文書の適切な利用方法を提供することができる。しかしながら、上記従来技術では、各電子文書に対してひとつひとつセキュリティポリシの設定を繰り返さなければならず、作業者の作業負担となっていた。セキュリティ管理システムが既に構築済みである場合などには、電子文書に対してポリシの設定操作を行う作業者が通常電子文書の作成者である場合が多く、上記作業負担に起因する問題はあまり顕在化しないが、システムの導入時などには、管理者が大量の既存の電子文書を処理する必要があるため、管理者の作業負担は増大し、ひいてはセキュリティ管理システムの導入に対する敷居となる。また、文書管理システム等が既に構築されている場合には、その文書管理システムにおける文書リポジトリの管理者の視点から文書に対してセキュリティポリシを設定したいとの要望もある。 According to the above-described conventional technology, it is possible to set a security policy including a viewing authority and a printing authority for an electronic document to be protected, and provide an appropriate usage method of the electronic document according to the policy. However, in the above prior art, it is necessary to repeat the setting of the security policy for each electronic document one by one, which is a burden on the operator. When a security management system has already been established, the worker who performs policy setting operations on an electronic document is usually the creator of the electronic document, and the problems caused by the above work burden are not so obvious. However, when the system is introduced, the administrator needs to process a large amount of existing electronic documents, which increases the work burden on the administrator and eventually becomes a threshold for the introduction of the security management system. In addition, when a document management system or the like has already been established, there is a demand for setting a security policy for a document from the viewpoint of a document repository administrator in the document management system.

本発明は、上記従来技術における問題点に鑑みてなされたものであり、内容的に類似する電子文書には同一のセキュリティポリシを適用すればよいとの考えに基づき、所与の電子文書群において、内容の類似性によって分類される電子文書に対し、一括にセキュリティポリシを設定可能とし、ひいては電子文書に適用するセキュリティポリシを効率的に設定することができ、作業者の作業負担を軽減することができる、セキュリティ管理システム、サーバ装置、セキュリティ管理方法、プログラムおよび記録媒体を提供することを目的とする。 The present invention has been made in view of the above problems in the prior art, and based on the idea that the same security policy should be applied to electronic documents that are similar in content, in a given group of electronic documents. , It is possible to set security policies for electronic documents classified according to content similarity in a lump, and to set security policies to be applied to electronic documents efficiently, thereby reducing the workload on workers. An object of the present invention is to provide a security management system, a server device, a security management method, a program, and a recording medium.

本発明では、上記課題を解決するために、ネットワークに接続され、利用権限を規定するセキュリティポリシを電子文書に対し設定するサーバ装置、該サーバを含むセキュリティ管理システムを提供する。本発明のサーバ装置では、１以上の文書分類を特徴付けるパラメータを受け取り、ポリシ設定の対象となる複数の電子文書をネットワーク上のリソースから取得する。そして、取得された電子文書それぞれからテキストを抽出し、自然言語解析を施し、受け取ったパラメータに従って内容の類似性を有する文書を取りまとめて電子文書それぞれに対し文書分類を付す。そして、それぞれの文書分類が付される電子文書に適用すべきセキュリティポリシを受け取り、この分類された電子文書それぞれに対し、付された文書分類に対応するセキュリティポリシに従って、利用権限を規定するための設定データを生成する。 In order to solve the above-described problems, the present invention provides a server device that is connected to a network and sets a security policy that defines use authority for an electronic document, and a security management system including the server. The server device of the present invention receives parameters characterizing one or more document classifications, and acquires a plurality of electronic documents that are targets of policy setting from resources on the network. Then, text is extracted from each of the acquired electronic documents, subjected to natural language analysis, documents having similar contents are collected according to the received parameters, and document classification is given to each electronic document. Then, a security policy to be applied to the electronic document to which each document classification is attached is received, and for each of the classified electronic documents, usage authority is defined according to the security policy corresponding to the attached document classification. Generate configuration data.

上記構成によれば、ポリシ設定の対象とすべきものとして指定された電子文書群において、電子文書が含むテキストの内容の類似性に従って分類された電子文書に対し、付与された文書分類に関連付けられたセキュリティポリシに従って一括に利用権限の設定を行うことが可能となり、ひいては電子文書にセキュリティポリシを効率的に設定することを可能とし、作業者の作業負担を軽減することができる。 According to the above configuration, in the electronic document group designated as the policy setting target, the electronic document classified according to the similarity of the text content included in the electronic document is associated with the assigned document classification. It is possible to set the use authority in a batch according to the security policy, and as a result, it is possible to efficiently set the security policy for the electronic document, thereby reducing the work load on the operator.

さらに本発明では、上記文書分類を付す際に、それぞれの電子文書が複数の文書分類に所属可能なように電子文書を分類することができる。この場合に、セキュリティポリシを結合する際の優先規定の指定に従って複数の文書分類のセキュリティポリシに含まれる権限情報を結合することができる。上記構成により、各電子文書に対し、内容の類似性に従ってより柔軟かつ効率的に、高度な利用権限の制御を行うことが可能となる。 Furthermore, in the present invention, when attaching the document classification, the electronic documents can be classified so that each electronic document can belong to a plurality of document classifications. In this case, it is possible to combine the authority information included in the security policies of a plurality of document classifications in accordance with the designation of priority when combining the security policies. With the above-described configuration, it is possible to perform advanced usage authority control more flexibly and efficiently in accordance with the similarity of contents for each electronic document.

また本発明では、取得された電子文書それぞれについて、電子文書それぞれに含まれる単語を集計し、電子文書それぞれに含まれる単語それぞれの出現頻度を集計したデータセットを生成し、クラスタリング・アルゴリズムに従って類似性を有する文書を分類することができる。このデータセットは、例えば、電子文書に含まれる各単語の出現頻度をベクトルデータとした文書ベクトルの各電子文書毎のセットとすることができ、あるいは、電子文書に含まれる単語の出現頻度を各電子文書毎および各単語毎に集計したデータとすることができる。また出現頻度は、出現回数をそのまま用いてもよいし、出現回数を規格化した値を用いてもよい。上記クラスタリング・アルゴリズムは、好適には、制約付きＫ平均法、ナイーブ・ベイズ法とＥＭ（Expectation Maximization）法とを組み合わせた手法、グラフ理論に従ったアルゴリズムを採用することができる。 In the present invention, for each acquired electronic document, the words included in each electronic document are totaled, and a data set is generated in which the appearance frequency of each word included in each electronic document is totaled. Can be classified. This data set can be, for example, a set for each electronic document of a document vector in which the appearance frequency of each word included in the electronic document is vector data, or the appearance frequency of the word included in the electronic document is It can be set as the data totaled for every electronic document and every word. As the appearance frequency, the number of appearances may be used as it is, or a value obtained by standardizing the number of appearances may be used. As the clustering algorithm, a constrained K-means method, a method combining a Naive Bayes method and an EM (Expectation Maximization) method, or an algorithm according to graph theory can be preferably used.

なお本発明では、上記電子文書の利用権限は、閲覧権限を有するユーザを特定する識別値と、印刷権限を有するユーザを特定する識別値と、上記閲覧権限および印刷権限の一方または両方に付加される付加条件とを含むことができる。上記付加条件は、閲覧時の追加要求、印刷時の追加要求を含むことができる。上記閲覧時の追加要求は、警告メッセージの表示の指定を含むことができ、上記印刷時の追加要求は、警告メッセージの表示の指定、機密印刷の指定、地紋印刷の指定、スタンプ印刷の指定と、警告印字の指定とを含むことができる。 In the present invention, the authority to use the electronic document is added to one or both of an identification value that identifies a user having browsing authority, an identification value that identifies a user having printing authority, and the browsing authority and printing authority. Additional conditions may be included. The additional condition can include an addition request at the time of browsing and an addition request at the time of printing. The additional request at the time of browsing may include designation of warning message display, and the additional request at the time of printing includes designation of warning message display, designation of confidential printing, designation of tint block printing, designation of stamp printing, and the like. , A warning print designation, and the like.

さらに本発明によれば、上述したサーバ装置が実行するセキュリティ管理方法、上述したサーバ装置をコンピュータ上に実現するためのプログラム、該プログラムを格納する記録媒体を提供することができる。 Furthermore, according to the present invention, it is possible to provide a security management method executed by the server apparatus described above, a program for realizing the server apparatus described above on a computer, and a recording medium storing the program.

第１実施形態の文書セキュリティ管理システムの概略を示す図。1 is a diagram illustrating an outline of a document security management system according to a first embodiment. 第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル変換サービスの動作に関連する機能ブロック図。The functional block diagram relevant to operation | movement of the secure file conversion service in the document security management system of 1st Embodiment. 第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル変換サービスの動作に関連する詳細機能ブロック図。The detailed functional block diagram relevant to operation | movement of the secure file conversion service in the document security management system of 1st Embodiment. 第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル変換サービスの動作を示すシーケンス図。FIG. 6 is a sequence diagram showing an operation of a secure file conversion service in the document security management system according to the first embodiment. 第１実施形態のセキュア文書管理サーバが実行する、「排他」が選択される場合の解析処理を示すフローチャート。6 is a flowchart illustrating an analysis process executed by the secure document management server according to the first embodiment when “exclusive” is selected. 第１実施形態のセキュア文書管理サーバが実行する、「共存」が選択される場合の解析処理を示すフローチャート。6 is a flowchart illustrating analysis processing executed when the “coexistence” is selected, which is executed by the secure document management server according to the first embodiment. 第１実施形態のファイル解析部において生成されるデータのデータ構造を示す図。The figure which shows the data structure of the data produced | generated in the file analysis part of 1st Embodiment. 第１実施形態の管理端末の表示装置の画面上に表示されるＧＵＩを示す図。The figure which shows GUI displayed on the screen of the display apparatus of the management terminal of 1st Embodiment. 本実施形態の管理端末の表示装置の画面上に表示される分類パラメータ設定画面を示す図。The figure which shows the classification parameter setting screen displayed on the screen of the display apparatus of the management terminal of this embodiment. 第１実施形態の管理端末の表示装置の画面上に表示される他のＧＵＩを示す図。The figure which shows the other GUI displayed on the screen of the display apparatus of the management terminal of 1st Embodiment. 第１実施形態の管理端末の表示画面上に表示されるポリシ設定入力画面を示す図。The figure which shows the policy setting input screen displayed on the display screen of the management terminal of 1st Embodiment. 第１実施形態の文書セキュリティ管理システムにおいて使用されるデータ構造を示す図。The figure which shows the data structure used in the document security management system of 1st Embodiment. 第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル利用時のシーケンス図（１／２）。FIG. 11 is a sequence diagram (1/2) when a secure file is used in the document security management system according to the first embodiment. 第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル利用時のシーケンス図（２／２）。FIG. 3 is a sequence diagram (2/2) when a secure file is used in the document security management system according to the first embodiment. 第２実施形態のセキュア文書管理サーバが実行する、「排他」が選択される場合の解析処理を示すフローチャート。12 is a flowchart showing an analysis process executed when the “exclusive” is selected, which is executed by the secure document management server of the second embodiment. 第２実施形態のセキュア文書管理サーバが実行する、「共存」が選択される場合の解析処理を示すフローチャート。12 is a flowchart illustrating an analysis process performed when the “coexistence” is selected, which is executed by the secure document management server according to the second embodiment. 第３実施形態のセキュア文書管理サーバが実行する解析処理を示すフローチャート。14 is a flowchart showing analysis processing executed by the secure document management server of the third embodiment. 第３実施形態におけるクラスタリングの概念図。The conceptual diagram of the clustering in 3rd Embodiment.

以下、本発明の実施形態を説明するが、本発明の実施形態は、以下の実施形態に限定されるものではない。なお、以下の実施形態では、サーバ装置およびセキュリティ管理システムの一例として、電子文書にセキュリティポリシの設定を行うとともに、セキュリティ保護された電子文書を利用者に提供するセキュア文書管理サーバ、および該セキュア文書管理サーバを含む文書セキュリティ管理システムを用いて説明する。 Hereinafter, although embodiment of this invention is described, embodiment of this invention is not limited to the following embodiment. In the following embodiments, as an example of a server device and a security management system, a secure document management server that sets a security policy for an electronic document and provides a user with the secured electronic document, and the secure document A description will be given using a document security management system including a management server.

［第１実施形態］
図１は、第１実施形態の文書セキュリティ管理システム１００の概略を示す図である。図１に示す文書セキュリティ管理システム１００は、ネットワーク１０２を介して相互に接続されるセキュア文書管理サーバ１１０と、管理端末１３０と、利用者端末１４０ａ，１４０ｂと、セキュリティサーバ１５０とを含む。図１に示す文書セキュリティ管理システム１００では、セキュア文書管理サーバ１１０には、さらに、ファイアウォール１２０および他のネットワークを介して、文書リポジトリ１６０ａ，１６０ｂ，１６０ｃが接続されている。 [First Embodiment]
FIG. 1 is a diagram showing an outline of a document security management system 100 according to the first embodiment. A document security management system 100 shown in FIG. 1 includes a secure document management server 110, a management terminal 130, user terminals 140a and 140b, and a security server 150 that are connected to each other via a network 102. In the document security management system 100 shown in FIG. 1, document repositories 160a, 160b, and 160c are further connected to the secure document management server 110 via a firewall 120 and other networks.

上記ネットワーク１０２は、イーサネット（登録商標）やＴＣＰ／ＩＰ（Transmission Control Protocol / Internet Protocol）などのトランザクション・プロトコルによるＬＡＮ（Local Area Network）や、ＶＰＮ（Virtual Private Network）や専用線を使用して接続されるＷＡＮ（Wide Area Network）などとして構成することができる。しかしながら、ネットワーク１０２の構成は、特に限定されるものではなく、図示しないルータを介して接続されるインターネットなどを含んでいてもよく、また有線または無線、またはこれらの混合のネットワークとして構成することができる。 The network 102 is connected using a LAN (Local Area Network) based on a transaction protocol such as Ethernet (registered trademark) or TCP / IP (Transmission Control Protocol / Internet Protocol), a VPN (Virtual Private Network), or a dedicated line. It can be configured as a WAN (Wide Area Network) or the like. However, the configuration of the network 102 is not particularly limited, and may include the Internet connected via a router (not shown), and may be configured as a wired or wireless network, or a mixed network thereof. it can.

上記セキュア文書管理サーバ１１０は、文書リポジトリ１６０などのネットワーク上のリソースに保管されている一群の電子文書のファイルに対し、所定のセキュリティポリシを適用し、セキュリティ保護された電子文書のファイルに変換するセキュアファイル変換サービスを提供する。ここで、セキュアファイル変換サービスとは、オリジナルの電子文書に対し、暗号化などのセキュリティ保護を施すことにより、セキュリティ保護が施された文書ファイル（以下、セキュアファイルという。）に変換するサービスである。 The secure document management server 110 applies a predetermined security policy to a group of electronic document files stored in a network resource such as the document repository 160 and converts the files into a secure electronic document file. Provide secure file conversion service. Here, the secure file conversion service is a service for converting an original electronic document into a security-protected document file (hereinafter referred to as a secure file) by applying security protection such as encryption. .

上記管理端末１３０は、セキュア文書管理サーバ１１０にアクセスして上記セキュアファイル変換サービスの利用を指示するために管理者等が使用する端末である。管理端末１３０は、セキュア文書管理サーバ１１０と通信して各種設定管理するための管理ツールがインストールされた、専用端末として構成することができる。あるいはセキュア文書管理サーバ１１０がウェブサーバとして構成される場合には、管理端末１３０は、ウェブブラウザを備え、セキュア文書管理サーバ１１０が提供するウェブベースのユーザインタフェースにアクセスする端末として構成されてもよい。 The management terminal 130 is a terminal used by an administrator or the like to access the secure document management server 110 and instruct the use of the secure file conversion service. The management terminal 130 can be configured as a dedicated terminal installed with a management tool for communicating with the secure document management server 110 and managing various settings. Alternatively, when the secure document management server 110 is configured as a web server, the management terminal 130 may be configured as a terminal that includes a web browser and accesses a web-based user interface provided by the secure document management server 110. .

上記セキュリティサーバ１５０は、実際に電子文書に対しセキュリティ保護を施すセキュアファイル変換処理を実行するサーバである。セキュリティサーバ１５０は、セキュア文書管理サーバ１１０からの依頼を受けて、依頼されたオリジナルファイルを変換し、処理結果として生成したセキュアファイルを返す。セキュリティサーバ１５０からセキュアファイルを受信すると、セキュア文書管理サーバ１１０は、利用者端末１４０ａ，１４０ｂなどからのアクセスが可能となるように内部のデータベースに保存する。 The security server 150 is a server that executes secure file conversion processing that actually performs security protection on an electronic document. Upon receiving a request from the secure document management server 110, the security server 150 converts the requested original file and returns the secure file generated as a processing result. When the secure file is received from the security server 150, the secure document management server 110 stores it in an internal database so that it can be accessed from the user terminals 140a and 140b.

セキュリティサーバ１５０は、さらに、セキュアファイルの利用を管理するために、セキュアファイルに施された暗号化を解除して利用可能な状態とするための鍵や、識別される利用者端末や利用ユーザの権限を判定するためのセキュリティ管理データを、セキュアファイルに対応付けて保持する。セキュアファイルを利用する際には、利用者端末１４０は、まずセキュア文書管理サーバ１１０から所望のセキュアファイルを取得し、その利用要求をセキュリティサーバ１５０に問い合わせる。問い合わせを受けたセキュリティサーバ１５０は、該利用者端末または利用ユーザ等の利用要求元を認証し、利用要求のあったセキュアファイルに対応するセキュリティポリシを読み出し、当該利用要求元の当該セキュアファイルに対する閲覧権限、印刷権限、他の付加条件等を判断する。閲覧権限があるとされている利用要求元からの要求に対しては、セキュリティサーバ１５０は、利用者端末１４０に対し、セキュアファイルを利用するために必要な鍵などの情報を送信する。これにより、利用者端末１４０側では、閲覧権限がある場合には、セキュアファイルの閲覧が可能とされる。 The security server 150 further manages the use of the secure file by deciphering the encryption applied to the secure file to make it usable, the identified user terminal and the user's user Security management data for determining authority is stored in association with a secure file. When using a secure file, the user terminal 140 first obtains a desired secure file from the secure document management server 110 and inquires the security server 150 about the use request. Upon receiving the inquiry, the security server 150 authenticates the use request source such as the user terminal or the user, reads the security policy corresponding to the secure file requested for use, and browses the secure file of the use request source. Judgment of authority, printing authority, and other additional conditions. In response to a request from a use request source that is deemed to have browsing authority, the security server 150 transmits information such as a key necessary for using a secure file to the user terminal 140. Thereby, on the user terminal 140 side, when there is a viewing authority, the secure file can be browsed.

文書リポジトリ１６０ａ，１６０ｂ，１６０ｃは、セキュアファイル変換サービスによってセキュアファイルに変換されるオリジナルの電子文書のソースとなり、文書リポジトリ１６０上の電子文書は、例えばＵＲＬ（Uniform Resource Locator）やＵＲＮ（Uniform Resource Name）などのＵＲＩ（Uniform Resource Identifier）によって識別される。セキュアファイル変換可能な電子文書としては、オフィス環境ソフトウェアで作成されるドキュメント、プレゼンテーションスライドやスプレッドシート、その他ＰＤＦ（Portable Document Format）などの種々の形式の電子文書を用いることができる。第１実施形態で利用可能な電子文書は、類似性を判断するためにテキストが抽出可能である限り、如何なるデータ形式とすることができ、電子文書自体にテキストデータが含まれていてもよく、文字認識処理等によってテキストが抽出可能であれば、ＴＩＦＦ、ビットマップ、ＰＮＧ、ＧＩＦ、ＪＰＥＧ、ＪＰＥＧ２０００などのピクセルイメージの形で文字情報を含んでいてもよい。セキュアファイルとしては、暗号化等のセキュリティ保護が設定可能であれば、これまで知られた如何なるフォーマットを採用することができるが、好適には、セキュリティ保護されたＰＤＦファイル（セキュアＰＤＦ）を採用することができる。 The document repositories 160a, 160b, and 160c are sources of original electronic documents that are converted into secure files by the secure file conversion service, and the electronic documents on the document repository 160 are, for example, URL (Uniform Resource Locator) or URN (Uniform Resource Name). ) And the like (Uniform Resource Identifier). As an electronic document that can be converted into a secure file, various types of electronic documents such as a document created by office environment software, a presentation slide, a spreadsheet, and a PDF (Portable Document Format) can be used. The electronic document that can be used in the first embodiment can have any data format as long as the text can be extracted to determine similarity, and the electronic document itself may include text data. If text can be extracted by character recognition processing or the like, character information may be included in the form of a pixel image such as TIFF, bitmap, PNG, GIF, JPEG, or JPEG2000. Any format known so far can be adopted as the secure file as long as security protection such as encryption can be set, but preferably a secure PDF file (secure PDF) is adopted. be able to.

上記セキュア文書管理サーバ１１０は、パーソナルコンピュータ、ワークステーション、エンタープライズサーバなどの各種コンピュータ装置によって実装される。より具体的には、セキュア文書管理サーバ１１０は、中央演算ユニット（ＣＰＵ）と、ＢＩＯＳ（Basic Input Output System）を格納するＲＯＭ（Read Only Memory）と、ＣＰＵによるプログラム処理を可能とする実行記憶空間を提供するＲＡＭ（Random Access Memory）と、その他、セキュアファイル等をアクセス可能に保持するデータベース（以下、ＤＢと参照することがある。）を格納するハードディスク・ドライブ（ＨＤＤ）と、ネットワーク１０２などに接続するためのネットワーク・インタフェース・カード（ＮＩＣ）とを含むハードウェア要素で構成される。そして、これらハードウェア要素が内部バスで接続されて本実施形態のセキュア文書管理サーバ１１０が動作する。 The secure document management server 110 is implemented by various computer devices such as a personal computer, a workstation, and an enterprise server. More specifically, the secure document management server 110 includes a central processing unit (CPU), a ROM (Read Only Memory) that stores a basic input output system (BIOS), and an execution storage space that enables program processing by the CPU. A hard disk drive (HDD) that stores a RAM (Random Access Memory) that provides storage, a database (hereinafter referred to as DB) that holds secure files and the like in an accessible manner, and the network 102 It consists of hardware elements including a network interface card (NIC) for connection. These hardware elements are connected via an internal bus, and the secure document management server 110 of this embodiment operates.

セキュア文書管理サーバ１１０は、ＲＯＭ、ＨＤＤ、その他ＮＶ−ＲＡＭやＳＤカードなどの記憶装置に格納されたプログラムを読み出し、ＲＡＭのメモリ領域に展開することにより、適切なオペレーティング・システム（ＯＳ）のもとで、後述する各機能手段および各処理を実現している。なお、セキュリティサーバ１５０についても同様の構成とすることができる。また管理端末１３０および利用者端末１４０についても、同様のハードウェア構成とすることができ、その他、マウスやキーボードなどの入力装置およびディスプレイなどの表示装置を含むユーザインタフェースを備える。 The secure document management server 110 reads out a program stored in a storage device such as a ROM, HDD, or other NV-RAM or SD card, and expands it in a memory area of the RAM, so that an appropriate operating system (OS) is installed. Thus, each functional means and each process described later are realized. The security server 150 can have the same configuration. The management terminal 130 and the user terminal 140 can also have the same hardware configuration, and include a user interface including an input device such as a mouse and a keyboard and a display device such as a display.

以下、本実施形態の文書セキュリティ管理システム１００における、セキュアファイル変換サービスの動作について詳細に説明する。図２は、第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル変換サービスの動作に関連する機能ブロック図を示す。図２に示すセキュア文書管理サーバ１１０、管理端末１３０、利用者端末１４０、セキュリティサーバ１５０および文書リポジトリ１６０における各機能部は、それぞれＣＰＵが、内部バスで接続される各ハードウェア要素を制御することにより、それぞれの装置上で実現される。 Hereinafter, the operation of the secure file conversion service in the document security management system 100 of this embodiment will be described in detail. FIG. 2 is a functional block diagram related to the operation of the secure file conversion service in the document security management system of the first embodiment. Each function unit in the secure document management server 110, the management terminal 130, the user terminal 140, the security server 150, and the document repository 160 illustrated in FIG. 2 controls each hardware element connected by an internal bus. Is realized on each device.

管理端末１３０は、ポリシ設定開始指令部２３０を含み、ポリシ設定開始指令部２３０は、本実施形態のセキュア文書変換サービスの開始画面をディスプレイなどの表示装置上に表示し、管理端末１３０を操作する管理者からのセキュリティポリシ設定処理の開始の指示を待ち受ける。図８は、本実施形態の管理端末１３０の表示装置の画面上に表示されるグラフィカル・ユーザ・インタフェース（以下、ＧＵＩという。）を例示する。図８（Ａ）は、セキュア文書変換サービス開始画面を示す。図８（Ａ）に示す開始画面４００は、設定処理開始の指示を待ち受けるポリシ設定開始ボタン４０２と、変換済み文書一覧ボタン４０４と、マニュアルボタン４０６と、キャンセルボタン４０８とを含む。変換済み文書一覧ボタン４０４は、既にセキュアファイルに変換済みの電子文書を一覧表示する画面を呼び出すボタンであり、マニュアルボタン４０６は、セキュアファイル変換サービスに関する操作マニュアルを表示するためのボタンである。キャンセルボタン４０８は、本セキュリティポリシ設定処理を中止するためのボタンである。 The management terminal 130 includes a policy setting start command unit 230. The policy setting start command unit 230 displays the start screen of the secure document conversion service of the present embodiment on a display device such as a display and operates the management terminal 130. Wait for an instruction from the administrator to start the security policy setting process. FIG. 8 illustrates a graphical user interface (hereinafter referred to as GUI) displayed on the screen of the display device of the management terminal 130 of this embodiment. FIG. 8A shows a secure document conversion service start screen. The start screen 400 shown in FIG. 8A includes a policy setting start button 402 that waits for an instruction to start setting processing, a converted document list button 404, a manual button 406, and a cancel button 408. The converted document list button 404 is a button for calling a screen for displaying a list of electronic documents that have already been converted into secure files, and the manual button 406 is a button for displaying an operation manual related to the secure file conversion service. A cancel button 408 is a button for canceling the security policy setting process.

ポリシ設定開始指令部２３０は、本サービスが起動されると、図８（Ａ）に示すようなサービス開始画面４００を表示させ、続いて、この画面においてポリシ設定開始ボタン４０２に対するクリックを検出するなどにより管理者からの処理開始指示を受領すると、セキュリティポリシの設定処理を開始し、図８（Ｂ）に示すような保管場所指定画面を表示させる。図８（Ｂ）は、ポリシ設定の対象とすべき電子文書が保管されている保管場所を指定するための保管場所指定画面４１０を例示する。図８（Ｂ）に示す保管場所指定画面４１０は、現在指定済みの保管場所を表示するリストボックス４１２と、新たな保管場所を追加指定するための追加ボタン４１４とを含む。リストボックス４１２には、選択中の保管場所のＵＲＬが反転表示４１８され、この状態で保管場所の削除を行うための削除ボタン４１６がクリックされると、選択中の保管場所が指定から削除される。なお、保管場所は、例えばＵＲＬなどによるパスにより指定され、この指定により、指定フォルダ内の電子文書または指定フォルダ以下の階層の電子文書がポリシ設定の対象として指定される。 When this service is activated, the policy setting start command unit 230 displays a service start screen 400 as shown in FIG. 8A, and subsequently detects a click on the policy setting start button 402 on this screen. When the processing start instruction is received from the administrator, the security policy setting process is started and a storage location designation screen as shown in FIG. 8B is displayed. FIG. 8B illustrates a storage location designation screen 410 for designating a storage location where an electronic document to be set as a policy setting target is stored. The storage location designation screen 410 shown in FIG. 8B includes a list box 412 that displays a storage location that is currently designated, and an add button 414 for additionally designating a new storage location. In the list box 412, the URL of the selected storage location is highlighted 418. When the delete button 416 for deleting the storage location is clicked in this state, the selected storage location is deleted from the designation. . The storage location is specified by a path such as a URL, for example, and by this specification, an electronic document in the specified folder or an electronic document in a hierarchy below the specified folder is specified as a policy setting target.

保管場所指定画面４１０において、１以上の保管場所が指定され、「次へ」ボタン４２０がクリックされると、ポリシ設定開始指令部２３０は、さらに図９に示すような分類パラメータ設定画面を表示させる。図９は、第１実施形態の管理端末の表示装置の画面上に表示される分類パラメータ設定画面を示す。図９に示す分類パラメータ設定画面４３０は、ポリシ設定対象の電子文書を分類する際に各文書分類を特徴付ける各種パラメータを設定するために用いられる。図９に示す分類パラメータ設定画面４３０は、各種パラメータを示すテーブル４３２と、文書分類の追加を行うための追加ボタン４３４と、各文書分類の右側に配置される編集ボタン４３６および削除ボタン４３８とを含む。テーブル４３２は、指定済みの文書分類名４３２ａと各文書分類に指定されるキーワード４３２ｂとを対応付けて表示している。各文書分類の右側に配置された編集ボタン４３６がクリックされると、対応する文書分類のキーワードを編集するためのＧＵＩが呼び出される。各文書分類の右側に配置された削除ボタン４３８は、対応する文書分類を除外する指示を待ち受ける。なお、説明する実施形態では、キーワードは、任意の文字列が入力されるものとして説明するが、他の実施形態では、例えば過去の解析結果で抽出されたキーワード候補群や、予め設定してあるキーワード候補群から選択するようにしてもよい。 When one or more storage locations are specified on the storage location specification screen 410 and the “next” button 420 is clicked, the policy setting start command unit 230 further displays a classification parameter setting screen as shown in FIG. . FIG. 9 shows a classification parameter setting screen displayed on the screen of the display device of the management terminal according to the first embodiment. The classification parameter setting screen 430 shown in FIG. 9 is used to set various parameters that characterize each document classification when classifying an electronic document that is a policy setting target. The classification parameter setting screen 430 shown in FIG. 9 includes a table 432 showing various parameters, an add button 434 for adding a document classification, and an edit button 436 and a delete button 438 arranged on the right side of each document classification. Including. The table 432 displays the designated document classification name 432a and the keyword 432b designated for each document classification in association with each other. When an edit button 436 arranged on the right side of each document category is clicked, a GUI for editing the keyword of the corresponding document category is called. A delete button 438 arranged on the right side of each document category waits for an instruction to exclude the corresponding document category. In the embodiment to be described, an explanation will be given assuming that an arbitrary character string is input as a keyword. However, in another embodiment, for example, a keyword candidate group extracted based on past analysis results or a keyword is set in advance. You may make it select from a keyword candidate group.

分類パラメータ設定画面４３０は、さらに、文書分類間の関係を「共存」または「排他」から選択するためのラジオボタン４４２をさらに含む。ここで、「共存」の文書分類間の関係とは、電子文書が同時に複数の文書分類に属することが可能なソフトな文書分類間の関係をいう。一方、上記「排他」の文書分類間の関係とは、電子文書が唯一の文書分類にしか属せないハードな文書分類間の関係をいう。分類パラメータ設定画面４３０には、さらに、「共存」が選択された場合にポリシの結合方法を指定するためのラジオボタン４４４がさらに示されている。ラジオボタン４４２において「共存」が選択されると、ラジオボタン４４４が操作可能な状態となる。ラジオボタン４４４は、ひとつの電子文書に複数の文書分類が付された場合に、各文書分類に対応するポリシを結合する方法を指定するためのものであり、図９に示す例では、「拒否優先」、「許可優先」、「分類優先」の３つの項目が示されている。 The classification parameter setting screen 430 further includes a radio button 442 for selecting a relationship between document classifications from “coexistence” or “exclusive”. Here, the relationship between “coexistence” document categories refers to a relationship between software document categories in which an electronic document can belong to a plurality of document categories at the same time. On the other hand, the relationship between “exclusive” document classifications refers to a relationship between hard document classifications in which an electronic document belongs only to a single document classification. The classification parameter setting screen 430 further shows a radio button 444 for designating a policy coupling method when “coexistence” is selected. When “coexistence” is selected in the radio button 442, the radio button 444 becomes operable. The radio button 444 is used to specify a method of combining policies corresponding to each document classification when a plurality of document classifications are attached to one electronic document. In the example shown in FIG. Three items of “priority”, “permission priority”, and “classification priority” are shown.

「拒否優先」とは、ポリシ中で明示的に許可されない事項を優先させて結合する方式であり、結合するすべてのポリシにおいて許可されている事項のみが、結合後のポリシにおいて許可されるように結合するものである。「許可優先」とは、上記「拒否優先」とは逆に、ポリシ中で明示的に許可されている事項を優先させて結合する方式であり、少なくとも結合するいずれかのポリシにおいて許可されている事項は、結合後のポリシにおいて許可されるように結合するものである。「分類優先」とは、ひとつの電子文書に対し複数の文書分類が付された場合に、優先順位が高い文書分類のポリシに従うよう結合するものである。 “Reject priority” is a method of giving priority to the items that are not explicitly allowed in the policy and combining them, so that only the items that are allowed in all the combined policies are allowed in the combined policy. It is to be combined. “Permitted priority” is a method of concatenating prioritized items that are explicitly permitted in a policy, contrary to the above “rejected priority”, and is permitted in at least one of the policies to be combined. The matter is to join as permitted in the combined policy. “Classification priority” is to combine documents according to a document classification policy with a high priority when a plurality of document classifications are assigned to one electronic document.

分類パラメータ設定画面４３０には、さらに、「分類優先」が選択された場合に文書分類の優先順位を設定するためのリストボックス４４６およびボタン４４８がさらに示されている。リストボックス４４６において文書分類が選択され、ボタン４４８がクリックされると、選択中（例えば反転表示４４６ａにより強調表示されている。）の文書分類の優先順位が引き上げられる。 The classification parameter setting screen 430 further shows a list box 446 and a button 448 for setting the document classification priority when “classification priority” is selected. When the document classification is selected in the list box 446 and the button 448 is clicked, the priority of the document classification being selected (for example, highlighted by the reverse display 446a) is raised.

分類パラメータ設定画面４３０において、１以上の文書分類が指定され、その文書分類に対し１以上のキーワードが指定され、文書分類間の関係が選択され、適宜ポリシの結合方法および文書分類の優先順位が決定された後、「次へ」ボタン４４０がクリックされると、ポリシ設定開始指令部２３０は、上述した文書分類、そのキーワード、文書分類間の関係、ポリシ結合方法、文書分類の優先順位などを含む分類パラメータと、指定された保管場所のリストとをセキュア文書管理サーバ１１０に渡すとともに、指定された分類パラメータに従った分類処理の実行を指令する。 In the classification parameter setting screen 430, one or more document classifications are designated, one or more keywords are designated for the document classification, the relationship between the document classifications is selected, the policy combining method and the document classification priority are appropriately set. When the “next” button 440 is clicked after the determination, the policy setting start command unit 230 displays the above-described document classification, the keyword, the relationship between the document classifications, the policy combination method, the document classification priority order, and the like. The classification parameter to be included and the list of designated storage locations are passed to the secure document management server 110 and the execution of the classification process according to the designated classification parameter is instructed.

セキュア文書管理サーバ１１０は、その機能部として、ファイル取得部２１０とファイルＤＢ２１２とを含む。ファイル取得部２１０は、管理端末１３０のポリシ設定開始指令部２３０からの指令に応答して、上記保管場所のリストに指定された保管場所から電子文書のファイルを取得し、それぞれファイルＤＢ２１２に一時的に保管する。なお、電子文書の取得方法は、特に限定されるものではなく、ファイル共有プロトコル、ファイル転送プロトコル（ＦＴＰ：File Transfer Protocol）、ハイパーテキスト転送プロトコル（ＨＴＴＰ：HyperText Transfer Protocol）などのこれまで知られた如何なるデータ転送プロトコルに従った取得方法とすることができる。また、説明する実施形態では、上記保管場所の指定は、セキュアファイル変換サービスを開始する際に行われるものとして説明したが、定期的または不定期に同一保管場所に対して設定処理を行う場合などには、事前に保管場所を登録していてもよい。なお、ファイル取得部２１０は、本実施形態において、文書分類を特徴付けるパラメータを受け取る手段、および電子文書を取得する取得手段を構成する。 The secure document management server 110 includes a file acquisition unit 210 and a file DB 212 as functional units. In response to the command from the policy setting start command unit 230 of the management terminal 130, the file acquisition unit 210 acquires the electronic document file from the storage location specified in the storage location list, and temporarily stores the file in the file DB 212. Keep in. In addition, the acquisition method of an electronic document is not particularly limited, and a file sharing protocol, a file transfer protocol (FTP), a hypertext transfer protocol (HTTP), and the like have been known so far. An acquisition method according to any data transfer protocol can be used. In the embodiment to be described, the storage location is specified when the secure file conversion service is started. However, when setting processing is performed on the same storage location regularly or irregularly, etc. The storage location may be registered in advance. In the present embodiment, the file acquisition unit 210 constitutes means for receiving parameters characterizing the document classification and acquisition means for acquiring an electronic document.

セキュア文書管理サーバ１１０は、その機能部としてさらに、ファイル解析部２１４と解析結果表示部２１６とを含む。ファイル取得部２１０は、指定の保管場所から対象となるすべての電子文書のファイルを取得すると、ファイル解析部２１４にファイルＤＢ２１２内の一群のファイルの解析処理を開始させる。ファイル解析部２１４は、指定された分類パラメータに従って、ファイルＤＢ２１２に保管された電子文書のそれぞれに対し、適切な文書フィルタを適用しながらテキストを抽出し、自然言語解析を施し、内容の類似性による電子文書の分類処理を実行する。ファイル解析部２１４は、本実施形態において、解析手段を構成する。 The secure document management server 110 further includes a file analysis unit 214 and an analysis result display unit 216 as functional units. When the file acquisition unit 210 acquires all the electronic document files from the designated storage location, the file acquisition unit 210 causes the file analysis unit 214 to start analyzing a group of files in the file DB 212. The file analysis unit 214 extracts the text while applying an appropriate document filter to each of the electronic documents stored in the file DB 212 according to the specified classification parameter, performs natural language analysis, and depends on the similarity of the contents The electronic document classification process is executed. The file analysis unit 214 constitutes an analysis unit in the present embodiment.

ここで、図３（Ａ）および図７を参照して、本実施形態の文書セキュリティ管理システム１００における、ファイル解析処理の詳細について説明する。図３は、第１実施形態の文書セキュリティ管理システムにおけるセキュアファイル変換サービスの動作に関連する詳細機能ブロック図を示す。図３（Ａ）は、ファイル解析部２１４周辺の詳細機能ブロック図を示す。図７は、本実施形態のファイル解析部が生成するデータのデータ構造を例示する。 Here, with reference to FIG. 3A and FIG. 7, the details of the file analysis processing in the document security management system 100 of the present embodiment will be described. FIG. 3 is a detailed functional block diagram related to the operation of the secure file conversion service in the document security management system of the first embodiment. FIG. 3A shows a detailed functional block diagram around the file analysis unit 214. FIG. 7 illustrates a data structure of data generated by the file analysis unit of the present embodiment.

ファイル解析部２１４は、詳細には図３（Ａ）に示すように、テキスト抽出部２６０と、構文解析部２６２と、文書分類部２６４とを含む。テキスト抽出部２６０は、ファイルＤＢ２１２から電子文書のファイルを読み出し、電子文書のファイルのフォーマットに対応する文書フィルタを呼び出して、電子文書からテキストを抽出し、構文解析部２６２に渡す。構文解析部２６２は、例えば電子文書の書誌情報等から使用言語を特定し、適切な自然言語処理エンジンを呼び出して、抽出されたテキストに対して構文解析を施す。自然言語処理エンジンとしては、より簡便には、所定の言語で記述されたテキストを形態素に分割し、品詞情報をタグ付けする形態素解析エンジンを用いることができる。説明する実施形態では、形態素解析エンジンを用いるが、適用可能な自然言語解析は、特に限定されるものではなく、これまで知られた如何なる自然言語解析処理を適用することができ、例えば係受け解析や、適切なパターン辞書を用いることで、より高次の意味のまとまりを抽出することもできる。 As shown in detail in FIG. 3A, the file analysis unit 214 includes a text extraction unit 260, a syntax analysis unit 262, and a document classification unit 264. The text extraction unit 260 reads an electronic document file from the file DB 212, calls a document filter corresponding to the electronic document file format, extracts the text from the electronic document, and passes it to the syntax analysis unit 262. The syntax analysis unit 262 specifies a language used from bibliographic information of an electronic document, for example, calls an appropriate natural language processing engine, and performs syntax analysis on the extracted text. As a natural language processing engine, more simply, a morphological analysis engine that divides text described in a predetermined language into morphemes and tags part-of-speech information can be used. In the embodiment to be described, a morphological analysis engine is used. However, applicable natural language analysis is not particularly limited, and any known natural language analysis processing can be applied, for example, dependency analysis. Or, by using an appropriate pattern dictionary, it is possible to extract a group of higher-order meanings.

例えば図７（Ａ）に示すテキスト３００に対し形態素解析が施されると、図７（Ｂ）に示すような形態素３１０ａおよび品詞情報３１０ｂを含む形態素解析結果データ３１０が生成される。また構文解析部２６２は、適切なフィルタリング処理を施して、助詞、日付、「する」「なる」「ある」「いる」などの用語や、副詞、語気助詞、接続詞、助動詞、方向詞などの不要な語を除去することができる。図７（Ｂ）に示した例では、アスタリスクを付した形態素を除去することができる。そして、電子文書中に含まれる形態素が集計され、形態素と当該形態素の出現回数との集合となり、文書ベクトルが生成される。また他の実施形態では、すべての電子文書のファイルに対する解析処理が完了した後に、すべての電子文書に対する解析結果をさらにｔｆ−ｉｄｆなどの解析を施すことにより、有効な形態素と、形態素に対する重み付けとの集合として文書ベクトルを生成することもできる。 For example, when the morpheme analysis is performed on the text 300 shown in FIG. 7A, morpheme analysis result data 310 including the morpheme 310a and the part of speech information 310b as shown in FIG. 7B is generated. In addition, the parsing unit 262 performs an appropriate filtering process, and does not require particles, dates, terms such as “do”, “be”, “is”, “is”, adverbs, vocabulary particles, conjunctions, auxiliary verbs, directives, etc. Simple words can be removed. In the example shown in FIG. 7B, morphemes with an asterisk can be removed. Then, the morphemes contained in the electronic document are aggregated to form a set of morphemes and the number of appearances of the morphemes, and a document vector is generated. In another embodiment, after the analysis processing for all the electronic document files is completed, the analysis result for all the electronic documents is further analyzed by tf-idf, etc. Document vectors can also be generated as a set of.

図７（Ｃ）は、各電子文書について算出される文書ベクトルを含む文書ベクトルデータのデータ構造を例示する。図７（Ｃ）に示す文書ベクトルデータ３２０は、形態素３２０ａと、各電子文書における形態素の出現回数３２０ｂ〜ｄとを含んで構成される。文書分類部２６４は、生成された文書ベクトルデータ３２０を読み出して、例えばＫ平均（K-means）法やＥＭ（Expectation Maximization）法等の分類およびクラスタリング・アルゴリズムを用いて、上記分類パラメータに従って各電子文書をいずれかの文書分類にラベルして分類する。文書クラスタリングの方法としては、すでに文書分類がラベルされている電子文書が多数存在する場合には、このラベル済みの一群の電子文書から学習して文書分類のルールを得ることができる。一方、初期の導入時など文書分類がラベルされている電子文書が少数である場合に対応し、本実施形態では、文書分類がラベルされていない電子文書を利用して分類精度を向上させる手法を利用する。このような手法としては、制約付きのクラスタリング・アルゴリズムを用いることができ、制約付きＫ平均（Constrained K-means）法を好適に用いることができる。なお、本実施形態では、初期のクラスタ中心をランダムに選ぶのではなく、キーワードから求めることを制約として用いる。 FIG. 7C illustrates a data structure of document vector data including a document vector calculated for each electronic document. The document vector data 320 shown in FIG. 7C includes a morpheme 320a and morpheme appearance counts 320b to 320d in each electronic document. The document classification unit 264 reads out the generated document vector data 320, and uses each classification and clustering algorithm such as a K-means method or an EM (Expectation Maximization) method, according to the classification parameters. Label a document with one of the document classifications. As a document clustering method, when there are a large number of electronic documents already labeled with a document classification, a rule for document classification can be obtained by learning from the group of electronic documents that have been labeled. On the other hand, corresponding to the case where there are a small number of electronic documents labeled with document classification, such as at the time of initial introduction, this embodiment uses a method for improving classification accuracy using electronic documents that are not labeled with document classification. Use. As such a method, a constrained clustering algorithm can be used, and a constrained K-means method can be suitably used. In the present embodiment, the initial cluster center is not selected at random, but is obtained from a keyword as a constraint.

より具体的には、「排他」の文書分類間の関係が選択された場合には、まず各文書分類Ｃ_ｉ（ｉ＝１，…Ｎ｜Ｎは文書分類の数である。）指定されたキーワードを用いて初期の中心文書ベクトルＷ（Ｃ_ｉ）のセットを生成し、中心文書ベクトルＷ（Ｃ_ｉ）との距離に従った（再）分類と、各中心文書ベクトルＷ（Ｃ_ｉ）の再計算（つまり、各文書分類に（再）分類された電子文書にわたり文書ベクトルにおける出現回数の平均を算出する。）とを、定常解に収束するまで繰り返すことにより、各文書分類に指定されたキーワードに応じた分類を行うことができる。この場合、各電子文書には、いずれかひとつの文書分類がラベルされることとなる。図７（Ｄ）は、収束した中心文書ベクトルのデータ構造を例示する。図７（Ｄ）に示す中心文書ベクトルデータ３３０は、形態素３３０ａと、各文書分類における形態素の出現回数の平均値３３０ｂ〜ｄとの集合として構成される。 More specifically, when a relationship between “exclusive” document classifications is selected, each document classification C _i (i = 1,... N | N is the number of document classifications) is designated. A set of initial central document vectors W (C _i ) is generated using keywords, (re) classification according to the distance from the central document vector W (C _i ), and each central document vector W (C _i ) Specified for each document class by repeating recalculation (that is, calculating the average number of occurrences in the document vector over electronic documents classified (re-) in each document class) until convergence to a steady solution. Classification according to keywords can be performed. In this case, each electronic document is labeled with any one document classification. FIG. 7D illustrates the data structure of the converged central document vector. The central document vector data 330 shown in FIG. 7D is configured as a set of morphemes 330a and average values 330b to d of morpheme appearance times in each document classification.

一方、「共存」の文書分類間の関係が選択された場合には、ひとつの文書分類Ｃ_ｉに対し、その文書分類Ｃ_ｉとそれ以外Ｎｏｔ＿Ｃ_ｉとに分けて、各文書分類Ｃｉに対してＫ平均アルゴリズムを適用することにより、各文書分類に指定されたキーワードに応じた分類を行うことができる。この場合において、ひとつの文書分類Ｃ_ｉについてまず、その文書分類Ｃ_ｉに指定されたキーワードを用いて初期の中心文書ベクトルＷ（Ｃ_ｉ）を生成し、それ以外の文書分類に指定されたキーワードから、それ以外を表すＮｏｔ＿Ｃ_ｉの初期の中心文書ベクトルＷ（Ｎｏｔ＿Ｃ_ｉ）を生成する。そして、中心文書ベクトルＷ（Ｃ_ｉ）またはＷ（Ｎｏｔ＿Ｃ_ｉ）との距離に従った文書分類Ｃ_ｉそれ以外の文書分類Ｎｏｔ＿Ｃ_ｉへの（再）分類と、各中心文書ベクトルの再計算とを、定常解に収束するまで繰り返すことにより、文書分類Ｃ_ｉとそれ以外とへの分類を行う。そして、上述した操作をすべての文書分類Ｃ_ｉ（１，…，Ｎ）について行う。この場合、ひとつの電子文書に対し複数の文書分類をラベルすることが可能となる。 On the other hand, when the relationship between the “coexistence” document classifications is selected, one document classification C _i is divided into the document classification C _i and the other Not_C _i, and each document classification C _i is divided. By applying the K average algorithm, classification according to the keyword specified for each document classification can be performed. In this case, for one document classification C _i , first, an initial central document vector W (C _i ) is generated using the keyword specified in the document classification C _i , and the keyword specified in the other document classification. from generates an initial central document vector W of Not_C _i representing the rest (Not_C _i). Then, the document classification C _i according to the distance to the central document vector W (C _i ) or W (Not_C _i ) and the (re) classification to the other document classification Not_C _i and the recalculation of each central document vector are performed. By repeating until convergence to a stationary solution, classification into document classification C _i and other is performed. Then, the above-described operation is performed for all document classifications C _i (1,..., N). In this case, it is possible to label a plurality of document classifications for one electronic document.

再び図２を参照して説明する。ファイル解析部２１４は、対象となるすべての電子文書に対する解析処理が完了すると、その解析結果を解析結果表示部２１６に渡す。解析結果表示部２１６は、渡された解析結果をユーザが理解しやすい形式のデータに処理し、管理端末１３０へ送信する。解析結果のデータを受信すると、管理端末１３０では、その解析結果閲覧部２３２が、解析処理により分類された結果を示す分類結果閲覧画面を管理端末１３０の表示画面上に表示させる。図１０は、第１実施形態の管理端末の表示画面上に表示される他のＧＵＩを例示する。図１０（Ａ）は、分類結果閲覧画面を示す。図１０（Ａ）に示す分類結果閲覧画面４５０は、いずれかの文書分類に属する電子文書を一覧表示するためのタブ４５２ａ，４５２ｂ，４５２ｃを含んでいる。図１０（Ａ）に示す例では、「経営」のタブ４５２ａが選択され、「経営」の文書分類に属する電子文書の一覧リスト４５４が表示されている。一覧リスト４５４には、電子文書のファイル名と、保管場所を示すＵＲＬパスとが、スクロールバー４５４ｂによりスクロール可能に表示されている。 A description will be given with reference to FIG. 2 again. When the analysis process for all the target electronic documents is completed, the file analysis unit 214 passes the analysis result to the analysis result display unit 216. The analysis result display unit 216 processes the passed analysis result into data in a format that is easy for the user to understand, and transmits the data to the management terminal 130. When the analysis result data is received, in the management terminal 130, the analysis result browsing unit 232 displays a classification result browsing screen showing a result classified by the analysis processing on the display screen of the management terminal 130. FIG. 10 illustrates another GUI displayed on the display screen of the management terminal according to the first embodiment. FIG. 10A shows a classification result browsing screen. The classification result browsing screen 450 shown in FIG. 10A includes tabs 452a, 452b, and 452c for displaying a list of electronic documents belonging to any one of the document classifications. In the example shown in FIG. 10A, the “management” tab 452a is selected, and a list 454 of electronic documents belonging to the “management” document classification is displayed. In the list 454, the file name of the electronic document and the URL path indicating the storage location are displayed in a scrollable manner by the scroll bar 454b.

タブ４５２ａは、さらに、分類処理の結果生成された分類情報のユーザによる編集指示を待ち受ける分類情報編集ボタン４５６を含んでいる。分類情報編集ボタン４５６がクリックされると、解析結果閲覧部２３２は、一覧リスト４５４において現在選択中（例えば反転表示４５４ａにより強調表示されている。）の電子文書に対する分類情報を編集するための分類情報編集ダイアログを呼び出す。図１０（Ｂ）および（Ｃ）は、分類情報編集ダイアログを示す。図１０（Ｂ）は、「共存」が選択された場合の分類情報編集ダイアログ４６０Ａを示し、電子文書の表示４６２Ａと、当該文書が分類される文書分類を指定するチェックボックス４６４Ａと、文書分類情報の内容を確定する決定ボタン４６６Ａとを含む。一方、図１０（Ｃ）は、「排他」が選択された場合の分類情報編集ダイアログ４６０Ｂを示し、チェックボックスに代えて、文書分類を排他的に選択するためのラジオボタン４６４Ｂが示されている。このようなダイアログによる編集により、管理者は、分類の判定が気になる電子文書に対して、望ましい文書分類に分類されるように修正を施すことが可能となる。 The tab 452a further includes a classification information editing button 456 that waits for an editing instruction from the user of the classification information generated as a result of the classification process. When the classification information editing button 456 is clicked, the analysis result browsing unit 232 classifies the classification information for editing the electronic document currently selected in the list 454 (for example, highlighted by the reverse display 454a). Call the information edit dialog. FIGS. 10B and 10C show the classification information editing dialog. FIG. 10B shows a classification information editing dialog 460A when “coexistence” is selected, a display 462A of an electronic document, a check box 464A for designating a document classification into which the document is classified, and document classification information And a determination button 466A for confirming the contents of. On the other hand, FIG. 10C shows a classification information editing dialog 460B when “exclusive” is selected, and instead of a check box, a radio button 464B for exclusively selecting a document classification is shown. . By editing using such a dialog, the administrator can modify an electronic document that is anxious to be classified into a desired document classification.

再び図１０（Ａ）を参照すると、分類結果閲覧画面４５０において、「次へ」ボタン４５８がクリックされると、解析結果閲覧部２３２は、ポリシ設定入力部２３４に処理を渡す。管理端末１３０のポリシ設定入力部２３４は、各文書分類毎にセキュリティポリシを設定入力するためのポリシ設定入力画面を表示させる。図１１は、第１実施形態の管理端末１３０の表示画面上に表示されるポリシ設定入力画面を示す。本実施形態において電子文書に適用されるセキュリティポリシは、ポリシ番号と、ポリシ発効日と、閲覧権限者と、閲覧付加条件と、印刷権限者と、印刷権限付加条件とを含んで構成される。そして、図１１に示すポリシ設定入力画面４７０は、これらセキュリティポリシを構成する情報を入力するためのＧＵＩ部品を含んでいる。 Referring to FIG. 10A again, when the “next” button 458 is clicked on the classification result browsing screen 450, the analysis result browsing unit 232 passes the processing to the policy setting input unit 234. The policy setting input unit 234 of the management terminal 130 displays a policy setting input screen for setting and inputting a security policy for each document classification. FIG. 11 shows a policy setting input screen displayed on the display screen of the management terminal 130 of the first embodiment. The security policy applied to the electronic document in this embodiment includes a policy number, a policy effective date, a viewing authority, a viewing additional condition, a printing authority, and a printing authority additional condition. The policy setting input screen 470 shown in FIG. 11 includes a GUI component for inputting information constituting these security policies.

ポリシ設定入力画面４７０は、ポリシ番号を表示するボックス４７２と、ポリシ発効日を入力するためのテキストボックス群４７４とを含む。ポリシ設定入力部２３４は、複数のポリシを識別するための識別値であるポリシ番号を採番し、ボックス４７２に表示させる。「ポリシ発効日」とは、ここで作成されるセキュリティポリシが実際にセキュアファイルに適用されて有効となる日である。本実施形態においては、セキュアファイルに対して複数のセキュリティポリシが設定可能に構成され、複数作成されるそれぞれのセキュリティポリシに対し異なる発効日を設定しておくことにより、ポリシ発効日を境に異なるポリシをセキュアファイルに適用させることが可能とされる。 The policy setting input screen 470 includes a box 472 for displaying a policy number and a text box group 474 for inputting a policy effective date. The policy setting input unit 234 assigns a policy number, which is an identification value for identifying a plurality of policies, and displays it in the box 472. The “policy effective date” is the date when the security policy created here is actually applied to the secure file and becomes effective. In this embodiment, a plurality of security policies can be set for a secure file, and different effective dates are set for each of the plurality of created security policies, so that the policy effective dates differ. The policy can be applied to the secure file.

ポリシ設定入力画面４７０は、さらに、閲覧権限者を登録するためのリストボックス４７６と、閲覧付加条件を登録するためのボックス４７８とを含む。「閲覧権限者」とは、セキュアファイルの閲覧が許可される利用者である。管理者は、所定の文書分類に属する電子文書のセキュアファイルに対し、閲覧許可を付与したい利用者の氏名やユーザＩＤ、端末名などの利用要求元を識別する利用要求元識別情報をこのリストに入力する。「閲覧付加条件」とは、閲覧権限者によるセキュアファイルの閲覧に際して、付加的に適用する条件をいう。すなわち、「閲覧付加条件」は、閲覧権限者に対し、閲覧に付随してセキュアファイルの取り扱いの範囲を設定する条件をいい、後述するように取り扱いの注意を喚起する条件や、取り扱いの制限を示す条件などを挙げることができる。図１１に示すセキュリティポリシ設定入力画面においては、管理者は、予め用意された条件の中からポリシとして設定したい条件を指定する。例えば、閲覧オプションの「警告メッセージ表示」が有効に設定されると、利用者がセキュアファイルを閲覧する際に、利用者が利用する利用者端末１４０の表示画面上に「取り扱いに注意してください。」などの警告メッセージが表示されるようになる。 Policy setting input screen 470 further includes a list box 476 for registering a viewing authority and a box 478 for registering additional viewing conditions. The “viewing authority” is a user who is permitted to view a secure file. The administrator includes usage request source identification information for identifying the usage request source such as the name, user ID, and terminal name of the user who wants to grant viewing permission for the secure file of the electronic document belonging to the predetermined document classification in this list. input. The “browsing additional condition” refers to a condition additionally applied when browsing a secure file by a viewing authority. In other words, the “browsing additional condition” is a condition for setting the scope of handling of a secure file in association with browsing for a viewing authority, and conditions for alerting handling and restrictions on handling as described later. The conditions shown can be mentioned. In the security policy setting input screen shown in FIG. 11, the administrator designates a condition to be set as a policy from conditions prepared in advance. For example, if the warning option “Warning message display” is enabled, when the user browses a secure file, the user terminal 140 used by the user will display “Please handle with care. A warning message such as “.” Is displayed.

ポリシ設定入力画面４７０は、さらに、印刷権限者を登録するためのリストボックス４８０と、印刷付加条件を登録するためのボックス４８２とを含む。「印刷権限者」とは、セキュアファイルの閲覧が許可される利用者のうち、さらに印刷が許可される利用者である。印刷権限者として登録されると、利用者端末１４０においてセキュアファイルを印刷することが可能となる。管理者は、所定の文書分類に属する電子文書のセキュアファイルに対し、印刷許可を付与したい利用者の利用要求元識別情報をこのリストに入力する。「印刷付加条件」とは、印刷権限者によるセキュアファイルの印刷に際して付加的に適用する条件をいう。 Policy setting input screen 470 further includes a list box 480 for registering a print authority and a box 482 for registering additional printing conditions. The “printing authority” is a user who is permitted to print among users who are permitted to view the secure file. If the user is registered as a print authority, the user terminal 140 can print a secure file. The administrator inputs the use request source identification information of the user who wants to give print permission to the secure file of the electronic document belonging to the predetermined document classification in this list. The “print additional condition” refers to a condition additionally applied when printing a secure file by a print authority.

例えば印刷オプションの「警告メッセージ表示」が有効とされると、利用者端末１４０においてセキュアファイルが印刷される際に、「取り扱いに注意してください。」などの警告メッセージが表示画面上に表示されるようになる。印刷オプションの「機密印刷」が有効にされると、プリントアウトに際し、利用者に対しプリンタへの秘密のＰＩＮ（Personal Identification Number）の入力が要求されるようになる。さらに、「地紋印刷」が有効とされると、複写機で複写された場合に例えば利用者名及び印刷日時が浮き上がる地紋画像が、セキュアファイルの画像に重ね合わせて印刷されるようになる。「スタンプ印刷」が有効とされると、例えば「極秘」のマークがスタンプとしてセキュアファイルの画像に重ねて印刷されるようになる。「警告印字」が有効とされると、セキュアファイルが印刷される際に、例えば利用者名及び印刷日時が印字されるようになる。 For example, when the “warning message display” print option is enabled, a warning message such as “Please handle with care” is displayed on the display screen when a secure file is printed on the user terminal 140. Become so. When the “confidential printing” printing option is enabled, a user is required to input a secret personal identification number (PIN) to the printer when printing out. Further, when “background pattern printing” is enabled, a background pattern image in which, for example, the user name and the printing date and time are highlighted when the image is copied by a copying machine, is superimposed on the image of the secure file. When “stamp printing” is enabled, for example, a “confidential” mark is printed as a stamp on the image of the secure file. When “warning printing” is enabled, when a secure file is printed, for example, a user name and a printing date are printed.

なお、上述した閲覧オプションおよび印刷オプションは、例示であって、「閲覧付加条件」は、上述した条件以外の付加条件をセキュリティポリシとして設定するようにすることもできる。例えば、「地紋印刷」や「スタンプ印刷」や「警告印字」などに対し、さらに印刷する地紋パターン、スタンプ、警告の内容などを個別に指定するようなことも可能である。 Note that the browsing option and the printing option described above are examples, and the “viewing additional condition” may set an additional condition other than the above-described condition as a security policy. For example, for “background pattern printing”, “stamp printing”, “warning printing”, etc., it is also possible to individually specify a background pattern pattern to be printed, a stamp, a warning content, and the like.

ここで再び図２を参照して説明する。管理者により、すべての文書分類に対しセキュリティポリシ設定入力画面４７０を介してセキュリティポリシの設定が行われ、最後の文書分類に対するセキュリティポリシ設定入力画面４７０の「次へ」ボタン４８４がクリックされると、ポリシ設定入力部２３４は、ポリシ設定処理の開始をセキュア文書管理サーバ１１０に指令する。なお、各文書分類毎のセキュリティポリシを記述するデータは、上記セキュリティポリシ設定入力画面４７０の切り替わりの際に、または最後のポリシ設定処理の開始の指令とともにセキュア文書管理サーバ１１０に送信される。また、文書分類の修正内容も、セキュア文書管理サーバ１１０に送信され、確定された文書分類の結果が、ファイルＤＢ２１２に保存される。 Here, description will be given with reference to FIG. 2 again. When the administrator sets the security policy for all document classifications via the security policy setting input screen 470 and clicks the “Next” button 484 on the security policy setting input screen 470 for the last document classification. The policy setting input unit 234 instructs the secure document management server 110 to start the policy setting process. Note that the data describing the security policy for each document category is transmitted to the secure document management server 110 when the security policy setting input screen 470 is switched or together with a command to start the last policy setting process. The correction contents of the document classification are also transmitted to the secure document management server 110, and the confirmed document classification result is stored in the file DB 212.

セキュア文書管理サーバ１１０は、その機能部として、ポリシ設定部２１８と、ファイル送信部２２０とをさらに含む。ポリシ設定部２１８は、各文書分類毎のセキュリティポリシを記述するデータを受け取り、ファイルＤＢ２１２から文書分類情報を読み出し、各文書分類に属する電子文書それぞれに対するセキュリティポリシの設定データに変換する。 The secure document management server 110 further includes a policy setting unit 218 and a file transmission unit 220 as functional units. The policy setting unit 218 receives data describing the security policy for each document classification, reads out the document classification information from the file DB 212, and converts it into security policy setting data for each electronic document belonging to each document classification.

より具体的には、文書分類間の関係に「排他」が選択された場合には、ひとつの電子文書にはひとつの文書分類しかラベルされていないため、ラベルされた文書分類に対応するセキュリティポリシがその電子文書に設定すべきポリシとなる。一方、文書分類間の関係に「共存」が選択された場合には、ひとつの電子文書に複数の文書分類がラベルされている可能性があるため、選択された結合方法に従ってラベルされた文書分類のセキュリティポリシを結合する。ひとつの文書分類のみがラベルされている電子文書については、ラベルされた文書分類に対応するセキュリティポリシに従う。複数の文書分類がラベルされた電子文書については、（ｉ）「拒否優先」の結合方法が選択された場合には、閲覧権限者および印刷権限者の共通集合を求め、付加条件については、一旦設定されたことを検知したものをそのまま流用する。またこの場合、「拒否優先」は、最も制約が厳しくなるように運用するものであるから、付加条件の和集合を設定するよう構成してもよい。（ｉｉ）「許可優先」の結合方法が選択された場合には、閲覧権限者および印刷権限者の和集合を求め、付加条件については、一旦設定されたことを検知すると、そのまま流用する。またこの場合、「許可優先」は、制約が緩和されるように運用するものであるから、付加条件の共通集合を設定するよう構成してもよい。（ｉｉｉ）「分類優先」の結合方法が選択された場合には、閲覧権限者、印刷権限者および各付加条件について、優先順位の高い方の文書分類のものに従う。 More specifically, when “exclusive” is selected for the relationship between document classifications, only one document classification is labeled in one electronic document, and therefore, the security policy corresponding to the labeled document classification. Is the policy to be set for the electronic document. On the other hand, if “coexistence” is selected as the relationship between document classifications, multiple document classifications may be labeled in one electronic document, so document classifications labeled according to the selected combination method Combine security policies. For an electronic document that is labeled with only one document classification, it follows the security policy corresponding to the labeled document classification. For an electronic document labeled with a plurality of document categories, (i) when the “rejection priority” combination method is selected, a common set of viewing authority and printing authority is obtained. The detected one is used as it is. In this case, since “rejection priority” is operated so as to have the strictest restrictions, a union of additional conditions may be set. (Ii) When the combination method of “permission priority” is selected, the union of the viewing authority and the printing authority is obtained, and the additional condition is used as it is once it is detected. In this case, the “permission priority” is operated so that the restriction is relaxed, so that a common set of additional conditions may be set. (Iii) When the combination method of “classification priority” is selected, the browsing authority person, the printing authority person, and each additional condition follow the document classification with the higher priority.

例えば、「経営」の文書分類に対し、「山田太郎」および「田中花子」が閲覧権限者として登録され、「生産管理」の文書分類に対し、「山田太郎」、「田中花子」および「鈴木良子」が閲覧権限者として登録されている場合の「経営」および「生産管理」の文書分類がラベルされている電子文書について考える。上記（ｉ）「拒否優先」の結合方法では、両分類で閲覧権限を有する「山田太郎」および「田中花子」のみが閲覧権限者となり、上記（ｉｉ）「許可優先」の結合方法では、少なくともいずれかの分類の閲覧権限を有する「山田太郎」、「田中花子」および「鈴木良子」が閲覧権限者となる。（ｉｉｉ）「分類優先」の結合方法が選択された場合には、優先順位が高い文書分類のポリシに従う。 For example, “Taro Yamada” and “Hanako Tanaka” are registered as viewing authority for the document classification of “Management”, and “Taro Yamada”, “Hanako Tanaka”, and “Suzuki” are registered for the document classification of “Production Management”. Consider an electronic document that is labeled with the “Management” and “Production Management” document classifications when “Ryoko” is registered as a viewing authority. In the above (i) “rejection priority” combination method, only “Taro Yamada” and “Hanako Tanaka” who have the browsing authority in both categories become the browsing authority, and in the above (ii) “permission priority” combination method, “Taro Yamada”, “Hanako Tanaka”, and “Ryoko Suzuki” who have viewing authority of any category are viewing authority persons. (Iii) When the combination method of “classification priority” is selected, the policy of document classification with a high priority is followed.

図１２は、第１実施形態の文書セキュリティ管理システム１００において使用されるデータ構造を示す。図１２（Ａ）は、各電子文書に対応して生成されるセキュリティポリシの設定データのデータ構造を示す。図１２（Ａ）に示すように、設定データには、電子文書の保管場所を示すＵＲＬと、文書のファイル名と、設定されたセキュリティポリシの内容とが記述されている。電子文書に対する設定データが変換されると、ポリシ設定部２１８は、各文書分類に属する電子文書それぞれのオリジナルファイルと、対応する設定データとをセットにして、ファイル送信部２２０介してセキュリティサーバ１５０へ送信する。ポリシ設定部２１８は、本実施形態において、設定手段および文書分類毎のセキュリティポリシを受け取る手段を構成する。 FIG. 12 shows a data structure used in the document security management system 100 of the first embodiment. FIG. 12A shows a data structure of security policy setting data generated corresponding to each electronic document. As shown in FIG. 12A, the setting data describes a URL indicating the storage location of the electronic document, the file name of the document, and the contents of the set security policy. When the setting data for the electronic document is converted, the policy setting unit 218 sets the original file of each electronic document belonging to each document classification and the corresponding setting data as a set to the security server 150 via the file transmission unit 220. Send. In this embodiment, the policy setting unit 218 constitutes a setting unit and a unit that receives a security policy for each document classification.

セキュリティサーバ１５０では、ファイル受信部２５０は、電子文書のオリジナルファイルと、対応する設定データとをセキュア文書管理サーバ１１０のファイル送信部２２０から受け取り、セキュアファイル変換部２５２に渡す。セキュアファイル変換部２５２は、ファイル受信部２５０から渡された電子文書のオリジナルファイルを、対応する設定データに従ってセキュリティ保護されたセキュアファイルに変換し、かつセキュアファイルを利用するための管理データであるセキュリティ管理データを生成する。なお、セキュアファイル変換部２５２は、本実施形態の変換手段を構成する。 In the security server 150, the file receiving unit 250 receives the original file of the electronic document and the corresponding setting data from the file transmission unit 220 of the secure document management server 110 and passes them to the secure file conversion unit 252. The secure file conversion unit 252 converts the original file of the electronic document passed from the file reception unit 250 into a secure file that is secured according to the corresponding setting data, and is security data that is management data for using the secure file Generate management data. Note that the secure file conversion unit 252 constitutes the conversion unit of the present embodiment.

以下、本実施形態の文書セキュリティ管理システム１００における、セキュアファイル変換処理の詳細について説明する。図３（Ｂ）は、セキュアファイル変換部２５２周辺の詳細ブロック図を示す。セキュアファイル変換部２５２は、図３（Ｂ）に示すように、より詳細には、ファイル暗号化部２７０と、鍵生成部２７２と、セキュリティ管理データ作成部２７４と、セキュアファイル作成部２７６と、ファイルＩＤ生成部２７８とを含む。 Details of the secure file conversion process in the document security management system 100 of this embodiment will be described below. FIG. 3B shows a detailed block diagram around the secure file conversion unit 252. As shown in FIG. 3B, the secure file conversion unit 252 includes a file encryption unit 270, a key generation unit 272, a security management data creation unit 274, a secure file creation unit 276, A file ID generation unit 278.

鍵生成部２７２は、暗号化アルゴリズムを用いて電子文書のオリジナルファイルを暗号化し、また復号するための鍵をランダムに生成する。ファイル暗号化部２７０は、ファイル受信部２５０が受信した電子文書のオリジナルファイルを、鍵生成部２７２が生成した暗号化鍵を用いて暗号化し、暗号化ファイルを作成する。ファイルＩＤ生成部２７８は、ファイル暗号化部２７０が暗号化したファイルを識別するための固有のファイルＩＤを採番する。なお、本実施形態で利用することができる暗号化アルゴリズムとしては、ＤＥＳ（Data Encryption Standard）トリプルＤＥＳ、ＡＥＳ（Advanced Encryption Standard）などの共通鍵暗号化方式のアルゴリズムを挙げることができる。また、他の実施形態では、ＲＳＡやＥＬＧａｍａｌなどの公開鍵暗号方式のアルゴリズムを用いることもできる。 The key generation unit 272 encrypts the original file of the electronic document using an encryption algorithm, and randomly generates a key for decryption. The file encryption unit 270 encrypts the original file of the electronic document received by the file reception unit 250 using the encryption key generated by the key generation unit 272, and creates an encrypted file. The file ID generation unit 278 assigns a unique file ID for identifying the file encrypted by the file encryption unit 270. Examples of encryption algorithms that can be used in the present embodiment include algorithms of common key encryption methods such as DES (Data Encryption Standard) triple DES and AES (Advanced Encryption Standard). In another embodiment, an algorithm of a public key cryptosystem such as RSA or ELGamal can be used.

セキュアファイル作成部２７６は、ファイル暗号化部２７０が作成した暗号化ファイルに、ファイルＩＤ生成部２７８が生成したファイルＩＤを付加して、セキュアファイルを作成し、セキュアファイル送信部２５４に渡す。そして、セキュアファイル送信部２５４は、セキュアファイル変換部２５２が生成したセキュアファイルをセキュア文書管理サーバ１１０へ送信する。一方、セキュリティ管理データ作成部２７４は、上記暗号化ファイルを復号するための鍵と、ファイル受信部２５０が受信した設定データとに従って、利用を管理するためのセキュリティ管理データを作成し、セキュリティ管理データ保管部２５６に渡す。セキュリティ管理データ保管部２５６は、セキュアファイルに対応付けてセキュリティ管理データをセキュリティ管理ＤＢ２５８に格納する。 The secure file creation unit 276 creates a secure file by adding the file ID generated by the file ID generation unit 278 to the encrypted file created by the file encryption unit 270, and passes it to the secure file transmission unit 254. Then, the secure file transmission unit 254 transmits the secure file generated by the secure file conversion unit 252 to the secure document management server 110. On the other hand, the security management data creation unit 274 creates security management data for managing usage according to the key for decrypting the encrypted file and the setting data received by the file receiving unit 250, and the security management data It is passed to the storage unit 256. The security management data storage unit 256 stores the security management data in the security management DB 258 in association with the secure file.

図１２（Ｂ）は、セキュリティ管理ＤＢ２５８に保管されるセキュリティ管理データのデータ構造を例示する。図１２（Ｂ）に示すように、セキュリティ管理データには、電子文書に固有に割り当てられるファイルＩＤと、セキュアファイルの暗号化データを復号するための鍵と、設定されたセキュリティポリシの内容とが記述される。 FIG. 12B illustrates the data structure of security management data stored in the security management DB 258. As shown in FIG. 12B, the security management data includes a file ID uniquely assigned to the electronic document, a key for decrypting the encrypted data of the secure file, and the contents of the set security policy. Described.

ここで再び図２を参照して説明すると、セキュア文書管理サーバ１１０は、さらにその機能部として、セキュアファイル受信部２２２と、セキュアファイルＤＢ２２４とを含む。セキュアファイル受信部２２２は、セキュリティサーバ１５０で変換された各電子文書に対応するセキュアファイルを受信し、利用者端末１４０などから外部利用できるようにセキュアファイルＤＢ２２４に保管する。これにより、利用者端末１４０のセキュアファイル取得部２４０は、セキュア文書管理サーバ１１０のセキュアファイルＤＢ２２４にアクセスし、所望の電子文書のセキュアファイルを利用することが可能となる。 Here, referring to FIG. 2 again, the secure document management server 110 further includes a secure file receiving unit 222 and a secure file DB 224 as its functional units. The secure file receiving unit 222 receives a secure file corresponding to each electronic document converted by the security server 150 and stores it in the secure file DB 224 so that it can be used externally from the user terminal 140 or the like. Thus, the secure file acquisition unit 240 of the user terminal 140 can access the secure file DB 224 of the secure document management server 110 and use a secure file of a desired electronic document.

なお、セキュア文書管理サーバ１１０がウェブサーバとして構成され、管理端末１３０においてウェブブラウザを利用してウェブベースの管理インタフェースにアクセスする実施形態では、セキュア文書管理サーバ１１０における上記ファイル取得部２１０、解析結果表示部２１６およびポリシ設定部２１８は、ＨＴＭＬ（HyperText Markup Language）や、ＪａｖａＳｃｒｉｐｔ（登録商標）などのスクリプト言語によって、各画面のユーザインタフェースを記述するウェブページを生成する手段として構成される。この場合、生成されたウェブページは、ＨＴＴＰプロトコルに従って管理端末１３０に送信され、管理端末１３０上のウェブブラウザにおいて、ウェブページの記述に従って、ポリシ設定開始指令部２３０、解析結果閲覧部２３２およびポリシ設定入力部２３４とが実現されることとなる。 In the embodiment in which the secure document management server 110 is configured as a web server and the management terminal 130 accesses the web-based management interface using a web browser, the file acquisition unit 210 and the analysis result in the secure document management server 110 are analyzed. The display unit 216 and the policy setting unit 218 are configured as means for generating a web page describing a user interface of each screen by using a script language such as HTML (HyperText Markup Language) or JavaScript (registered trademark). In this case, the generated web page is transmitted to the management terminal 130 according to the HTTP protocol, and in the web browser on the management terminal 130, the policy setting start command unit 230, the analysis result browsing unit 232, and the policy setting according to the description of the web page. The input unit 234 is realized.

以下、本実施形態のセキュアファイル変換サービスの動作について、シーケンス図およびフローチャートを参照しながら説明する。図４は、第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル変換サービスの動作を示すシーケンス図である。図４には、管理端末１３０、セキュア文書管理サーバ１１０およびセキュリティサーバ１５０の動作が示されている。 Hereinafter, the operation of the secure file conversion service of this embodiment will be described with reference to sequence diagrams and flowcharts. FIG. 4 is a sequence diagram showing the operation of the secure file conversion service in the document security management system of the first embodiment. FIG. 4 shows operations of the management terminal 130, the secure document management server 110, and the security server 150.

管理端末１３０において、例えば図８（Ａ）に示す開始画面４００を介して、セキュアファイルへの一括変換を要求する旨の指示が入力装置に入力されると、このセキュアファイルの一括変換処理が開始され、ステップＳ１００で管理端末１３０は、例えば図８（Ｂ）および図９に示したポリシ設定のための画面を表示装置に表示させる。ポリシ設定のための画面を介して、対象の保管場所および分類パラメータの入力を受け付け決定されると、ステップＳ１０１で管理端末１３０は、指定された保管場所のリストと、分類パラメータとをセキュア文書管理サーバ１１０に渡して、分類パラメータを初期化するとともに、指定された条件に従った分類処理の実行を指令する。 In the management terminal 130, when an instruction to request batch conversion to a secure file is input to the input device via the start screen 400 shown in FIG. 8A, for example, the batch conversion processing of the secure file starts. In step S100, the management terminal 130 displays, for example, the policy setting screens shown in FIGS. 8B and 9 on the display device. When the input of the target storage location and the classification parameter is accepted and determined via the policy setting screen, the management terminal 130 performs secure document management on the list of the specified storage location and the classification parameter in step S101. The data is passed to the server 110 to initialize the classification parameters and to instruct the execution of the classification process according to the specified conditions.

指令を受けてセキュア文書管理サーバ１１０は、ステップＳ１０２で、上記保管場所のリストに指定された保管場所から電子文書のファイルを取得し、電子文書を分類するための解析処理を呼び出す。図５は、第１実施形態のセキュア文書管理サーバ１１０が実行する、「排他」が選択された場合の解析処理を示すフローチャートである。なお、図５に示す解析方法は、制約付きＫ平均法に基づくものを例示し、分類パラメータの文書分類間の関係が「排他」であった場合に呼び出される。 In response to the instruction, in step S102, the secure document management server 110 obtains an electronic document file from the storage location specified in the storage location list and calls an analysis process for classifying the electronic document. FIG. 5 is a flowchart showing an analysis process executed when the “exclusive” is selected, which is executed by the secure document management server 110 according to the first embodiment. Note that the analysis method shown in FIG. 5 is based on the constrained K-average method, and is called when the relationship between the document classifications of the classification parameters is “exclusive”.

図５に示す処理は、図４に示したステップＳ１０２の処理で呼び出されて、ステップＳ２００から開始する。ステップＳ２０１では、セキュア文書管理サーバ１１０は、ファイルＤＢ２１２に格納された電子文書のファイルを読み出し、適切な文書フィルタを呼び出して、この電子文書からテキストを抽出し、抽出したテキストをメモリまたはＨＤＤなどの記憶領域に格納する。 The process shown in FIG. 5 is called in the process of step S102 shown in FIG. 4 and starts from step S200. In step S201, the secure document management server 110 reads an electronic document file stored in the file DB 212, calls an appropriate document filter, extracts text from the electronic document, and extracts the extracted text from a memory or an HDD. Store in the storage area.

ステップＳ２０２でセキュア文書管理サーバ１１０は、電子文書から抽出されて格納されたテキストを読み出し、適切な形態素解析エンジンを呼び出して、テキストを形態素に分割し、各形態素に品詞情報をタグ付けし、フィルタリングにより不要語を除去し、その構文解析結果データを記憶領域に格納する。ステップＳ２０３では、セキュア文書管理サーバ１１０は、記憶領域から構文解析結果データを読み出して、各形態素毎に出現回数を計数し、文書ベクトルを作成し、電子文書に対応付けて文書ベクトルを記憶領域に格納する。 In step S202, the secure document management server 110 reads the text extracted and stored from the electronic document, calls an appropriate morpheme analysis engine, divides the text into morphemes, tags part of speech information with each morpheme, and performs filtering. The unnecessary words are removed by the above and the syntax analysis result data is stored in the storage area. In step S203, the secure document management server 110 reads the parsing result data from the storage area, counts the number of appearances for each morpheme, creates a document vector, and associates the document vector with the electronic document in the storage area. Store.

ステップＳ２０４でセキュア文書管理サーバ１１０は、ファイルＤＢ２１２に未処理の電子文書のファイルが残されているか否か、すなわち全ての対象の電子文書に対する文書ベクトルの作成が完了したか否かを判定する。ステップＳ２０４で、完了していないと判定された場合（ＮＯ）には、ステップＳ２０５へ処理が進められ、次の電子文書を対象とし、ステップＳ２０１へループさせる。一方、ステップＳ２０４で、完了していると判定された場合（ＹＥＳ）には、ステップＳ２０６へ処理が進められる。 In step S204, the secure document management server 110 determines whether or not an unprocessed electronic document file remains in the file DB 212, that is, whether or not creation of document vectors for all target electronic documents has been completed. If it is determined in step S204 that the process has not been completed (NO), the process proceeds to step S205, the next electronic document is targeted, and the process loops to step S201. On the other hand, if it is determined in step S204 that the process has been completed (YES), the process proceeds to step S206.

ステップＳ２０６でセキュア文書管理サーバ１１０は、管理端末１３０から渡された分類パラメータの各文書分類のキーワードに従って、初期値としての各文書分類Ｃ_ｉの中心文書ベクトルＷ（Ｃ_ｉ）を生成し、適宜、文書ベクトルについても修正を加える。例えば、分類パラメータに含まれるキーワードが文書ベクトルを構成していない場合などには、対応するキーワードの値を０として文書ベクトルの要素に追加する。ステップＳ２０７では、セキュア文書管理サーバ１１０は、各文書分類Ｃ_ｉの中心文書ベクトルＷ（Ｃ_ｉ）と、各文書ベクトルとのユークリッド距離を算出し、各文書ベクトルが最短距離にある中心文書ベクトルの文書分類に属するものとして電子文書を分類する。ステップＳ２０８では、セキュア文書管理サーバ１１０は、新たに分類された各文書分類について、属する文書ベクトルの平均を算出し、新たな中心文書ベクトルを算出し、更新する。 In step S206, the secure document management server 110 generates a central document vector W (C _i ) of each document classification C _i as an initial value according to the keyword of each document classification of the classification parameter passed from the management terminal 130, and appropriately. The document vector is also modified. For example, when the keyword included in the classification parameter does not constitute a document vector, the value of the corresponding keyword is set to 0 and added to the element of the document vector. In step S207, the secure document management server 110 calculates the Euclidean distance between the central document vector W (C _i ) of each document classification C _i and each document vector, and the central document vector having the shortest distance between each document vector is calculated. The electronic document is classified as belonging to the document classification. In step S208, the secure document management server 110 calculates the average of the document vectors to which each newly classified document class belongs, and calculates and updates a new central document vector.

ステップＳ２０９では、セキュア文書管理サーバ１１０は、従前の各中心文書ベクトルと、新しく算出された各中心文書ベクトルとを比較して、収束したか否かを判定する。例えば、ひとつ前の各中心文書ベクトルと新しい各中心文書ベクトルの差の自乗和を指標値として算出し、一定基準値以下に収まった場合に収束したものと判定することができる。この場合に、イタレーション回数の上限値を設定してもよい。ステップＳ２０９で、収束していないと判定された場合（ＮＯ）には、ステップＳ２０７へ処理をループさせ、収束判定条件が満たされるまで、中心文書ベクトルの再計算と再分類処理を繰り返させる。一方、ステップＳ２０９で、収束したと判定された場合（ＹＥＳ）には、ステップＳ２１０へ処理を進める。ステップＳ２１０では、セキュア文書管理サーバ１１０は、文書集合への分類を確定し、分類結果を示すデータを生成し、記憶領域に格納し、ステップＳ２１１で当該解析処理を終了させる。 In step S209, the secure document management server 110 compares each previous central document vector with each newly calculated central document vector, and determines whether or not convergence has occurred. For example, the sum of squares of the difference between each previous central document vector and each new central document vector is calculated as an index value, and can be determined to have converged when it falls below a certain reference value. In this case, an upper limit value of the number of iterations may be set. If it is determined in step S209 that it has not converged (NO), the process loops to step S207, and the recalculation and reclassification processing of the central document vector are repeated until the convergence determination condition is satisfied. On the other hand, if it is determined in step S209 that the convergence has occurred (YES), the process proceeds to step S210. In step S210, the secure document management server 110 determines the classification into the document set, generates data indicating the classification result, stores it in the storage area, and ends the analysis processing in step S211.

一方、図６は、第１実施形態のセキュア文書管理サーバが実行する、「共存」が選択された場合の解析処理を示すフローチャートである。図６に示す解析方法は、制約付きＫ平均法に基づくものを例示し、分類パラメータの文書分類間の関係が「共存」であった場合に呼び出される。なお、ステップＳ３００〜Ｓ３０５までの処理は、図５に示したものと同一であるため説明を割愛する。 On the other hand, FIG. 6 is a flowchart showing an analysis process executed by the secure document management server of the first embodiment when “coexistence” is selected. The analysis method shown in FIG. 6 is based on the constrained K-average method, and is called when the relationship between document classifications of classification parameters is “coexistence”. In addition, since the process from step S300 to S305 is the same as what was shown in FIG. 5, description is omitted.

ステップＳ３０４で処理が完了していると判定されて、ステップＳ３０６およびステップＳ３１２のループにより、各文書分類Ｃ_ｉについてステップＳ３０７〜Ｓ３１１の処理が繰り返される。ステップＳ３０７では、セキュア文書管理サーバ１１０は、管理端末１３０から渡された処理対象の文書分類Ｃ_ｉのキーワードに従って、初期値としての文書分類Ｃ_ｉの中心文書ベクトルＷ（Ｃ_ｉ）を生成する。ステップＳ３０８では、セキュア文書管理サーバ１１０は、対象の文書分類Ｃ_ｉ以外の文書分類のキーワードを用いて、初期値としての中心文書ベクトルＷ（ＮＯＴ＿Ｃ_ｉ）を生成する。例えば、ループにおいて処理の対象である文書分類Ｃ_ｉではない文書分類のキーワードから各キーワードの出現回数の平均を求める。ステップＳ３０９では、中心文書ベクトルＷ（Ｃ_ｉ）およびＷ（ＮＯＴ＿Ｃ_ｉ）と、各文書ベクトルとのユークリッド距離を算出し、各文書ベクトルが最短距離にある中心文書ベクトルの文書分類に属するものとして電子文書を分類する。ステップＳ３１０では、セキュア文書管理サーバ１１０は、新たに分類された結果に従って、文書ベクトルの平均を算出し、新たな中心文書ベクトルＷ（Ｃ_ｉ）およびＷ（ＮＯＴ＿Ｃ_ｉ）を算出し、更新する。 It is determined in step S304 that the processing is completed, and the processing in steps S307 to S311 is repeated for each document classification C _i through the loop of step S306 and step S312. In step S307, the secure document management server 110 generates the central document vector W (C _i ) of the document classification C _i as an initial value according to the keyword of the document classification C _i to be processed passed from the management terminal 130. In step S308, the secure document management server 110 generates a central document vector W (NOT_C _i ) as an initial value using a document classification keyword other than the target document classification C _i . For example, an average of the number of appearances of each keyword is obtained from keywords of a document classification that is not the document classification C _i to be processed in the loop. In step S309, the Euclidean distance between the central document vectors W (C _i ) and W (NOT_C _i ) and each document vector is calculated, and the electronic document is assumed to belong to the document classification of the central document vector in which each document vector is at the shortest distance. Classify documents. In step S310, the secure document management server 110 calculates the average of the document vectors according to the newly classified result, and calculates and updates new central document vectors W (C _i ) and W (NOT_C _i ).

ステップＳ３１１では、セキュア文書管理サーバ１１０は、従前の各中心文書ベクトルと、新しく算出された各中心文書ベクトルとを比較して、収束したか否かを判定する。ステップＳ３１１で、収束していないと判定された場合（ＮＯ）には、ステップＳ３０９へ処理をループさせ、収束判定条件が満たされるまで、中心文書ベクトルの再計算と再分類処理とを繰り返させる。一方、ステップＳ３１１で、収束したと判定された場合（ＹＥＳ）には、ステップＳ３１２へ処理を進め、処理すべき文書分類があれば、次の文書分類Ｃ_ｉ＋１に処理を進める。設定された全ての文書分類に対する処理が完了すると、ステップＳ３１３では、セキュア文書管理サーバ１１０は、文書集合への分類を確定し、分類結果を示すデータを生成し、記憶領域に格納し、ステップＳ３１４で当該解析処理を終了させる。 In step S311, the secure document management server 110 compares each previous central document vector with each newly calculated central document vector, and determines whether or not convergence has occurred. If it is determined in step S311 that it has not converged (NO), the process loops to step S309, and the recalculation of the central document vector and the reclassification process are repeated until the convergence determination condition is satisfied. On the other hand, if it is determined in step S311 that it has converged (YES), the process proceeds to step S312, and if there is a document classification to be processed, the process proceeds to the next document classification C _{i + 1} . When the processing for all the set document classifications is completed, in step S313, the secure document management server 110 determines the classification into the document set, generates data indicating the classification result, stores the data in the storage area, and stores the data in step S314. Then, the analysis process is terminated.

再び図４を参照すると、ステップＳ１０２の解析処理の後、ステップＳ１０３でセキュア文書管理サーバ１１０は、分類結果をユーザが見やすい形式に変換し、管理端末１３０へ送信する。ステップＳ１０４で管理端末１３０は、図１０（Ａ）に示した分類結果閲覧画面４５０をディスプレイなどに表示させる。ステップＳ１０５では、管理端末１３０は、分類結果閲覧画面４５０を介した分類結果の修正および確認を受付する。管理端末１３０において、例えば図１０（Ａ）に示す分類結果閲覧画面４５０を介して、セキュリティポリシの設定に進める旨の指示が入力装置に入力されると、ステップＳ１０６で管理端末１３０は、例えば図１１に示したポリシ設定入力画面４７０を表示させる。管理者により、すべての文書分類に対しセキュリティポリシの設定が行われて、設定が完了すると、管理端末１３０は、ポリシ設定処理の開始の指示をセキュア文書管理サーバ１１０に送信する。ステップＳ１０７でセキュア文書管理サーバ１１０は、各文書分類の要素として含まれる各電子文書に対して、電子文書それぞれに対するセキュリティポリシの設定データを展開し、電子文書それぞれのオリジナルファイルと、対応する設定データとをセットにして、セキュリティサーバ１５０へ送信し、セキュアファイル変換処理を依頼する。 Referring to FIG. 4 again, after the analysis processing in step S102, in step S103, the secure document management server 110 converts the classification result into a format that can be easily viewed by the user, and transmits it to the management terminal 130. In step S104, the management terminal 130 displays the classification result browsing screen 450 shown in FIG. In step S <b> 105, the management terminal 130 accepts correction and confirmation of the classification result via the classification result browsing screen 450. In the management terminal 130, for example, when an instruction to proceed to the setting of the security policy is input to the input device via the classification result browsing screen 450 illustrated in FIG. The policy setting input screen 470 shown in FIG. When the administrator sets the security policy for all document classifications and the setting is completed, the management terminal 130 transmits an instruction to start the policy setting process to the secure document management server 110. In step S107, the secure document management server 110 expands the security policy setting data for each electronic document for each electronic document included as an element of each document classification, and the original file for each electronic document and the corresponding setting data. Are sent to the security server 150 and a secure file conversion process is requested.

セキュリティサーバ１５０は、セキュア文書管理サーバ１１０から各電子文書の設定データおよびオリジナルファイルを受信すると、ステップＳ１０８で、オリジナルファイルを暗号化および復号するための鍵をランダムに生成し、ステップＳ１０９で、生成した鍵を用いて受信した電子文書のオリジナルファイルを暗号化し、暗号化ファイルを作成する。さらにセキュリティサーバ１５０は、ステップＳ１１０で、暗号化ファイルを識別するためのファイルＩＤを生成し、ステップＳ１１１で、暗号化ファイルに対応するファイルＩＤを付加して各電子文書のセキュアファイルを作成し、作成した各電子文書のセキュアファイルをセキュア文書管理サーバ１１０に送信する。 When the security server 150 receives the setting data and the original file of each electronic document from the secure document management server 110, the security server 150 randomly generates a key for encrypting and decrypting the original file in step S108, and generates the key in step S109. The original file of the received electronic document is encrypted using the key, and an encrypted file is created. In step S110, the security server 150 generates a file ID for identifying the encrypted file. In step S111, the security server 150 adds a file ID corresponding to the encrypted file to create a secure file for each electronic document. The created secure file of each electronic document is transmitted to the secure document management server 110.

またセキュリティサーバ１５０は、ステップＳ１１２で、生成したファイルＩＤと、暗号化ファイルの復号に用いる鍵と、受信した設定データ内のセキュリティポリシとを含むセキュリティ管理データを作成し、ステップＳ１１３で、作成したセキュリティ管理データをセキュリティ管理ＤＢ２５８に格納する。ステップＳ１１４では、セキュリティサーバ１５０は、すべての電子文書に対する処理が完了するまで（ＮＯの間）、ステップＳ１０８へ処理をループさせ、すべての電子文書に対する処理が完了すると（ＹＥＳ）処理を終了させる。このようにして、各電子文書について、ファイルＩＤにより識別されるセキュリティ管理ファイルがセキュリティ管理ＤＢ２５８に格納され、また各電子文書について、セキュアファイルがセキュア文書管理サーバ１１０に送信される。 The security server 150 creates security management data including the generated file ID, the key used for decrypting the encrypted file, and the security policy in the received setting data in step S112, and created in step S113. Security management data is stored in the security management DB 258. In step S114, the security server 150 loops the process to step S108 until processing for all electronic documents is completed (during NO), and ends the processing when processing for all electronic documents is completed (YES). In this way, the security management file identified by the file ID is stored in the security management DB 258 for each electronic document, and the secure file is transmitted to the secure document management server 110 for each electronic document.

一方、セキュア文書管理サーバ１１０では、セキュリティサーバ１５０からセキュアファイルを受信すると、ステップＳ１１５で、受信したセキュアファイルをセキュアファイルＤＢ２２４に格納し、利用者がアクセス可能な状態とする。ステップＳ１１６では、セキュア文書管理サーバ１１０は、すべての電子文書に対するセキュアファイルの受信が完了するまで（ＮＯの間）、ステップＳ１１５へ処理をループさせ、すべての電子文書のセキュアファイルの格納が完了すると（ＹＥＳ）、ステップＳ１１７で管理端末１３０に処理終了を通知し、処理を終了させる。このようにして、各電子文書について、セキュアファイルＤＢ２２４内にセキュアファイルが格納され、ユーザから利用可能な状態となる。 On the other hand, when the secure document management server 110 receives a secure file from the security server 150, the secure file management server 110 stores the received secure file in the secure file DB 224 in step S115 so that the user can access it. In step S116, the secure document management server 110 loops the process to step S115 until reception of secure files for all electronic documents is completed (during NO), and when storage of secure files for all electronic documents is completed. (YES), the management terminal 130 is notified of the end of processing in step S117, and the processing is terminated. In this way, for each electronic document, the secure file is stored in the secure file DB 224 and can be used by the user.

以下、第１実施形態の文書セキュリティ管理システム１００における、セキュアファイル利用時の動作について説明する。図１３および図１４は、第１実施形態の文書セキュリティ管理システムにおける、セキュアファイル利用時のシーケンス図である。なお、図１３および図１４は、ポイントＡおよびポイントＢで連結されていることに留意されたい。図１３および図１４には、利用者端末１４０およびセキュリティサーバ１５０の動作が示されている。なおここでは、上述したセキュアファイル変換処理により、利用者端末１４０においてセキュアファイルが既に取得されており、利用者端末１４０のハードディスクなどの記憶領域にセキュアファイルが記憶されているものとして説明する。 Hereinafter, the operation when using the secure file in the document security management system 100 of the first embodiment will be described. 13 and 14 are sequence diagrams when a secure file is used in the document security management system according to the first embodiment. Note that FIG. 13 and FIG. 14 are connected at point A and point B. 13 and 14 show operations of the user terminal 140 and the security server 150. In the following description, it is assumed that the secure file has already been acquired in the user terminal 140 by the above-described secure file conversion process, and the secure file is stored in a storage area such as a hard disk of the user terminal 140.

利用者端末１４０において、例えば利用者端末１４０のディスプレイに予め表示されている複数のセキュアファイルのアイコンのうちの１つのアイコンが、利用者の操作によりクリックがされるなど入力装置にセキュアファイルを指定する情報が入力されると、セキュアファイルの利用処理が開始され、ステップＳ４０１で利用者端末１４０は、入力装置からの入力に従って、セキュアファイルの指定情報を取得する。この指定情報は、パス名やファイル名、ＵＲＬなどが挙げられる。 In the user terminal 140, for example, one of a plurality of secure file icons displayed in advance on the display of the user terminal 140 is clicked by the user's operation, and the secure file is designated in the input device. When the information to be entered is input, secure file usage processing is started, and in step S401, the user terminal 140 acquires secure file designation information in accordance with the input from the input device. Examples of the designation information include a path name, a file name, and a URL.

続いて利用者端末１４０は、ステップＳ４０２で、取得した指定情報から特定されるセキュアファイルを記憶領域から読み出し、ステップＳ４０３で、読み出したセキュアファイルからファイルＩＤを取り出す。ステップＳ４０４では、利用者端末１４０は、セキュアファイルの利用者の利用者ＩＤを取得する。例えばディスプレイに利用者ＩＤ入力画面を表示し、この入力画面を介して利用者により入力装置に入力された利用者ＩＤを取得する。なお、利用者端末１４０が特定の利用者の専用端末であって、利用者ＩＤが利用者端末１４０内に予め記憶されているような場合には、記憶されている利用者ＩＤを読み出して取得してもよい。そして、ステップＳ４０５で利用者端末１４０は、ファイルＩＤと利用者ＩＤとをセキュリティサーバ１５０に送信する。 Subsequently, in step S402, the user terminal 140 reads the secure file specified from the acquired designation information from the storage area, and in step S403, extracts the file ID from the read secure file. In step S404, the user terminal 140 acquires the user ID of the user of the secure file. For example, a user ID input screen is displayed on the display, and the user ID input to the input device by the user via this input screen is acquired. When the user terminal 140 is a specific terminal for a specific user and the user ID is stored in the user terminal 140 in advance, the stored user ID is read and acquired. May be. In step S405, the user terminal 140 transmits the file ID and user ID to the security server 150.

セキュリティサーバ１５０は、ステップＳ４０６で、利用者端末１４０から受信したファイルＩＤを有するセキュリティ管理データを、セキュリティ管理ＤＢ２５８から読み出し、ステップＳ４０７で、読み出したセキュリティ管理データからポリシを取り出す。さらに、セキュリティサーバ１５０は、ステップＳ４０８で、取り出したポリシから閲覧権限者情報を取り出し、ステップＳ４０９で、取り出した閲覧権限者情報を利用権限者ＩＤテーブルを参照して、ＩＤ情報である閲覧権限者ＩＤに変換する。利用権限者ＩＤテーブルは、セキュアファイルに対して利用権限が与えられる可能性のある者の氏名等とその者の識別情報である利用権限者ＩＤとを関連付けて記憶するものである。より具体的には、例えば所定の会社内で利用する場合には、本システムを利用する可能性がある全社員の氏名とＩＤとが関連付けて記憶されるものである。なお、セキュリティポリシ自体にＩＤ情報で閲覧権限者が登録されている場合には、上記のような変換は行わなくともよい。 In step S406, the security server 150 reads security management data having the file ID received from the user terminal 140 from the security management DB 258. In step S407, the security server 150 extracts a policy from the read security management data. Further, in step S408, the security server 150 extracts browsing authority information from the extracted policy. In step S409, the security server 150 refers to the usage authority ID table for the extracted browsing authority information, and the browsing authority is ID information. Convert to ID. The use authority person ID table stores the names of persons who may be given use authority to the secure file in association with the use authority person ID which is identification information of the person. More specifically, for example, when used in a predetermined company, the names and IDs of all employees who may use this system are stored in association with each other. Note that when the viewing authority is registered with the ID information in the security policy itself, the above-described conversion may not be performed.

続いてセキュリティサーバ１５０は、ステップＳ４１０で、変換して得た閲覧権限者ＩＤに、利用者端末１４０から受信した利用者ＩＤが含まれているか否かを判別することにより、利用者に閲覧権限があるか否かを判別する。ここで、利用者に閲覧権限がない場合（ステップＳ４１０；ＮＯ）には、セキュリティサーバ１５０は、ステップＳ４１１で、利用者端末１４０に閲覧不許可情報を送信する。この場合、利用者端末１４０は、ステップＳ４１２で、セキュリティサーバ１５０から受信した閲覧不許可情報に従って、閲覧が許可されていない旨の情報をディスプレイに表示し、このセキュアファイル利用処理を終了する。このように、ポリシに閲覧権限者として設定されていない利用者は、セキュアファイルを閲覧することができない。 Subsequently, in step S410, the security server 150 determines whether or not the user ID received from the user terminal 140 is included in the browsing authority person ID obtained by conversion, thereby allowing the user to have the browsing authority. It is determined whether or not there is. Here, when the user does not have the browsing authority (step S410; NO), the security server 150 transmits the browsing non-permission information to the user terminal 140 in step S411. In this case, in step S412, the user terminal 140 displays information indicating that browsing is not permitted in accordance with the browsing non-permission information received from the security server 150, and ends the secure file usage processing. Thus, a user who is not set as a browsing authority in the policy cannot browse the secure file.

一方、利用者に閲覧権限が有る場合（ステップＳ４１０；ＹＥＳ）には、セキュリティサーバ１５０は、図１４に示すステップＳ４１３で、読み出したセキュリティ管理データから復号のための鍵を取り出し、さらにステップＳ４１４で、取り出したポリシから閲覧付加条件を取り出す。そして、ステップＳ４１５でセキュリティサーバ１５０は、取り出した鍵と、閲覧付加条件とを利用者端末１４０に送信する。また、利用者に印刷権限がある場合には、印刷が許可されている旨の情報を閲覧付加条件に含め、さらに印刷付加情報を利用者端末１４０に送信する。 On the other hand, if the user has viewing authority (step S410; YES), the security server 150 extracts the decryption key from the read security management data in step S413 shown in FIG. 14, and further in step S414. The browsing additional conditions are extracted from the extracted policy. In step S415, the security server 150 transmits the extracted key and the browsing additional condition to the user terminal 140. If the user has printing authority, information indicating that printing is permitted is included in the viewing additional conditions, and additional printing information is transmitted to the user terminal 140.

利用者端末１４０は、セキュリティサーバ１５０から鍵および閲覧付加条件を受信すると、ステップＳ４１６で、読み出したセキュアファイルから暗号化ファイルを取り出し、ステップＳ４１７で、セキュリティサーバ１５０から受信した鍵を用いて暗号化ファイルを復号し、閲覧可能なファイルに戻す。 When the user terminal 140 receives the key and the additional viewing condition from the security server 150, the user terminal 140 extracts the encrypted file from the read secure file in step S416, and encrypts it using the key received from the security server 150 in step S417. Decrypt the file and return it to a viewable file.

続いてステップＳ４１８では、利用者端末１４０は、復号した電子文書ファイルの内容を、受信した閲覧付加条件に従ってディスプレイに表示する。具体的には、閲覧付加条件として、閲覧オプションの「警告メッセージ表示」が設定されている場合には、利用者端末１４０は、ドキュメント内容の表示に伴ってディスプレイに警告メッセージを表示する。 Subsequently, in step S418, the user terminal 140 displays the content of the decrypted electronic document file on the display according to the received browsing additional conditions. Specifically, when the browsing option “warning message display” is set as the viewing additional condition, the user terminal 140 displays a warning message on the display as the document content is displayed.

また、利用者により入力装置を介して印刷要求が指示された場合には、利用者端末１４０は、受信した閲覧付加条件の印刷設定に従い、文書内容の印刷を制御する。具体的には、閲覧付加条件において印刷許可が設定されている場合には、印刷は許可され、印刷不許可が設定されている場合には、印刷は禁止される。さらに、閲覧付加条件において印刷オプションが設定されている場合には、利用者端末１４０は、入出力インタフェースに接続されるプリンタのプリンタドライバ等を制御することにより、印刷に際して、印刷オプションに沿った印刷を行う。このようにして、ポリシに閲覧権限者として設定されている利用者のみが、セキュアファイルの閲覧および印刷の利用をすることができる。 When a print request is instructed by the user via the input device, the user terminal 140 controls printing of the document content according to the received print setting of the additional viewing conditions. Specifically, printing is permitted when printing permission is set in the viewing additional conditions, and printing is prohibited when printing non-permission is set. Further, when the print option is set in the viewing additional condition, the user terminal 140 controls the printer driver or the like of the printer connected to the input / output interface, so that printing according to the print option is performed at the time of printing. I do. In this way, only a user who is set as a browsing authority in the policy can browse and use the secure file.

続いてステップＳ４１９では、利用者端末１４０は、例えば内容表示ウィンドウが閉じられたことを検出することなどにより、利用者によるセキュアファイルの閲覧が終了したか否かを判別し、セキュアファイルの閲覧終了を待つ（ステップＳ４１９でＮＯの間）。一方、閲覧が終了したと判別された場合には（ステップＳ４１９；ＹＥＳ）、利用者端末１４０は、ステップＳ４２０で、セキュリティサーバ１５０から受信した鍵と、ステップＳ４１７において復号して作成したファイルとを利用者端末１４０内の記憶領域から削除し、このセキュアファイル利用処理を終了する。 Subsequently, in step S419, the user terminal 140 determines whether or not browsing of the secure file by the user is ended, for example, by detecting that the content display window is closed, and ends browsing of the secure file. (NO during step S419). On the other hand, if it is determined that the browsing has ended (step S419; YES), the user terminal 140 obtains the key received from the security server 150 in step S420 and the file created by decryption in step S417. It deletes from the storage area in the user terminal 140, and this secure file utilization process is complete | finished.

上述した実施形態によれば、文書リポジトリ１６０などに保管されている既存の電子文書に対して、その電子文書の内容に従って文書分類し、電子文書の内容の類似性によって文書分類をラベルして、まとまった電子文書の集合に対し一括にセキュリティポリシを設定することが可能となる。これにより、システムの導入時などにおける管理者の作業負担が軽減され、ひいてはセキュリティ管理システムの導入に対する敷居を低くすることが可能となる。また管理者は、電子文書の保管場所に応じて文書分類や文書分類に対応するセキュリティポリシをカスタマイズすることができるため、その文書管理システムにおける文書リポジトリの管理者の視点から電子文書に対してセキュリティポリシを設定したいとの要望に応えることもできる。 According to the embodiment described above, an existing electronic document stored in the document repository 160 or the like is classified according to the contents of the electronic document, and the document classification is labeled according to the similarity of the contents of the electronic document. It is possible to set a security policy for a set of electronic documents collectively. As a result, the workload of the administrator at the time of system introduction or the like is reduced, and as a result, the threshold for the introduction of the security management system can be lowered. In addition, the administrator can customize the document classification and the security policy corresponding to the document classification according to the storage location of the electronic document. Therefore, the administrator can secure the electronic document from the viewpoint of the document repository administrator in the document management system. It is also possible to respond to requests for setting a policy.

なお、上述した実施形態では、文書分類に用いるクラスタリング・アルゴリズムとして、制約付きＫ平均法を用いるものについて詳細に説明してきた。しかしながら、本発明の実施形態で採用することができるクラスタリング・アルゴリズムは、上述したものに限定されるものではない。以下、他のクラスタリング・アルゴリズムを用いる他の実施形態にいて説明する。 In the above-described embodiment, the clustering algorithm used for document classification has been described in detail using the constrained K-average method. However, clustering algorithms that can be employed in the embodiments of the present invention are not limited to those described above. Hereinafter, another embodiment using another clustering algorithm will be described.

［第２実施形態］
なお、以下説明する第２実施形態の文書セキュリティ管理システムは、第１実施形態の文書セキュリティ管理システム１００と概ね同一の構成を備えているため、以下、相違点である解析処理について説明する。第１実施形態の解析処理では、制約付きＫ平均法をクラスタリング・アルゴリズムとして用いたが、本第２実施形態の解析処理では、確率モデルであるナイーブ・ベイズ（Naive Bayes）法と、ＥＭ法とを組み合わせた手法を採用する。図１５は、第２実施形態のセキュア文書管理サーバ１１０が実行する、「排他」が選択された場合の解析処理を示すフローチャートである。 [Second Embodiment]
The document security management system according to the second embodiment described below has substantially the same configuration as the document security management system 100 according to the first embodiment, and hence analysis processing that is a difference will be described below. In the analysis processing of the first embodiment, the constrained K-means method is used as a clustering algorithm. However, in the analysis processing of the second embodiment, the Naive Bayes method, which is a probability model, and the EM method are used. Adopt a method that combines. FIG. 15 is a flowchart illustrating an analysis process executed when the “exclusive” is selected, which is executed by the secure document management server 110 according to the second embodiment.

図１５に示す解析処理は、分類パラメータの文書分類間の関係が「排他」であった場合に、図４に示したステップＳ１０２の処理で呼び出され、ステップＳ５００から開始する。ステップＳ５０１では、セキュア文書管理サーバ１１０は、ファイルＤＢ２１２に格納された各電子文書のファイルを読み出し、各電子文書を解析し、単語ｗ_ｔ（ｔ＝１，…｜Ｖ｜：｜Ｖ｜は、すべての文書集合における単語数である。）が特定の電子文書Ｄ_ｉ（ｉ＝１，…｜Ｄ｜：｜Ｄ｜は、文書数である。）に出現した回数Ｎ（ｗ_ｔ，Ｄ_ｉ）を求める。ステップＳ５０１では、より具体的には、第１実施形態で説明したように、読み出した各電子文書のファイルに対し、適切な文書フィルタを使用して各電子文書からテキストを抽出し、形態素解析により形態素に分解して、不要語のフィルタリングを行う。そして、各電子文書Ｄ_ｉについて各単語ｗ_ｔの数を計数し、計数したＮ（ｗ_ｔ，Ｄ_ｉ）をメモリまたはＨＤＤなどの記憶領域に格納する。 The analysis process shown in FIG. 15 is called in the process of step S102 shown in FIG. 4 and starts from step S500 when the relationship between the document classifications of the classification parameters is “exclusive”. In step S501, the secure document management server 110 reads the file of each electronic document stored in the file DB 212, analyzes each electronic document, and the word w _t (t = 1,... | V |: | V | This is the number of words in all document sets.) N (w _t , D _i ) the number of occurrences of a specific electronic document D _i (i = 1,... | D |: | D | is the number of documents). ) More specifically, in step S501, as described in the first embodiment, text is extracted from each electronic document by using an appropriate document filter for each read electronic document file, and morphological analysis is performed. Decompose into morphemes and filter unnecessary words. Then, the number of each word w _t is counted for each electronic document D _i , and the counted N (w _t , D _i ) is stored in a storage area such as a memory or HDD.

ステップＳ５０２では、セキュア文書管理サーバ１１０は、各文書分類のキーワードのリストを文書と見なして、各文書分類Ｃ_ｊについて、下記式（１）に従って単語ｗ_ｔが文書分類Ｃ_ｊに１回現れる確率を表すＰ（ｗ_ｔ｜Ｃ_ｊ）を計算し、下記式（２）に従って文書分類Ｃ_ｊの出現確率を表すＰ（Ｃ_ｊ）を計算する。 In step S502, the secure document management server 110 regards the keyword list of each document category as a document, and for each document category C _j , the probability that the word w _t appears once in the document category C _j according to the following equation (1). P (w _t | C _j ) representing, and P (C _j ) representing the appearance probability of the document classification C _j is calculated according to the following equation (2).

上記式中、Ｐ（Ｃ_ｊ｜Ｄ_ｉ）は、文書Ｄ_ｉが文書分類Ｃ_ｊに属する確率を表す。また、｜Ｃ｜は、文書分類の数を表す。文書分類のキーワードのリストを文書と見なして処理しているステップＳ５０２の計算では、｜Ｄ｜＝１であり、Ｐ（Ｃ_ｊ｜Ｄ_ｉ）については、キーワードのリストが表す文書Ｄ_ｉが文書分類Ｃ_ｊに対応している場合は、文書分類Ｃ_ｊのキーワードリストを表している文書Ｄ_ｉが文書分類Ｃ_ｊに属する確率であるためＰ（Ｃ_ｊ｜Ｄ_ｉ）＝１とし、それ以外の場合はＰ（Ｃ_ｊ｜Ｄ_ｉ）＝０とする。続いてステップＳ５０３では、セキュア文書管理サーバ１１０は、すべての電子文書Ｄ_ｉに対して、下記式（３）に従って、上記ステップＳ５０２で算出したＰ（ｗ_ｔ｜Ｃ_ｊ）およびＰ（Ｃ_ｊ）を用い、各文書Ｄ_ｉが文書分類Ｃ_ｊに属する確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）を計算する。なお、本実施形態では、キーワードから確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）の初期値を求めている。 In the above formula, P (C _j | D _i ) represents the probability that the document D _i belongs to the document classification C _j . | C | represents the number of document classifications. In the calculation of step S502 in which the document classification keyword list is processed as a document, | D | = 1, and for P (C _j | D _i ), the document D _i represented by the keyword list is a document. If supported to the classification _{C j} are, _P for the document _{D i,} which represents the keyword list of document classification _{C j} is the probability that belong to a document classification _{C j} _| a _{(C j D i) = 1} , otherwise In this case, P (C _j | D _i ) = 0. Subsequently, in step S503, the secure document management server 110 performs P (w _t | C _j ) and P (C _j ) calculated in step S502 on all electronic documents D _i according to the following equation (3). , The probability P (C _j | D _i ) that each document D _i belongs to the document classification C _j is calculated. In the present embodiment, the initial value of the probability P (C _j | D _i ) is obtained from the keyword.

続いてステップＳ５０４では、各電子文書Ｄ_ｉ（キーワードリストのものを除く。）対して新たに求められた確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）の値を使用して、各文書分類Ｃ_ｊについて、上記式（１）に従いＰ（ｗ_ｔ｜Ｃ_ｊ）を計算し、上記式（２）に従いＰ（Ｃ_ｊ）を計算する。ここで計算されるＰ（ｗ_ｔ｜Ｃ_ｊ）は、０以上１未満の値となる。ステップＳ５０５では、すべての電子文書Ｄ_ｉに対して、上記ステップＳ５０４で算出したＰ（ｗ_ｔ｜Ｃ_ｊ）およびＰ（Ｃ_ｊ）を用い、上記式（３）に従って確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）を再計算する。ステップＳ５０６では、セキュア文書管理サーバ１１０は、収束したか否かを判定する。例えば、前後における確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）の差の自乗和を指標値として算出し、一定基準値以下に収まった場合に収束したものと判定することができる。その他、確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）の平均変化率がある閾値以下に収まった場合に収束したものと判定することができる。 Subsequently, in step S504, for each document classification C _j , the value of the probability P (C _j | D _i ) newly obtained for each electronic document D _i (excluding those in the keyword list) is used. according to the above formula (1) _P | calculates the _(w t _{C j),} calculates the P _{(C j)} according to the above formula (2). P (w _t | C _j ) calculated here is a value of 0 or more and less than 1. In step S505, P (w _t | C _j ) and P (C _j ) calculated in step S504 are used for all electronic documents D _i , and the probability P (C _j | D is calculated according to the above equation (3). _i ) Recalculate. In step S506, the secure document management server 110 determines whether it has converged. For example, the sum of squares of the difference between the probabilities P (C _j | D _i ) before and after is calculated as an index value, and it can be determined that it has converged when it falls below a certain reference value. In addition, when the average change rate of the probability P (C _j | D _i ) falls below a certain threshold value, it can be determined that it has converged.

ステップＳ５０６で、収束していないと判定された場合（ＮＯ）には、ステップＳ５０４へ処理をループさせ、収束判定の条件が満たされるまで再計算を繰り返させる。一方、ステップＳ５０６で、収束したと判定された場合（ＹＥＳ）には、ステップＳ５０７へ処理を進める。ステップＳ５０７では、セキュア文書管理サーバ１１０は、各電子文書に対して、最も大きな確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）に対応する文書分類をラベルし、文書集合への分類を確定し、分類結果を示すデータを生成し、ステップＳ５０８で当該解析処理を終了させる。 If it is determined in step S506 that the convergence has not occurred (NO), the process loops to step S504, and recalculation is repeated until the convergence determination condition is satisfied. On the other hand, if it is determined in step S506 that convergence has occurred (YES), the process proceeds to step S507. In step S507, the secure document management server 110 labels the document classification corresponding to the highest probability P (C _j | D _i ) for each electronic document, determines the classification into the document set, and displays the classification result. The data shown is generated, and the analysis process is terminated in step S508.

図１６は、第２実施形態のセキュア文書管理サーバ１１０が実行する、「共存」が選択された場合の解析処理を示すフローチャートである。図１６に示す解析処理は、分類パラメータの文書分類間の関係が「排他」であった場合に、図４に示したステップＳ１０２の処理で呼び出され、ステップＳ６００から開始する。ステップＳ６０１では、セキュア文書管理サーバ１１０は、ファイルＤＢ２１２に格納された各電子文書を解析し、Ｎ（ｗ_ｔ，Ｄ_ｉ）を集計する。 FIG. 16 is a flowchart illustrating an analysis process executed by the secure document management server 110 according to the second embodiment when “coexistence” is selected. The analysis process shown in FIG. 16 is called in the process of step S102 shown in FIG. 4 and starts from step S600 when the relationship between the document classifications of the classification parameters is “exclusive”. In step S601, the secure document management server 110 analyzes each electronic document stored in the file DB 212 and totals N (w _t , D _i ).

続いてステップＳ６０２〜Ｓ６１０のループ処理が行われ、文書分類Ｃ_ｊと、それ以外の文書分類ＮＯＴ＿Ｃ_ｊとに分けて行う処理を各文書分類Ｃ_ｊについて繰り返す。ステップＳ６０３では、セキュア文書管理サーバ１１０は、処理対象以外の文書分類のキーワードを用いて、文書分類ＮＯＴ＿Ｃ_ｊのキーワードリストを作成する。ステップＳ６０４では、セキュア文書管理サーバ１１０は、対象の文書分類Ｃ_ｊおよびそれ以外の文書分類ＮＯＴ＿Ｃ_ｊのキーワードリストを文書と見なして、文書分類Ｃ_ｊおよびＮＯＴ＿Ｃ_ｊ（以下、これらを区別しないでＣと参照することがある。）ついて、上記式（１）に従って単語ｗ_ｔが文書分類Ｃに１回現れる確率を表すＰ（ｗ_ｔ｜Ｃ）をそれぞれ計算し、上記式（２）に従って文書分類Ｃの出現確率を表すＰ（Ｃ）をそれぞれ計算する。続いてステップＳ６０５では、セキュア文書管理サーバ１１０は、すべての電子文書Ｄ_ｉに対して、上記式（３）に従って、上記ステップＳ６０４で算出したＰ（ｗ_ｔ｜Ｃ）およびＰ（Ｃ）を用いて、文書Ｄ_ｉが文書分類Ｃ_ｊおよびＮＯＴ＿Ｃ_ｊに属する確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）および確率Ｐ（ＮＯＴ＿Ｃ_ｊ｜Ｄ_ｉ）を計算する。 Then is performed the loop processing of steps S602～S610, a document classification _{C j,} and repeats the processing performed separately in the other document classification NOT_C _j for each document classification _{C j.} In step S603, the secure document management server 110 creates a keyword list of the document classification NOT_C _j using the keywords of the document classification other than the processing target. In step S604, the secure document management server 110 regards the keyword list of the target document classification C _j and the other document classification NOT_C _j as a document, and determines the document classification C _j and NOT_C _j (hereinafter, without distinguishing them) Then, P (w _t | C) representing the probability that the word w _t appears once in the document classification C is calculated according to the above equation (1), and the document classification is calculated according to the above equation (2). P (C) representing the appearance probability of C is calculated. Subsequently, in step S605, the secure document management server 110 uses P (w _t | C) and P (C) calculated in step S604 according to the above equation (3) for all electronic documents D _i . Thus, the probability P (C _j | D _i ) and the probability P (NOT_C _j | D _i ) that the document D _i belongs to the document classification C _j and NOT_C _j are calculated.

続いてステップＳ６０６では、新たに各電子文書Ｄ_ｉ（キーワードリストのものを除く。）対して求めた確率Ｐ（Ｃ｜Ｄ_ｉ）の値を使用して、文書分類Ｃ_ｊおよびＮＯＴ＿Ｃ_ｊについて、上記式（１）に従ってＰ（ｗ_ｔ｜Ｃ）を計算し、上記式（２）に従ってＰ（Ｃ）を計算する。ステップＳ６０７では、すべての電子文書Ｄ_ｉに対して、上記ステップＳ６０６で算出したＰ（ｗ_ｔ｜Ｃ）およびＰ（Ｃ）を用いて、上記式（３）に従い確率Ｐ（Ｃ｜Ｄ_ｉ）を再計算する。ステップＳ６０８では、セキュア文書管理サーバ１１０は、従前の各確率Ｐ（Ｃ｜Ｄ_ｉ）と、新しく算出されたの各確率Ｐ（Ｃ｜Ｄ_ｉ）とを比較して、収束したか否かを判定する。ステップＳ６０８で、収束していないと判定された場合（ＮＯ）には、ステップＳ６０６へ処理をループさせ、収束判定の条件が満たされるまで再計算を繰り返させる。一方、ステップＳ６０８で、収束したと判定された場合（ＹＥＳ）には、ステップＳ６０９へ処理を進める。 Subsequently, in step S606, using the value of the probability P (C | D _i ) newly obtained for each electronic document D _i (excluding those in the keyword list), for the document classification C _j and NOT_C _j , P (w _t | C) is calculated according to the above equation (1), and P (C) is calculated according to the above equation (2). In step S607, the probability P (C | D _i ) according to the above equation (3) is used for all electronic documents D _i using P (w _t | C) and P (C) calculated in step S606. Is recalculated. In step S608, the secure document management server 110 compares each previous probability P (C | D _i ) with each newly calculated probability P (C | D _i ) to determine whether or not the convergence has occurred. judge. If it is determined in step S608 that it has not converged (NO), the process loops to step S606, and recalculation is repeated until the condition for convergence determination is satisfied. On the other hand, if it is determined in step S608 that convergence has occurred (YES), the process proceeds to step S609.

ステップＳ６０９では、セキュア文書管理サーバ１１０は、各電子文書Ｄ_ｉに対して、確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）および確率Ｐ（ＮＯＴ＿Ｃ_ｊ｜Ｄ_ｉ）の値の大きさに応じて、文書分類をラベルするか否かを決定する。例えば、確率Ｐ（Ｃ_ｊ｜Ｄ_ｉ）が確率Ｐ（ＮＯＴ＿Ｃ_ｊ｜Ｄ_ｉ）より大きい場合に、その文書分類Ｃ_ｊに属するものと判定しラベルする。すべての文書分類Ｃ_ｊに対するステップＳ６０２〜ステップＳ６１０の処理が完了すると、ステップＳ６１１でセキュア文書管理サーバ１１０は、文書集合への分類を確定し、分類結果を示すデータを格納し、ステップＳ６１２で当該解析処理を終了させる。 In step S609, the secure document management server 110 performs document classification for each electronic document D _{i according} to the value of the probability P (C _j | D _i ) and the probability P (NOT_C _j | D _i ). Decide whether to label For example, when the probability P (C _j | D _i ) is larger than the probability P (NOT_C _j | D _i ), it is determined that the document belongs to the document classification C _j and is labeled. When the process of step S602~ step S610 for all document classification _{C j} is complete, the secure document management server 110 in step S611, to confirm the classification of the document set, and stores the data indicating the classification result, the in step S612 End the analysis process.

［第３実施形態］
以下、さらに他のクラスタリング・アルゴリズムを用いる実施形態について説明する。なお、以下説明する第３実施形態の文書セキュリティ管理システムは、第１実施形態の文書セキュリティ管理システム１００と概ね同様の構成を備えているため、以下、相違点である解析処理について説明する。上述した第２実施形態の解析処理では、ナイーブ・ベイズ（Naive Bayes）法とＥＭ法とを組み合わせた手法を用いたが、本第３実施形態では、グラフカット法を採用する。図１７は、第３実施形態のセキュア文書管理サーバ１１０が実行する解析処理を示すフローチャートである。 [Third Embodiment]
Hereinafter, an embodiment using still another clustering algorithm will be described. The document security management system according to the third embodiment described below has a configuration substantially similar to that of the document security management system 100 according to the first embodiment, and therefore, analysis processing that is a difference will be described below. In the analysis processing of the second embodiment described above, a method combining the Naive Bayes method and the EM method is used, but in the third embodiment, a graph cut method is employed. FIG. 17 is a flowchart illustrating an analysis process executed by the secure document management server 110 according to the third embodiment.

図１７に示す解析処理は、図４に示したステップＳ１０２の処理で呼び出され、ステップＳ７００から開始される。ステップＳ７０１では、セキュア文書管理サーバ１１０は、各電子文書を解析し、含有する単語から構成される各電子文書Ｄ_ｉの文書ベクトルを作成する。なおここでは、各文書分類Ｃ_ｊのキーワードリストＬ_ｊからも文書ベクトルを作成し、以下の処理では、文書ＤおよびキーワードリストＬの和集合を用いてｋ近傍グラフを作成する。ステップＳ７０２では、セキュア文書管理サーバ１１０は、文書ベクトル間のユークリッド距離Ｅを計算する。ステップＳ７０３では、セキュア文書管理サーバ１１０は、各電子文書Ｄ_ｉに対しユークリッド距離が近い方から上位ｋ個の文書ベクトルへの重み付き辺を張る。なお、辺の重みは、下記式（４）から決定することができる。下記式（４）中、Ｅ（Ｄ_ｉ，Ｄ_ｊ）は、文書Ｄ_ｉと文書Ｄ_ｊとの間のユークリッド距離を表し、ｎは一定値の任意の正数である。 The analysis process shown in FIG. 17 is called in the process of step S102 shown in FIG. 4, and starts from step S700. At step S701, the secure document management server 110 analyzes each electronic document, create the document vector of each electronic document D _i consists words containing. Note here creates a document vector from the keyword list L _j of each document classification C _j, the following process to create a k-nearest neighbor graph with the union of the document D and the keyword list L. In step S702, the secure document management server 110 calculates the Euclidean distance E between document vectors. At step S703, the secure document management server 110, tensioning the weighted side of the top k documents vector from the direction Euclidean distance is closer to the electronic document D _i. The edge weight can be determined from the following equation (4). In the following formula (4), E (D _i , D _j ) represents the Euclidean distance between the document D _i and the document D _j, and n is an arbitrary positive number having a constant value.

続いてステップＳ７０４〜Ｓ７０８で示すループ処理によって、各文書分類Ｃ_ｊについて、生成されたｋ近傍グラフに対する処理が行われる。ステップＳ７０５では、セキュア文書管理サーバ１１０は、現在の処理対象である文書分類Ｃ_ｊに対して、キーワードリストＬのなかから、文書分類Ｃ_ｊに属するものと属しないものとを検索する。ステップＳ７０６では、セキュア文書管理サーバ１１０は、正の節点および負の節点をグラフに付加し、現在の処理対象である文書分類Ｃ_ｊに属するキーワードリストと正の節点とを大きな重みを有する辺で接続し、Ｃ_ｊに属さないキーワードリストと負の節点とを大きな重みを有する辺で接続する。この大きな重みの値は、例えば計算機上で表現できる最も正の浮動小数点を用いればよい。続いてステップＳ７０７では、セキュア文書管理サーバ１１０は、Ｆｏｒｄ−Ｆｕｌｋｅｒｓｏｎアルゴリズムなどを利用してグラフをカットし、各電子文書Ｄ_ｉを文書分類Ｃ_ｊに属するものと属しないものとに分類する。なお、このグラフカットとは、近傍のすべての要素との重みの和が最小となる辺の集合を取り除くことにより、グラフを接続されない部分に分離するものである。 Subsequently, the generated k neighborhood graph is processed for each document classification C _j by the loop processing shown in steps S704 to S708. In step S705, the secure document management server 110 searches the keyword list L for the document classification _Cj that is the current processing target, from the keyword list L, whether it belongs to the document classification _Cj . In step S706, the secure document management server 110 adds a positive node and a negative node to the graph, and uses an edge having a large weight between the keyword list belonging to the document classification _Cj that is the current processing target and the positive node. Connect the keyword lists that do not belong to C _j and the negative nodes with edges having large weights. For this large weight value, for example, the most positive floating point that can be expressed on a computer may be used. Subsequently, in step S707, the secure document management server 110 cuts the graph using the Ford-Fulkerson algorithm and classifies each electronic document D _i as belonging to the document classification C _j and not belonging to the document classification C _j . The graph cut is to separate the graph into unconnected parts by removing a set of edges that minimize the sum of the weights of all neighboring elements.

図１８は、第３実施形態におけるクラスタリングの概念図を示す図である。図１８（Ａ）は、ひとつの文書分類Ｃ_ｊについて、ステップＳ７０５からステップＳ７０７までの処理を実行した後のグラフの状態を示している。図１８（Ａ）には、３つの文書分類のキーワードリストＬ_ｊ（ｊ＝１，２，３）が、それぞれ●、▲、■で表されており、キーワードリスト以外の文書Ｄが○で表されている。また、文書（キーワードリストを含む）間には、近傍のもの同士が実線で示す辺で結ばれている。さらに図１８（Ａ）には、「＋」で示す正の節点が示され、文書分類Ｃ_ｊに対応する■のキーワードリストと大きな重みで接続されている。また図１８（Ａ）には、「−」で示す負の節点が示され、文書分類Ｃ_ｊに対応しない●および▲のキーワードリストと大きな重みを持って接続されている。そして、一点鎖太線で、文書分類Ｃ_ｊに属する文書の要素と、Ｃｊに属さない文書の要素とを分離するグラフカットの境界が模式的に示されている。 FIG. 18 is a diagram showing a conceptual diagram of clustering in the third embodiment. FIG. 18 (A) for one document classification _{C j,} it shows the state of the graph after performing the process from step S705 to step S707. In FIG. 18A, keyword lists L _j (j = 1, 2, 3) for three document classifications are represented by ●, ▲, and ■, respectively, and documents D other than the keyword list are represented by ○. Has been. In addition, between documents (including a keyword list), neighboring objects are connected by edges indicated by solid lines. Further, FIG. 18A shows a positive node indicated by “+”, and is connected with a large weight to the keyword list of 2 corresponding to the document classification C _j . In FIG. 18A, a negative node indicated by “−” is shown, and is connected with a large weight to a keyword list of ● and ▲ not corresponding to the document classification _Cj . Then, at one point chain thick lines, and elements of documents belonging to the document classification C _j, the boundaries of the graph cut to separate the elements of the document which does not belong to Cj are shown schematically.

図１８（Ｂ）は、すべての文書分類Ｃ_ｊについてステップＳ７０５からステップＳ７０７までの処理を施したあとのグラフの状態を示している。図１８（Ｂ）に示すように、グラフは、各文書分類Ｃ_ｊのキーワードリストを含みその文書分類Ｃ_ｊに属する文書と、属さない文書とを分離する分類数ｍ個（図１８（Ｂ）では３個である。）のグラフカットが作成される。 FIG. 18 (B) shows the state of the graph after having been subjected to processing for all the documents classified _{C j} from step S705 to step S707. As shown in FIG. 18B, the graph includes a keyword list of each document classification C _j and includes a classification number m that separates documents belonging to the document classification C _j and documents that do not belong (FIG. 18B). The number of graph cuts is 3).

各文書分類Ｃ_ｊについてステップＳ７０４〜Ｓ７０８で示す処理が完了すると、ステップＳ７０９へ処理が進められる。ステップＳ７０９では、選択された文書分類間の関係に応じて処理が分岐される。文書分類間として「排他」が選択されている場合には、ステップＳ７１０へ処理が進められる。ステップＳ７１０では、セキュア文書管理サーバ１１０は、求めたｍ個のグラフカットのうち、辺の重みの和が最も大きいものを除外し、残りｍ−１個のグラフカットによりグラフをｍ個の領域に分割して、各領域に含まれる電子文書Ｄを、対応する文書分類のラベルを付す。図１８（Ｃ−１）は、文書分類間として「排他」が選択されている場合に、ｍ−１個のグラフカットによりグラフを分割し、それぞれのクラスタが形成された状態を示している。図１８（Ｄ−１）は、ｍ−１個のグラフカットによりグラフを分割して、各領域に含まれる電子文書が対応する文書分類にラベルされた後の状態を示している。 When the process shown in step S704~S708 are completed for each document classification _{C j,} the process proceeds to step S709. In step S709, the process branches according to the relationship between the selected document classifications. If “exclusive” is selected as the document classification interval, the process proceeds to step S710. In step S710, the secure document management server 110 excludes the obtained m graph cuts having the largest sum of the edge weights, and puts the graph into m regions by the remaining m-1 graph cuts. The electronic document D included in each area is divided and a corresponding document classification label is attached. FIG. 18C-1 shows a state in which, when “exclusive” is selected as the document classification, the graph is divided by m−1 graph cuts, and respective clusters are formed. FIG. 18D-1 shows a state after the graph is divided by m-1 graph cuts and the electronic documents included in each region are labeled with the corresponding document classification.

一方、ステップＳ７０９で、文書分類間として「共存」が選択されている場合には、ステップＳ７１１へ処理を進める。ステップＳ７１１では、セキュア文書管理サーバ１１０は、求めたｍ個のグラフカットを境界として、各文書分類Ｃｊに属するか属さないかを決定しラベルを付す。また、どの文書分類にも属さない文書に対しては、ユークリッド距離が最も近接している他の電子文書が分類されている文書分類のラベルを付す。図１８（Ｃ−２）は、文書分類間として「共存」が選択されている場合に、ｍ個のグラフカットの境界によりグラフを分割し、それぞれのクラスタが形成された状態を示している。図１８（Ｄ−２）は、ｍ個のグラフカットの境界によりグラフを分割して、各領域に含まれる電子文書が対応する文書分類にラベルされた後の状態を示している。図１８（Ｄ−２）に示すように、いずれのクラスタにも属さない文書が、最もユークリッド距離が近接する文書が分類された▲の文書分類にラベルされている様子が分かる。ステップＳ７１０またはステップＳ７１１で、各電子文書に対し文書分類のラベルが付された後、ステップＳ７１２では、文書集合への分類を確定し、分類結果を示すデータを格納し、ステップＳ７１３で当該解析処理を終了させる。 On the other hand, if “coexistence” is selected as the document classification in step S709, the process proceeds to step S711. In step S711, the secure document management server 110 determines whether or not it belongs to each document classification Cj, using the determined m graph cuts as a boundary, and attaches a label. Also, a document classification label in which another electronic document having the closest Euclidean distance is classified is attached to a document that does not belong to any document classification. FIG. 18C-2 shows a state in which, when “coexistence” is selected as the document classification, the graph is divided by the boundaries of m graph cuts, and respective clusters are formed. FIG. 18D-2 shows a state after the graph is divided by the boundaries of the m graph cuts and the electronic documents included in each region are labeled with the corresponding document classification. As shown in FIG. 18D-2, it can be seen that a document that does not belong to any cluster is labeled with a document classification of ▲ in which the documents with the closest Euclidean distance are classified. In step S710 or step S711, each electronic document is labeled with a document classification. In step S712, classification into a document set is confirmed, and data indicating the classification result is stored. In step S713, the analysis process is performed. End.

以上説明したように、上述までの実施形態によれば、内容的に類似する電子文書には同一のセキュリティポリシを適用すればよいとの考えに基づき、所与の電子文書群において、内容の類似性によって分類される電子文書に対し、一括にセキュリティポリシを設定可能とし、ひいては電子文書に適用するセキュリティポリシを効率的に設定することができ、作業者の作業負担を軽減することができる、セキュリティ管理システム、サーバ装置、セキュリティ管理方法、プログラムおよび記録媒体を提供することができる。 As described above, according to the embodiments described above, based on the idea that the same security policy should be applied to electronic documents that are similar in content, the similarity of content in a given group of electronic documents Security that enables security policies to be set for electronic documents classified by gender in a batch, which in turn enables security policies to be applied efficiently to electronic documents, reducing the work burden on workers. A management system, a server device, a security management method, a program, and a recording medium can be provided.

また上記機能は、アセンブラ、Ｃ、Ｃ＋＋、Ｃ＃、Ｊａｖａ（登録商標）、などのレガシープログラミング言語やオブジェクト指向プログラミング言語などで記述されたコンピュータ実行可能なプログラムにより実現でき、ＲＯＭ、ＥＥＰＲＯＭ、ＥＰＲＯＭ、フラッシュメモリ、フレキシブルディスク、ＣＤ−ＲＯＭ、ＣＤ−ＲＷ、ＤＶＤ、ＳＤカード、ＭＯなど装置可読な記録媒体に格納して頒布することができる。 The above functions can be realized by a computer-executable program written in a legacy programming language such as an assembler, C, C ++, C #, Java (registered trademark), an object-oriented programming language, or the like. ROM, EEPROM, EPROM, It can be stored in a device-readable recording medium such as flash memory, flexible disk, CD-ROM, CD-RW, DVD, SD card, MO, and distributed.

これまで本発明の実施形態について説明してきたが、本発明の実施形態は上述した実施形態に限定されるものではなく、他の実施形態、追加、変更、削除など、当業者が想到することができる範囲内で変更することができ、いずれの態様においても本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 Although the embodiments of the present invention have been described so far, the embodiments of the present invention are not limited to the above-described embodiments, and those skilled in the art may conceive other embodiments, additions, modifications, deletions, and the like. It can be changed within the range that can be done, and any embodiment is included in the scope of the present invention as long as the effects of the present invention are exhibited.

１００…文書セキュリティ管理システム、１０２…ネットワーク、１１０…セキュア文書管理サーバ、１２０…ファイアウォール、１３０…管理端末、１４０…利用者端末、１５０…セキュリティサーバ、１６０…文書リポジトリ、２１０…ファイル取得部、２１２…ファイルＤＢ、２１４…ファイル解析部、２１６…解析結果表示部、２１８…ポリシ設定部、２２０…ファイル送信部、２２２…セキュアファイル受信部、２２４…セキュアファイルＤＢ、２３０…ポリシ設定開始指令部、２３２…解析結果閲覧部、２３４…ポリシ設定入力部、２４０…セキュアファイル取得部、２５０…ファイル受信部、２５２…セキュアファイル変換部、２５４…セキュアファイル送信部、２５６…セキュリティ管理データ保管部、２５８…セキュリティ管理ＤＢ、２６０…テキスト抽出部、２６２…構文解析部、２６４…文書分類部、２７０…ファイル暗号化部、２７２…鍵生成部、２７４…セキュリティ管理データ作成部、２７６…セキュアファイル作成部、２７８…ファイルＩＤ生成部、３００…テキスト、３１０…形態素解析結果データ、３１０ａ…形態素、３１０ｂ…品詞情報、３２０…文書ベクトルデータ、３２０ａ…形態素、３２０ｂ〜ｄ…出現回数、３３０…中心文書ベクトルデータ、３３０ａ…形態素、３３０ｂ〜ｄ…平均値、４００…開始画面、４０２…ポリシ設定開始ボタン、４０４…変換済み文書一覧ボタン、４０６…マニュアルボタン、４０８…キャンセルボタン、４１０…保管場所指定画面、４１２…リストボックス、４１４…追加ボタン、４１６…削除ボタン、４１８…反転表示、４２０…「次へ」ボタン、４３０…分類パラメータ設定画面、４３２…テーブル、４３２ａ…文書分類名、４３２ｂ…キーワード、４３４…追加ボタン、４３６…編集ボタン、４３８…削除ボタン、４４０…「次へ」ボタン、４４２，４４４…ラジオボタン、４４６…リストボックス、４４８…ボタン、４５０…分類結果閲覧画面、４５２…タブ、４５４…一覧リスト、４５６…編集ボタン、４５８…「次へ」ボタン、４６０…分類情報編集画面、４６２…表示、４６４Ａ…チェックボックス、４６４Ｂ…ラジオボックス、４７０…ポリシ設定入力画面、４７２…ボックス、４７４…テキストボックス群、４７６…リストボックス、４７８…ボックス、４８０…リストボックス、４８２…ボックス、４８４…次へボタン、４８６…キャンセル・ボタン DESCRIPTION OF SYMBOLS 100 ... Document security management system, 102 ... Network, 110 ... Secure document management server, 120 ... Firewall, 130 ... Management terminal, 140 ... User terminal, 150 ... Security server, 160 ... Document repository, 210 ... File acquisition part, 212 ... file DB, 214 ... file analysis unit, 216 ... analysis result display unit, 218 ... policy setting unit, 220 ... file transmission unit, 222 ... secure file reception unit, 224 ... secure file DB, 230 ... policy setting start command unit, 232 ... Analysis result browsing unit, 234 ... Policy setting input unit, 240 ... Secure file acquisition unit, 250 ... File reception unit, 252 ... Secure file conversion unit, 254 ... Secure file transmission unit, 256 ... Security management data storage unit, 258 …Security Management DB, 260 ... text extraction unit, 262 ... syntax analysis unit, 264 ... document classification unit, 270 ... file encryption unit, 272 ... key generation unit, 274 ... security management data creation unit, 276 ... secure file creation unit, 278 ... file ID generation unit, 300 ... text, 310 ... morpheme analysis result data, 310a ... morpheme, 310b ... part of speech information, 320 ... document vector data, 320a ... morpheme, 320b-d ... number of appearances, 330 ... central document vector data, 330a ... morpheme, 330b-d ... average value, 400 ... start screen, 402 ... policy setting start button, 404 ... converted document list button, 406 ... manual button, 408 ... cancel button, 410 ... storage location designation screen, 412 ... List box, 414 ... Add button, 416 ... Delete button 418 ... Reverse display, 420 ... Next button, 430 ... Classification parameter setting screen, 432 ... Table, 432a ... Document classification name, 432b ... Keyword, 434 ... Add button, 436 ... Edit button, 438 ... Delete button, 440 ... "Next" button, 442, 444 ... Radio button, 446 ... List box, 448 ... Button, 450 ... Classification result browsing screen, 452 ... Tab, 454 ... List list, 456 ... Edit button, 458 ... "Next" Button 460 ... Classification information edit screen 462 ... Display 464A ... Check box 464B ... Radio box 470 ... Policy setting input screen 472 ... Box 474 ... Text box group 476 ... List box 478 ... Box 480 ... List box, 482 ... Box, 484 ... Next button, 486 ... Cancel button

特開２００４−１５２２６１号公報JP 2004-152261 A 特開２００７− ４６１６号公報Japanese Patent Laid-Open No. 2007-4616 特開２００８− ４０６５９号公報JP 2008-40659 A

Claims

A security management system including a server device that is connected to a network and sets a security policy that defines usage rights for an electronic document, the server device comprising:
Means for receiving parameters characterizing one or more document classifications;
Obtaining means for obtaining a plurality of electronic documents to be policy-set from resources on the network;
Extracting text from each of the acquired electronic documents, performing natural language analysis, analyzing documents that have content similarity according to the received parameters, and attaching the document classification to each of the electronic documents;
Means for receiving a security policy to be applied to each electronic document to which the document classification is attached;
A security management system, comprising: setting means for generating setting data in which the use authority is defined for each of the classified electronic documents according to the security policy corresponding to the attached document classification.

The analyzing unit classifies the electronic document so that a plurality of the document classifications can be given to the electronic document, and the setting unit sets the plurality of documents according to designation of priority rules when combining the security policies. The security management system according to claim 1, wherein the authority information included in the security policy of the classification is combined.

For each of the acquired electronic documents, the analysis unit totals the words included in each of the electronic documents, generates a data set in which the appearance frequencies of the words included in each of the electronic documents are totaled, The security management system according to claim 1 or 2, wherein documents having the similarity are classified according to an algorithm.

The setting unit passes each acquired electronic document and the corresponding setting data to a conversion unit that converts the original electronic document into a secure electronic document that is secured, and each secure electronic document The security management system according to claim 1, wherein the generated secure electronic document is made available.

The security management system according to any one of claims 1 to 4, wherein the parameter includes a keyword defined for each of the document classifications, and the keyword is used to obtain an initial condition for processing related to the classification. .

The security management system is attached to a user interface for inputting the parameters, a user interface for inputting the security policy to be applied to each electronic document with the document classification, and the electronic document. The security management system according to claim 1, further comprising: a management terminal that provides a user interface for editing the document classification; and a conversion server device that implements the conversion unit.

The specified priority rules are combined in accordance with a reject priority method that prioritizes items that are not permitted to be combined, a permission priority method that prioritizes items that are permitted to be combined, and a predetermined priority order for the document classification. The security management system according to claim 2, which is selected from a classification priority method.

A server device that is connected to a network and sets a security policy that regulates usage rights for an electronic document, the server device comprising:
Means for receiving parameters characterizing one or more document classifications;
Obtaining means for obtaining a plurality of electronic documents to be policy-set from resources on the network;
Extracting text from each of the acquired electronic documents, performing natural language analysis, analyzing documents that have content similarity according to the received parameters, and attaching the document classification to each of the electronic documents;
Means for receiving a security policy to be applied to each electronic document to which the document classification is attached;
A setting unit configured to generate setting data in which the use authority is defined for each of the classified electronic documents according to the security policy corresponding to the attached document classification.

The analyzing unit classifies the electronic document so that a plurality of the document classifications can be given to the electronic document, and the setting unit sets the plurality of documents according to designation of priority rules when combining the security policies. The server apparatus according to claim 8, wherein the authority information included in the classification security policy is combined.

For each of the acquired electronic documents, the analysis unit totals the words included in each of the electronic documents, generates a data set in which the appearance frequencies of the words included in each of the electronic documents are totaled, The server apparatus according to claim 8 or 9, wherein documents having similarities are classified according to an algorithm.

The setting unit passes each acquired electronic document and the corresponding setting data to a conversion unit that converts the original electronic document into a secure electronic document that is secured, and each secure electronic document The server apparatus according to claim 8, wherein the secure electronic document generated is made available.

A method of setting a security policy for defining usage rights for an electronic document,
Receiving parameters characterizing one or more document classifications;
Obtaining a plurality of electronic documents to be policy-set from resources on the network;
Extracting text from each of the acquired electronic documents, performing natural language analysis, collecting documents having content similarity according to the received parameters, and attaching the document classification to each of the electronic documents;
Generating each of the classified electronic documents according to the received security policy corresponding to the attached document classification, and generating the setting data in which the use authority is defined.

The step of attaching the document classification includes a sub-step of classifying the electronic document so that a plurality of the document classifications can be given to the electronic document, and the generating step is a priority when combining the security policies. 13. The security management method according to claim 12, further comprising a substep of combining authority information included in a plurality of document classification security policies according to a specified specification.

In the step of attaching the document classification, for each of the acquired electronic documents, the words included in each of the electronic documents are totaled, and a data set in which the appearance frequencies of the words included in the respective electronic documents are totaled is generated. The security management method according to claim 12 or 13, comprising a sub-step and a sub-step of classifying the documents having the similarity according to a clustering algorithm.

The computer-executable program for implement | achieving each means of any one of Claims 8-11 in a computer.

A recording medium for storing the computer-executable program according to claim 15 in a device-readable manner.