JP2018159986A

JP2018159986A - Information management apparatus, information management method and program

Info

Publication number: JP2018159986A
Application number: JP2017055491A
Authority: JP
Inventors: 晃滝上野; Akiro Ueno
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-03-22
Filing date: 2017-03-22
Publication date: 2018-10-11
Anticipated expiration: 2037-03-22
Also published as: JP6984147B2

Abstract

PROBLEM TO BE SOLVED: To provide an information management apparatus, an information management method and a program capable of identifying an outflow source of information even when an original file is modified, without increasing burden on a system.SOLUTION: An information management apparatus 1 includes: a feature quantity registration unit 2 that registers a feature quantity of each of a plurality of areas obtained by dividing management target information on the basis of a preset rule; an area division unit 3 that divides information inputted from outside into the plurality of areas on the basis of the rule; a feature quantity extraction unit 4 that extracts a feature quantity for each area obtained by division; and a determination unit 5 that compares the extracted feature quantity with the registered feature quantity and determines if the information inputted from the outside matches the management target information.SELECTED DRAWING: Figure 1

Description

本発明は、情報、特には個人情報を管理するための、情報管理装置及び情報管理方法に関し、更には、これらを実現するためのプログラムに関する。 The present invention relates to an information management apparatus and information management method for managing information, particularly personal information, and further relates to a program for realizing these.

近年、個人情報の流出が社会的に問題となっている。特に、個人を対象にして商品又はサービスを提供する企業の場合、ファイルサーバには、大量の個人情報が保有されており、個人情報が流出する可能性がある。また、一旦流出してしまった個人情報を元に戻すことは不可能であるため、個人情報が流出してしまった場合は、流出元を特定し、情報の変更等の対策をとることが必要となる。 In recent years, leakage of personal information has become a social problem. In particular, in the case of a company that provides goods or services for individuals, a large amount of personal information is held in the file server, and there is a possibility that personal information may be leaked. In addition, since it is impossible to restore personal information once leaked, if personal information leaks, it is necessary to identify the source and take measures such as changing the information It becomes.

このため、例えば、特許文献１は、実在しない人物のダミーの個人情報を一定期間毎に生成し、生成したダミーの個人情報を、データベースに登録することで、個人情報の流出と流出元とを特定する技術を開示している。特許文献２は、機密文書のイメージ画像から抽出された特徴量を登録しておき、流出した機密文書のイメージ画像の特徴量と、登録されている特徴量とを比較することで、流出元を特定する技術を開示している。 For this reason, for example, Patent Document 1 generates dummy personal information of a non-existent person at regular intervals, and registers the generated dummy personal information in a database, so that the leakage of personal information and the source of the leakage are detected. The technology to identify is disclosed. Patent Document 2 registers the feature amount extracted from the image image of the confidential document, and compares the feature amount of the image image of the confidential document that has flowed out with the registered feature amount. The technology to identify is disclosed.

特開２００６−７９２３３号公報JP 2006-79233 A 特開２００８−４２６３６号公報JP 2008-42636 A

しかしながら、特許文献１に開示された技術では、定期的にダミー情報を生成し、更に生成したダミー情報を管理する必要があり、システムに負担が掛かってしまう。また、特許文献２に開示された技術では、元のファイルに対して、分割、列の入れ替え等の改変が行なわれた場合に、特徴量が維持されず、流出元を特定することが困難となる。 However, in the technique disclosed in Patent Document 1, it is necessary to periodically generate dummy information and manage the generated dummy information, which places a burden on the system. Further, in the technique disclosed in Patent Document 2, when modification such as division and column replacement is performed on the original file, the feature amount is not maintained, and it is difficult to specify the outflow source. Become.

本発明の目的の一例は、上記問題を解消し、システムにかかる負担を増加させることなく、元のファイルが改変された場合でも、情報の流出元を特定できるようにし得る、情報管理装置、情報管理方法、及びプログラム提供することにある。 An example of the object of the present invention is to provide an information management apparatus and information that can solve the above-mentioned problems and can identify the outflow source of information even when the original file is modified without increasing the burden on the system. It is to provide a management method and a program.

上記目的を達成するため、本発明の一側面における情報管理装置は、
予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録している、特徴量登録部と、
外部から入力された情報を、前記ルールに基づいて、複数の領域に分割する、領域分割部と、
分割によって得られた領域毎に、特徴量を抽出する、特徴量抽出部と、
抽出した特徴量と、登録されている前記特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する、判定部と、
を備えていることを特徴とする。 In order to achieve the above object, an information management device according to one aspect of the present invention provides:
A feature amount registration unit that registers the feature amounts of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
An area dividing unit that divides information input from the outside into a plurality of areas based on the rules;
A feature amount extraction unit that extracts a feature amount for each region obtained by the division;
A determination unit that compares the extracted feature value with the registered feature value to determine whether information input from the outside matches information to be managed;
It is characterized by having.

また、上記目的を達成するため、本発明の一側面における情報管理方法は、
（ａ）予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録する、ステップと、
（ｂ）外部から入力された情報を、前記ルールに基づいて、複数の領域に分割する、ステップと、
（ｃ）分割によって得られた領域毎に、特徴量を抽出する、ステップと、
（ｄ）抽出した特徴量と、登録されている前記特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する、ステップと、
を有することを特徴とする。 In order to achieve the above object, an information management method according to one aspect of the present invention includes:
(A) registering feature quantities of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
(B) dividing information input from outside into a plurality of regions based on the rules;
(C) extracting a feature amount for each region obtained by the division; and
(D) comparing the extracted feature quantity with the registered feature quantity to determine whether information input from the outside matches information to be managed;
It is characterized by having.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
（ａ）予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録する、ステップと、
（ｂ）外部から入力された情報を、前記ルールに基づいて、複数の領域に分割する、ステップと、
（ｃ）分割によって得られた領域毎に、特徴量を抽出する、ステップと、
（ｄ）抽出した特徴量と、登録されている前記特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する、ステップと、
を実行させることを特徴とする。 Furthermore, in order to achieve the above object, a program according to one aspect of the present invention is provided.
On the computer,
(A) registering feature quantities of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
(B) dividing information input from outside into a plurality of regions based on the rules;
(C) extracting a feature amount for each region obtained by the division; and
(D) comparing the extracted feature quantity with the registered feature quantity to determine whether information input from the outside matches information to be managed;
Is executed.

以上のように、本発明によれば、システムにかかる負担を増加させることなく、元のファイルが改変された場合でも、情報の流出元を特定できるようにすることができる。 As described above, according to the present invention, it is possible to specify the source of information leakage even when the original file is altered without increasing the burden on the system.

図１は、本発明の実施の形態における情報管理装置の構成を概略的に示すブロック図である。FIG. 1 is a block diagram schematically showing a configuration of an information management apparatus according to an embodiment of the present invention. 図２は、本発明の実施の形態における情報管理装置の具体的構成を示すブロック図である。FIG. 2 is a block diagram showing a specific configuration of the information management apparatus according to the embodiment of the present invention. 図３は、本発明の実施の形態で用いられる分割ルールの一例を示す図である。FIG. 3 is a diagram showing an example of the division rule used in the embodiment of the present invention. 図４は、本発明の実施の形態で用いられるデータベースに登録されている情報の一例を示す図である。FIG. 4 is a diagram showing an example of information registered in the database used in the embodiment of the present invention. 図５は、本発明の実施の形態における情報管理装置がファイルから管理対象となる個人情報を抽出する際の動作を示すフロー図である。FIG. 5 is a flowchart showing an operation when the information management apparatus according to the embodiment of the present invention extracts personal information to be managed from a file. 図６は、本発明の実施の形態における情報管理装置が管理対象となる個人情報を分割する際の動作を示すフロー図である。FIG. 6 is a flowchart showing an operation when the information management apparatus according to the embodiment of the present invention divides personal information to be managed. 図７は、本発明の実施の形態における情報管理装置が分割された領域から特徴量を抽出する際の動作を示すフロー図である。FIG. 7 is a flowchart showing an operation performed when the information management apparatus according to the embodiment of the present invention extracts a feature amount from a divided area. 図８は、本発明の実施の形態における情報管理装置が流出したファイルから判定対象となる個人情報を抽出する際の動作を示すフロー図である。FIG. 8 is a flowchart showing an operation when the personal information to be determined is extracted from the file leaked by the information management apparatus according to the embodiment of the present invention. 図９は、本発明の実施の形態における情報管理装置が流出したファイルの判定対象となる個人情報を分割する際の動作を示すフロー図である。FIG. 9 is a flowchart showing an operation when dividing the personal information that is a determination target of the leaked file by the information management apparatus according to the embodiment of the present invention. 図１０は、本発明の実施の形態における情報管理装置が流出したファイルから得られた領域から特徴量を抽出する際の動作を示すフロー図である。FIG. 10 is a flowchart showing an operation when the feature quantity is extracted from the area obtained from the file leaked by the information management apparatus according to the embodiment of the present invention. 図１１は、本発明の実施の形態における情報管理装置が特徴量の比較による判定処理を実行する際の動作を示すフロー図である。FIG. 11 is a flowchart showing an operation when the information management apparatus according to the embodiment of the present invention executes determination processing based on comparison of feature amounts. 図１２は、本発明の実施の形態における情報管理装置の変形例１での具体的構成を示すブロック図である。FIG. 12 is a block diagram showing a specific configuration of Modification 1 of the information management apparatus according to the embodiment of the present invention. 図１３は、本発明の実施の形態における情報管理装置を実現するコンピュータの一例を示すブロック図である。FIG. 13 is a block diagram illustrating an example of a computer that implements the information management apparatus according to the embodiment of the present invention.

（発明の概要）
本発明では、管理装置を必要とし、管理装置はファイルサーバのファイルに対して個人情報を含むファイルを検出することができる。個人情報を含むファイルの検出は、例えば、特許第５０１３０８１号等に開示された技術を用いて行なうことができる。 (Summary of Invention)
In the present invention, a management device is required, and the management device can detect a file containing personal information with respect to a file on a file server. A file containing personal information can be detected using, for example, the technique disclosed in Japanese Patent No. 5013081.

管理装置は、ファイルサーバに格納されているファイルに対して、個人情報を含むファイルの検出を実行する。検出された個人情報には、例えば、姓、名、性別、都道府県等の項目が含まれる。管理装置は、それらの項目に対して、姓属性、名属性、性別属性、都道府県属性等の属性のタグ付けを実施する。 The management device detects a file including personal information for the file stored in the file server. The detected personal information includes items such as first name, last name, gender, and prefecture. The management apparatus tags these items with attributes such as a surname attribute, a first name attribute, a gender attribute, and a prefecture attribute.

次に、管理装置は、検出された個人情報が持っている属性と属性の値とを利用して、ファイルが持っている個人情報を複数の領域に分割する。具体的には、分割は、例えば、属性ごとに、又は特定の属性の値ごとに行なわれる。また、分割された領域が増えるほど、より多くのファイルの改変に対応することができる。 Next, the management apparatus divides the personal information held in the file into a plurality of areas by using the attribute and the attribute value of the detected personal information. Specifically, the division is performed, for example, for each attribute or for each specific attribute value. In addition, as the number of divided areas increases, more files can be modified.

但し、分割される領域が増えると、計算量も増加するため、対象となるファイルの重要性に応じて、分割される領域の数を増減させても良い。例えば、文書中に「機密情報」というキーワードが含まれている場合は、より重要なファイルと判定して、個人情報の持っている属性と属性値とにより領域を分割するが、「機密情報」というキーワードを持っていない場合は属性のみによって領域を分割するというような手法が考えられる。 However, since the amount of calculation increases as the number of divided areas increases, the number of divided areas may be increased or decreased according to the importance of the target file. For example, if the keyword “confidential information” is included in the document, it is determined as a more important file, and the area is divided by the attribute and attribute value of the personal information. If the keyword is not included, a method may be considered in which the area is divided only by the attribute.

また、個人情報が複数の領域に分割された後、分割された領域ごとに、属性の値が用いられて特徴量が抽出される。特徴量を抽出するためのアルゴリズムとしては、既存のアルゴリズム、例えば、ハフマン符号のアルゴリズム、相互情報量のアルゴリズム等が挙げられる。このようにして抽出された特徴量は保持され、個人情報が流出した場合は、流出した個人情報の特徴量と、保持されている特徴量とが対比されて、個人情報の流出元が特定される。 Further, after the personal information is divided into a plurality of areas, the attribute value is used for each divided area to extract a feature amount. Examples of the algorithm for extracting the feature amount include an existing algorithm such as a Huffman code algorithm and a mutual information amount algorithm. The feature quantity extracted in this way is retained, and when personal information is leaked, the leaked personal information feature quantity is compared with the retained feature quantity to identify the source of the personal information leak. The

（実施の形態）
以下、本発明の実施の形態における情報管理装置、情報管理方法、及びプログラムについて、図１〜図１３を参照しながら説明する。 (Embodiment)
Hereinafter, an information management apparatus, an information management method, and a program according to an embodiment of the present invention will be described with reference to FIGS.

［装置構成］
最初に、図１を用いて、本実施の形態における情報管理装置の概略構成について説明する。図１は、本発明の実施の形態における情報管理装置の構成を概略的に示すブロック図である。 [Device configuration]
First, a schematic configuration of the information management apparatus according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram schematically showing a configuration of an information management apparatus according to an embodiment of the present invention.

図１に示す、情報管理装置１は、情報、特には個人情報を管理するための装置である。図１に示すように、情報管理装置１は、特徴量登録部２と、領域分割部３と、特徴量抽出部４と、判定部５とを備えている。 An information management apparatus 1 shown in FIG. 1 is an apparatus for managing information, particularly personal information. As illustrated in FIG. 1, the information management apparatus 1 includes a feature amount registration unit 2, a region division unit 3, a feature amount extraction unit 4, and a determination unit 5.

特徴量登録部２は、予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録している。領域分割部３は、外部から入力された情報を、ルールに基づいて、複数の領域に分割する。特徴量抽出部４は、分割によって得られた領域毎に、特徴量を抽出する。判定部５は、特徴量抽出部４によって抽出された特徴量と、登録されている特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する。 The feature amount registration unit 2 registers the feature amounts of each of the plurality of areas obtained by dividing the information to be managed based on a preset rule. The area dividing unit 3 divides information input from the outside into a plurality of areas based on rules. The feature amount extraction unit 4 extracts a feature amount for each region obtained by the division. The determination unit 5 compares the feature amount extracted by the feature amount extraction unit 4 with the registered feature amount, and whether or not the information input from the outside matches the information to be managed. Determine.

このように、本実施の形態では、管理対象となる情報において、ダミー情報の作成及び管理を行なう必要がない。また、管理対象となる情報の分割された領域の特徴量は、元のファイルが改変された場合であっても変更されない。このため、本実施の形態によれば、システムにかかる負担を増加させることなく、元のファイルが改変された場合でも、情報の流出元を特定できるようにすることができる。 Thus, in this embodiment, it is not necessary to create and manage dummy information for information to be managed. Further, the feature amount of the divided area of the information to be managed is not changed even when the original file is modified. For this reason, according to the present embodiment, it is possible to specify an information outflow source even when the original file is altered without increasing the burden on the system.

続いて、図２〜図４を用いて、本実施の形態における情報管理装置の構成についてより具体的に説明する。図２は、本発明の実施の形態における情報管理装置の具体的構成を示すブロック図である。 Next, the configuration of the information management apparatus according to the present embodiment will be described more specifically with reference to FIGS. FIG. 2 is a block diagram showing a specific configuration of the information management apparatus according to the embodiment of the present invention.

まず、本実施の形態において、管理対象となる情報は、個人情報であるとする。図２に示すように、本実施の形態では、情報管理装置１は、ネットワーク１０を介して、管理者が使用する端末装置２０及びファイルサーバ３０に接続されている。 First, in the present embodiment, it is assumed that information to be managed is personal information. As shown in FIG. 2, in the present embodiment, the information management device 1 is connected to a terminal device 20 and a file server 30 that are used by an administrator via a network 10.

ファイルサーバ３０は、管理対象となる個人情報を含むファイルを格納している。端末装置２０は、管理者の指示により、流出したファイルを情報管理装置１に入力する。また、端末装置２０は、情報管理装置１によって判定が行なわれた場合は、判定結果を取得して、取得した判定結果を、その表示画面上に表示する。 The file server 30 stores a file including personal information to be managed. The terminal device 20 inputs the leaked file to the information management device 1 in accordance with the administrator's instruction. Further, when the determination is made by the information management apparatus 1, the terminal device 20 acquires the determination result and displays the acquired determination result on the display screen.

また、図２に示すように、本実施の形態では、情報管理装置１は、上述した特徴量登録部２、領域分割部３、特徴量抽出部４、及び判定部５に加えて、ファイル取得部６及び個人情報検出部７を備えている。 As shown in FIG. 2, in the present embodiment, the information management apparatus 1 acquires a file acquisition in addition to the above-described feature amount registration unit 2, region division unit 3, feature amount extraction unit 4, and determination unit 5. A unit 6 and a personal information detection unit 7 are provided.

ファイル取得部６は、端末装置２０を介した管理者の指示に応じて、ファイルサーバ３０にアクセスして、指定されたファイルを取得する。個人情報検出部７は、取得されたファイルから個人情報を検出する。 The file acquisition unit 6 accesses the file server 30 in accordance with an instruction from the administrator via the terminal device 20, and acquires the specified file. The personal information detection unit 7 detects personal information from the acquired file.

特徴量登録部２は、本実施の形態では、管理対象となる個人情報を分割して得られた、複数の領域それぞれの特徴量を、データベース９に登録している。また、本実施の形態では、特徴量登録部２が登録している特徴量は、領域分割部３と特徴量抽出部４とによって作成されている。つまり、本実施の形態では、領域分割部３が、管理対象となる個人情報を分割し、特徴量抽出部４が、分割によって得られた個人情報の領域毎に、特徴量を抽出する。特徴量登録部２は、個人情報の分割領域から抽出された特徴量を登録する。なお、本実施の形態において、特徴量登録部２が登録している特徴量は、外部の装置によって作成された特徴量であっても良い。 In this embodiment, the feature amount registration unit 2 registers the feature amounts of each of a plurality of areas obtained by dividing personal information to be managed in the database 9. In the present embodiment, the feature quantity registered by the feature quantity registration unit 2 is created by the area dividing unit 3 and the feature quantity extraction unit 4. That is, in the present embodiment, the area dividing unit 3 divides personal information to be managed, and the feature amount extracting unit 4 extracts a feature amount for each region of personal information obtained by the division. The feature amount registration unit 2 registers the feature amount extracted from the divided area of personal information. In the present embodiment, the feature quantity registered by the feature quantity registration unit 2 may be a feature quantity created by an external device.

領域分割部３は、予め作成されている分割ルール８を用いて、個人情報を複数の領域に分割する。分割ルール８は、本実施の形態では、個人情報を構成する特定の属性に基づいて情報を分割するルール、及び個人情報に含まれる特定の属性値に基づいて情報を分割するルールのうち、少なくとも１つを含む。また、本実施の形態では、特徴量登録部２に登録されている特徴量を得るために用いられるルールと、領域分割部３が用いるルールとは、同一のルールである。 The area dividing unit 3 divides personal information into a plurality of areas using a division rule 8 created in advance. In the present embodiment, the division rule 8 is at least one of a rule for dividing information based on a specific attribute constituting personal information and a rule for dividing information based on a specific attribute value included in the personal information. Contains one. Further, in the present embodiment, the rule used for obtaining the feature quantity registered in the feature quantity registration unit 2 and the rule used by the area dividing unit 3 are the same rule.

図３は、本発明の実施の形態で用いられる分割ルールの一例を示す図である。図３の例では、分割ルールとして、（１）属性（姓）で分割するルール、（２）属性（名）で分割するルール、（３）属性（都道府県）の属性値で分割するルールが採用されている。そして、分割ルール（１）により、属性が姓である情報で構成された領域が生成される。分割ルール（２）により、属性が名である情報で構成された領域が生成される。また、分割ルール（３）により、都道府県が神奈川となっている行で構成された領域と、都道府県が東京となっている行で構成された領域とが生成される。 FIG. 3 is a diagram showing an example of the division rule used in the embodiment of the present invention. In the example of FIG. 3, the division rules include (1) a rule for dividing by attribute (last name), (2) a rule for dividing by attribute (first name), and (3) a rule for dividing by attribute values of attributes (prefectures). It has been adopted. Then, according to the division rule (1), an area composed of information whose attribute is a surname is generated. By the division rule (2), an area composed of information whose attribute is a name is generated. Further, the division rule (3) generates an area composed of rows in which the prefecture is Kanagawa and an area composed of rows in which the prefecture is Tokyo.

特徴量抽出部４は、本実施の形態では、例えば、分割によって得られた領域毎に、ハフマン符号のアルゴリズムを利用して、当該領域を表す文字列を取り出すことによって、特徴量を抽出する。また、本実施の形態では、特徴量抽出部４は、ハフマン符号のアルゴリズム以外のアルゴリズム、例えば、相互情報量等の特徴量を抽出するアルゴリズムを利用することもできる。特徴量抽出部４による処理の具体例については、図７を用いて後述する。 In the present embodiment, for example, the feature amount extraction unit 4 extracts a feature amount by extracting a character string representing the region using a Huffman code algorithm for each region obtained by the division. In the present embodiment, the feature quantity extraction unit 4 can use an algorithm other than the Huffman code algorithm, for example, an algorithm that extracts a feature quantity such as a mutual information quantity. A specific example of processing by the feature amount extraction unit 4 will be described later with reference to FIG.

また、特徴量抽出部４は、図４に示すように、抽出した特徴量をデータベース９に登録する。図４は、本発明の実施の形態で用いられるデータベースに登録されている情報の一例を示す図である。図４に示すように、データタベース９は、個人情報ファイルテーブルと、個人情報特徴テーブルとを格納している。 Further, the feature quantity extraction unit 4 registers the extracted feature quantity in the database 9 as shown in FIG. FIG. 4 is a diagram showing an example of information registered in the database used in the embodiment of the present invention. As shown in FIG. 4, the database 9 stores a personal information file table and a personal information feature table.

個人情報ファイルテーブルは、個人情報が検出されたファイルパスを格納しているテーブルである。個人情報特徴テーブルは、個人情報を含むファイルの属性及び属性値毎に、特徴量となる文字列を格納している。なお、個人情報ファイルテーブルは、後述するように、個人情報検出部７によって格納されている。 The personal information file table is a table storing a file path where personal information is detected. The personal information feature table stores a character string serving as a feature amount for each attribute and attribute value of a file including personal information. The personal information file table is stored by the personal information detector 7 as will be described later.

［装置動作］
次に、本発明の実施の形態における情報管理装置１の動作について図５〜図１１を用いて説明する。また、本実施の形態では、情報管理装置１を動作させることによって、情報管理方法が実施される。よって、本実施の形態における情報管理方法の説明は、以下の情報管理装置１の動作説明に代える。 [Device operation]
Next, operation | movement of the information management apparatus 1 in embodiment of this invention is demonstrated using FIGS. In the present embodiment, the information management method is implemented by operating the information management apparatus 1. Therefore, the description of the information management method in the present embodiment is replaced with the following description of the operation of the information management apparatus 1.

まず、情報管理装置１においては、主に、管理対象となる個人情報の特徴量を抽出して、これを登録する処理と、流出したファイルから抽出した特徴量と登録されている特徴量とから流出元を判定する処理とが行なわれる。以下においては、図５〜図７を用いて、管理対象となる個人情報の特徴量の抽出処理を説明する。また、図８〜図１１を用いて、流出元の判定処理を説明する。 First, the information management apparatus 1 mainly extracts the feature amount of personal information to be managed and registers it, and the feature amount extracted from the leaked file and the registered feature amount. A process for determining the outflow source is performed. In the following, a feature amount extraction process for personal information to be managed will be described with reference to FIGS. In addition, the outflow source determination processing will be described with reference to FIGS.

特徴量登録処理
管理対象となる個人情報の特徴量を登録するため、ファイルサーバ３０に格納されているファイルからの個人情報の抽出処理（図５）、個人情報の分割処理（図６）、分割された領域からの特徴量の抽出処理（図７）が実行される。以下、各処理を順に説明する。 Feature amount registration processing In order to register the feature amount of personal information to be managed, extraction processing of personal information from a file stored in the file server 30 (FIG. 5), personal information division processing (FIG. 6), division The feature amount extraction processing (FIG. 7) from the region thus performed is executed. Hereinafter, each process will be described in order.

図５は、本発明の実施の形態における情報管理装置がファイルから管理対象となる個人情報を抽出する際の動作を示すフロー図である。 FIG. 5 is a flowchart showing an operation when the information management apparatus according to the embodiment of the present invention extracts personal information to be managed from a file.

図５に示すように、最初に、ファイル取得部６はファイルサーバ３０にアクセスしてファイルを取得する（ステップＡ１）。次に、ファイル取得部６は、取得したファイルを個人情報検出部７に渡す（ステップＡ２）。 As shown in FIG. 5, first, the file acquisition unit 6 accesses the file server 30 to acquire a file (step A1). Next, the file acquisition unit 6 passes the acquired file to the personal information detection unit 7 (step A2).

次に、個人情報検出部７は、ステップＡ１で取得されているファイルに、氏名、住所、メールアドレス、電話番号、性別、クレジットカード番号等の個人情報が含まれているかどうか確認し、それらの情報が含まれている場合、個人情報を検出する（ステップＡ３）。なお、個人情報の検出は、上述した特許第５０１３０８１号に開示された既存技術を用いて行なうことができる。 Next, the personal information detection unit 7 confirms whether the file acquired in step A1 includes personal information such as name, address, email address, telephone number, gender, credit card number, etc. If information is included, personal information is detected (step A3). The personal information can be detected using the existing technology disclosed in the above-mentioned Japanese Patent No. 5013081.

その後、個人情報検出部７は、検出した個人情報に対して、文書ＩＤを採番し、検出したファイルのパスと採番した文書ＩＤとを、データベース９に格納されている個人情報ファイルテーブルに登録する（ステップＡ４）。 Thereafter, the personal information detection unit 7 assigns a document ID to the detected personal information, and the detected file path and the assigned document ID are stored in the personal information file table stored in the database 9. Register (step A4).

図６は、本発明の実施の形態における情報管理装置が管理対象となる個人情報を分割する際の動作を示すフロー図である。 FIG. 6 is a flowchart showing an operation when the information management apparatus according to the embodiment of the present invention divides personal information to be managed.

図６に示すように、最初に、領域分割部３は、個人情報検出部７によって検出された個人情報を取得する（ステップＢ１）。 As shown in FIG. 6, first, the area dividing unit 3 acquires the personal information detected by the personal information detecting unit 7 (step B1).

次に、領域分割部３は、ステップＢ１で取得した個人情報を用いてテーブルを作成する（ステップＢ２）。具体的には、領域分割部３は、縦に並ぶ各行（レコード）が１件の個人情報に該当し、各行において横方向に個人情報の各属性の値が並ぶように、テーブルを作成する。また、テーブルの列を構成している属性として、姓（氏名）、名（氏名）、都道府県（住所）、市（住所）、区（住所）、町村（住所）、メールアドレス、性別等が挙げられる。 Next, the area dividing unit 3 creates a table using the personal information acquired in Step B1 (Step B2). Specifically, the area dividing unit 3 creates a table so that each row (record) arranged vertically corresponds to one piece of personal information, and each attribute value of the personal information is arranged horizontally in each row. In addition, the attributes that make up the columns of the table include last name (name), first name (name), prefecture (address), city (address), ward (address), town (address), mail address, gender, etc. Can be mentioned.

次に、領域分割部３は、分割ルール８を取得する（ステップＢ３）。図６の例では、取得される分割ルールは、属性（姓（氏名））で分割するルール、属性（名（氏名））で分割するルール、属性（町村（住所））で分割するルール、属性（都道府県（住所））の属性値で分割するルール、属性（性別）の属性値で分割するルール、属性（メールアドレス）の属性値で分割するルールである。 Next, the area dividing unit 3 acquires the division rule 8 (step B3). In the example of FIG. 6, the obtained division rule is a rule for dividing by attribute (first name (full name)), a rule for dividing by attribute (first name (full name)), a rule for dividing by attribute (town (address)), and attribute These are a rule that divides by the attribute value of (prefecture (address)), a rule that divides by the attribute value of attribute (gender), and a rule that divides by the attribute value of attribute (mail address).

次に、領域分割部３は、属性で分割するルールに沿って、姓（氏名）、名（氏名）、町村（住所）といった属性毎に、ステップＢ５〜Ｂ７を実行して分割を行なう（ステップＢ４）。 Next, the area dividing unit 3 performs the division by executing steps B5 to B7 for each attribute such as a surname (name), first name (name), and town (address) in accordance with the rule for dividing by attribute (step). B4).

ステップＢ５では、領域分割部３は、処理対象となっている属性の値が一定数以上（例えば１００件以上）存在しているかどうかを判定する。 In step B5, the area dividing unit 3 determines whether or not there are a predetermined number or more (for example, 100 or more) of attribute values to be processed.

ステップＢ５の判定の結果、値が一定数以上存在してない場合は、処理対象となっている属性についての処理を終了し、次の属性についての処理を開始する。 As a result of the determination in step B5, if there are no more than a certain number of values, the processing for the attribute that is the processing target is terminated, and the processing for the next attribute is started.

一方、ステップＢ５の判定の結果、値が一定数以上存在している場合は、領域分割部３は、処理対象となっている属性全体が１つの領域となるように、テーブルを分割する（ステップＢ６）。続いて、領域分割部３は、分割のキーとなった属性を、分割によって生成された領域に付加する（ステップＢ７）。 On the other hand, as a result of the determination in step B5, if there are more than a certain number of values, the area dividing unit 3 divides the table so that the entire attribute to be processed becomes one area (step) B6). Subsequently, the area dividing unit 3 adds the attribute that is the key for the division to the area generated by the division (step B7).

次に、領域分割部３は、属性値で分割するルールに沿って、都道府県（住所）、性別、メールアドレスといった属性毎に、ステップＢ９〜Ｂ１６を実行して分割を行なう（ステップＢ８）。 Next, the area dividing unit 3 performs steps B9 to B16 for each attribute such as prefecture (address), sex, and mail address in accordance with the rule for dividing by attribute value (step B8).

ステップＢ９では、領域分割部３は、処理対象となっている属性に値が存在しているかどうかを判定する。 In step B9, the area dividing unit 3 determines whether or not a value exists in the attribute to be processed.

ステップＢ９の判定の結果、値が存在していない場合は、領域分割部３は、次の属性を選択して、再度ステップＢ９を実行する。 If the value does not exist as a result of the determination in step B9, the area dividing unit 3 selects the next attribute and executes step B9 again.

一方、ステップＢ９の判定の結果、値が存在している場合は、領域分割部３は、処理対象となっている属性がメールアドレスかどうかを判定する（ステップＢ１０）。 On the other hand, if the value is present as a result of the determination in step B9, the area dividing unit 3 determines whether the attribute to be processed is a mail address (step B10).

ステップＢ１０の判定の結果、属性がメールアドレスである場合は、まず、メールアドレスに含まれるアットマークの右側部分をドメイン名として取出し、ドメイン名を属性値として、ドメイン名ごとにグループ化し、グループ化された個人情報を領域とする（ステップＢ１１）。 As a result of the determination in step B10, if the attribute is a mail address, first, the right part of the at sign included in the mail address is taken out as a domain name, and the domain name is grouped for each domain name using the attribute value. The obtained personal information is set as a region (step B11).

一方、ステップＢ１０の判定の結果、属性がメールアドレスでない場合は、領域分割部３は、属性の値ごとに個人情報をグループ化して、領域を生成する（ステップＢ１２）。 On the other hand, if the attribute is not an e-mail address as a result of the determination in step B10, the area dividing unit 3 groups the personal information for each attribute value and generates an area (step B12).

次に、ステップＢ１１又はＢ１２が実行されると、領域分割部３は、生成されたグループについて、姓（氏名）、名（氏名）、町村（住所）の属性毎に、ステップＢ１４〜Ｂ１６を実行する（ステップＢ１３）。 Next, when step B11 or B12 is executed, the area dividing unit 3 executes steps B14 to B16 for each attribute of the surname (name), first name (name), and town (address) for the generated group. (Step B13).

ステップＢ１４では、領域分割部３は、グループ内で、処理対象となっている属性の値が一定数以上（例えば１００件以上）存在しているかどうかを判定する。 In step B14, the area dividing unit 3 determines whether or not there are a certain number or more (for example, 100 or more) of attribute values to be processed in the group.

ステップＢ１４の判定の結果、値が一定数以上存在していない場合は、領域分割部３は、次の属性を選択して、再度ステップＢ１４を実行する。 As a result of the determination in step B14, if there are not more than a certain number of values, the area dividing unit 3 selects the next attribute and executes step B14 again.

一方、ステップＢ１４の判定の結果、値が一定数以上存在している場合は、領域分割部３は、グループ化した個人情報の中で、処理対象となっている属性全体を１つの領域として分割する（ステップＢ１５）。 On the other hand, as a result of the determination in step B14, if there are more than a certain number of values, the area dividing unit 3 divides the entire processing target attribute into one area in the grouped personal information. (Step B15).

続いて、領域分割部３は、グループ化においてキーとなった属性とその属性値とを、分割によって生成された領域に付加する（ステップＢ１６）。 Subsequently, the area dividing unit 3 adds the attribute which is a key in the grouping and the attribute value to the area generated by the division (step B16).

その後、領域分割部３は、分割によって生成した領域を、特徴量抽出部４に渡す（ステップＢ１７）。 Thereafter, the region dividing unit 3 passes the region generated by the division to the feature amount extracting unit 4 (step B17).

図７は、本発明の実施の形態における情報管理装置が分割された領域から特徴量を抽出する際の動作を示すフロー図である。図７の例では、特徴量は、ハフマン符号のアルゴリズムを利用して、文字列として抽出される。 FIG. 7 is a flowchart showing an operation performed when the information management apparatus according to the embodiment of the present invention extracts a feature amount from a divided area. In the example of FIG. 7, the feature amount is extracted as a character string using the Huffman code algorithm.

図７に示すように、特徴量抽出部４は、分割によって生成された領域毎に、ステップＣ２〜Ｃ６を実行して特徴量を抽出する（ステップＣ１）。 As illustrated in FIG. 7, the feature amount extraction unit 4 performs steps C2 to C6 to extract feature amounts for each region generated by the division (step C1).

ステップＣ２では、特徴量抽出部４は、処理対象となっている領域における属性の値の出現頻度を計算する。 In step C2, the feature quantity extraction unit 4 calculates the appearance frequency of the attribute value in the processing target area.

次に、特徴量抽出部４は、ステップＣ２で計算した出現頻度をもとに、ハフマン符号のアルゴリズムを利用してハフマン木を生成し、属性の各値に対して２進数による符号化を実施する（ステップＣ３）。 Next, the feature quantity extraction unit 4 generates a Huffman tree using the Huffman code algorithm based on the appearance frequency calculated in Step C2, and encodes each attribute value with a binary number. (Step C3).

次に、特徴量抽出部４は、符号化した結果、「最短の符号長／最長の符号長」が一定の値以下（例えば０.２以下）となっているかどうかを判定する（ステップＣ４）。 Next, the feature quantity extraction unit 4 determines whether or not “shortest code length / longest code length” is equal to or less than a certain value (for example, 0.2 or less) as a result of encoding (step C4). .

ステップＣ４の判定の結果、「最短の符号長／最長の符号長」が一定の値以下でない場合は、特徴量抽出部４は、次の領域を選択して、再度、ステップＣ２を実行する。 As a result of the determination in step C4, when “the shortest code length / the longest code length” is not equal to or smaller than a predetermined value, the feature amount extraction unit 4 selects the next region and executes step C2 again.

一方、ステップＣ４の判定の結果、「最短の符号長／最長の符号長」が一定の値以下である場合は、特徴量抽出部４は、特徴的な文字列が存在していると判断する。そして、特徴量抽出部４は、「符号長／最長の符号長」が一定の値以上（例えば０.８以上）となっている文字列は存在しているかどうかを判定する（ステップＣ５）。 On the other hand, as a result of the determination in step C4, if “the shortest code length / the longest code length” is equal to or smaller than a certain value, the feature amount extraction unit 4 determines that a characteristic character string exists. . Then, the feature amount extraction unit 4 determines whether there is a character string having “code length / longest code length” equal to or greater than a certain value (for example, 0.8 or more) (step C5).

ステップＣ５の判定の結果、「符号長／最長の符号長」が一定の値以上となっている文字列が存在していない場合は、特徴量抽出部４は、次の領域を選択して、再度、ステップＣ２を実行する。 As a result of the determination in step C5, if there is no character string whose “code length / longest code length” is equal to or greater than a certain value, the feature amount extraction unit 4 selects the next region, Step C2 is executed again.

一方、ステップＣ５の判定の結果、「符号長／最長の符号長」が一定の値以上となっている文字列が存在している場合は、特徴量抽出部４は、該当する文字列を特徴的な文字列と判定する（ステップＣ６）。 On the other hand, if there is a character string having “code length / longest code length” equal to or greater than a certain value as a result of the determination in step C5, the feature amount extraction unit 4 characterizes the corresponding character string. Is determined to be a typical character string (step C6).

次に、特徴量抽出部４は、文書ＩＤ、属性、属性値、特徴量（ステップＣ６で判定された文字列）を、データベース９における個人情報特徴テーブルに格納する（ステップＣ７）。 Next, the feature amount extraction unit 4 stores the document ID, attribute, attribute value, and feature amount (character string determined in step C6) in the personal information feature table in the database 9 (step C7).

また、ステップＣ７において、個人情報特徴テーブルにおいて、属性及び属性値としては、領域への分割時にテーブルに格納されていた値が用いられる。また、元のテーブルにおいて属性値が無かった場合は、個人情報特徴テーブルの該当する欄の属性値は空とされる。更に、特徴量抽出部４は、個人情報特徴テーブルにおいて、特徴量として抽出した文字列１つが１レコードとなるように、特徴量を格納する（図４参照）。 In step C7, the values stored in the table at the time of division into areas are used as attributes and attribute values in the personal information feature table. If there is no attribute value in the original table, the attribute value in the corresponding column of the personal information feature table is empty. Furthermore, the feature quantity extraction unit 4 stores the feature quantity so that one character string extracted as the feature quantity becomes one record in the personal information feature table (see FIG. 4).

流出元判定処理
ファイルの流出元を判定するため、流出したファイルからの個人情報の抽出処理（図８）、個人情報の分割処理（図９）、分割された領域からの特徴量の抽出処理（図１０）、特徴量の比較による判定処理（図１１）が実行される。以下、各処理を順に説明する。 Outflow source determination processing In order to determine the outflow source of a file, personal information extraction processing from the outflowed file (FIG. 8), personal information division processing (FIG. 9), feature amount extraction processing from the divided area ( FIG. 10) and determination processing (FIG. 11) based on comparison of feature amounts are executed. Hereinafter, each process will be described in order.

図８は、本発明の実施の形態における情報管理装置が流出したファイルから判定対象となる個人情報を抽出する際の動作を示すフロー図である。 FIG. 8 is a flowchart showing an operation when the personal information to be determined is extracted from the file leaked by the information management apparatus according to the embodiment of the present invention.

まず、管理者が端末装置２０を介して流出したファイルを情報管理装置１に送信する。これにより、図８に示すように、ファイル取得部６は、送信されたファイルを取得する（ステップＤ１）。 First, the administrator sends a file that has flowed out via the terminal device 20 to the information management device 1. Thereby, as shown in FIG. 8, the file acquisition part 6 acquires the transmitted file (step D1).

次に、ファイル取得部６は、取得したファイルを個人情報検出部７に渡す（ステップＤ２）。 Next, the file acquisition unit 6 passes the acquired file to the personal information detection unit 7 (step D2).

次に、個人情報検出部７は、ステップＤ１で取得されているファイルに、氏名、住所、メールアドレス、電話番号、性別、クレジットカード番号等の個人情報が含まれているかどうか確認し、それらの情報が含まれている場合、個人情報を検出する（ステップＤ３）。ステップＤ３は、図５に示したステップＡ３と同様に行なわれる。 Next, the personal information detection unit 7 confirms whether the file acquired in step D1 includes personal information such as name, address, email address, telephone number, gender, credit card number, etc. If information is included, personal information is detected (step D3). Step D3 is performed in the same manner as step A3 shown in FIG.

その後、個人情報検出部７は、ステップＤ３で検出した個人情報を、領域分割部３に渡す（ステップＤ４）。 Thereafter, the personal information detection unit 7 passes the personal information detected in step D3 to the region dividing unit 3 (step D4).

図９は、本発明の実施の形態における情報管理装置が流出したファイルの判定対象となる個人情報を分割する際の動作を示すフロー図である。なお、図９に示す各ステップは、図６に示した各ステップに準じて行なわれる。 FIG. 9 is a flowchart showing an operation when dividing the personal information that is a determination target of the leaked file by the information management apparatus according to the embodiment of the present invention. In addition, each step shown in FIG. 9 is performed according to each step shown in FIG.

図９に示すように、最初に、領域分割部３は、流出したファイルから、個人情報検出部７によって検出された個人情報を取得する（ステップＥ１）。 As shown in FIG. 9, first, the area dividing unit 3 acquires the personal information detected by the personal information detecting unit 7 from the outflowed file (step E1).

次に、領域分割部３は、流出したファイルの個人情報を用いて、テーブルを作成する（ステップＥ２）。ステップＥ２においても、領域分割部３は、図６に示したステップＢ２と同様に、縦に並ぶ各行（レコード）が１件の個人情報に該当し、各行において横方向に個人情報の各属性の値が並ぶように、テーブルを作成する。また、テーブルの列を構成している属性として、姓（氏名）、名（氏名）、都道府県（住所）、市（住所）、区（住所）、町村（住所）、メールアドレス、性別等が挙げられる。 Next, the area dividing unit 3 creates a table using the personal information of the leaked file (step E2). Also in step E2, similarly to step B2 shown in FIG. 6, the area dividing unit 3 corresponds to each row (record) arranged vertically corresponds to one piece of personal information, and each attribute of the personal information is horizontally displayed in each row. Create a table so that the values are aligned. In addition, the attributes that make up the columns of the table include last name (first name), first name (name), prefecture (address), city (address), ward (address), town (address), mail address, gender, etc. Can be mentioned.

次に、領域分割部３は、分割ルール８を取得する（ステップＢ３）。図６の例では、取得される分割ルールは、属性（姓（氏名））で分割するルール、属性（名（氏名））で分割するルール、属性（町村（住所））で分割するルール、属性（都道府県（住所））の属性値で分割するルール、属性（性別）の属性値で分割するルール、属性（メールアドレス）の属性値で分割するルールである。 Next, the area dividing unit 3 acquires the division rule 8 (step B3). In the example of FIG. 6, the obtained division rule is a rule that divides by attribute (first name (first name)), a rule that divides by attribute (first name (first name)), a rule that divides by attribute (town (address)), and an attribute These are a rule that divides by the attribute value of (prefecture (address)), a rule that divides by the attribute value of attribute (gender), and a rule that divides by the attribute value of attribute (mail address).

次に、領域分割部３は、属性で分割するルールに沿って、姓（氏名）、名（氏名）、町村（住所）といった属性毎に、ステップＥ５〜Ｅ７を実行して分割を行なう（ステップＥ４）。 Next, the area dividing unit 3 performs the division by executing steps E5 to E7 for each attribute such as a surname (name), first name (name), and town (address) in accordance with the rule for dividing by attribute (step S5). E4).

ステップＥ５では、領域分割部３は、処理対象となっている属性の値が一定数以上（例えば１００件以上）存在しているかどうかを判定する。 In Step E5, the area dividing unit 3 determines whether or not there are a certain number or more (for example, 100 or more) of attribute values to be processed.

ステップＥ５の判定の結果、値が一定数以上存在してない場合は、処理対象となっている属性についての処理を終了し、次の属性についての処理を開始する。 As a result of the determination in step E5, if there are no more than a certain number of values, the processing for the attribute that is the processing target is terminated, and the processing for the next attribute is started.

一方、ステップＥ５の判定の結果、値が一定数以上存在している場合は、領域分割部３は、処理対象となっている属性全体が１つの領域となるように、テーブルを分割する（ステップＥ６）。続いて、領域分割部３は、分割のキーとなった属性を、分割によって生成された領域に付加する（ステップＥ７）。 On the other hand, as a result of the determination in step E5, if there are more than a certain number of values, the area dividing unit 3 divides the table so that the entire attribute to be processed becomes one area (step E6). Subsequently, the area dividing unit 3 adds the attribute that is the key for the division to the area generated by the division (step E7).

次に、領域分割部３は、属性値で分割するルールに沿って、都道府県（住所）、性別、メールアドレスといった属性毎に、ステップＥ９〜Ｅ１６を実行して分割を行なう（ステップＥ８）。 Next, the area dividing unit 3 performs steps E9 to E16 for each attribute such as prefecture (address), sex, and mail address in accordance with the rule for dividing by attribute value (step E8).

ステップＥ９では、領域分割部３は、処理対象となっている属性に値が存在しているかどうかを判定する。 In step E9, the area dividing unit 3 determines whether or not a value exists in the attribute to be processed.

ステップＥ９の判定の結果、値が存在していない場合は、領域分割部３は、次の属性を選択して、再度ステップＥ９を実行する。 If the value does not exist as a result of the determination in step E9, the area dividing unit 3 selects the next attribute and executes step E9 again.

一方、ステップＥ９の判定の結果、値が存在している場合は、領域分割部３は、処理対象となっている属性がメールアドレスかどうかを判定する（ステップＥ１０）。 On the other hand, if the value is present as a result of the determination in step E9, the area dividing unit 3 determines whether the attribute to be processed is a mail address (step E10).

ステップＥ１０の判定の結果、属性がメールアドレスである場合は、まず、メールアドレスに含まれるアットマークの右側部分をドメイン名として取出し、ドメイン名を属性値として、ドメイン名ごとにグループ化し、グループ化された個人情報を領域とする（ステップＥ１１）。 As a result of the determination in step E10, if the attribute is a mail address, first, the right part of the at sign included in the mail address is taken out as a domain name, and the domain name is grouped by domain name as an attribute value. The obtained personal information is set as an area (step E11).

一方、ステップＥ１０の判定の結果、属性がメールアドレスでない場合は、領域分割部３は、属性の値ごとに個人情報をグループ化して、領域を生成する（ステップＥ１２）。 On the other hand, if the attribute is not a mail address as a result of the determination in step E10, the area dividing unit 3 groups the personal information for each attribute value and generates an area (step E12).

次に、ステップＥ１１又はＥ１２が実行されると、領域分割部３は、生成されたグループについて、姓（氏名）、名（氏名）、町村（住所）の属性毎に、ステップＥ１４〜Ｅ１６を実行する（ステップＥ１３）。 Next, when step E11 or E12 is executed, the area dividing unit 3 executes steps E14 to E16 for each attribute of the surname (name), first name (name), and town (address) for the generated group. (Step E13).

ステップＥ１４では、領域分割部３は、グループ内で、処理対象となっている属性の値が一定数以上（例えば１００件以上）存在しているかどうかを判定する。 In step E14, the area dividing unit 3 determines whether or not there are a certain number or more (for example, 100 or more) of attribute values to be processed in the group.

ステップＥ１４の判定の結果、値が一定数以上存在していない場合は、領域分割部３は、次の属性を選択して、再度ステップＥ１４を実行する。 As a result of the determination in step E14, if there are not more than a certain number of values, the area dividing unit 3 selects the next attribute and executes step E14 again.

一方、ステップＥ１４の判定の結果、値が一定数以上存在している場合は、領域分割部３は、グループ化した個人情報の中で、処理対象となっている属性全体を１つの領域として分割する（ステップＥ１５）。 On the other hand, as a result of the determination in step E14, if there are more than a certain number of values, the area dividing unit 3 divides the entire attribute to be processed into one area in the grouped personal information. (Step E15).

続いて、領域分割部３は、グループ化においてキーとなった属性とその属性値とを、分割によって生成された領域に付加する（ステップＥ１６）。 Subsequently, the area dividing unit 3 adds the attribute which is a key in the grouping and the attribute value to the area generated by the division (step E16).

その後、領域分割部３は、分割によって生成した領域を、特徴量抽出部４に渡す（ステップＥ１７）。 Thereafter, the region dividing unit 3 passes the region generated by the division to the feature amount extracting unit 4 (step E17).

図１０は、本発明の実施の形態における情報管理装置が流出したファイルから得られた領域から特徴量を抽出する際の動作を示すフロー図である。なお、図１０に示す各ステップは、図７に示した各ステップに準じて行なわれる。また、図１０の例でも、特徴量は、ハフマン符号のアルゴリズムを利用して、文字列として抽出される。 FIG. 10 is a flowchart showing an operation when the feature quantity is extracted from the area obtained from the file leaked by the information management apparatus according to the embodiment of the present invention. In addition, each step shown in FIG. 10 is performed according to each step shown in FIG. Also in the example of FIG. 10, the feature amount is extracted as a character string using the Huffman code algorithm.

図１０に示すように、特徴量抽出部４は、流出したファイルから生成された領域毎に、ステップＦ２〜Ｆ６を実行して特徴量を抽出する（ステップＦ１）。 As shown in FIG. 10, the feature quantity extraction unit 4 performs steps F2 to F6 to extract feature quantities for each region generated from the outflowed file (step F1).

ステップＦ２では、特徴量抽出部４は、処理対象となっている領域における属性の値の出現頻度を計算する。 In step F2, the feature quantity extraction unit 4 calculates the appearance frequency of the attribute value in the processing target area.

次に、特徴量抽出部４は、ステップＦ２で計算した出現頻度をもとに、ハフマン符号のアルゴリズムを利用してハフマン木を生成し、属性の各値に対して２進数による符号化を実施する（ステップＦ３）。 Next, the feature quantity extraction unit 4 generates a Huffman tree using the Huffman code algorithm based on the appearance frequency calculated in Step F2, and encodes each attribute value with a binary number. (Step F3).

次に、特徴量抽出部４は、符号化した結果、「最短の符号長／最長の符号長」が一定の値以下（例えば０.２以下）となっているかどうかを判定する（ステップＦ４）。 Next, the feature amount extraction unit 4 determines whether or not “shortest code length / longest code length” is equal to or less than a certain value (for example, 0.2 or less) as a result of encoding (step F4). .

ステップＦ４の判定の結果、「最短の符号長／最長の符号長」が一定の値以下でない場合は、特徴量抽出部４は、次の領域を選択して、再度、ステップＦ２を実行する。 If the result of determination in step F4 is that “shortest code length / longest code length” is not equal to or less than a certain value, the feature quantity extraction unit 4 selects the next region and executes step F2 again.

一方、ステップＦ４の判定の結果、「最短の符号長／最長の符号長」が一定の値以下である場合は、特徴量抽出部４は、特徴的な文字列が存在していると判断する。そして、特徴量抽出部４は、「符号長／最長の符号長」が一定の値以上（例えば０.８以上）となっている文字列は存在しているかどうかを判定する（ステップＦ５）。 On the other hand, as a result of the determination in step F4, when “the shortest code length / the longest code length” is equal to or smaller than a certain value, the feature amount extraction unit 4 determines that a characteristic character string exists. . Then, the feature amount extraction unit 4 determines whether there is a character string having “code length / longest code length” equal to or greater than a certain value (for example, 0.8 or more) (step F5).

ステップＦ５の判定の結果、「符号長／最長の符号長」が一定の値以上となっている文字列が存在していない場合は、特徴量抽出部４は、次の領域を選択して、再度、ステップＦ２を実行する。 As a result of the determination in step F5, when there is no character string whose “code length / longest code length” is equal to or greater than a certain value, the feature amount extraction unit 4 selects the next region, Step F2 is executed again.

一方、ステップＦ５の判定の結果、「符号長／最長の符号長」が一定の値以上となっている文字列が存在している場合は、特徴量抽出部４は、該当する文字列を特徴的な文字列と判定する（ステップＦ６）。また、ステップＦ６の終了後、特徴量抽出部４は、抽出した領域毎の特徴量を判定部５に渡す。 On the other hand, if there is a character string having “code length / longest code length” equal to or greater than a certain value as a result of the determination in step F5, the feature amount extraction unit 4 characterizes the corresponding character string. Is determined to be a typical character string (step F6). Further, after the end of step F6, the feature amount extraction unit 4 passes the extracted feature amount for each region to the determination unit 5.

図１１は、本発明の実施の形態における情報管理装置が特徴量の比較による判定処理を実行する際の動作を示すフロー図である。 FIG. 11 is a flowchart showing an operation when the information management apparatus according to the embodiment of the present invention executes determination processing based on comparison of feature amounts.

図１１に示すように、判定部５は、特徴量抽出部４から受け取った領域（流出したファイルから得られた領域）毎の特徴量を用いて、この領域毎に、ステップＧ２〜Ｇ１０を実行する（ステップＧ１）。 As illustrated in FIG. 11, the determination unit 5 performs steps G <b> 2 to G <b> 10 for each region using the feature amount for each region (region obtained from the outflowed file) received from the feature amount extraction unit 4. (Step G1).

ステップＧ２では、判定部５は、処理対象となる領域において、属性値が空の属性が存在しているかどうかを判定する（ステップＧ２）。 In step G2, the determination unit 5 determines whether or not an attribute having an empty attribute value exists in the region to be processed (step G2).

ステップＧ２の判定の結果、処理対象となる領域において、属性値が空の属性が存在している場合は、判定部５は、属性値が空の属性を検索条件として、個人情報特徴テーブルを検索し、属性値が空のレコードを特定する（ステップＧ３）。 If there is an attribute with an empty attribute value in the region to be processed as a result of the determination in step G2, the determination unit 5 searches the personal information feature table using an attribute with an empty attribute value as a search condition. Then, a record having an empty attribute value is specified (step G3).

一方、ステップＧ２の判定の結果、処理対象となる領域において、属性値が空の属性が存在していない場合は、判定部５は、処理対象となる領域における属性とその属性値とを検索条件として、個人情報特徴テーブルを検索する（ステップＧ４）。 On the other hand, as a result of the determination in step G2, if there is no attribute with an empty attribute value in the region to be processed, the determination unit 5 uses the attribute and the attribute value in the region to be processed as a search condition. The personal information feature table is searched (step G4).

次に、判定部５は、ステップＧ３又はＧ４が実行されると、検索によって抽出されたレコードを、文書ＩＤ毎に分ける（ステップＧ５）。続いて、判定部５は、文書ＩＤ毎に、レコードから特徴量（文字列）を抽出する（ステップＧ６）。 Next, when step G3 or G4 is executed, the determination unit 5 separates the records extracted by the search for each document ID (step G5). Subsequently, the determination unit 5 extracts a feature amount (character string) from the record for each document ID (step G6).

次に、判定部５は、文書ＩＤ毎に、ステップＧ６で取り出した特徴量（文字列）と、特徴量抽出部４から受け取った領域毎の特徴量とを比較し、特徴量が一致した文書ＩＤが存在しているかどうかを判定する（ステップＧ７）。 Next, the determination unit 5 compares the feature amount (character string) extracted in step G6 with the feature amount for each region received from the feature amount extraction unit 4 for each document ID, and the document whose feature amount matches. It is determined whether or not an ID exists (step G7).

ステップＧ７の判定の結果、特徴量が一致している文書ＩＤが存在していない場合は、判定部５は、次の領域を選択して、再度ステップＧ２を実行する。一方、ステップＧ７の判定の結果、特徴量が一致している文書ＩＤが存在している場合は、判定部５は、その文書ＩＤを流出元の候補とする（ステップＧ８）。ステップＧ８によって、分割した領域毎に流出元の候補となる文書ＩＤが特定できる。 As a result of the determination in step G7, when there is no document ID having the same feature value, the determination unit 5 selects the next area and executes step G2 again. On the other hand, as a result of the determination in step G7, if there is a document ID having a matching feature amount, the determination unit 5 sets the document ID as a source of the outflow source (step G8). By step G8, the document ID that is a candidate of the outflow source can be specified for each divided area.

次に、判定部５は、個人情報ファイルテーブル（図４参照）を検索して、特定された文書ＩＤを持つ文書のファイルを特定し、特定したファイルを流出元の候補とする（ステップＧ９）。次に、判定部５は、文書ＩＤ毎に、候補となった回数を計算する（ステップＧ１０）。 Next, the determination unit 5 searches the personal information file table (see FIG. 4), specifies a file of a document having the specified document ID, and sets the specified file as an outflow source candidate (step G9). . Next, the determination unit 5 calculates the number of candidates for each document ID (step G10).

領域毎のステップＧ２〜Ｇ１０が終了すると、判定部５は、領域毎に計算された各文書ＩＤの回数を足し合せ、文書ＩＤ毎に回数の合計値を算出する（ステップＧ１１）。分割した領域毎に複数の文書ＩＤが候補となり、また、分割した領域毎に、候補となる文書ＩＤが異なる可能性があるが、候補となった回数が多いほど、流出元となっている可能性が高いと判断できる。よって、ステップＧ１１では、合計値が算出される。 When Steps G2 to G10 for each region are completed, the determination unit 5 adds the number of times of each document ID calculated for each region, and calculates the total number of times for each document ID (Step G11). Multiple document IDs can be candidates for each divided area, and the document IDs that can be candidates for each divided area may be different. Judgment is high. Therefore, in step G11, the total value is calculated.

次に、判定部５は、合計値が最も高い文書ＩＤを流出元の文書と判定し、判定結果を出力する（ステップＧ１２）。具体的には、判定結果は、ネットワーク１０を介して、管理者の端末装置２０へと送信される。また、情報管理装置１が表示装置を備えている場合は、判定結果は、この表示装置に出力されても良い。 Next, the determination unit 5 determines that the document ID having the highest total value is the outflow source document, and outputs the determination result (step G12). Specifically, the determination result is transmitted to the administrator's terminal device 20 via the network 10. In addition, when the information management apparatus 1 includes a display device, the determination result may be output to the display device.

［実施の形態における効果］
以上の本実施の形態では、個人情報を属性及び属性値を用いて分割することで得られた領域の特徴量と、流出したファイルの同様に分割された領域の特徴量とを比較することで、流出元のファイルが特定される。このため、分割又は順序の入れ替えが行なわれた状態でファイルが流出していても、流出元のファイルを特定することができる [Effects of the embodiment]
In the present embodiment as described above, the feature amount of the area obtained by dividing the personal information using the attribute and the attribute value is compared with the feature amount of the similarly divided area of the outflowed file. , The source file is identified. For this reason, even if a file is leaked in a state where division or order change has been performed, it is possible to identify the source file.

また、本実施の形態では、上述したように、分割された領域の特徴量を手がかりとして流出元のファイルを特定するので、本実施の形態は、テキストファイルのような電子透かしを埋め込むことが困難なファイルが流出した場合であっても対応できる。 Further, in this embodiment, as described above, the outflow source file is specified using the feature amount of the divided area as a clue, so it is difficult to embed a digital watermark such as a text file in this embodiment. Even if a large file is leaked, it can be handled.

［変形例１］
以下に本実施の形態における変形例１について説明する。変形例１では、領域分割部３による分割の回数が、管理対象となる情報（個人情報）の重要度に応じて設定される。図１２は、本発明の実施の形態における情報管理装置の変形例１での具体的構成を示すブロック図である。 [Modification 1]
Hereinafter, Modification Example 1 of the present embodiment will be described. In the first modification, the number of divisions by the area dividing unit 3 is set according to the importance of information (personal information) to be managed. FIG. 12 is a block diagram showing a specific configuration of Modification 1 of the information management apparatus according to the embodiment of the present invention.

図１２に示すように、本変形例１では、情報管理装置１は、更に、重要度計算部１１を備えている。重要度計算部１１は、個人情報検出部７によって個人情報が検出されると、検出された個人情報の重要度を計算し、計算された重要度に応じて、領域分割部３による分割に用いられる分割ルール８の数を増減させる。例えば、重要度計算部１１は、重要度が高い程、領域分割部３が用いる分割ルール８の数を増加させることで、流出元の特定精度を向上させることができる。 As shown in FIG. 12, in the first modification, the information management apparatus 1 further includes an importance degree calculation unit 11. When the personal information is detected by the personal information detecting unit 7, the importance calculating unit 11 calculates the importance of the detected personal information and uses it for the division by the region dividing unit 3 according to the calculated importance. Increase or decrease the number of division rules 8 to be created. For example, the importance calculation unit 11 can improve the accuracy of identifying the outflow source by increasing the number of division rules 8 used by the region division unit 3 as the importance is higher.

具体的には、重要度計算部１１は、例えば、重要度の段階毎に、キーワード（特定の人名、特定の住所等）及びキーワードの出現頻度を設定し、個人情報に含まれているキーワード及びその出現頻度に応じて、管理対象となる個人情報に重要度を設定する。また、分割ルール８それぞれには、重要度に応じて適用されるルールが設定されていても良く、この場合は、領域分割部３は、設定された重要度に適用される分割ルール８を用いて分割を実行する。 Specifically, for example, the importance level calculation unit 11 sets a keyword (specific person name, specific address, etc.) and an appearance frequency of the keyword for each level of importance level, In accordance with the appearance frequency, importance is set for personal information to be managed. Further, each of the division rules 8 may be set with a rule that is applied according to the importance. In this case, the region dividing unit 3 uses the division rule 8 that is applied to the set importance. To execute splitting.

［変形例２］
続いて、本実施の形態における変形例２について説明する。上述の実施の形態では、属性の値を利用して個人情報をグループ化する際において、都道府県、性別、メールアドレスといった属性値が同一のレコードがグループ化されている。これに対して、変形例２では、属性値が同一のレコードではなく、例えば、姓を辞書順にソートした結果がグループ化される。この場合であっても、流出元を特定することができる [Modification 2]
Subsequently, Modification 2 of the present embodiment will be described. In the above-described embodiment, when personal information is grouped using attribute values, records having the same attribute value such as prefecture, sex, and mail address are grouped. On the other hand, in the second modification, not the records having the same attribute value but, for example, the results of sorting the last names in the dictionary order are grouped. Even in this case, the spill source can be identified.

［プログラム］
本実施の形態におけるプログラムは、コンピュータに、図５に示すステップＡ１〜Ａ４、図６に示すステップＢ１〜Ｂ１７、図７に示すステップＣ１〜Ｃ７、図８に示すステップＤ１〜Ｄ４、図９に示すステップＥ１〜Ｅ１７、図１０に示すステップＦ１〜Ｆ６、図１１に示すステップＧ１〜Ｇ１２を実行させるプログラムであれば良い。 [program]
The programs in the present embodiment are stored in the computer in steps A1 to A4 shown in FIG. 5, steps B1 to B17 shown in FIG. 6, steps C1 to C7 shown in FIG. 7, steps D1 to D4 shown in FIG. Any program that executes steps E1 to E17, steps F1 to F6 shown in FIG. 10, and steps G1 to G12 shown in FIG.

また、本実施の形態におけるプログラムをコンピュータにインストールし、実行することによって、本実施の形態における情報管理装置１と情報管理方法とを実現することができる。この場合、コンピュータのＣＰＵ（Central Processing Unit）は、特徴量登録部２、領域分割部３、特徴量抽出部４、判定部５、ファイル取得部６及び個人情報検出部７として機能し、処理を行なう。 Moreover, the information management apparatus 1 and the information management method in the present embodiment can be realized by installing and executing the program in the present embodiment on a computer. In this case, a CPU (Central Processing Unit) of the computer functions as a feature amount registration unit 2, a region division unit 3, a feature amount extraction unit 4, a determination unit 5, a file acquisition unit 6, and a personal information detection unit 7, and performs processing. Do.

また、本実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、特徴量登録部２、領域分割部３、特徴量抽出部４、判定部５、ファイル取得部６及び個人情報検出部７のいずれかとして機能しても良い。 The program in the present embodiment may be executed by a computer system constructed by a plurality of computers. In this case, for example, each computer functions as any one of the feature amount registration unit 2, the region division unit 3, the feature amount extraction unit 4, the determination unit 5, the file acquisition unit 6, and the personal information detection unit 7, respectively. Also good.

ここで、実施の形態におけるプログラムを実行することによって、情報管理装置１を実現するコンピュータについて図１３を用いて説明する。図１３は、本発明の実施の形態における情報管理装置を実現するコンピュータの一例を示すブロック図である。 Here, a computer that realizes the information management apparatus 1 by executing the program according to the embodiment will be described with reference to FIG. FIG. 13 is a block diagram illustrating an example of a computer that implements the information management apparatus according to the embodiment of the present invention.

図１３に示すように、コンピュータ１１０は、ＣＰＵ１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。 As shown in FIG. 13, the computer 110 includes a CPU 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. These units are connected to each other via a bus 121 so that data communication is possible.

ＣＰＵ１１１は、記憶装置１１３に格納された、本実施の形態におけるプログラム（コード）をメインメモリ１１２に展開し、これらを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。また、本実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。 The CPU 111 performs various calculations by developing the program (code) in the present embodiment stored in the storage device 113 in the main memory 112 and executing them in a predetermined order. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Further, the program in the present embodiment is provided in a state of being stored in a computer-readable recording medium 120. Note that the program in the present embodiment may be distributed on the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Specific examples of the storage device 113 include a hard disk drive and a semiconductor storage device such as a flash memory. The input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and a mouse. The display controller 115 is connected to the display device 119 and controls display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。 The data reader / writer 116 mediates data transmission between the CPU 111 and the recording medium 120, and reads a program from the recording medium 120 and writes a processing result in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and another computer.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of the recording medium 120 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), magnetic recording media such as a flexible disk, or CD- An optical recording medium such as ROM (Compact Disk Read Only Memory) can be used.

なお、本実施の形態における情報管理装置１は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、情報管理装置１は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。 Note that the information management apparatus 1 according to the present embodiment can be realized not by using a computer in which a program is installed but also by using hardware corresponding to each unit. Furthermore, part of the information management apparatus 1 may be realized by a program, and the remaining part may be realized by hardware.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）〜（付記１５）によって表現することができるが、以下の記載に限定されるものではない。 Part or all of the above-described embodiment can be expressed by (Appendix 1) to (Appendix 15) described below, but is not limited to the following description.

（付記１）
予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録している、特徴量登録部と、
外部から入力された情報を、前記ルールに基づいて、複数の領域に分割する、領域分割部と、
分割によって得られた領域毎に、特徴量を抽出する、特徴量抽出部と、
抽出した特徴量と、登録されている前記特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する、判定部と、
を備えていることを特徴とする情報管理装置。 (Appendix 1)
A feature amount registration unit that registers the feature amounts of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
An area dividing unit that divides information input from the outside into a plurality of areas based on the rules;
A feature amount extraction unit that extracts a feature amount for each region obtained by the division;
A determination unit that compares the extracted feature value with the registered feature value to determine whether information input from the outside matches information to be managed;
An information management device comprising:

（付記２）
前記ルールが、情報を構成する特定の属性に基づいて情報を分割するルール、及び情報に含まれる特定の属性値に基づいて情報を分割するルールのうち、少なくとも１つを含み、
前記特徴量登録部に登録されている特徴量を得るために用いられるルールと、前記領域分割部が用いるルールとは、同一のルールである、
付記１に記載の情報管理装置。 (Appendix 2)
The rule includes at least one of a rule for dividing information based on a specific attribute constituting information and a rule for dividing information based on a specific attribute value included in the information,
The rule used to obtain the feature amount registered in the feature amount registration unit and the rule used by the region division unit are the same rule.
The information management device according to attachment 1.

（付記３）
前記領域分割部が、前記管理対象となる情報を分割し、
前記特徴量抽出部が、前記管理対象となる情報の分割によって得られた領域毎に、特徴量を抽出し、
前記特徴量登録部が、前記特徴量抽出部によって、前記管理対象となる情報の分割によって得られた領域から抽出された特徴量を登録する、
付記１または２に記載の情報管理装置。 (Appendix 3)
The area dividing unit divides the information to be managed,
The feature amount extraction unit extracts a feature amount for each area obtained by dividing the information to be managed,
The feature amount registration unit registers the feature amount extracted from the region obtained by dividing the information to be managed by the feature amount extraction unit;
The information management device according to appendix 1 or 2.

（付記４）
前記管理対象となる情報の重要度を計算し、計算した重要度に応じて、前記領域分割部による分割に用いられる前記ルールの数を増減させる、重要度計算部を更に備えている、
付記３に記載の情報管理装置。 (Appendix 4)
Further comprising an importance calculator that calculates the importance of the information to be managed, and increases or decreases the number of rules used for division by the region divider according to the calculated importance;
The information management device according to attachment 3.

（付記５）
管理対象となる情報が、個人情報である、
付記１〜４のいずれかに記載の情報管理装置。 (Appendix 5)
Information to be managed is personal information,
The information management device according to any one of appendices 1 to 4.

（付記６）
（ａ）予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録する、ステップと、
（ｂ）外部から入力された情報を、前記ルールに基づいて、複数の領域に分割する、ステップと、
（ｃ）分割によって得られた領域毎に、特徴量を抽出する、ステップと、
（ｄ）抽出した特徴量と、登録されている前記特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する、ステップと、
を有することを特徴とする情報管理方法。 (Appendix 6)
(A) registering feature quantities of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
(B) dividing information input from outside into a plurality of regions based on the rules;
(C) extracting a feature amount for each region obtained by the division; and
(D) comparing the extracted feature quantity with the registered feature quantity to determine whether information input from the outside matches information to be managed;
An information management method characterized by comprising:

（付記７）
前記ルールが、情報を構成する特定の属性に基づいて情報を分割するルール、及び情報に含まれる特定の属性値に基づいて情報を分割するルールのうち、少なくとも１つを含み、
前記（ａ）のステップで登録される特徴量を得るために用いられるルールと、前記（ｂ）のステップで用いるルールとは、同一のルールである、
付記６に記載の情報管理方法。 (Appendix 7)
The rule includes at least one of a rule for dividing information based on a specific attribute constituting information and a rule for dividing information based on a specific attribute value included in the information,
The rule used for obtaining the feature value registered in the step (a) and the rule used in the step (b) are the same rule.
The information management method according to attachment 6.

（付記８）
（ｅ）前記ルールに基づいて、前記管理対象となる情報を、複数の領域に分割する、ステップと、
（ｆ）前記管理対象となる情報の分割によって得られた領域毎に、特徴量を抽出する、ステップと、を更に有し、
前記（ａ）のステップにおいて、前記（ｆ）のステップによって抽出された特徴量を登録する、
付記６または７に記載の情報管理方法。 (Appendix 8)
(E) dividing the information to be managed into a plurality of areas based on the rules;
And (f) extracting a feature amount for each area obtained by dividing the information to be managed, and
In the step (a), the feature amount extracted in the step (f) is registered.
The information management method according to appendix 6 or 7.

（付記９）
（ｇ）前記管理対象となる情報の重要度を計算し、計算した重要度に応じて、前記（ｂ）のステップ及び前記（ｅ）のステップによる分割に用いられる前記ルールの数を増減させる、ステップを更に有する、
付記８に記載の情報管理方法。 (Appendix 9)
(G) Calculate the importance of the information to be managed, and increase or decrease the number of rules used for the division by the step (b) and the step (e) according to the calculated importance. Further comprising a step,
The information management method according to attachment 8.

（付記１０）
管理対象となる情報が、個人情報である、
付記６〜９のいずれかに記載の情報管理方法。 (Appendix 10)
Information to be managed is personal information,
The information management method according to any one of appendices 6 to 9.

（付記１１）
コンピュータに、
（ａ）予め設定されたルールに基づいて、管理対象となる情報を分割して得られた、複数の領域それぞれの特徴量を登録する、ステップと、
（ｂ）外部から入力された情報を、前記ルールに基づいて、複数の領域に分割する、ステップと、
（ｃ）分割によって得られた領域毎に、特徴量を抽出する、ステップと、
（ｄ）抽出した特徴量と、登録されている前記特徴量とを比較して、外部から入力された情報と管理対象となる情報とが、一致しているかどうかを判定する、ステップと、
を実行させるプログラム。 (Appendix 11)
On the computer,
(A) registering feature quantities of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
(B) dividing information input from outside into a plurality of regions based on the rules;
(C) extracting a feature amount for each region obtained by the division; and
(D) comparing the extracted feature quantity with the registered feature quantity to determine whether information input from the outside matches information to be managed;
A program that executes

（付記１２）
前記ルールが、情報を構成する特定の属性に基づいて情報を分割するルール、及び情報に含まれる特定の属性値に基づいて情報を分割するルールのうち、少なくとも１つを含み、
前記（ａ）のステップで登録される特徴量を得るために用いられるルールと、前記（ｂ）のステップで用いるルールとは、同一のルールである、
付記１１に記載のプログラム。 (Appendix 12)
The rule includes at least one of a rule for dividing information based on a specific attribute constituting information and a rule for dividing information based on a specific attribute value included in the information,
The rule used for obtaining the feature value registered in the step (a) and the rule used in the step (b) are the same rule.
The program according to appendix 11.

（付記１３）
前記コンピュータに、
（ｅ）前記ルールに基づいて、前記管理対象となる情報を、複数の領域に分割する、ステップと、
（ｆ）前記管理対象となる情報の分割によって得られた領域毎に、特徴量を抽出する、ステップと、を更に実行させ、
前記（ａ）のステップにおいて、前記（ｆ）のステップによって抽出された特徴量を登録する、
付記１１または１２に記載のプログラム。 (Appendix 13)
In the computer,
(E) dividing the information to be managed into a plurality of areas based on the rules;
(F) further executing a step of extracting a feature amount for each area obtained by dividing the information to be managed,
In the step (a), the feature amount extracted in the step (f) is registered.
The program according to appendix 11 or 12.

（付記１４）
前記コンピュータに、
（ｇ）前記管理対象となる情報の重要度を計算し、計算した重要度に応じて、前記（ｂ）のステップ及び前記（ｅ）のステップによる分割に用いられる前記ルールの数を増減させる、ステップを更に実行させる、
付記１３に記載のプログラム。 (Appendix 14)
In the computer,
(G) Calculate the importance of the information to be managed, and increase or decrease the number of rules used for the division by the step (b) and the step (e) according to the calculated importance. Let the step run further,
The program according to attachment 13.

（付記１５）
管理対象となる情報が、個人情報である、
付記１１〜１４のいずれかに記載のプログラム。 (Appendix 15)
Information to be managed is personal information,
The program according to any one of appendices 11 to 14.

以上のように、本発明によれば、システムにかかる負担を増加させることなく、元のファイルが改変された場合でも、情報の流出元を特定できるようにすることができる。本発明は、ファイルサーバを利用している企業で、ファイルサーバ上に個人情報を含むファイルを置いている（あるいは誤っておかれている）状況である場合に、個人情報を含むファイルが流出してしまった際に、流出元のファイルを効率的に特定するために用いることが出来る。 As described above, according to the present invention, it is possible to specify the source of information leakage even when the original file is altered without increasing the burden on the system. In the present invention, when a file server is used in a company where a file containing personal information is placed (or misplaced) on the file server, the file containing personal information is leaked. Can be used to efficiently identify the source file.

１情報管理措置
２特徴量登録部
３領域分割部
４特徴量抽出部
５判定部
６ファイル取得部
７個人情報検出部
８分割ルール
９データベース
１０ネットワーク
１１重要度計算部
２０端末装置
３０ファイルサーバ
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス DESCRIPTION OF SYMBOLS 1 Information management measure 2 Feature quantity registration part 3 Area division part 4 Feature quantity extraction part 5 Judgment part 6 File acquisition part 7 Personal information detection part 8 Division rule 9 Database 10 Network 11 Importance calculation part 20 Terminal apparatus 30 File server 110 Computer 111 CPU
112 Main Memory 113 Storage Device 114 Input Interface 115 Display Controller 116 Data Reader / Writer 117 Communication Interface 118 Input Device 119 Display Device 120 Recording Medium 121 Bus

Claims

A feature amount registration unit that registers the feature amounts of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
An area dividing unit that divides information input from the outside into a plurality of areas based on the rules;
A feature amount extraction unit that extracts a feature amount for each region obtained by the division;
A determination unit that compares the extracted feature value with the registered feature value to determine whether information input from the outside matches information to be managed;
An information management device comprising:

The rule includes at least one of a rule for dividing information based on a specific attribute constituting information and a rule for dividing information based on a specific attribute value included in the information,
The rule used to obtain the feature amount registered in the feature amount registration unit and the rule used by the region division unit are the same rule.
The information management apparatus according to claim 1.

The area dividing unit divides the information to be managed,
The feature amount extraction unit extracts a feature amount for each area obtained by dividing the information to be managed,
The feature amount registration unit registers the feature amount extracted from the region obtained by dividing the information to be managed by the feature amount extraction unit;
The information management apparatus according to claim 1 or 2.

Further comprising an importance calculator that calculates the importance of the information to be managed, and increases or decreases the number of rules used for division by the region divider according to the calculated importance;
The information management apparatus according to claim 3.

Information to be managed is personal information,
The information management apparatus in any one of Claims 1-4.

(A) registering feature quantities of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
(B) dividing information input from outside into a plurality of regions based on the rules;
(C) extracting a feature amount for each region obtained by the division; and
(D) comparing the extracted feature quantity with the registered feature quantity to determine whether information input from the outside matches information to be managed;
An information management method characterized by comprising:

On the computer,
(A) registering feature quantities of each of a plurality of areas obtained by dividing information to be managed based on a preset rule;
(B) dividing information input from outside into a plurality of regions based on the rules;
(C) extracting a feature amount for each region obtained by the division; and
(D) comparing the extracted feature quantity with the registered feature quantity to determine whether information input from the outside matches information to be managed;
A program that executes