JP2014026305A

JP2014026305A - Data processing device, database system, data processing method, and program

Info

Publication number: JP2014026305A
Application number: JP2012163583A
Authority: JP
Inventors: Mitsuhiro Hattori; 充洋服部; Takumi Mori; 拓海森; Tadashi Matsuda; 規松田; Takashi Ito; 伊藤　　隆; Takahito Hirano; 貴人平野
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2012-07-24
Filing date: 2012-07-24
Publication date: 2014-02-06

Abstract

PROBLEM TO BE SOLVED: To store a large amount of data by reducing a processing load required for data analysis while protecting personal information.SOLUTION: A data distribution server device 104 stores data including three or more attributes, and selects two ore more attributes to be separated from each other as quasi-identifiers from the attributes included in the data. Then, for each quasi-identifier, the data distribution server device 104 generates partial data by combining a value of attribute which is not any of the quasi-identifiers and a value of the quasi-identifier, and stores the two or more pieces of generated partial data in different sub-DBs, respectively.

Description

本発明は、データを分割して保管する技術に関する。 The present invention relates to a technique for dividing and storing data.

情報システム分野では、取り扱うデータ量が年々増大しつつある。
それにともなって、大量のデータを保管し、それを効率よく利用する技術の必要性が高まっている。
特に、アクセスログや入退室ログなどのログデータに関しては、データへのアクセスや建物の入退室のたびにデータが増えるため、データ量が時間とともに増加するという性質があり、そのような状況下でもデータをすべて保管できるような仕組みが求められる。
また、ログデータは単に保管するだけでなく、例えば毎週のアクセス数や入退室者数を割り出したり、深夜の入退室などの不正な挙動を検知したりといった分析にも用いるため、効率よく分析できるような仕組みも求められる。
さらに、ログデータにはアクセス者の氏名や年齢などの個人情報が含まれることもある。
例えば病院内の情報システムのログデータには、患者名や年齢のほか、病名、病歴、家族構成などの機微な情報が含まれることもある。
そのため、個人情報が適切に保護される仕組みも求められる。
このような背景から、データを保管しておくデータベース分野において、データを効率よく分析する方法や、個人情報を保護する方法が考えられてきた（例えば特許文献１、特許文献２、特許文献３）。 In the information system field, the amount of data handled is increasing year by year.
Along with this, there is an increasing need for technologies for storing large amounts of data and using them efficiently.
Especially for log data such as access logs and entrance / exit logs, the amount of data increases with time because the data increases every time the data is accessed and the entrance / exit of the building. Under such circumstances, A mechanism that can store all data is required.
Log data is not only stored, but can also be analyzed efficiently because it is used for analysis such as determining the number of weekly accesses and the number of people entering and leaving the room, and detecting illegal behavior such as entering and leaving the room at midnight. Such a mechanism is also required.
Furthermore, the log data may include personal information such as the name and age of the accessor.
For example, log data of an information system in a hospital may include sensitive information such as a patient name and age, a disease name, a medical history, and a family structure.
For this reason, a mechanism for appropriately protecting personal information is also required.
From such a background, in the database field for storing data, methods for efficiently analyzing data and methods for protecting personal information have been considered (for example, Patent Document 1, Patent Document 2, Patent Document 3). .

特開２０１１−２０９９７４号公報JP 2011-209974 A 特開平１１−１５７９７号公報Japanese Patent Laid-Open No. 11-15797 特開平０７−１８２３６８号公報Japanese Patent Laid-Open No. 07-182368

ＲａｊｅｅｖＭｏｔｗａｎｉａｎｄＹｉｎｇＸｕ，“Ｅｆｆｉｃｉｅｎｔａｌｇｏｒｉｔｈｍｓｆｏｒｍａｓｋｉｎｇａｎｄｆｉｎｄｉｎｇｑｕａｓｉ−ｉｄｅｎｔｉｆｉｅｒｓ，”ＶＬＤＢ２００７，ＡＣＭ，２００７．Rajeev Motwani and Ying Xu, “Efficient algorithms for masking and finding quasi-identifiers,” VLDB 2007, ACM, 2007.

特許文献１に記載の分散型データベースシステムでは、１つのデータベースを属性ごとに複数のサブデータベースに分割してさらに暗号化することにより、個人情報を保護することが開示されている。
しかしながら、特許文献１に記載の分散型データベースシステムは、分散された暗号化データをすべて復号して集め直した上でなければデータ分析ができないような構成となっており、データ分析の際の処理の負荷が高いという課題がある。
これは、上述のログデータの場合には特に問題となる。 In the distributed database system described in Patent Document 1, it is disclosed that personal information is protected by dividing one database into a plurality of sub-databases for each attribute and further encrypting them.
However, the distributed database system described in Patent Document 1 is configured such that data analysis can be performed only after all the encrypted encrypted data is decrypted and collected again. There is a problem of high load.
This is a particular problem in the case of the log data described above.

また、特許文献２に記載のデータ転送方式では、１つのデータベースの内容を、属性を一部重ねあわせつつ複数のデータベースに分割し、各サブデータベース上でデータ分析することにより、分析処理の負荷を軽減することが開示されている。
しかしながら、個人情報を保護しつつ分析する方法については開示されておらず、したがって個人情報保護が重要となるようなシステムでは安心して利用することができないという課題がある。 In addition, in the data transfer method described in Patent Document 2, the content of one database is divided into a plurality of databases while overlapping attributes partially, and data analysis is performed on each sub-database, thereby reducing the load of analysis processing. Mitigation is disclosed.
However, a method for analyzing personal information while protecting it is not disclosed, and therefore there is a problem that it cannot be used with peace of mind in a system in which personal information protection is important.

また、特許文献３に記載のデータ加工システムでは、１つのデータベースの内容を複数のサブデータベースに分割して格納し、分析の際には、各サブデータベースで情報を圧縮することによりデータベース間の通信負荷を低減し処理時間を短縮することが開示されている。
しかしながら、やはり個人情報を保護しつつ分析する方法については開示されておらず、したがって個人情報保護が重要となるようなシステムでは安心して利用することができないという課題がある。 In the data processing system described in Patent Document 3, the content of one database is divided and stored in a plurality of sub-databases, and communication between databases is performed by compressing information in each sub-database during analysis. It is disclosed that the load is reduced and the processing time is shortened.
However, a method for analyzing personal information while protecting it is not disclosed, and therefore there is a problem that it cannot be used with confidence in a system in which personal information protection is important.

本発明は、これらの課題を解決することを主な目的としており、個人情報を保護しつつ、データ分析の際の処理負荷が高くならないようにして、大量のデータを保管することを主な目的とする。 The main object of the present invention is to solve these problems, and it is a main object to store a large amount of data while protecting personal information and avoiding an increase in processing load during data analysis. And

本発明に係るデータ処理装置は、
それぞれに項目値が記述されている３つ以上の項目が含まれるデータを記憶するデータ記憶部と、
前記データに含まれる項目のうち、相互に分離すべき２つ以上の項目を分離対象項目として選択する項目選択部と、
分離対象項目ごとに、前記データに含まれる項目のうち分離対象項目でない項目の項目値と分離対象項目の項目値とを組み合わせた部分データを生成して、分離対象項目の個数分の部分データを生成する部分データ生成部と、
前記部分データ生成部により生成された２つ以上の部分データを、それぞれ異なるデータベースに格納する部分データ格納部とを有することを特徴とする。 The data processing apparatus according to the present invention
A data storage unit for storing data including three or more items each having item values described therein;
An item selection unit that selects two or more items to be separated from each other as items to be separated among the items included in the data;
For each item to be separated, partial data combining the item value of the item included in the data that is not the item to be separated and the item value of the item to be separated is generated, and partial data corresponding to the number of items to be separated is generated. A partial data generation unit to generate;
And a partial data storage unit that stores two or more partial data generated by the partial data generation unit in different databases.

本発明によれば、個人情報に関連する２つ以上の項目を分離対象項目として選択することで、個人情報に関連する２つ以上の項目の項目値を相互に異なる部分データとすることができ、異なるデータベースに格納することができる。
このため、個人情報を保護することができる。
また、部分データ内の分離対象項目の項目値と分離対象項目でない項目の項目値とを組み合わせたきめ細かなデータ分析を行うことができ、また、部分データは、分割前のデータの一部であるため、データ分析の際の処理の負荷を低く抑えることができる。
また、部分データを、分散して異なるデータベースに格納するため、大量のデータを保管することができる。 According to the present invention, by selecting two or more items related to personal information as items to be separated, the item values of two or more items related to personal information can be made different partial data. Can be stored in different databases.
For this reason, personal information can be protected.
In addition, detailed data analysis can be performed by combining the item value of the separation target item in the partial data and the item value of the item that is not the separation target item, and the partial data is a part of the data before the division. As a result, the processing load during data analysis can be kept low.
Further, since the partial data is distributed and stored in different databases, a large amount of data can be stored.

実施の形態１に係るシステム構成例を示す図。FIG. 3 is a diagram illustrating an example of a system configuration according to the first embodiment. 実施の形態１に係るデータ分散サーバ装置の構成例を示す図。FIG. 3 is a diagram illustrating a configuration example of a data distribution server device according to the first embodiment. 実施の形態１に係るログデータの例を示す図。FIG. 3 is a diagram illustrating an example of log data according to the first embodiment. 実施の形態１に係るデータ分散サーバ装置の動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of the data distribution server device according to the first embodiment. 実施の形態１に係る部分ログデータの例を示す図。FIG. 4 shows an example of partial log data according to the first embodiment. 実施の形態１に係る部分ログデータの例を示す図。FIG. 4 shows an example of partial log data according to the first embodiment. 実施の形態１に係る部分分析装置の動作例を示すフローチャート図。FIG. 4 is a flowchart showing an operation example of the partial analysis apparatus according to the first embodiment. 実施の形態１に係る部分分析結果の例を示す図。FIG. 6 shows an example of a partial analysis result according to the first embodiment. 実施の形態１に係るデータ分散サーバ装置のハードウェア構成例を示す図。FIG. 3 is a diagram illustrating a hardware configuration example of the data distribution server device according to the first embodiment.

実施の形態１．
本実施の形態では、データベース分野において、大量のデータを保管でき、データ分析の際の処理の負荷を低減させることができ、また、個人情報を保護することができる構成を説明する。 Embodiment 1 FIG.
In the present embodiment, a configuration in which a large amount of data can be stored in the database field, a processing load at the time of data analysis can be reduced, and personal information can be protected will be described.

本実施の形態では、説明を容易にするために、病院におけるログ管理システムへ適用した場合を想定して説明する。
ただし、他のシステムであってももちろんよい。 In the present embodiment, in order to facilitate the explanation, a case where the present invention is applied to a log management system in a hospital will be described.
However, other systems may of course be used.

はじめに図１を用いて、本実施の形態におけるシステム構成を述べる。 First, the system configuration in the present embodiment will be described with reference to FIG.

図１において、データ管理対象システム１００は、データ管理の対象となるシステムを抽象的に表したものである。
これは例えば病院の情報システム全体である。
データ管理対象システム１００の内部には、電子カルテなどのデータファイルを格納するファイルサーバ装置１０１や、医師や看護師の病院への入退室を管理する入退室管理サーバ装置１０２などがあり、これらは各サーバ装置のログデータを収集するデータ収集サーバ装置１０３に接続されている。
また、データ収集サーバ装置１０３は、データ分散サーバ装置１０４に接続されている。
ここで、データ分散サーバ装置１０４は、後述する方法に従って、アクセスログや入退室ログなど個人情報が含まれるログデータを複数のサブデータベースへ分割保管する処理を行うサーバ装置である。
データ分散サーバ装置１０４は、データ処理装置の例に相当する。 In FIG. 1, a data management target system 100 is an abstract representation of a system that is a target of data management.
This is, for example, the entire hospital information system.
Inside the data management target system 100, there are a file server device 101 for storing data files such as electronic medical records, and an entrance / exit management server device 102 for managing entrance / exit of doctors and nurses to / from a hospital. The server is connected to a data collection server device 103 that collects log data of each server device.
The data collection server device 103 is connected to the data distribution server device 104.
Here, the data distribution server device 104 is a server device that performs processing of dividing and storing log data including personal information such as an access log and an entrance / exit log into a plurality of sub-databases according to a method described later.
The data distribution server device 104 corresponds to an example of a data processing device.

図１において、データ分散保管分析システム１１０は、ログデータの分散保管やログデータの分析を行うシステムである。
これは例えば、病院から委託を受けたクラウドサービス提供会社のシステムである。
データ分散保管分析システム１１０の内部は、複数のデータ分散保管分析サブシステムに分かれている。
各サブシステムは、分割されたログデータを保管するサブデータベース（サブＤＢ）１１１や、ログデータの部分分析を行う部分分析装置１１４から構成されている。
また、データ分散保管分析システム１１０には分析結果集計サーバ装置１１７がある。
分析結果集計サーバ装置１１７は、各サブシステムで得られた部分分析結果を集計し、ログデータ全体としての分析結果をまとめるサーバ装置である。
なお、データ分散保管分析システム１１０は、データベースシステムの例に相当する。 In FIG. 1, a data distributed storage analysis system 110 is a system that performs distributed storage of log data and analysis of log data.
This is, for example, a cloud service provider system commissioned by a hospital.
The data distributed storage analysis system 110 is divided into a plurality of data distributed storage analysis subsystems.
Each subsystem includes a sub-database (sub-DB) 111 that stores divided log data and a partial analysis device 114 that performs partial analysis of log data.
The data distributed storage analysis system 110 includes an analysis result totaling server device 117.
The analysis result totaling server device 117 is a server device that totals the partial analysis results obtained in each subsystem and collects the analysis results as the entire log data.
The data distributed storage analysis system 110 corresponds to an example of a database system.

図１において、データ監視システム１２０は、ログデータを監視するためのシステムを抽象的に表したものである。
これは例えば病院内の情報システム部門である。
データ監視システム１２０の内部には、分析結果から定期レポートを作成したり、分析結果に基づいて医師や看護師の不審な行動（例えば自分の担当外であるはずの有名人の電子カルテを閲覧する行為）を発見したりする役割を果たすデータ監視サーバ装置１２１がある。 In FIG. 1, a data monitoring system 120 is an abstract representation of a system for monitoring log data.
This is, for example, an information system department in a hospital.
Within the data monitoring system 120, a regular report is created from the analysis results, and suspicious behaviors of doctors and nurses based on the analysis results (for example, browsing the electronic records of celebrities who should not be in charge) Or the like.

なお、以下では、主に、データ分散サーバ装置１０４、部分分析装置１１４〜１１６、および分析結果集計サーバ装置１１７について説明を行う。 In the following, the data distribution server device 104, the partial analysis devices 114 to 116, and the analysis result totaling server device 117 will be mainly described.

次に、図２を用いて、データ分散サーバ装置１０４の内部構成例について述べる。 Next, an internal configuration example of the data distribution server device 104 will be described with reference to FIG.

図２において、識別子・準識別子判断部２０１は、データの中で識別子および準識別子と呼ばれる属性（項目）を判断する。
識別子・準識別子判断部２０１は、項目選択部の例に相当する。
なお、識別子・準識別子判断部２０１の詳細は、後述する。
また、識別子および準識別子についても、図３の典型的なデータ例の説明の中で詳しく説明する。 In FIG. 2, an identifier / quasi-identifier determination unit 201 determines attributes (items) called identifiers and quasi-identifiers in the data.
The identifier / quasi-identifier determination unit 201 corresponds to an example of an item selection unit.
Details of the identifier / quasi-identifier determination unit 201 will be described later.
The identifier and quasi-identifier will also be described in detail in the description of the typical data example in FIG.

データ分割部２０２は、識別子・準識別子判断部の判断に基づいてデータを分割する。
データ分割部２０２は、部分データ生成部の例に相当する。
データ分割部２０２の詳細も、後述する。
なお、データ分割部２０２により分割された後のデータを、部分データ又は部分ログデータという。 The data dividing unit 202 divides the data based on the determination by the identifier / quasi-identifier determining unit.
The data division unit 202 corresponds to an example of a partial data generation unit.
Details of the data dividing unit 202 will also be described later.
The data after being divided by the data dividing unit 202 is referred to as partial data or partial log data.

識別子暗号化部２０３は、分割されたデータのうち識別子と呼ばれる部分を暗号化する。
識別子暗号化部２０３は、秘匿化部の例に相当する。
識別子暗号化部２０３の詳細も、後述する。 The identifier encryption unit 203 encrypts a part called an identifier among the divided data.
The identifier encryption unit 203 corresponds to an example of a concealment unit.
Details of the identifier encryption unit 203 will also be described later.

データ記憶部２０４は、データ分割の元となるデータを記憶する。 The data storage unit 204 stores data that is a source of data division.

入力部２０５は、データ記憶部２０４が記憶するデータをデータ収集サーバ装置１０３から入力する。
また、入力部２０５は、システム管理者等からの指示コマンドを入力する。 The input unit 205 inputs data stored in the data storage unit 204 from the data collection server device 103.
The input unit 205 inputs an instruction command from a system administrator or the like.

出力部２０６は、データ分割部２０２により分割された複数の部分データを、それぞれ、データ分散保管分析システム１１０内の異なるサブＤＢに格納する。
出力部２０６は、部分データ格納部の例に相当する。 The output unit 206 stores the plurality of partial data divided by the data dividing unit 202 in different sub-DBs in the data distributed storage analysis system 110, respectively.
The output unit 206 corresponds to an example of a partial data storage unit.

次に、図３を用いて、典型的なデータの例を述べる。 Next, an example of typical data will be described with reference to FIG.

図３は、病院におけるログ管理システムでのログデータ３０１の例である。
データは３つ以上の属性（項目）からなっている。
図３では、ログデータ３０１は、「日時」、「医師ＩＤ」、「操作履歴」、「操作結果」からなっており、このうち「操作結果」はより詳細に「患者氏名」、「年齢」、「病棟」、「病名」からなっている。
そして、医師が何らかの操作を行うたびに行が追加される。
例えば電子カルテの閲覧や記入により行が追加される。 FIG. 3 is an example of log data 301 in a log management system in a hospital.
Data consists of three or more attributes (items).
In FIG. 3, the log data 301 includes “date and time”, “doctor ID”, “operation history”, and “operation result”, among which “operation result” is “patient name” and “age” in more detail. , "Ward", "sick name".
A row is added each time the doctor performs some operation.
For example, a row is added by browsing or filling in an electronic medical record.

図３において、「患者氏名」は、患者を基本的に一意に特定できる情報（ＩＤ（Ｉｄｅｎｔｉｆｉｅｒ）値）である。
例えば、病院の中で「鈴木一郎」といえば、多くの場合ただ一人を指している。
もちろん同姓同名の人物がいる可能性もあるが、それでも不特定多数を指すわけではなく極めて狭い範囲の人物に特定される。
このように、個人を一意に特定できる属性（項目）を識別子と呼ぶ。
すなわち、「患者氏名」は識別子である。
識別子として扱われる属性（項目）は、ＩＤ値項目に相当する。
そして、識別子として扱われる属性（項目）は、秘匿化すべき項目であり、秘匿化対象項目に相当する。 In FIG. 3, “Patient Name” is information (ID (Identifier) value) that can uniquely identify a patient.
For example, “Ichiro Suzuki” in hospitals often refers to only one person.
Of course, there is a possibility that there is a person with the same surname and the same name.
In this way, an attribute (item) that can uniquely identify an individual is called an identifier.
That is, “patient name” is an identifier.
An attribute (item) treated as an identifier corresponds to an ID value item.
An attribute (item) treated as an identifier is an item to be concealed and corresponds to an item to be concealed.

一方、図３において、「年齢」、「病棟」、「病名」は、それぞれ単独では患者を一意に特定できる情報にはならない。
例えば、年齢が５１歳の患者は中規模の病院であれば複数いると考えられるし、２棟の病棟にも通常は複数の患者がいると考えられるし、また肺ガンの患者も中規模の病院であれば複数いると考えられる。
ところが、これらの組合せ、すなわち、例えば２棟にいる５１歳の患者となると、患者を一意に特定できる情報になりうる。
同様に、２棟にいる肺ガンの患者も患者を一意に特定できる情報になりうる。
同様に、５１歳の肺ガン患者も患者を一意に特定できる情報になりうる。
このように、単独では個人を一意に特定できないが、組み合わせることで個人を一意に特定できる情報になりうるような属性を準識別子と呼ぶ。
すなわち、「年齢」、「病棟」、「病名」は準識別子である。
準識別子の値は、識別子である「患者氏名」が対象としている患者の特性を表す特性値である。
また、準識別子として扱われる属性（項目）は、同一のデータに含まれていると個人が一意に特定される可能性があるため、相互に分離して管理すべきである。
準識別子として扱われる属性（項目）は、特性値項目及び分離対象項目に相当する。 On the other hand, in FIG. 3, “age”, “ward”, and “disease name” are not information that can uniquely identify a patient alone.
For example, a 51-year-old patient is considered to have multiple patients in a medium-sized hospital, two hospital wards usually have multiple patients, and lung cancer patients are also medium-sized. It is thought that there are multiple hospitals.
However, when these combinations, that is, for example, a 51-year-old patient in two buildings, can be information that can uniquely identify the patient.
Similarly, patients with lung cancer in two buildings can be information that can uniquely identify the patient.
Similarly, a 51-year-old lung cancer patient can be information that can uniquely identify a patient.
In this way, an attribute that can uniquely identify an individual by itself but can be combined to uniquely identify an individual is called a quasi-identifier.
That is, “age”, “ward”, and “disease name” are quasi-identifiers.
The value of the quasi-identifier is a characteristic value that represents the characteristic of the patient targeted by the identifier “patient name”.
In addition, since attributes (items) treated as quasi-identifiers may be uniquely specified if they are included in the same data, they should be managed separately from each other.
Attributes (items) treated as quasi-identifiers correspond to characteristic value items and separation target items.

図３において、識別子、準識別子以外の情報、すなわち、日時、医師ＩＤ、操作履歴については、患者を一意に特定できる情報にはならない。
このような情報をその他の属性と呼ぶ。
なお、ここで医師ＩＤは患者を一意に特定できることにはならないが、医師を一意に特定できることにはなる。
ただし、個人情報の保護の対象は患者であり、このため、ここでは患者の一意特定の有無にのみ着目しており、医師の一意特定の有無には着目していないため、医師ＩＤはその他の属性として扱う。
このように、識別子、準識別子、その他の属性の決め方は、着目している個人情報に依存する。
なお、その他の属性として扱われる属性（項目）は、通常項目に相当する。 In FIG. 3, information other than identifiers and quasi-identifiers, that is, date / time, doctor ID, and operation history are not information that can uniquely identify a patient.
Such information is called other attributes.
Here, the doctor ID cannot uniquely identify the patient, but can uniquely identify the doctor.
However, the subject of protection of personal information is a patient. Therefore, here, only the patient's unique identification is focused on, and the doctor's unique identification is not focused. Treat as an attribute.
As described above, how to determine an identifier, a quasi-identifier, and other attributes depends on the personal information of interest.
Note that attributes (items) treated as other attributes correspond to normal items.

次に、図４を用いて、データ分散サーバ装置１０４上で、図３のようなデータを複数のサブデータベースに分散保管する手順について説明する。
なお、前提として、データはデータ収集サーバ装置１０３を通じて図３のような形式ですでに取得されているものとする。
つまり、データ分散サーバ装置１０４の入力部２０５がデータ収集サーバ装置１０３からログデータ３０１を入力し、データ記憶部２０４が図３の形式にてログデータ３０１を記憶しているものとする。 Next, a procedure for distributing and storing data as shown in FIG. 3 in a plurality of sub-databases on the data distribution server device 104 will be described with reference to FIG.
As a premise, it is assumed that the data has already been acquired through the data collection server device 103 in the format shown in FIG.
That is, it is assumed that the input unit 205 of the data distribution server device 104 inputs the log data 301 from the data collection server device 103, and the data storage unit 204 stores the log data 301 in the format of FIG.

図４は、データ分散サーバ装置１０４上で、図３のようなログデータを複数のサブデータベースに分散保管する手順を説明するフローチャートである。 FIG. 4 is a flowchart for explaining the procedure for distributing and storing log data as shown in FIG. 3 in a plurality of sub-databases on the data distribution server device 104.

図４において、まずステップＳ４０１にて、識別子・準識別子判断部２０１が図３のデータの各属性のうち、識別子および準識別子を判断する。
これには機械的に判断する方法とシステム管理者などの人が指定する方法がある。
機械的に判断する方法は、例えば非特許文献１に示されているのでここでは割愛する。
一方、システム管理者などの人が指定する方法は、実際のデータの属性に基づいて判断し指定する。
属性の種類はデータ管理対象システム１００の設計時に決まっているものであり、またどのような情報が一意に特定されるべきでない情報であるかは、上述の通り着目している個人情報に依存するため、実際の使用用途などを勘案して判断し指定する。 In FIG. 4, first, in step S401, the identifier / quasi-identifier determination unit 201 determines an identifier and a quasi-identifier among the attributes of the data in FIG.
This can be done either mechanically or by a person such as a system administrator.
A method for mechanical determination is described in Non-Patent Document 1, for example, and is omitted here.
On the other hand, a method designated by a person such as a system administrator is determined and designated based on the attribute of actual data.
The type of attribute is determined at the time of designing the data management target system 100, and what information should not be uniquely specified depends on the personal information focused on as described above. Therefore, it is determined and specified in consideration of the actual usage.

次に、ステップＳ４０２にて、データ分割部２０２が、データのうち、その他の属性と識別子１だけからなる部分ログデータを作成する。
図３の例で言えば、「日時」、「医師ＩＤ」、「操作履歴」、「患者氏名」だけからなる部分ログデータ（第１の部分データ）を作成する。 Next, in step S402, the data dividing unit 202 creates partial log data including only other attributes and identifier 1 in the data.
In the example of FIG. 3, partial log data (first partial data) including only “date and time”, “doctor ID”, “operation history”, and “patient name” is created.

次に、ステップＳ４０３にて、データ分割部２０２が、データのうち、その他の属性と準識別子１だけからなる部分ログデータ（第２の部分データ）を作成する。
図３の例で言えば、「日時」、「医師ＩＤ」、「操作履歴」、「年齢」だけからなる部分ログデータを作成する。
これを各準識別子について繰り返し、ステップＳ４０５にて、データ分割部２０２は、データのうち、その他の属性と識別子３だけからなる部分ログデータ（第２の部分データ）を作成する。
図３の例で言えば、「日時」、「医師ＩＤ」、「操作履歴」、「病名」だけからなる部分ログデータを作成する。 Next, in step S403, the data dividing unit 202 creates partial log data (second partial data) including only other attributes and quasi-identifier 1 among the data.
In the example of FIG. 3, partial log data including only “date and time”, “doctor ID”, “operation history”, and “age” is created.
This is repeated for each quasi-identifier, and in step S405, the data dividing unit 202 creates partial log data (second partial data) including only other attributes and identifier 3 in the data.
In the example of FIG. 3, partial log data including only “date and time”, “doctor ID”, “operation history”, and “disease name” is created.

次に、ステップＳ４０６にて、識別子暗号化部２０３が、識別子を暗号化する。
つまり、ステップＳ４０２で生成された部分ログデータ中の識別子の値を暗号化する。
この際の暗号化鍵はデータ管理対象システム１００内で管理しておくものとする。 Next, in step S406, the identifier encryption unit 203 encrypts the identifier.
That is, the identifier value in the partial log data generated in step S402 is encrypted.
It is assumed that the encryption key at this time is managed in the data management target system 100.

最後に、ステップＳ４０７にて、出力部２０６が、各部分ログデータを各サブデータベースに送信する。
つまり、出力部２０６は、各部分データを、相互に異なるサブデータベースに格納する。 Finally, in step S407, the output unit 206 transmits each partial log data to each sub database.
That is, the output unit 206 stores each partial data in different sub-databases.

このように、データ分散サーバ装置１０４において、識別子・準識別子判断部２０１が、ログデータ３０１に含まれる属性のうち、秘匿化すべき属性を識別子として選択し、また、相互に分離すべき２つ以上の属性を準識別子として選択する。
データ分割部２０２は、その他の属性と識別子とを組み合わせた部分データ（第１の部分データ）を生成する。
また、データ分割部２０２は、準識別子ごとに、その他の属性と準識別子とを組み合わせた部分データ（第２の部分データ）を生成する。
更に、識別子暗号化部２０３が、識別子の値を暗号化する。
そして、このようにして生成された２つ以上の部分データを、出力部２０６が、それぞれ異なるサブデータベースに格納する。 As described above, in the data distribution server device 104, the identifier / quasi-identifier determination unit 201 selects an attribute to be concealed among the attributes included in the log data 301 as an identifier, and two or more to be separated from each other. Attribute as a quasi-identifier.
The data dividing unit 202 generates partial data (first partial data) in which other attributes and identifiers are combined.
In addition, the data dividing unit 202 generates partial data (second partial data) that combines other attributes and the quasi-identifier for each quasi-identifier.
Further, the identifier encryption unit 203 encrypts the identifier value.
Then, the output unit 206 stores the two or more partial data generated in this way in different sub-databases.

なお、図４では識別子が１つだけの場合を示したが、識別子が複数ある場合には、ステップＳ４０２を識別子の数だけ繰り返して識別子の数だけの部分データを作成する構成としてもよい。
また、複数の識別子全体をひとつの識別子と捉えて、図４のように１度だけ行い１つの大きな部分データを作成する構成としてもよい。
前者の構成とすれば、識別子の種類ごとに暗号化鍵を変えることで、後述のようにデータ監視サーバ装置１２１上で医師の不審な行動を発見しさらなる追跡を行う際、データ監視システム管理者に開示する情報を細かく制御できるという利点がある。
また、後者の構成とすれば、サブデータベースの数を減らせるという利点がある。 Although FIG. 4 shows a case where there is only one identifier, when there are a plurality of identifiers, step S402 may be repeated for the number of identifiers to create partial data for the number of identifiers.
Moreover, it is good also as a structure which regards the whole several identifier as one identifier, performs once only like FIG. 4, and produces one big partial data.
If the former configuration is adopted, the data monitoring system administrator can change the encryption key for each type of identifier to detect a doctor's suspicious behavior on the data monitoring server device 121 and perform further tracking as described later. There is an advantage that the information disclosed in the above can be finely controlled.
Further, the latter configuration has an advantage that the number of sub-databases can be reduced.

図５は、図４の処理によって作成される部分データの一例を示した図である。 FIG. 5 is a diagram showing an example of partial data created by the process of FIG.

図５において、「患者氏名」は識別子のため、値が暗号化されている。
また、準識別子の情報は取り除かれている。
そして、この部分ログデータ５０１は図４の処理によって例えばサブデータベース１へ格納される。 In FIG. 5, the “patient name” is an identifier, and thus the value is encrypted.
Also, the quasi-identifier information has been removed.
The partial log data 501 is stored in, for example, the sub-database 1 by the process of FIG.

図６は、同じく図４の処理によって作成される部分データの一例を示した図である。 FIG. 6 is a diagram showing an example of partial data similarly created by the process of FIG.

図６において、各部分データでは、その他の属性以外に１つの準識別子のみが現れている。
そして、図４の処理によって、例えば部分ログデータ６０１はサブデータベース２へ、部分ログデータ６０２はサブデータベース３へ、部分ログデータ６０３はサブデータベース４へ、それぞれ格納される。 In FIG. 6, in each partial data, only one quasi-identifier appears in addition to the other attributes.
4, for example, the partial log data 601 is stored in the sub database 2, the partial log data 602 is stored in the sub database 3, and the partial log data 603 is stored in the sub database 4.

次に、図７を用いて、部分分析装置１１４、１１５、１１６における部分分析の方法の一例を説明する。
なお、以下では、部分分析装置１１４の動作として説明するが、部分分析装置１１５、１１６の動作も同様である。 Next, an example of a partial analysis method in the partial analysis apparatuses 114, 115, and 116 will be described with reference to FIG.
Hereinafter, the operation of the partial analyzer 114 will be described, but the operations of the partial analyzers 115 and 116 are also the same.

図７は、部分分析装置１１４における部分分析の手順を示したフローチャートである。 FIG. 7 is a flowchart showing a partial analysis procedure in the partial analysis apparatus 114.

図７にて、部分分析装置１１４は、はじめにステップＳ７０１にて識別子、準識別子、その他の属性を指定する。
これはデータ分散サーバ装置１０４上でのデータ分散時にすでに指定されたものであるので、あらかじめデータ分散サーバ装置１０４から受け取った識別子、準識別子、その他の属性の情報を利用する。
次に、ステップＳ７０２にて、部分分析装置１１４は、準識別子が含まれているかを判断する。
もし準識別子が含まれている場合には、ステップＳ７０３にて、その他の属性と準識別子の組合せから統計情報を抽出する。
もし準識別子が含まれていない場合には、部分分析装置１１４は、ステップＳ７０４にて、その他の属性から統計情報を抽出する。
そして、部分分析装置１１４は、ステップＳ７０５にて、抽出した統計情報を分析結果集計サーバ装置１１７に送信し終了する。 In FIG. 7, the partial analysis device 114 first specifies an identifier, a quasi-identifier, and other attributes in step S701.
Since this is already specified at the time of data distribution on the data distribution server device 104, an identifier, a quasi-identifier, and other attribute information received from the data distribution server device 104 in advance are used.
Next, in step S702, the partial analyzer 114 determines whether a quasi-identifier is included.
If a quasi-identifier is included, statistical information is extracted from the combination of other attributes and quasi-identifiers in step S703.
If the quasi-identifier is not included, the partial analyzer 114 extracts statistical information from other attributes in step S704.
In step S705, the partial analysis device 114 transmits the extracted statistical information to the analysis result totaling server device 117, and the process ends.

図７の手順の中で、ステップＳ７０３やステップＳ７０４における統計情報の抽出方法は、データの内容やデータ監視システム１２０の管理者の要望によって異なってくる。
図８は、この統計情報の抽出結果の一例を示した図である。 In the procedure of FIG. 7, the statistical information extraction method in step S703 and step S704 differs depending on the data contents and the request of the administrator of the data monitoring system 120.
FIG. 8 is a diagram showing an example of the extraction result of the statistical information.

図８において、サブシステム１の部分分析結果８０１では、医師ＩＤと操作履歴から、１２／０３／２６午後における医師ごとの操作数と操作種類が統計情報として得られている。
これは、図５の部分ログデータにおいて、「日時」属性によって必要な行を取り出し、その上で「医師ＩＤ」と「操作履歴」から統計を取ることで得られる。
サブシステム２の部分分析結果８０２では、１２／０３／２６午後における入院患者数が統計情報として得られている。
これは、図６の部分ログデータ６０１において、「日時」属性によって必要な行を取り出し、さらに「操作履歴」から「カルテ作成」を絞り込み、その上で「年齢」から入院患者の年代を抽出することで得られる。
サブシステム３の部分分析結果８０３では、各病棟とその病棟にいる患者のカルテの閲覧回数が統計情報として得られている。
これは、図６の部分ログデータ６０２において、「日時」属性によって必要な行を取り出し、さらに「病棟」属性を見ることによって得られる。
サブシステム４の部分分析結果８０４では、医師ＩＤと担当患者の疾患種が統計情報として得られている。
これは、図６の部分ログデータ６０３において、「日時」属性によって必要な行を取り出し、その上で「医師ＩＤ」と「病名」から統計を取ることで得られる。 In FIG. 8, in the partial analysis result 801 of the subsystem 1, the number of operations and operation types for each doctor in 12/03/26 pm are obtained as statistical information from the doctor ID and the operation history.
This is obtained by extracting necessary lines from the “date and time” attribute in the partial log data of FIG. 5 and then obtaining statistics from “doctor ID” and “operation history”.
In the partial analysis result 802 of the subsystem 2, the number of hospitalized patients on 12/03/26 pm is obtained as statistical information.
This is because, in the partial log data 601 in FIG. 6, necessary lines are extracted based on the “date and time” attribute, “chart creation” is further narrowed down from “operation history”, and the inpatient age is extracted from “age”. Can be obtained.
In the partial analysis result 803 of the subsystem 3, the number of times the medical charts of each ward and the patients in the ward are viewed are obtained as statistical information.
This is obtained by extracting a necessary line by the “date and time” attribute in the partial log data 602 of FIG. 6 and viewing the “ward” attribute.
In the partial analysis result 804 of the subsystem 4, the doctor ID and the disease type of the patient in charge are obtained as statistical information.
This is obtained by extracting necessary lines from the “date and time” attribute in the partial log data 603 of FIG. 6 and then obtaining statistics from the “doctor ID” and “disease name”.

図７の手順で得られた図８のような部分分析結果は、分析結果集計サーバ装置１１７にて集計され、データ監視サーバ装置１２１へ送られる。 The partial analysis results as shown in FIG. 8 obtained by the procedure of FIG. 7 are aggregated by the analysis result aggregation server device 117 and sent to the data monitoring server device 121.

データ監視サーバ装置１２１では、受け取った部分分析結果から、次のようなことがわかる。
１２／０３／２６午後には、４人の医師がカルテの閲覧や作成を行っている。
入院患者は７０代の１名だけである。 In the data monitoring server device 121, the following can be understood from the received partial analysis result.
On 12/03/26 pm, four doctors browse and create medical records.
There is only one hospitalized patient in his 70s.

さらに、データ監視サーバ装置１２１では、受け取った部分分析結果から、例えば次のようなこともわかる。
この病院では、通常深夜に電子カルテを閲覧するような医療行為は発生しないものとする。
その場合、部分分析結果１から、１２／０３／２６午後に医師「ＦＧ８２０９６」が不正な閲覧を２回行っていたことが分かる。
また、この医師「ＦＧ８２０９６」は心疾患の患者を担当したことになっているが、もしこの医師「ＦＧ８２０９６」が本来は精神科の医師であった場合には、ますますこの履歴が本来あるべき医療行為でないことが明らかになる。 Further, in the data monitoring server device 121, for example, the following can be understood from the received partial analysis result.
In this hospital, it is assumed that a medical practice such as browsing an electronic medical record usually at midnight does not occur.
In this case, it can be seen from the partial analysis result 1 that the doctor “FG82096” performed unauthorized browsing twice on 12/03/26 pm.
In addition, this doctor “FG82096” is supposed to be in charge of patients with heart disease, but if this doctor “FG82096” was originally a psychiatric doctor, this history should be more and more original It becomes clear that it is not medical practice.

このように、不正な挙動が発見された場合、データ監視システム１２０の管理者は、各サブデータベースから当該日時のログデータを収集することによってさらに調査することもできる。
また、データ管理対象システム１００内に保管されている鍵を受け取って識別子のデータを復号することにより、さらに患者氏名を調べることもできる。 As described above, when an illegal behavior is found, the administrator of the data monitoring system 120 can further investigate by collecting log data of the date and time from each sub-database.
Further, by receiving a key stored in the data management target system 100 and decrypting the data of the identifier, it is possible to further check the patient name.

以上、本実施の形態に係るシステムを説明した。
以上のような構成とすれば、図３のような大きなデータを図５や図６のような小さな部分データに分割して複数のサブデータベースに格納するため、大量のデータを保管することができるという効果がある。 The system according to the present embodiment has been described above.
With the above configuration, a large amount of data can be stored because large data as shown in FIG. 3 is divided into small partial data as shown in FIGS. 5 and 6 and stored in a plurality of sub-databases. There is an effect.

また、各サブデータベースでは、小さな部分データに関してのみ分析して統計情報を得ればよいので、データ分析の際の処理の負荷を低減できるという効果がある。 In addition, each sub-database only needs to analyze only small partial data to obtain statistical information, so that it is possible to reduce the processing load during data analysis.

また、分析結果集計サーバ装置は部分データそのものではなく統計情報を集計する。
そのため、データ分散保管分析システム内の通信量は小さい。
したがって、データ集計の際の通信量を低減できるという効果がある。 Moreover, the analysis result totaling server apparatus totals statistical information, not the partial data itself.
For this reason, the amount of communication in the data distributed storage analysis system is small.
Therefore, there is an effect that the amount of communication at the time of data aggregation can be reduced.

また、部分データ内の準識別子の値とその他の属性の値とを組み合わせたきめ細かなデータ分析を行うことができる。 Further, fine data analysis can be performed by combining the quasi-identifier value in the partial data with the values of other attributes.

また、各サブデータベースでは、識別子は暗号化されており、また準識別子は分割されているため、部分データだけでは個人を一意に特定できないことから、個人情報を保護できるという効果がある。 Further, in each sub-database, the identifier is encrypted, and the quasi-identifier is divided, so that it is not possible to uniquely identify an individual with only partial data, so that personal information can be protected.

以上、本発明について実施の形態を説明した。
しかし、本発明はこの実施の形態のみに限定されるものではなく、当業者であれば様々な応用が可能であることは明らかである。
例えば、サブシステムの下にさらにサブシステムを設けるような構成としてもよいし、準識別子が個人の特定につながらない範囲で複数のサブシステムを統合する構成としてもよい。
また、データ管理対象システム内でファイルサーバ装置と入退室管理サーバ装置の両方のデータを収集してから分散する構成としたが、これを各サーバ装置で独自にデータ分散保管分析システムに保管する構成としてもよい。 The embodiments of the present invention have been described above.
However, the present invention is not limited to this embodiment, and it will be apparent to those skilled in the art that various applications are possible.
For example, a configuration may be adopted in which a subsystem is further provided below the subsystem, or a configuration in which a plurality of subsystems are integrated within a range where the quasi-identifier does not lead to identification of an individual.
In addition, the data management target system collects data from both the file server device and the entry / exit management server device and then distributes the data. However, each server device independently stores the data in the data distribution storage analysis system. It is good.

また、本実施の形態の説明では、説明を容易にするために、病院におけるログ管理システムへ適用した場合を想定して説明した。
しかし、工場におけるログ管理システム、ビルにおけるログ管理システムなど応用範囲は極めて広い。
また、ログ管理システムだけにかぎらず、一般のデータ管理システムに応用できることは当業者であれば自明である。 In the description of the present embodiment, the case where the present invention is applied to a log management system in a hospital has been described for ease of explanation.
However, the range of applications such as log management systems in factories and log management systems in buildings is extremely wide.
It is obvious to those skilled in the art that the present invention can be applied not only to the log management system but also to a general data management system.

最後に、本実施の形態に示したデータ分散サーバ装置１０４のハードウェア構成例について説明する。
図９は、本実施の形態に示すデータ分散サーバ装置１０４のハードウェア資源の一例を示す図である。
なお、図９の構成は、あくまでもデータ分散サーバ装置１０４のハードウェア構成の一例を示すものであり、データ分散サーバ装置１０４のハードウェア構成は図９に記載の構成に限らず、他の構成であってもよい。 Finally, a hardware configuration example of the data distribution server device 104 shown in the present embodiment will be described.
FIG. 9 is a diagram showing an example of hardware resources of the data distribution server device 104 shown in the present embodiment.
The configuration in FIG. 9 is merely an example of the hardware configuration of the data distribution server device 104, and the hardware configuration of the data distribution server device 104 is not limited to the configuration described in FIG. There may be.

図９において、データ分散サーバ装置１０４は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。
ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。
更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７と接続していてもよい。また、磁気ディスク装置９２０の代わりに、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク装置、メモリカード（登録商標）読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置の一例である。
本実施の形態で説明した「データ記憶部２０４」は、ＲＡＭ９１４、磁気ディスク装置９２０等により実現される。
通信ボード９１５、キーボード９０２、マウス９０３、スキャナ装置９０７などは、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力装置の一例である。 In FIG. 9, the data distribution server device 104 includes a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, a processing unit, a microprocessor, a microcomputer, and a processor) that executes a program.
The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices.
Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907. Further, instead of the magnetic disk device 920, a storage device such as an SSD (Solid State Drive), an optical disk device, or a memory card (registered trademark) read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of the storage device.
The “data storage unit 204” described in the present embodiment is realized by the RAM 914, the magnetic disk device 920, and the like.
A communication board 915, a keyboard 902, a mouse 903, a scanner device 907, and the like are examples of input devices.
The communication board 915, the display device 901, the printer device 906, and the like are examples of output devices.

通信ボード９１５は、ネットワークに接続されている。
例えば、通信ボード９１５は、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ＷＡＮ（ワイドエリアネットワーク）、ＳＡＮ（ストレージエリアネットワーク）などに接続されている。 The communication board 915 is connected to the network.
For example, the communication board 915 is connected to a LAN (local area network), the Internet, a WAN (wide area network), a SAN (storage area network), and the like.

磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。
プログラム群９２３のプログラムは、ＣＰＵ９１１がオペレーティングシステム９２１、ウィンドウシステム９２２を利用しながら実行する。 The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924.
The programs in the program group 923 are executed by the CPU 911 using the operating system 921 and the window system 922.

また、ＲＡＭ９１４には、ＣＰＵ９１１に実行させるオペレーティングシステム９２１のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。
また、ＲＡＭ９１４には、ＣＰＵ９１１による処理に必要な各種データが格納される。 The RAM 914 temporarily stores at least part of the operating system 921 program and application programs to be executed by the CPU 911.
The RAM 914 stores various data necessary for processing by the CPU 911.

また、ＲＯＭ９１３には、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）プログラムが格納され、磁気ディスク装置９２０にはブートプログラムが格納されている。
データ分散サーバ装置１０４の起動時には、ＲＯＭ９１３のＢＩＯＳプログラム及び磁気ディスク装置９２０のブートプログラムが実行され、ＢＩＯＳプログラム及びブートプログラムによりオペレーティングシステム９２１が起動される。 The ROM 913 stores a BIOS (Basic Input Output System) program, and the magnetic disk device 920 stores a boot program.
When the data distribution server device 104 is activated, the BIOS program in the ROM 913 and the boot program in the magnetic disk device 920 are executed, and the operating system 921 is activated by the BIOS program and the boot program.

上記プログラム群９２３には、本実施の形態の説明において「〜部」（「データ記憶部２０４」以外、以下同様）として説明している機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。 The program group 923 stores programs for executing functions described as “˜unit” (except for “data storage unit 204” in the following) in the description of the present embodiment. The program is read and executed by the CPU 911.

ファイル群９２４には、本実施の形態の説明において、「〜の判断」、「〜の生成」、「〜の作成」、「〜の暗号化」、「〜の設定」、「〜の選択」、「〜の入力」、「〜の出力」等として説明している処理の結果を示す情報やデータや信号値や変数値が、ディスクやメモリなどの記憶媒体にファイルとして記憶されている。
また、暗号鍵・復号鍵や乱数値やパラメータが、ディスクやメモリなどの記憶媒体にファイルとして記憶されてもよい。
「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記憶媒体に記憶される。
ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出される。
そして、読み出された情報やデータや信号値や変数値やパラメータは、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。
抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、本実施の形態で説明しているフローチャートの矢印の部分は主としてデータや信号の入出力を示す。
データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ブルーレイ（登録商標）ディスク、ＤＶＤ等の記憶媒体に記録される。
また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 In the description of the present embodiment, the file group 924 includes “determination of”, “generation of”, “creation of”, “encryption of”, “setting of”, and “selection of”. , Information, data, signal values, and variable values indicating the results of the processing described as “input of”, “output of”, etc. are stored as files on a storage medium such as a disk or memory.
The encryption key / decryption key, random number value, and parameter may be stored as a file in a storage medium such as a disk or memory.
The “˜file” and “˜database” are stored in a storage medium such as a disk or a memory.
Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit.
The read information, data, signal value, variable value, and parameter are used for CPU operations such as extraction, search, reference, comparison, calculation, calculation, processing, editing, output, printing, and display.
Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
In addition, the arrows in the flowchart described in this embodiment mainly indicate input / output of data and signals.
Data and signal values are recorded on a storage medium such as a memory of the RAM 914, a flexible disk of the FDD 904, a compact disk of the CDD 905, a magnetic disk of the magnetic disk device 920, other optical disks, a Blu-ray (registered trademark) disk, and a DVD.
Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、本実施の形態の説明において「〜部」として説明しているものは、「〜回路」、「〜装置」、「〜機器」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。
すなわち、本実施の形態で説明したフローチャートに示すステップ、手順、処理により、本発明に係る「データ処理方法」を実現することができる。
また、「〜部」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。
或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。
ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ブルーレイ（登録商標）ディスク、ＤＶＤ等の記憶媒体に記憶される。
プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。
すなわち、プログラムは、本実施の形態の「〜部」としてコンピュータを機能させるものである。あるいは、本実施の形態の「〜部」の手順や方法をコンピュータに実行させるものである。 In addition, what is described as “˜unit” in the description of the present embodiment may be “˜circuit”, “˜device”, “˜device”, and “˜step”, “˜”. “Procedure” and “˜Process” may be used.
That is, the “data processing method” according to the present invention can be realized by the steps, procedures, and processes shown in the flowchart described in the present embodiment.
Further, what is described as “˜unit” may be realized by firmware stored in the ROM 913.
Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware.
Firmware and software are stored as programs in a storage medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a Blu-ray (registered trademark) disk, and a DVD.
The program is read by the CPU 911 and executed by the CPU 911.
In other words, the program causes the computer to function as “to part” of the present embodiment. Alternatively, the procedure or method of “˜unit” in the present embodiment is executed by a computer.

このように、本実施の形態に示すデータ分散サーバ装置１０４は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス、通信ボード等、出力装置たる表示装置、通信ボード等を備えるコンピュータである。
そして、上記したように「〜部」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the data distribution server device 104 shown in the present embodiment includes a CPU that is a processing device, a memory that is a storage device, a magnetic disk, a keyboard that is an input device, a mouse, a communication board, and a display device that is an output device, and a communication board. Etc. are computers provided with the above.
Then, as described above, the functions indicated as “˜units” are realized using these processing devices, storage devices, input devices, and output devices.

１００データ管理対象システム、１０１ファイルサーバ装置、１０２入退室管理サーバ装置、１０３データ収集サーバ装置、１０４データ分散サーバ装置、１１０データ分散保管分析システム、１１１サブＤＢ１、１１２サブＤＢ２、１１３サブＤＢｎ、１１４部分分析装置、１１５部分分析装置、１１６部分分析装置、１１７分析結果集計サーバ装置、１２０データ監視システム、１２１データ監視サーバ装置、２０１識別子・準識別子判断部、２０２データ分割部、２０３識別子暗号化部、２０４データ記憶部、２０５入力部、２０６出力部。 100 data management target system, 101 file server device, 102 entrance / exit management server device, 103 data collection server device, 104 data distribution server device, 110 data distributed storage analysis system, 111 sub DB1, 112 sub DB2, 113 sub DBn, 114 Partial analysis device, 115 Partial analysis device, 116 Partial analysis device, 117 Analysis result totaling server device, 120 Data monitoring system, 121 Data monitoring server device, 201 Identifier / quasi-identifier determination unit, 202 Data division unit, 203 Identifier encryption unit 204 Data storage unit 205 Input unit 206 Output unit

Claims

A data storage unit for storing data including three or more items each having item values described therein;
An item selection unit that selects two or more items to be separated from each other as items to be separated among the items included in the data;
For each item to be separated, partial data combining the item value of the item included in the data that is not the item to be separated and the item value of the item to be separated is generated, and partial data corresponding to the number of items to be separated is generated. A partial data generation unit to generate;
A data processing apparatus comprising: a partial data storage unit that stores two or more partial data generated by the partial data generation unit in different databases.

The data storage unit
An ID value item in which an ID (Identifier) value is described is included, and two or more characteristic value items in which a characteristic value representing the characteristic of the target object is described in the ID value of the ID value item. Remember the data contained,
The item selection unit includes:
The data processing apparatus according to claim 1, wherein each of the characteristic value items is selected as a separation target item.

The item selection unit includes:
An item whose item value should be concealed is selected as an item to be concealed from items included in the data, and two or more items from items other than the item to be concealed among items included in the data are selected. Select as the item to be separated,
The partial data generation unit
Generating partial data combining the item value of the normal item and the item value of the concealment target item that is neither the concealment target item or the separation target item among the items included in the data as the first partial data,
For each separation target item, partial data combining the item value of the normal item and the item value of the separation target item is generated as second partial data, and second partial data corresponding to the number of separation target items is generated. ,
The data processing device further includes:
A concealment unit for concealing the item value of the concealment target item;
The partial data storage unit
The first partial data after the item value of the item to be concealed is concealed and two or more second partial data are stored in different databases, respectively. The data processing apparatus described in 1.

The item selection unit includes:
Two or more items may be selected as items to be concealed,
The partial data generation unit
When two or more items are selected as a concealment target item by the item selection unit, first part data is generated for each concealment target item, and the first part corresponding to the number of concealment target items is generated. Generate data,
The concealment unit
When two or more items are selected as a concealment target item by the item selection unit, conceal the item value of each concealment target item,
The partial data storage unit
Two or more first partial data after the item value of each concealment target item is concealed and two or more second partial data are stored in different databases, respectively. The data processing apparatus according to claim 3.

The data storage unit
An ID value item in which an ID (Identifier) value is described is included, and two or more characteristic value items in which a characteristic value representing the characteristic of the target object is described in the ID value of the ID value item. Remember the data contained,
The item selection unit includes:
The data processing apparatus according to claim 3 or 4, wherein the ID value item is selected as a concealment target item, and each of the characteristic value items is selected as a separation target item.

6. A database system comprising a plurality of databases for distributing and storing the partial data according to claim 1.

The database system further includes:
A plurality of partial analysis devices arranged for each database and analyzing partial data stored in the target database;
The database system according to claim 6, further comprising an analysis result totaling server device that collects analysis results from the plurality of partial analysis devices and outputs the collected analysis results to a predetermined external device.

A computer that stores data including three or more items each having item values described therein,
An item selection step of selecting two or more items to be separated from each other as items to be separated among the items included in the data;
For each item to be separated, partial data combining the item value of the item included in the data that is not the item to be separated and the item value of the item to be separated is generated, and partial data corresponding to the number of items to be separated is generated. A partial data generation step to generate;
A partial data storage step of storing two or more partial data generated by the partial data generation unit in different databases, respectively.

In a computer that stores data including three or more items each having item values described therein,
An item selection step of selecting two or more items to be separated from each other as items to be separated among the items included in the data;
For each item to be separated, partial data combining the item value of the item included in the data that is not the item to be separated and the item value of the item to be separated is generated, and partial data corresponding to the number of items to be separated is generated. A partial data generation step to generate;
A program for executing a partial data storage step of storing two or more partial data generated by the partial data generation unit in different databases.