JP2013008175A

JP2013008175A - Conversion processing method, device and program and restoration processing method, device and program

Info

Publication number: JP2013008175A
Application number: JP2011140070A
Authority: JP
Inventors: Yoshinori Katayama; 佳則片山; Mebae Ushida; 芽生恵牛田; Hiroshi Tsuda; 宏津田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-06-24
Filing date: 2011-06-24
Publication date: 2013-01-10
Anticipated expiration: 2031-06-24
Also published as: JP5655718B2

Abstract

PROBLEM TO BE SOLVED: To prevent data leakage from being generated from the distribution of data.SOLUTION: A conversion processing method includes: a step (A) of, as for a plurality of records having attribute values about a plurality of attribute items stored in a first data storage section, detecting first distribution about the appearance frequency of the attribute value which appears about a specific attribute item among the plurality of attribute items; and a step (B) of converting each of at least partial attribute values among the attribute values into any of a plurality of second attribute values to be specified according to a predetermined first rule so that the first distribution can be turned into second distribution different from the first distribution.

Description

本技術は、データ秘匿化技術に関する。 The present technology relates to data concealment technology.

例えば、図１Ａ乃至図１Ｃに示すように、大学の工学部、薬学部及び商学部の各学生について、複数の属性項目（例えば、性別、学年の高低、体力点数及び疾患の有無（１＝あり、０＝なし））の各々の属性値が保持されている場合を検討する。さらに、このようなデータ全体（ここでは大学全体）について、特定の属性項目の属性値毎に、当該属性値が出現するレコード群を特定して、当該レコード群における、他の属性項目の属性値の出現回数を算出するクロス集計を行う場合を考える。 For example, as shown in FIG. 1A to FIG. 1C, for each student in the university's engineering department, pharmacy department, and commerce department, there are a plurality of attribute items (for example, gender, grade level, physical fitness score, and presence or absence of disease (1 = Yes, 0 = (None)) Consider a case where each attribute value is retained. Further, for such entire data (here, the entire university), for each attribute value of the specific attribute item, the record group in which the attribute value appears is specified, and the attribute value of the other attribute item in the record group Consider the case of performing cross tabulation to calculate the number of occurrences of.

例えば、図１Ａ乃至図１Ｃに示すようなデータをそのままクロス集計を実施する集計者に渡してしまうと個人が特定されてしまうので、プライバシ情報の漏洩を防止するため、簡単な方法として一部の属性項目について暗号化またはハッシュ関数による秘匿化を行う場合がある。例えば、性別という属性項目について暗号化を行う場合、例えばＭ（男性）をａｂｃに暗号化し、Ｆ（女性）をｄｅｆに暗号化した場合には、図２Ａ乃至図２Ｃのようなデータが得られる。このようなデータであれば、このデータのみからは個人が特定できるわけではない。なお、性別と疾患についてクロス集計を実施すれば、図３に示すような集計結果が得られる。 For example, if the data as shown in FIGS. 1A to 1C is passed to the tabulator who performs the cross tabulation as it is, an individual is specified. Therefore, in order to prevent leakage of privacy information, some simple methods are used. The attribute item may be encrypted or concealed using a hash function. For example, when encrypting the attribute item of gender, for example, when M (male) is encrypted to abc and F (female) is encrypted to def, data as shown in FIGS. 2A to 2C is obtained. . With such data, an individual cannot be identified from this data alone. In addition, if cross tabulation is implemented about sex and a disease, a tabulation result as shown in FIG. 3 will be obtained.

しかしながら、例えば工学部は女性が少ないという背景知識があったり、薬学部は男性が少ないという背景知識があったりすれば、ａｂｃ＝Ｍ（男性）、ｄｅｆ＝Ｆ（女性）であることが集計者側には分かってしまう。さらに、個人も特定できてしまい、その個人について疾患の有無というセンシティブな情報も特定されてしまう可能性もある。これは、値そのものが秘匿されても、秘匿前の値と秘匿後の値とが一対一対応であれば、値の分布も秘匿化前後で全く変化しないために生ずる問題である。 However, for example, if the engineering department has a background knowledge that there are few women, or the pharmacy department has a background knowledge that there are few men, it is on the tabulator side that abc = M (male) and def = F (female). Will understand. Furthermore, an individual can be specified, and sensitive information about the presence or absence of a disease may be specified for the individual. This is a problem that occurs even if the value itself is concealed, if the value before concealment and the value after concealment have a one-to-one correspondence, the distribution of values does not change at all before and after concealment.

なお、文字や数字をそれとの恣意的な対応が規定された数字や記号列に変換する手法では、同じ数字や記号列の暗号文中における出現の規則性により、容易に解読されてしまうという危険性について考慮した従来技術が存在している。この従来技術では、秘匿後の文字列として既にあらわれた文字列と同じ場合に、変換規則を一部改変することで、変換規則の判別を困難にするものである。しかしながら、秘匿化後のデータの分布や秘匿化した後のクロス集計などの集計処理については考慮されていない。 In addition, there is a risk that the method of converting letters and numbers into numbers and symbol strings with arbitrary correspondence with them will be easily deciphered due to the regularity of appearance of the same numbers and symbol strings in the ciphertext. There is a conventional technique that considers the above. In this conventional technique, when a character string that has already appeared as a character string after concealment is the same, the conversion rule is partially modified to make it difficult to determine the conversion rule. However, the distribution of the data after concealment and the summarization processing such as cross tabulation after concealment are not considered.

また、変換処理対象として指定された電子文書中の文字列を、他の文字列に変換する技術も存在している。具体的には、予め変換文字と、各変換文字を一意に識別する識別子とを定義した文字リストを保持しておき、変換指示の入力を受け付け、変換対象としての文字列に含まれる各文字を文字リストから検索して、当該文字リストから変換対象文字に付与された識別子に基づき、予め定義された変換規則に従って文字リスト中の他の文字に変換する。一方、復元指示の入力を受け付け、変換後の文字を、変換規則に対応した復元規則により、文字リストに含まれる変換対象文字に復元する。しかしながら、秘匿化後のデータの分布やクロス集計などの集計処理については考慮されていない。 There is also a technique for converting a character string in an electronic document designated as a conversion processing target into another character string. Specifically, a character list that defines conversion characters and identifiers that uniquely identify each conversion character is held in advance, an input of a conversion instruction is accepted, and each character included in the character string to be converted is A search is performed from the character list, and the character list is converted into another character in the character list according to a predefined conversion rule based on the identifier assigned to the character to be converted from the character list. On the other hand, the input of the restoration instruction is accepted, and the converted character is restored to the conversion target character included in the character list by the restoration rule corresponding to the conversion rule. However, aggregation processing such as distribution of data after concealment and cross tabulation is not considered.

さらに、ファイル中の機密情報部分は保護しつつ、それ以外の部分は容易に一般に開示できるようにすると共に、管理者が元のファイルを復元できるようにする技術も存在している。この技術では、置換対象文字に対する置換後文字を、ファイル内に存在しない文字の中から選択し、当該選択した置換後文字と置換対象文字との対応関係を示す置換マップを作成し、置換マップに基づいてファイル内に存在する変換対象文字列中の置換対象文字を置換後文字に置き換える。また、置換マップに基づいてファイル内に存在する置換後文字を置換対象文字に戻す。しかしながら、秘匿化後のデータの分布やクロス集計などの集計処理については考慮されていない。 Further, there is a technique for protecting the confidential information portion in the file while allowing other portions to be easily disclosed to the general public and allowing the administrator to restore the original file. In this technology, a replacement character for a replacement target character is selected from characters that do not exist in the file, a replacement map indicating the correspondence between the selected replacement character and the replacement target character is created, and the replacement map is created. Based on this, the replacement target character in the conversion target character string existing in the file is replaced with the replaced character. Further, the replaced character existing in the file is returned to the replacement target character based on the replacement map. However, aggregation processing such as distribution of data after concealment and cross tabulation is not considered.

また、暗号強度を強化又は保持する目的で、順序関係を保持したまま、確率暗号で１つの値を複数に散らすという技術も存在している。しかしながら、秘匿化後のデータの分布やクロス集計などの集計処理について考慮したものではない。 In addition, for the purpose of strengthening or maintaining the encryption strength, there is a technique in which one value is scattered into a plurality of values by the probability encryption while maintaining the order relation. However, it does not take into account the aggregation processing such as the distribution of data after concealment and cross tabulation.

特開２００２−３７４２４３号公報JP 2002-374243 特開２００７−１０２５４０号公報JP 2007-102540 A 特開２００７−１５６８６１号公報JP 2007-156861 A

"Order-Preserving Symmetric Encryption" EUROCRYPT '09 Proceedings of the 28th Annual International Conference on Advances in Cryptology: the Theory and Applications of Cryptographic Techniques Springer-Verlag Berlin, Heidelberg, 2009 ISBN: 978-3-642-01000-2"Order-Preserving Symmetric Encryption" EUROCRYPT '09 Proceedings of the 28th Annual International Conference on Advances in Cryptology: the Theory and Applications of Cryptographic Techniques Springer-Verlag Berlin, Heidelberg, 2009 ISBN: 978-3-642-01000-2

従って、本技術の目的は、一側面によれば、データの分布からデータ漏洩が生ずることを防止するための技術を提供することである。 Therefore, the objective of this technique is to provide the technique for preventing that a data leak arises from distribution of data according to one side surface.

本技術の第１の態様に係る変換処理方法は、（Ａ）第１データ格納部に格納されており且つ各々複数の属性項目について属性値を有する複数のレコードについて、複数の属性項目のうち特定の属性項目について出現する属性値の出現頻度についての第１の分布を検出するステップと、（Ｂ）第１の分布を当該第１の分布とは異なる第２の分布となるように、上記属性値のうち少なくとも一部の属性値の各々を、予め定められた第１のルールに従って特定される複数の第２の属性値のいずれかに変換するステップとを含む。 The conversion processing method according to the first aspect of the present technology includes: (A) identifying a plurality of attribute items among a plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items. Detecting the first distribution of the appearance frequency of the attribute value appearing for the attribute item, and (B) the attribute so that the first distribution is a second distribution different from the first distribution. Converting each of at least some of the attribute values into one of a plurality of second attribute values specified according to a predetermined first rule.

本技術の第２の態様に係る復元処理方法は、（Ａ）集計対象の複数のレコードにおける第１の属性項目について出現する第１属性値から、予め定められたルールに従って複数の第２属性値を生成するステップと、（Ｂ）第１の属性項目について出現した第３属性値の各々について第２の属性項目の属性値の計数値又は属性値の合計値を含む集計結果を格納する第１データ格納部において、複数の第２属性値の各々について、当該第２の属性値と一致する第３属性値が存在するか判断し、存在すれば当該第３属性値に対応付けられている第２の属性項目の属性値毎の計数値又は属性値の合計値を第１データ格納部から読み出し、当該第２の属性項目の属性値毎に累計し又は当該第２の属性項目の属性値を累計し、累計結果を前記第２の属性値に対応付けて第２データ格納部に格納するステップとを含む。 The restoration processing method according to the second aspect of the present technology includes (A) a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted. And (B) a first result storing a count result including a count value of attribute values of the second attribute item or a total value of attribute values for each of the third attribute values that appear for the first attribute item. In the data storage unit, for each of the plurality of second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is a second attribute value, the second attribute value is associated with the third attribute value. The count value for each attribute value of the two attribute items or the total value of the attribute values is read from the first data storage unit and accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is Accumulate the accumulated result to the second attribute value And storing in the second data storage unit with response.

データの分布からデータ漏洩が生ずることを防止できるようになる。 It is possible to prevent data leakage from the data distribution.

図１Ａは、従来技術の問題を説明するための図である。FIG. 1A is a diagram for explaining a problem of the related art. 図１Ｂは、従来技術の問題を説明するための図である。FIG. 1B is a diagram for explaining a problem of the conventional technology. 図１Ｃは、従来技術の問題を説明するための図である。FIG. 1C is a diagram for explaining a problem of the related art. 図２Ａは、従来技術の問題を説明するための図である。FIG. 2A is a diagram for explaining a problem of the related art. 図２Ｂは、従来技術の問題を説明するための図である。FIG. 2B is a diagram for explaining a problem of the related art. 図２Ｃは、従来技術の問題を説明するための図である。FIG. 2C is a diagram for explaining a problem of the related art. 図３は、クロス集計の一例を示す図である。FIG. 3 is a diagram illustrating an example of cross tabulation. 図４は、本実施の形態におけるシステム概要を示す図である。FIG. 4 is a diagram showing a system overview in the present embodiment. 図５は、情報処理装置の機能ブロック図である。FIG. 5 is a functional block diagram of the information processing apparatus. 図６は、本実施の形態における処理の処理フローを示す図である。FIG. 6 is a diagram showing a processing flow of processing in the present embodiment. 図７は、提供データの一例を示す図である。FIG. 7 is a diagram illustrating an example of provided data. 図８は、提供データから得られる分布の一例を示す図である。FIG. 8 is a diagram illustrating an example of a distribution obtained from provided data. 図９は、分布変換データ生成処理の処理フローを示す図である。FIG. 9 is a diagram illustrating a processing flow of distribution conversion data generation processing. 図１０Ａは、分布変換データの生成過程を示す図である。FIG. 10A is a diagram illustrating a process of generating distribution conversion data. 図１０Ｂは、分布変換データの生成過程を示す図である。FIG. 10B is a diagram illustrating a process of generating distribution conversion data. 図１０Ｃは、分布の変換を説明するための図である。FIG. 10C is a diagram for explaining distribution conversion. 図１１は、変換処理の処理フローを示す図である。FIG. 11 is a diagram illustrating a processing flow of conversion processing. 図１２は、変換処理後のデータの一例を示す図である。FIG. 12 is a diagram illustrating an example of data after conversion processing. 図１３Ａは、暗号化の演算を模式的に示す図である。FIG. 13A is a diagram schematically illustrating an encryption operation. 図１３Ｂは、暗号化された変換後属性値の一例を示す図である。FIG. 13B is a diagram illustrating an example of the encrypted attribute value after conversion. 図１４Ａは、図１Ａのデータの変換例を示す図である。FIG. 14A is a diagram illustrating a conversion example of the data in FIG. 1A. 図１４Ｂは、図１Ｂのデータの変換例を示す図である。FIG. 14B is a diagram illustrating a conversion example of the data in FIG. 1B. 図１４Ｃは、図１Ｃのデータの変換例を示す図である。FIG. 14C is a diagram illustrating an example of conversion of the data in FIG. 1C. 図１５Ａは、図１Ａのデータの暗号化データの例を示す図である。FIG. 15A is a diagram illustrating an example of encrypted data of the data in FIG. 1A. 図１５Ｂは、図１Ｂのデータの暗号化データの例を示す図である。FIG. 15B is a diagram illustrating an example of encrypted data of the data in FIG. 1B. 図１５Ｃは、図１Ｃのデータの暗号化データの例を示す図である。FIG. 15C is a diagram illustrating an example of encrypted data of the data in FIG. 1C. 図１６Ａは、図１５Ａ乃至図１５Ｃのクロス集計の一例を示す図である。FIG. 16A is a diagram illustrating an example of the cross tabulation of FIGS. 15A to 15C. 図１６Ｂは、図１５Ａ乃至図１５Ｃのクロス集計の一例を示す図である。FIG. 16B is a diagram illustrating an example of the cross tabulation of FIGS. 15A to 15C. 図１６Ｃは、図１５Ａ乃至図１５Ｃのクロス集計の一例を示す図である。FIG. 16C is a diagram illustrating an example of the cross tabulation of FIGS. 15A to 15C. 図１６Ｄは、図１５Ａ乃至図１５Ｃのクロス集計の一例を示す図である。FIG. 16D is a diagram illustrating an example of the cross tabulation of FIGS. 15A to 15C. 図１７は、分布データの他の例を示す図である。FIG. 17 is a diagram illustrating another example of distribution data. 図１８は、分布変換データの他の例を示す図である。FIG. 18 is a diagram illustrating another example of distribution conversion data. 図１９は、分布の変換を説明するための図である。FIG. 19 is a diagram for explaining distribution conversion. 図２０は、本実施の形態における処理の処理フローを示す図である。FIG. 20 is a diagram showing a processing flow of processing in the present embodiment. 図２１Ａは、属性値生成処理部の処理内容を示す図である。FIG. 21A is a diagram illustrating processing contents of the attribute value generation processing unit. 図２１Ｂは、属性値生成処理部の処理内容を示す図である。FIG. 21B is a diagram illustrating processing contents of the attribute value generation processing unit. 図２２は、本実施の形態における処理の処理フローを示す図である。FIG. 22 is a diagram showing a processing flow of processing in the present embodiment. 図２３は、復元集計結果の一例を示す図である。FIG. 23 is a diagram illustrating an example of the restoration count result. 図２４は、復元集計処理結果の他の例を示す図である。FIG. 24 is a diagram illustrating another example of the result of restoration tabulation processing. 図２５は、分布変換データの他の例を示す図である。FIG. 25 is a diagram illustrating another example of distribution conversion data. 図２６は、コンピュータの機能ブロック図である。FIG. 26 is a functional block diagram of a computer.

図４に、本実施の形態に係るシステム構成例を示す。インターネットなどのネットワーク１には、クロス集計などの集計処理を実施する集計サーバ３００と、集計サーバ３００に対してデータ提供を行ったり集計結果を利用する処理を実施する複数の情報処理装置（図４では情報処理装置Ａ乃至Ｃ）とが接続されている。集計サーバ３００は、情報処理装置から集計対象のデータを受信するデータ受信部３１０と、データ受信部３１０により受信された集計対象のデータを格納するデータ格納部３２０と、情報処理装置からの集計要求に応じてクロス集計などの集計処理を実施して要求元の情報処理装置に集計結果を返信する集計処理部３３０とを有する。集計サーバ３００の処理内容は従来と変わらないので、これ以上述べない。 FIG. 4 shows a system configuration example according to the present embodiment. In the network 1 such as the Internet, a totaling server 300 that performs a totaling process such as cross tabulation, and a plurality of information processing apparatuses that perform processing for providing data to the totaling server 300 and using the totaling result (FIG. 4). Are connected to the information processing apparatuses A to C). The aggregation server 300 includes a data receiving unit 310 that receives data to be aggregated from the information processing device, a data storage unit 320 that stores data to be aggregated received by the data receiving unit 310, and an aggregation request from the information processing device. And a totaling processing unit 330 that performs a totaling process such as cross tabulation in response to the request and returns a totaling result to the requesting information processing apparatus. Since the processing content of the totaling server 300 is not different from the conventional one, it will not be described further.

図５に、情報処理装置の機能ブロック図を示す。情報処理装置は、提供データ格納部１０１と、分布検出部１０２と、分布データ格納部１０３と、分布変換データ生成部１０４と、ルール格納部１０５と、分布変換データ格納部１０６と、変換処理部１０７と、変換済データ格納部１０８と、暗号化処理部１０９と、キー格納部１１０と、暗号化データ格納部１１１と、送信部１１２と、集計要求処理部１２１と、集計結果格納部１２２と、復元処理部１２３と、属性値生成部１２４と、復元集計結果格納部１２６とを有する。なお、情報処理装置は、各属性項目について出現する属性値のデータを格納する属性値格納部１２５を有する場合もある。 FIG. 5 shows a functional block diagram of the information processing apparatus. The information processing apparatus includes a provided data storage unit 101, a distribution detection unit 102, a distribution data storage unit 103, a distribution conversion data generation unit 104, a rule storage unit 105, a distribution conversion data storage unit 106, and a conversion processing unit. 107, converted data storage unit 108, encryption processing unit 109, key storage unit 110, encrypted data storage unit 111, transmission unit 112, aggregation request processing unit 121, and aggregation result storage unit 122 A restoration processing unit 123, an attribute value generation unit 124, and a restoration tabulation result storage unit 126. Note that the information processing apparatus may include an attribute value storage unit 125 that stores attribute value data that appears for each attribute item.

なお、情報処理装置には、他のネットワークを介して当該情報処理装置を管理する管理者等が操作する端末装置が接続されており、当該端末装置からの指示に従って情報処理装置は処理を行う。例えば、端末装置から送られてきた提供データを受信すると、提供データ格納部１０１に格納する。 Note that a terminal device operated by an administrator or the like who manages the information processing apparatus is connected to the information processing apparatus via another network, and the information processing apparatus performs processing according to an instruction from the terminal apparatus. For example, when the provision data sent from the terminal device is received, the provision data is stored in the provision data storage unit 101.

提供データ格納部１０１は、集計サーバ３００に送信すべきデータである提供データを格納する。分布検出部１０２は、提供データ格納部１０１に格納されている複数のレコードに対して、例えば管理者により指定された特定の属性項目について各属性値について出現頻度を計数する処理、属性値を合計する処理などを実施して、処理結果を分布データ格納部１０３に格納する。 The provided data storage unit 101 stores provided data that is data to be transmitted to the aggregation server 300. The distribution detection unit 102, for a plurality of records stored in the provided data storage unit 101, for example, counts the appearance frequency for each attribute value for a specific attribute item specified by the administrator, and sums the attribute values The processing result is stored in the distribution data storage unit 103.

分布変換データ生成部１０４は、ルール格納部１０５に格納されているルールデータに従って、分布データ格納部１０３に格納されている分布データを処理することで、提供データに含まれる複数のレコードにおける特定の属性項目の属性値を変換するためのデータを生成し、分布変換データ格納部１０６に格納する。変換処理部１０７は、分布変換データ格納部１０６に格納されているデータに従って、提供データ格納部１０１に格納されている複数のレコードにおける特定の属性項目についての属性値を変換して、変換後のデータを変換済データ格納部１０８に格納する。 The distribution conversion data generation unit 104 processes the distribution data stored in the distribution data storage unit 103 in accordance with the rule data stored in the rule storage unit 105, thereby specifying specific records in a plurality of records included in the provided data. Data for converting the attribute value of the attribute item is generated and stored in the distribution conversion data storage unit 106. The conversion processing unit 107 converts attribute values for specific attribute items in a plurality of records stored in the provided data storage unit 101 in accordance with the data stored in the distribution conversion data storage unit 106, and converts the converted values. Data is stored in the converted data storage unit 108.

暗号化処理部１０９は、キー格納部１１０に格納されているキーと所定のハッシュ関数に従って、変換済データ格納部１０８に格納されている複数のレコードにおける特定の属性項目についての属性値を暗号化する処理を実施し、処理結果を暗号化データ格納部１１１に格納する。送信部１１２は、暗号化データ格納部１１１に格納されているデータを、集計サーバ３００に送信する。 The encryption processing unit 109 encrypts attribute values for specific attribute items in a plurality of records stored in the converted data storage unit 108 in accordance with the key stored in the key storage unit 110 and a predetermined hash function. The processing result is executed, and the processing result is stored in the encrypted data storage unit 111. The transmission unit 112 transmits the data stored in the encrypted data storage unit 111 to the aggregation server 300.

集計要求処理部１２１は、情報処理装置のユーザからの指示に応じて集計要求を集計サーバ３００に送信し、集計サーバ３００から集計結果を受信し、集計結果格納部１２２に格納する。復元処理部１２３は、属性値生成部１２４と連携して集計結果格納部１２２に格納されている集計結果のうち暗号化されており且つ属性値の変換が行われている属性項目の属性値を復元する処理を実施し、処理結果を復元集計結果格納部１２６に格納する。属性値生成部１２４は、分布データ格納部１０３（又は属性値格納部１２５）に格納されている特定の属性項目の属性値をルール格納部１０５に格納されているルールデータに従って変換すると共に、キー格納部１１０に格納されているキーと所定のハッシュ関数で暗号化することによって暗号化した変換済属性値を生成して、復元処理部１２３に出力する。 The aggregation request processing unit 121 transmits an aggregation request to the aggregation server 300 in response to an instruction from the user of the information processing apparatus, receives the aggregation result from the aggregation server 300, and stores it in the aggregation result storage unit 122. The restoration processing unit 123 cooperates with the attribute value generation unit 124 to extract the attribute value of the attribute item that is encrypted and the attribute value is converted from the aggregation result stored in the aggregation result storage unit 122. The restoration process is performed, and the processing result is stored in the restoration tabulation result storage unit 126. The attribute value generation unit 124 converts the attribute value of a specific attribute item stored in the distribution data storage unit 103 (or attribute value storage unit 125) according to the rule data stored in the rule storage unit 105, and The encrypted attribute value encrypted by encrypting with the key stored in the storage unit 110 and a predetermined hash function is generated and output to the restoration processing unit 123.

図５の例では、情報処理装置は、提供データを集計サーバ３００に送信するための処理を行う部分と、集計サーバ３００から集計結果を受信して復元するための処理を行う部分とを含む。しかし、それらが別の装置に設けられるようにしても良い。キー格納部１１０に格納されているキーとハッシュ関数とについては、集計結果を共用する者又は会社で共有される。また、属性値格納部１２５については、集計結果を共用する者又は会社で、特定の属性項目について出現する可能性のある属性値のバリエーションを共通して格納するものである。これに用いることで自らの提供データには含まれない属性値が集計結果に出現しても対処することができるようになる。同様に、ルール格納部１０５についても、集計結果を共用する者又は会社で、共通のルールデータが格納されている。 In the example of FIG. 5, the information processing apparatus includes a portion that performs processing for transmitting provided data to the aggregation server 300 and a portion that performs processing for receiving and restoring the aggregation result from the aggregation server 300. However, they may be provided in another device. The key and the hash function stored in the key storage unit 110 are shared by the person or company sharing the aggregation result. In addition, the attribute value storage unit 125 stores in common variations of attribute values that may appear for a specific attribute item by a person or company sharing the aggregation result. By using this, it becomes possible to cope with an attribute value that is not included in the data provided by the user, even if it appears in the aggregation result. Similarly, in the rule storage unit 105, common rule data is stored by the person or company sharing the aggregation result.

次に、図６乃至図２５を用いて、情報処理装置の処理内容について説明する。なお、既に提供データについては提供データ格納部１０１に格納されているものとする。まず、分布検出部１０２は、提供データ格納部１０１に格納されている提供データである複数のレコードにおいて、処理の対象である特定の属性項目の属性値毎に出現頻度を計数し、出現頻度の平均値（場合によっては最小値など）を算出し、分布データ格納部１０３に格納する（図６：ステップＳ１）。 Next, processing contents of the information processing apparatus will be described with reference to FIGS. It is assumed that the provided data is already stored in the provided data storage unit 101. First, the distribution detection unit 102 counts the appearance frequency for each attribute value of a specific attribute item to be processed in a plurality of records that are provided data stored in the provided data storage unit 101, An average value (such as a minimum value in some cases) is calculated and stored in the distribution data storage unit 103 (FIG. 6: step S1).

例えば図７のような提供データが提供データ格納部１０１に格納されている場合には、特定の属性項目が「性別」であれば図８に示すようなデータが得られる。図８の例では、属性値「Ｍ」の出現頻度と、属性値「Ｆ」の出現頻度と、それらの平均値が含まれる。平均値は、出現頻度の偏りを判断するための基準値として用いられる。なお、最小値を、基準値として用いる場合もある。 For example, when provision data as shown in FIG. 7 is stored in the provision data storage unit 101, if the specific attribute item is “gender”, data as shown in FIG. 8 is obtained. In the example of FIG. 8, the appearance frequency of the attribute value “M”, the appearance frequency of the attribute value “F”, and the average value thereof are included. The average value is used as a reference value for determining the bias in appearance frequency. Note that the minimum value may be used as a reference value.

次に、分布変換データ生成部１０４は、分布変換データ生成処理を実施する（ステップＳ３）。分布変換データ生成処理については、図９乃至図１０Ｃを用いて説明する。 Next, the distribution conversion data generation unit 104 performs distribution conversion data generation processing (step S3). The distribution conversion data generation processing will be described with reference to FIGS. 9 to 10C.

分布変換データ生成部１０４は、分布データ格納部１０３に格納されている分布データにおいて未処理の属性値を１つ特定する（図９：ステップＳ１１）。そして、分布変換データ生成部１０４は、特定された属性値の出現頻度が基準値未満であるか判断する（ステップＳ１３）。例えば平均値を基準値として用いる場合について説明すると、図８の場合、属性値「Ｍ」の場合には、平均値以上の出現頻度が得られており、属性値「Ｆ」の場合には、平均値未満の出現頻度が得られている。 The distribution conversion data generation unit 104 identifies one unprocessed attribute value in the distribution data stored in the distribution data storage unit 103 (FIG. 9: Step S11). Then, the distribution conversion data generation unit 104 determines whether the appearance frequency of the identified attribute value is less than the reference value (step S13). For example, in the case of using the average value as the reference value, in the case of FIG. 8, in the case of the attribute value “M”, an appearance frequency equal to or higher than the average value is obtained, and in the case of the attribute value “F”, Appearance frequency less than the average value is obtained.

特定された属性値の出現頻度が基準値未満であれば（ステップＳ１３：Ｙｅｓルート）、分布変換データ生成部１０４は、ルール格納部１０５に格納されているルールデータに従って変換後属性値を１つ生成し、当該変換後属性値及び変換確率「１００％」を含むレコードを生成して、分布変換データ格納部１０６に格納する（ステップＳ１５）。そしてステップＳ２３に移行する。 If the appearance frequency of the identified attribute value is less than the reference value (step S13: Yes route), the distribution conversion data generation unit 104 sets one converted attribute value according to the rule data stored in the rule storage unit 105. A record including the converted attribute value and the conversion probability “100%” is generated and stored in the distribution conversion data storage unit 106 (step S15). Then, control goes to a step S23.

ルール格納部１０５に格納されているルールデータは、例えば変換前の属性値にシリアルに番号を付加することで変換後属性値を生成するというルールを表すデータであるとすると、属性値「Ｆ」の場合、変換後属性値は「Ｆ１」となる。なお、ルールデータが、例えば変換前の属性値に、所定範囲内の整数値をランダムに選択するようなルールを表すデータである場合もある。このようなルールは一例であって、規則的に変換できればどのようなルールであってもよい。 If the rule data stored in the rule storage unit 105 is data representing a rule for generating a converted attribute value by serially adding a number to the attribute value before conversion, for example, the attribute value “F”. In the case of, the converted attribute value is “F1”. Note that the rule data may be data representing a rule that randomly selects an integer value within a predetermined range as an attribute value before conversion, for example. Such a rule is an example, and any rule may be used as long as it can be converted regularly.

図８の例では、図１０Ａに示すように変換前の属性値「Ｆ」に対応付けて変換後の属性値「Ｆ１」及び確率「１００」が格納される。 In the example of FIG. 8, as shown in FIG. 10A, the attribute value “F1” after conversion and the probability “100” are stored in association with the attribute value “F” before conversion.

一方、特定された属性値の出現頻度が基準値以上であれば（ステップＳ１３：Ｎｏルート）、分布変換データ生成部１０４は、展開数（分割数とも言う）を算出する（ステップＳ１７）。例えば、特定された属性値の出現頻度を、基準値を下回るまで「２」で割り算を行い、展開数＝２^N（Ｎは割り算の回数）というように算出するようにしても良い。また、特定された属性値の出現頻度を、基準値を下回るまで基準値で引き算してゆき、引き算の回数を展開数とするようにしてもよい。さらに、特定された属性値の出現頻度を基準値で割り算した場合の解（余りがある場合には、＋１）を、展開数とするようにしてもよい。 On the other hand, if the appearance frequency of the identified attribute value is equal to or higher than the reference value (step S13: No route), the distribution conversion data generation unit 104 calculates the number of expansions (also referred to as the number of divisions) (step S17). For example, the frequency of occurrence of the specified attribute value may be divided by “2” until it falls below the reference value, and the number of expansions = 2 ^N (N is the number of divisions) may be calculated. Further, the appearance frequency of the identified attribute value may be subtracted by the reference value until it falls below the reference value, and the number of subtractions may be set as the number of expansions. Furthermore, the solution (+1 if there is a remainder) when the appearance frequency of the specified attribute value is divided by the reference value may be set as the number of expansions.

図８の例で、変換前属性値「Ｍ」については、出現頻度「１０」及び平均値「６．５」が得られているので、「２」で割り算する方法でも、基準値で引き算を行う方法でも、基準値で割り算する方法でも、展開数は「２」と算出される。 In the example of FIG. 8, the appearance frequency “10” and the average value “6.5” are obtained for the pre-conversion attribute value “M”. Therefore, even with the method of dividing by “2”, subtraction is performed with the reference value. The number of expansions is calculated to be “2” regardless of the method used or the method of dividing by the reference value.

そして、分布変換データ生成部１０４は、ルール格納部１０５に格納されているルールデータに従って算出された展開数に応じた個数の変換後属性値を生成し、当該変換後属性値を含むレコードを展開数だけ生成し、分布変換データ格納部１０６に格納する（ステップＳ１９）。図８の例では、展開数が「２」であるので、変換後属性値「Ｍ１」「Ｍ２」が生成され、それらについてレコードが生成され、分布変換データ格納部１０６に格納する。さらに、分布変換データ生成部１０４は、生成したレコードに、変換確率を設定する（ステップＳ２１）。変換確率は、「１００」を展開数で除することで均一な値を設定するようにしても良い。図１０Ｂに示すように、展開数が「２」であれば「５０」が設定される。なお、変換確率は全て加算すると１００になればよく、５１と４９といったように揺らぎを持たせるようにしても良い。不自然に均一な分布にならないようにするためである。 The distribution conversion data generation unit 104 generates a number of converted attribute values corresponding to the number of expansions calculated according to the rule data stored in the rule storage unit 105, and expands the record including the converted attribute values. The number is generated and stored in the distribution conversion data storage unit 106 (step S19). In the example of FIG. 8, since the number of expansions is “2”, post-conversion attribute values “M1” and “M2” are generated, and records are generated for them and stored in the distribution conversion data storage unit 106. Furthermore, the distribution conversion data generation unit 104 sets a conversion probability in the generated record (step S21). The conversion probability may be set to a uniform value by dividing “100” by the number of expansions. As shown in FIG. 10B, if the number of expansions is “2”, “50” is set. Note that the conversion probabilities need only add to 100, and may have fluctuations such as 51 and 49. This is to prevent an unnatural distribution.

このように変換確率は「５０」％であれば、属性値「Ｍ１」は出現頻度が「５」となり、属性値「Ｍ２」は出現頻度が「５」となり、属性値「Ｆ」は出現頻度が「３」である。そうすると、図１０Ｃに示すように、属性値「Ｍ」に偏った分布が、是正されていることが分かる。すなわち、図１０Ｃの左側に示すように、変換前で属性値「Ｍ」の出現頻度が突出しているが、図１０Ｃの右側に示すように、変換後では、全ての属性値の出現頻度が、平均値を下回っており、出現頻度の差が少なくなっている。 Thus, when the conversion probability is “50”%, the attribute value “M1” has the appearance frequency “5”, the attribute value “M2” has the appearance frequency “5”, and the attribute value “F” has the appearance frequency. Is “3”. Then, as shown in FIG. 10C, it can be seen that the distribution biased toward the attribute value “M” is corrected. That is, as shown on the left side of FIG. 10C, the appearance frequency of the attribute value “M” is prominent before conversion, but as shown on the right side of FIG. 10C, the appearance frequency of all the attribute values is The average value is below, and the difference in appearance frequency is small.

その後、分布変換データ生成部１０４は、未処理の属性値が存在しているか判断する（ステップＳ２３）。未処理の属性値が存在している場合にはステップＳ１１に戻る。一方、未処理の属性値が存在していない場合には呼び出し元の処理に戻る。 Thereafter, the distribution conversion data generation unit 104 determines whether there is an unprocessed attribute value (step S23). If there is an unprocessed attribute value, the process returns to step S11. On the other hand, if there is no unprocessed attribute value, the process returns to the caller process.

このような処理を実施すれば、図１０Ｂに示すような分布変換データが生成される。この分布変換データによって、特定の属性項目について属性値の出現頻度の分布を、秘匿化という目的において適切に変換することができるようになる。 By performing such processing, distribution conversion data as shown in FIG. 10B is generated. With this distribution conversion data, the distribution of the appearance frequency of attribute values for a specific attribute item can be appropriately converted for the purpose of concealment.

図６の処理の説明に戻って、変換処理部１０７及び暗号化処理部１０９は、変換処理を実施する（ステップＳ５）。変換処理については、図１１乃至図１３Ｂを用いて説明する。 Returning to the description of the processing in FIG. 6, the conversion processing unit 107 and the encryption processing unit 109 perform conversion processing (step S5). The conversion process will be described with reference to FIGS. 11 to 13B.

まず、変換処理部１０７は、提供データ格納部１０１に格納されている提供データに含まれる複数のレコードのうち未処理のレコードを１つ特定する（図１１：ステップＳ３１）。そして、変換処理部１０７は、特定されたレコードにおける処理の対象である特定の属性項目の属性値から、分布変換データの該当部分を特定する（ステップＳ３３）。図７に示すようなデータの最初のレコードにおける属性項目「性別」の属性値「Ｍ」の場合には、図１０Ｂの分布変換データであれば、変換前の属性値「Ｍ」の行を特定する。 First, the conversion processing unit 107 identifies one unprocessed record among a plurality of records included in the provided data stored in the provided data storage unit 101 (FIG. 11: step S31). And the conversion process part 107 specifies the applicable part of distribution conversion data from the attribute value of the specific attribute item which is the object of the process in the specified record (step S33). In the case of the attribute value “M” of the attribute item “sex” in the first record of the data as shown in FIG. 7, if the distribution conversion data of FIG. 10B, the row of the attribute value “M” before conversion is specified. To do.

そして、変換処理部１０７は、分布変換データの該当部分に規定されている確率に応じて、特定されたレコードにおける特定の属性項目の属性値を、変換後属性値に変換し、処理後のレコードを変換済データ格納部１０８に格納する（ステップＳ３５）。変換後属性値が複数でそれらの確率が均一の場合には、ラウンドロビンで順番に変換後属性値を採用するようにしても良い。また、乱数を発生させて、確率値に応じて変換後属性値をその都度選択するようにしても良い。 Then, the conversion processing unit 107 converts the attribute value of the specific attribute item in the specified record into the converted attribute value according to the probability defined in the corresponding part of the distribution conversion data, and the processed record Is stored in the converted data storage unit 108 (step S35). If there are a plurality of post-conversion attribute values and the probabilities are uniform, the post-conversion attribute values may be adopted in turn in round robin. Alternatively, a random number may be generated and the converted attribute value may be selected each time according to the probability value.

例えば図７のような提供データの場合、図１０Ｂの分布変換データに従えば、ステップＳ３５を実施すると、例えば図１２に示すようなデータが得られる。性別の属性値「Ｍ」に対しては「Ｍ１」と「Ｍ２」が均一に発生するように変換が行われる。また、性別の属性値「Ｆ」は「Ｆ１」に変換される。 For example, in the case of provided data as shown in FIG. 7, according to the distribution conversion data shown in FIG. 10B, when step S35 is performed, data as shown in FIG. 12 is obtained. The gender attribute value “M” is converted so that “M1” and “M2” are uniformly generated. The gender attribute value “F” is converted to “F1”.

その後、暗号化処理部１０９は、キー格納部１１０に格納されているキーｋと所定のハッシュ関数Ｈとで、変換済データ格納部１０８に格納されているレコードにおける特定の属性項目の変換後属性値を暗号化して、処理後のレコードを暗号化データ格納部１１１に格納する（ステップＳ３７）。例えば、変換後属性値が「Ｍ１」であればＨ（Ｍ１，ｋ）の演算を実施する。図１２のようなデータが得られた場合には、図１３Ａに模式的に示すような演算を実施して、図１３Ｂに示すような暗号化データを生成する。図１３Ｂからも分かるように、「Ｍ１」であれば「ａｂｃ２５４３２」が生成される。なお、「Ｍ１」が出現する全てのレコードについて「ａｂｃ２５４３２」が生成されるので、クロス集計を実施しても、「Ｍ１」について疾患有「１」のレコード数及び疾患無「０」のレコード数を計数することができる。但し、「Ｍ２」に対応する「ａｗｅ３４５６５」との関連は、暗号化データにおいては不明であるから、「ａｂｃ２５４３２」と「ａｗｅ３４５６５」とが同一視されることはない。 Thereafter, the encryption processing unit 109 uses the key k stored in the key storage unit 110 and the predetermined hash function H to convert the attribute after conversion of a specific attribute item in the record stored in the converted data storage unit 108. The value is encrypted, and the processed record is stored in the encrypted data storage unit 111 (step S37). For example, if the attribute value after conversion is “M1”, the calculation of H (M1, k) is performed. When data as shown in FIG. 12 is obtained, an operation as schematically shown in FIG. 13A is performed to generate encrypted data as shown in FIG. 13B. As can be seen from FIG. 13B, if “M1”, “abc25432” is generated. Since “abc25432” is generated for all records in which “M1” appears, the number of records with “1” with disease and the number of records with “0” without disease for “M1” even if cross tabulation is performed. Can be counted. However, since the relationship with “a 34565” corresponding to “M2” is unknown in the encrypted data, “abc 25432” and “a 34565” are not identified with each other.

そして、変換処理部１０７は、未処理のレコードが存在するか判断する（ステップＳ３９）。未処理のレコードが存在する場合にはステップＳ３１に戻る。一方、未処理のレコードが存在しない場合には、呼び出し元の処理に戻る。 Then, the conversion processing unit 107 determines whether there is an unprocessed record (step S39). If there is an unprocessed record, the process returns to step S31. On the other hand, if there is no unprocessed record, the process returns to the caller process.

このような処理を実施すれば、特定の属性項目において属性値の出現頻度の分布を、異なった分布に変換し、さらに秘匿化されたデータが生成できたことになる。 By performing such processing, the distribution of the appearance frequency of the attribute value in a specific attribute item is converted into a different distribution, and further concealed data can be generated.

図６の処理の説明に戻って、送信部１１２は、暗号化データ格納部１１１に格納されているデータを集計サーバ３００に送信する（ステップＳ７）。 Returning to the description of the processing in FIG. 6, the transmission unit 112 transmits the data stored in the encrypted data storage unit 111 to the aggregation server 300 (step S <b> 7).

集計サーバ３００における処理については、上で述べたように、暗号化されたデータを受信して、データ格納部３２０に格納し、要求に応じて又は自動的に所定の集計処理（例えばクロス集計）を実施する。 As for the processing in the totaling server 300, as described above, the encrypted data is received and stored in the data storage unit 320, and a predetermined totaling process (for example, cross tabulation) is automatically performed in response to a request. To implement.

例えば、図１Ａのようなデータ（一部）の場合、性別「Ｍ」を「Ｍ１」乃至「Ｍ６」に展開し、性別「Ｆ」を「Ｆ１」に変換するという分布変換データが生成されたとする。そうすると、例えば図１４Ａに示すようなデータが得られる。さらに、図１Ｂのようなデータ（一部）の場合、性別「Ｍ」を「Ｍ１」に変換し、性別「Ｆ」を「Ｆ１」乃至「Ｆ５」に展開するという分布変換データが生成されたとする。そうすると、例えば図１４Ｂに示すようなデータが得られる。さらに、図１Ｃのようなデータ（一部）の場合、性別「Ｍ」を「Ｍ１」乃至「Ｍ３」に展開し、性別「Ｆ」を「Ｆ１」乃至「Ｆ３」に展開するという分布変換データが生成されたとする。そうすると、例えば図１４Ｃに示すようなデータが得られる。さらに、キーｋとハッシュ関数Ｈとで暗号化を行うと、図１４Ａのデータは図１５Ａのようになり、図１４Ｂのデータは図１５Ｂのようになり、図１４Ｃのデータは図１５Ｃのようになる。図１５Ａ、図１５Ｂ及び図１５Ｃのデータが、集計サーバ３００に送信され、データ受信部３１０によりデータ格納部３２０に格納されるものとする。 For example, in the case of data (partly) as shown in FIG. 1A, it is assumed that distribution conversion data is generated in which gender “M” is expanded from “M1” to “M6” and gender “F” is converted to “F1”. To do. Then, for example, data as shown in FIG. 14A is obtained. Further, in the case of data (partial) as shown in FIG. 1B, distribution conversion data is generated in which gender “M” is converted to “M1” and gender “F” is expanded to “F1” to “F5”. To do. Then, for example, data as shown in FIG. 14B is obtained. Further, in the case of data (partial) as shown in FIG. 1C, distribution conversion data in which gender “M” is expanded from “M1” to “M3” and gender “F” is expanded from “F1” to “F3”. Is generated. Then, for example, data as shown in FIG. 14C is obtained. Further, when encryption is performed using the key k and the hash function H, the data in FIG. 14A becomes as shown in FIG. 15A, the data in FIG. 14B becomes as in FIG. 15B, and the data in FIG. 14C becomes as shown in FIG. Become. 15A, 15B, and 15C are transmitted to the aggregation server 300 and stored in the data storage unit 320 by the data reception unit 310.

その後、集計処理部３３０は、自動的に又は要求に応じて、データ格納部３２０に格納されているデータに対して例えば性別毎に疾患の発生状況を集計するクロス集計を実施するとする。そうすると、図１６Ａ乃至図１６Ｄに示すような集計結果が得られる。図１６Ａの例では、各学部について、性別毎に疾患の属性値「０」の出現頻度及び属性値「１」の出現頻度を計数した結果を示している。また、図１６Ｂの例では、各学部について、性別毎に疾患の属性値「１」の出現頻度を計数した結果を示している。さらに、図１６Ｃの例では、学部に関係なく全学部について、性別毎に疾患の属性値「０」の出現頻度及び属性値「１」の出現頻度を計数した結果を示している。さらに、図１６Ｄの例では、学部に関係なく全学部について、性別毎に属性値「１」の出現頻度を計数した結果を示している。 Thereafter, the totalization processing unit 330 performs cross-tabulation that totals the occurrence status of diseases for each gender, for example, on the data stored in the data storage unit 320 automatically or upon request. Then, the total results as shown in FIGS. 16A to 16D are obtained. The example of FIG. 16A shows the results of counting the appearance frequency of the disease attribute value “0” and the appearance frequency of the attribute value “1” for each department for each gender. In the example of FIG. 16B, the result of counting the appearance frequency of the disease attribute value “1” for each department for each gender is shown. Further, the example of FIG. 16C shows the result of counting the appearance frequency of the disease attribute value “0” and the appearance frequency of the attribute value “1” for every gender regardless of the faculty. Furthermore, in the example of FIG. 16D, the result of counting the appearance frequency of the attribute value “1” for each gender is shown for all faculties regardless of the faculty.

なお、上でも述べたように展開数を算出する場合の基準値については出現頻度の最小値を採用するようにしてもよい。例えば図８の場合には、最小値「３」を基準値として、属性値「Ｍ」については出現頻度「１０」から上で述べたような方法で展開数を算出すると、図１７に示すように展開数「４」が得られる。一方、属性値「Ｆ」については出現頻度「３」から上で述べたようなで展開数を算出すると、展開数「０」が得られる。このような展開数の場合には、属性値「Ｍ」については属性値「Ｍ１」乃至「Ｍ４」に展開し、属性値「Ｆ」を属性値「Ｆ１」に変換する。すなわち、図１８に示すような分布変換データが生成される。なお、出現頻度「１０」で変換確率「２５」％の場合であっても、変換後属性値は整数回出現するので、例えば「Ｍ１」が３回、「Ｍ２」が２回、「Ｍ３」が３回、「Ｍ４」が２回出現する。 As described above, the minimum value of the appearance frequency may be adopted as the reference value for calculating the number of expansions. For example, in the case of FIG. 8, when the minimum value “3” is used as a reference value and the number of expansions is calculated for the attribute value “M” from the appearance frequency “10” by the method described above, as shown in FIG. The expansion number “4” is obtained. On the other hand, for the attribute value “F”, when the number of expansions is calculated as described above from the appearance frequency “3”, the number of expansions “0” is obtained. In the case of such an expansion number, the attribute value “M” is expanded into attribute values “M1” to “M4”, and the attribute value “F” is converted into the attribute value “F1”. That is, distribution conversion data as shown in FIG. 18 is generated. Even if the appearance frequency is “10” and the conversion probability is “25”%, the converted attribute value appears an integer number of times, so “M1” is three times, “M2” is two times, “M3”, for example. Appears three times and “M4” appears twice.

すなわち、図１９に示すように、出現頻度の偏りが少なくなるように、出現頻度の分布が変換されている。すなわち、図１９の左側のような分布から、図１９の右側のように、出現頻度が基準値である最小値以下になるように、分布の変換が行われている。 That is, as shown in FIG. 19, the appearance frequency distribution is converted so that the appearance frequency is less biased. That is, the distribution is converted from the distribution shown on the left side of FIG. 19 so that the appearance frequency is equal to or lower than the minimum value that is the reference value, as shown on the right side of FIG.

なお、上で述べた例では、出現頻度がフラットになる方向で分布の変換を行う例を示したが、変換後の分布から変換前の分布が推定できなくなればよいので、例えば、変換前に正規分布でない分布が得られた場合に、正規分布になるように変換を行うようにしても良い。 In the example described above, an example is shown in which the distribution is converted in a direction in which the appearance frequency becomes flat. However, it is only necessary that the distribution before conversion cannot be estimated from the distribution after conversion. When a distribution that is not a normal distribution is obtained, conversion may be performed so that a normal distribution is obtained.

また、上で述べた例では処理対象の属性項目については、予め指定される例を示したが、属性値の出現頻度の分布から処理対象の属性項目を特定するようにしても良い。例えば、分布検出部１０２が、平均値だけではなく、標準偏差σを算出する。そして、分布変換データ生成部１０４が、平均値からプラスマイナス３σの範囲を超えるような出現頻度の属性値が存在するか確認し、このような属性値が存在すれば当該属性値の属性項目を処理の対象として選択する。 Further, in the example described above, an example in which an attribute item to be processed is specified in advance is shown, but the attribute item to be processed may be specified from the distribution of the appearance frequency of attribute values. For example, the distribution detection unit 102 calculates not only the average value but also the standard deviation σ. Then, the distribution conversion data generation unit 104 checks whether there is an attribute value having an appearance frequency that exceeds the range of plus or minus 3σ from the average value. If such an attribute value exists, the attribute item of the attribute value is selected. Select as target for processing.

次に、集計結果を集計サーバ３００から取得した際の処理について図２０乃至図２４を用いて説明する。集計要求処理部１２１は、例えば集計サーバ３００に、例えばユーザからの要求に従って特定の集計処理を要求する集計要求を送信する。集計サーバ３００の集計処理部３３０は、当該集計要求に従って集計処理を実施し、集計結果を、要求元の情報処理装置に返信する。 Next, processing when the totaling result is acquired from the totaling server 300 will be described with reference to FIGS. The aggregation request processing unit 121 transmits, for example, an aggregation request for requesting specific aggregation processing to the aggregation server 300 according to a request from the user, for example. The aggregation processing unit 330 of the aggregation server 300 performs aggregation processing according to the aggregation request, and returns the aggregation result to the requesting information processing apparatus.

そうすると、集計要求処理部１２１は、集計サーバ３００から集計結果を受信し、集計結果格納部１２２に格納する（図２０：ステップＳ４１）。次に、属性値生成部１２４は、集計要求処理部１２１からの指示に応じて、分布データ格納部１０３又は属性値格納部１２５に格納されている、処理対象の特定の属性項目における未処理の属性値を１つ特定する（ステップＳ４３）。また、属性値生成部１２４は、カウンタＮを１に初期化する（ステップＳ４５）。 Then, the aggregation request processing unit 121 receives the aggregation result from the aggregation server 300 and stores it in the aggregation result storage unit 122 (FIG. 20: Step S41). Next, in response to an instruction from the aggregation request processing unit 121, the attribute value generation unit 124 stores unprocessed specific attribute items to be processed that are stored in the distribution data storage unit 103 or the attribute value storage unit 125. One attribute value is specified (step S43). Further, the attribute value generation unit 124 initializes the counter N to 1 (step S45).

属性値生成部１２４は、ルール格納部１０５に格納されているルールデータから、特定された属性値からＮ番目の変換後属性値を決定する（ステップＳ４７）。例えば、性別で、特定された属性値が「Ｍ」であれば、ルールデータに従えば「Ｍ１」が最初に決定される。 The attribute value generation unit 124 determines the Nth post-conversion attribute value from the identified attribute value from the rule data stored in the rule storage unit 105 (step S47). For example, if the specified attribute value by sex is “M”, “M1” is determined first according to the rule data.

その後、属性値生成部１２４は、Ｎ番目の変換後属性値に対して、キー格納部１１０に格納されているキーｋ及び所定のハッシュ関数Ｈから暗号化された変換後属性値を算出する（ステップＳ４９）。例えば図２１Ａに示すように、属性値「Ｍ」について「Ｍ１」が生成されると、Ｈ（Ｍ１，ｋ）という演算を実施して、暗号化された変換後属性値を算出する。属性値「Ｆ」についても同様に、図２１Ｂに示すように、「Ｆ１」が生成されると、Ｈ（Ｆ１，ｋ）という演算を実施して、暗号化された変換後属性値を算出する。属性値生成部１２４は、このような暗号化された変換後属性値を復元処理部１２３に出力する。 Thereafter, the attribute value generation unit 124 calculates a converted attribute value encrypted from the key k stored in the key storage unit 110 and a predetermined hash function H for the Nth converted attribute value ( Step S49). For example, as shown in FIG. 21A, when “M1” is generated for the attribute value “M”, an operation H (M1, k) is performed to calculate an encrypted converted attribute value. Similarly, with respect to the attribute value “F”, as shown in FIG. 21B, when “F1” is generated, an operation of H (F1, k) is performed to calculate an encrypted attribute value after conversion. . The attribute value generation unit 124 outputs the encrypted post-conversion attribute value to the restoration processing unit 123.

復元処理部１２３は、集計結果格納部１２２に格納されている集計結果を、暗号化された変換後属性値で検索を実施する（ステップＳ５１）。例えば、図１６Ｃに示すような集計結果が取得された場合には、図１６Ｃに示すような集計結果を、例えば「Ｍ１」に対応する「ａｂｃ２５４３２」で検索する。そうすると、１件目がヒットする。 The restoration processing unit 123 searches the aggregation result stored in the aggregation result storage unit 122 with the encrypted attribute value after conversion (step S51). For example, when the totaling result as shown in FIG. 16C is acquired, the totaling result as shown in FIG. 16C is searched by “abc25432” corresponding to “M1”, for example. Then, the first item is hit.

そして、復元処理部１２３は、集計結果に暗号化された変換後属性値が存在するか判断し（ステップＳ５３）、集計結果に暗号化された変換後属性値が存在する場合には、ステップＳ４３で特定された属性値のカウントを集計値だけ増分する（ステップＳ５５）。計算結果については、復元集計結果格納部１２６に格納する。例えば図１６Ｃの最初のレコードが特定された場合には、属性値「Ｍ」について、疾患「０」のカウント値に「２」を加算し、疾患「１」のカウント値に「１」を加算する。ここでは初期的にはカウント値が「０」であるから、疾患「０」のカウント値は「２」になり、疾患「１」のカウント値は「１」となる。そして、属性値生成部１２４は、Ｎを１インクリメントして（ステップＳ５７）ステップＳ４７に戻る。 Then, the restoration processing unit 123 determines whether or not the converted attribute value encrypted in the aggregation result exists (step S53). If the converted attribute value encrypted in the aggregation result exists, step S43 is performed. The count of the attribute value specified in (1) is incremented by the total value (step S55). The calculation result is stored in the restoration count result storage unit 126. For example, when the first record of FIG. 16C is specified, “2” is added to the count value of the disease “0” and “1” is added to the count value of the disease “1” for the attribute value “M”. To do. Here, since the count value is initially “0”, the count value of the disease “0” is “2”, and the count value of the disease “1” is “1”. The attribute value generation unit 124 increments N by 1 (step S57) and returns to step S47.

例えば、属性値「Ｍ」の場合には、次に「Ｍ２」が変換後属性値として決定され、図２１Ａに示すように、暗号化された変換後属性値「ａｗｅ３４５６５」が算出される。そして、図１６Ｃの集計結果を「ａｗｅ３４５６５」で検索する。そうすると、２行目のレコードが特定され、疾患「０」のカウント値に「２」を加算して「４」が得られ、疾患「１」のカウント値に「１」を加算して「２」が得られる。 For example, in the case of the attribute value “M”, “M2” is then determined as the converted attribute value, and the encrypted converted attribute value “away 34565” is calculated as shown in FIG. 21A. Then, the search result of FIG. 16C is searched with “away 34565”. Then, the record in the second row is specified, and “2” is added to the count value of the disease “0” to obtain “4”, and “1” is added to the count value of the disease “1” to obtain “2”. Is obtained.

一方、集計結果に暗号化された変換後属性値が存在していない場合には、端子Ａを介して図２２の処理に移行する。 On the other hand, if the converted attribute value encrypted in the aggregation result does not exist, the process proceeds to the process of FIG.

図２２の処理の説明に移行して、属性値生成部１２４は、Ｎが、予め定められたＮの最大値以上であるか判断する（ステップＳ５９）。ＮがＮの最大値以上でない場合には、属性値生成部１２４は、Ｎを１インクリメントして（ステップＳ６１）、端子Ｂを介して図２０のステップＳ４７に戻る。一方、ＮがＮの最大値以上である場合には、属性値生成部１２４は、分布データ格納部１０３又は属性値格納部１２５において未処理の属性値が存在するか判断する（ステップＳ６３）。未処理の属性値が存在する場合には、端子Ｃを介して図２０のステップＳ４３に戻る。 Moving to the description of the processing in FIG. 22, the attribute value generation unit 124 determines whether N is equal to or greater than a predetermined maximum value of N (step S <b> 59). If N is not equal to or greater than the maximum value of N, the attribute value generation unit 124 increments N by 1 (step S61), and returns to step S47 of FIG. On the other hand, if N is greater than or equal to the maximum value of N, the attribute value generation unit 124 determines whether there is an unprocessed attribute value in the distribution data storage unit 103 or the attribute value storage unit 125 (step S63). When there is an unprocessed attribute value, the process returns to step S43 in FIG.

一方、未処理の属性値が存在しない場合には、例えば復元処理部１２３は、情報処理装置のユーザの端末などに、復元集計結果格納部１２６に格納されている復元集計結果を送信する（ステップＳ６５）。 On the other hand, when there is no unprocessed attribute value, for example, the restoration processing unit 123 transmits the restoration tabulation result stored in the restoration tabulation result storage unit 126 to the terminal of the user of the information processing apparatus (step S110). S65).

このような処理を実施すれば、図１６Ｃについては、図２３に示すような結果が得られる。図２３に示すように、属性項目「性別」の属性値「Ｍ」及び「Ｆ」について、疾患「０」の計数値及び疾患「１」の計数値が得られるようになる。同様に、図１６Ｂについて処理を行えば、図２４に示すような結果が得られる。図２４の例では、学部毎に疾患「１」の計数値が、性別「Ｍ」及び「Ｆ」について計数された結果が得られるようになる。 When such processing is performed, the result shown in FIG. 23 is obtained for FIG. 16C. As shown in FIG. 23, for attribute values “M” and “F” of the attribute item “sex”, a count value of disease “0” and a count value of disease “1” are obtained. Similarly, when processing is performed on FIG. 16B, the result shown in FIG. 24 is obtained. In the example of FIG. 24, the result of counting the count value of the disease “1” for each faculty for the sexes “M” and “F” is obtained.

以上のような処理を実施すれば、集計サーバ３００においては、暗号化もされているし、分布も変更されているので、元のデータを推定することは不可能であるが、集計結果については正しい結果を情報処理装置において復元することができる。よって、集計サーバ３００にセンシティブな情報であっても登録しておき、クロス集計などの集計処理を実施させる。そして、情報処理装置においてクロス集計などの集計結果を復元して有効に活用することができるようになる。 If the processing as described above is performed, the aggregation server 300 is encrypted and the distribution is also changed, so it is impossible to estimate the original data. The correct result can be restored in the information processing apparatus. Therefore, even sensitive information is registered in the tabulation server 300, and tabulation processing such as cross tabulation is performed. In addition, the information processing apparatus can restore and effectively use the counting result such as the cross tabulation.

以上本技術の実施の形態を説明したが、本技術はこれらの実施の形態に限定されるわけではない。例えば、図５の機能ブロック図については、実際のプログラムモジュール構成は一致しない場合もある。また、処理フローについても、処理結果が変わらないかぎり、処理順番を入れ替えたり、並列実行するようにしても良い。例えば、ループを分割したり、ループを統合したりしても処理結果が変わらない場合には問題ない。例えば図２０及び図２２の処理フローにおいて、先に暗号化された変換後属性値を全て算出してから、集計結果の検索を実施するようにしても良い。同様に、図１１の処理において、暗号化処理をループから外すようにしても良い。 Although the embodiments of the present technology have been described above, the present technology is not limited to these embodiments. For example, in the functional block diagram of FIG. 5, the actual program module configuration may not match. As for the processing flow, as long as the processing result does not change, the processing order may be changed or the processing flow may be executed in parallel. For example, there is no problem if the processing result does not change even if the loop is divided or the loops are integrated. For example, in the processing flow of FIG. 20 and FIG. 22, the calculation result may be searched after all the converted attribute values encrypted previously are calculated. Similarly, in the process of FIG. 11, the encryption process may be removed from the loop.

また、シリアルに番号を属性値に付加して変換後属性値を生成する場合には、ステップＳ５９及びＳ６１を設けないような処理フローであってもよい。 In addition, when a converted attribute value is generated by serially adding a number to the attribute value, the processing flow may be such that steps S59 and S61 are not provided.

また、分布変換データ格納部１０６に、暗号化された変換後属性値を登録して、変換前の属性値から直接暗号化された変換後属性値を抽出できるようにしてもよい。例えば図１８のような分布変換データを、図２５に示すように、暗号化された変換後属性値まで登録しておき、変換処理部１０７で直接暗号化された変換後属性値を、提供データ中の各レコードに割り当てるようにしても良い。 Alternatively, the encrypted attribute value after conversion may be registered in the distribution conversion data storage unit 106 so that the attribute value after conversion directly encrypted can be extracted from the attribute value before conversion. For example, distribution conversion data as shown in FIG. 18 is registered up to an encrypted converted attribute value as shown in FIG. 25, and the converted attribute value directly encrypted by the conversion processing unit 107 is provided as provided data. You may make it allocate to each inside record.

なお、上で述べた例では、全ての情報処理装置Ａ乃至Ｃが、同一の属性項目について分布の変換を行うような場合を説明したが、場合によっては一部の情報処理装置のみが特定の属性項目について分布の変換を実施したり、しなかったりする場合もある。従って、復元処理部１２３は、集計結果格納部１２２に処理がなされなかったレコードが残っている場合には、そのレコードをそのまま復元集計処理結果格納部１２６に格納するようにする。 In the example described above, the case where all the information processing apparatuses A to C perform distribution conversion on the same attribute item has been described. However, in some cases, only some information processing apparatuses are specified. In some cases, distribution conversion may or may not be performed for attribute items. Therefore, when there is a record that has not been processed in the aggregation result storage unit 122, the restoration processing unit 123 stores the record in the restoration aggregation process result storage unit 126 as it is.

また、全ての情報処理装置Ａ乃至Ｃで特定の属性項目について同一の属性値セットが出現しない可能性もある。その場合には、分布データ格納部１０３をベースに属性値生成部１２４が処理を行うと、未知の属性値については暗号化された変換後属性値を生成できない。このような場合にも、集計結果格納部１２２に処理がなされなかったレコードが残るので、そのレコードをそのまま復元集計処理結果格納部１２６に格納するようにする。但し、属性値格納部１２５に、情報処理装置Ａ乃至Ｃで用いられる可能性のある全ての属性値を格納しておき、このデータを属性値生成部１２４が用いることにすれば、このような問題は生じない。 In addition, the same attribute value set may not appear for a specific attribute item in all the information processing apparatuses A to C. In this case, if the attribute value generation unit 124 performs processing based on the distribution data storage unit 103, an encrypted converted attribute value cannot be generated for an unknown attribute value. Even in such a case, since a record that has not been processed remains in the totalization result storage unit 122, the record is stored in the restoration totalization processing result storage unit 126 as it is. However, if all the attribute values that may be used in the information processing apparatuses A to C are stored in the attribute value storage unit 125 and this data is used by the attribute value generation unit 124, such an attribute value is used. There is no problem.

なお、上で述べた情報処理装置及び集計サーバ３００は、コンピュータ装置であって、図２６に示すように、メモリ２５０１とＣＰＵ（Central Processing Unit）２５０３とハードディスク・ドライブ（ＨＤＤ：Hard Disk Drive）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The information processing apparatus and the aggregation server 300 described above are computer apparatuses, and as shown in FIG. 26, a memory 2501, a CPU (Central Processing Unit) 2503, and a hard disk drive (HDD: Hard Disk Drive) 2505. A display control unit 2507 connected to the display device 2509, a drive device 2513 for the removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本実施の形態をまとめると、以下のようになる。 The above-described embodiment can be summarized as follows.

本実施の形態の第１の側面に係る変換処理方法は、（Ａ）第１データ格納部に格納されており且つ各々複数の属性項目について属性値を有する複数のレコードについて、複数の属性項目のうち特定の属性項目について出現する属性値の出現頻度についての第１の分布を検出する第１処理と、（Ｂ）第１の分布を当該第１の分布とは異なる第２の分布となるように、属性値のうち少なくとも一部の属性値の各々を、予め定められた第１のルールに従って特定される複数の第２の属性値のいずれかに変換する第２処理とを含む。 In the conversion processing method according to the first aspect of the present embodiment, (A) a plurality of attribute items for a plurality of records stored in the first data storage unit and having attribute values for a plurality of attribute items, respectively. Of these, (B) a first process for detecting a first distribution of the appearance frequency of an attribute value that appears for a specific attribute item, and (B) the first distribution to be a second distribution different from the first distribution. And a second process for converting each of at least some of the attribute values into one of a plurality of second attribute values specified according to a predetermined first rule.

このようにすれば、処理後のデータを見ても分布から元のデータを推定することが難しくなるので、データ漏洩を防止又は抑制することができる。 In this way, it is difficult to estimate the original data from the distribution even if the processed data is viewed, so that data leakage can be prevented or suppressed.

また、上で述べた第２処理は、（ｂ１）属性値の出現頻度についての第１の分布において所定基準以上の偏りを有する属性値を予め定められた第２のルールに従って複数の第２属性値に展開し、当該所定基準以上の偏りを有する属性値と複数の第２属性値との対応関係を第２データ格納部に格納する第３処理と、（ｂ２）第２データ格納部に格納されている対応関係に従って、複数のレコードにおける特定の属性項目についての属性値を、複数の第２属性値のうちのいずれかの第２属性値に変換する第４処理とを含むようにしてもよい。このように出現頻度に偏りを有する属性値を複数の属性値のいずれかに変換すれば、出現頻度の偏りを是正して、分布の特徴を消すことができるようになる。すなわち、分布からのデータ漏洩を防止又は抑制できるようになる。 In the second process described above, (b1) a plurality of second attributes according to a second rule in which attribute values having a bias greater than or equal to a predetermined reference in the first distribution of attribute value appearance frequencies are determined. A third process that expands the value and stores a correspondence relationship between the attribute value having a bias equal to or greater than the predetermined reference and the plurality of second attribute values in the second data storage unit; and (b2) stores in the second data storage unit According to the correspondence relationship, a fourth process for converting an attribute value for a specific attribute item in a plurality of records into a second attribute value of any of the plurality of second attribute values may be included. If an attribute value having a bias in appearance frequency is converted into any one of a plurality of attribute values in this way, the bias in the appearance frequency can be corrected and the distribution feature can be eliminated. That is, data leakage from the distribution can be prevented or suppressed.

また、上で述べた第２処理は、複数のレコードにおける特定の属性項目についての第２属性値を暗号化する第５処理をさらに含むようにしてもよい。上記対応関係において暗号化後の第２属性値も対応付けられている場合もあれば、このようにレコードについて暗号化処理を実施するようにしても良い。暗号化については、キーとハッシュ関数であってもよいし、他の暗号化手法を用いても良い。 The second process described above may further include a fifth process for encrypting the second attribute value for a specific attribute item in the plurality of records. In some cases, the second attribute value after encryption is also associated with the correspondence, and the encryption process may be performed on the record in this way. As for encryption, a key and a hash function may be used, or another encryption method may be used.

さらに、上で述べた第３処理が、特定の属性項目について出現する属性値のうち出現頻度が基準値を上回る属性値について、当該属性値の出現頻度及び基準値に基づき展開数を算出し、第２のルールに従って出現頻度が基準値を上回る属性値について展開数の第２属性値を生成するようにしてもよい。例えば第２属性値が均一に出現するように分布の変換を実施する場合には、出現頻度が基準値を下回るよう展開数を算出するようにしても良い。このようにすれば、出現頻度が突出した属性値を、見かけ上なくすことができるようになる。 Furthermore, the third process described above calculates the number of expansions based on the appearance frequency of the attribute value and the reference value for the attribute value whose appearance frequency exceeds the reference value among the attribute values that appear for the specific attribute item, According to the second rule, the second attribute value of the expansion number may be generated for the attribute value whose appearance frequency exceeds the reference value. For example, when the distribution is converted so that the second attribute value appears uniformly, the number of expansions may be calculated so that the appearance frequency is lower than the reference value. In this way, it is possible to apparently eliminate the attribute value with a prominent appearance frequency.

また、上で述べた第３処理が、展開数に応じて第２の属性値の各々について出現確率を算出し、対応関係に関連付けて第２データ格納部に格納する処理を含むようにしてもよい。その場合、上で述べた第４処理において、対応関係及び出現確率に従って、複数のレコードにおける特定の属性項目についての属性値を、複数の第２属性値のうちのいずれかの第２属性値に変換するようにしてもよい。出現確率を設定することで、柔軟に第２属性値を選択できるようになる。例えば、出現頻度が自然な形に揺らぐように出現確率を振れさせることで、分布の変換を第三者からわかりにくくすることができるようになる。 Further, the third process described above may include a process of calculating an appearance probability for each of the second attribute values in accordance with the number of expansions and storing the probability of association in the second data storage unit. In that case, in the fourth process described above, the attribute value for the specific attribute item in the plurality of records is changed to any one of the plurality of second attribute values according to the correspondence and the appearance probability. You may make it convert. By setting the appearance probability, the second attribute value can be selected flexibly. For example, by changing the appearance probability so that the appearance frequency fluctuates in a natural manner, the conversion of the distribution can be made difficult to understand by a third party.

さらに、本実施の形態の第１の側面において、第１の処理を行った後の複数のレコードを、集計処理を行うコンピュータに送信する処理をさらに含むようにしても良い。このように、集計処理を行うコンピュータでは様々な装置から得られたデータに対して総合した集計処理を実施することができる。 Furthermore, in the first aspect of the present embodiment, a process of transmitting a plurality of records after the first process to the computer that performs the aggregation process may be further included. As described above, the computer that performs the totaling process can perform the totaling process on the data obtained from various devices.

本実施の形態の第２の側面に係る復元処理方法は、（Ａ）集計対象の複数のレコードにおける第１の属性項目について出現する第１属性値から、予め定められたルールに従って複数の第２属性値を生成する第１処理と、（Ｂ）第１の属性項目について出現した第３属性値の各々について第２の属性項目の属性値毎の計数値又は属性値の合計値を含む集計結果を格納する第１データ格納部において、複数の第２属性値の各々について、当該第２の属性値と一致する第３属性値が存在するか判断し、存在すれば当該第３属性値に対応付けられている第２の属性項目の属性値毎の計数値又は属性値の合計値を第１データ格納部から読み出し、当該第２の属性項目の属性値毎に累計し又は当該第２の属性項目の属性値を累計し、累計結果を第２の属性値に対応付けて第２データ格納部に格納する第２処理とを含む。このようにすれば、特定の属性項目の属性値が複数の属性値に分けられている場合においても、その結果を集約することで、元々の集計結果を復元できるようになる。 The restoration processing method according to the second aspect of the present embodiment includes: (A) a plurality of second attributes according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted. A first result for generating an attribute value; and (B) a total result including a count value for each attribute value of the second attribute item or a total value of the attribute values for each of the third attribute values that appear for the first attribute item. In each of the plurality of second attribute values, it is determined whether there is a third attribute value that matches the second attribute value, and if it exists, it corresponds to the third attribute value The count value for each attribute value of the attached second attribute item or the total value of the attribute values is read from the first data storage unit, and accumulated for each attribute value of the second attribute item, or the second attribute The attribute value of the item is accumulated and the accumulated result is the second attribute. And a second process of storing the second data storage unit in association with. In this way, even when the attribute value of a specific attribute item is divided into a plurality of attribute values, the original aggregation result can be restored by aggregating the results.

なお、上で述べた第１処理が、予め定められた第２のルールに従って、第１属性値を複数の第４属性値に展開する処理と、複数の第４属性値の各々を暗号化することで、複数の第２属性値を生成する処理とを含むようにしてもよい。 The first process described above is a process of expanding the first attribute value into a plurality of fourth attribute values according to a predetermined second rule, and encrypts each of the plurality of fourth attribute values. Thus, a process of generating a plurality of second attribute values may be included.

なお、上で述べたような処理をコンピュータに実施させるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブル・ディスク、ＣＤ−ＲＯＭなどの光ディスク、光磁気ディスク、半導体メモリ（例えばＲＯＭ）、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。なお、処理途中のデータについては、ＲＡＭ等の記憶装置に一時保管される。 It is possible to create a program for causing a computer to carry out the processing described above, such as a flexible disk, an optical disk such as a CD-ROM, a magneto-optical disk, and a semiconductor memory (for example, ROM). Or a computer-readable storage medium such as a hard disk or a storage device. Note that data being processed is temporarily stored in a storage device such as a RAM.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
第１データ格納部に格納されており且つ各々複数の属性項目について属性値を有する複数のレコードについて、前記複数の属性項目のうち特定の属性項目について出現する属性値の出現頻度についての第１の分布を検出する第１処理と、
前記第１の分布を当該第１の分布とは異なる第２の分布となるように、前記属性値のうち少なくとも一部の属性値の各々を、予め定められた第１のルールに従って特定される複数の第２の属性値のいずれかに変換する第２処理と、
を、コンピュータに実行させるための変換処理プログラム。 (Appendix 1)
A plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items, the first of the appearance frequency of the attribute value appearing for a specific attribute item among the plurality of attribute items A first process for detecting a distribution;
Each of at least some of the attribute values is specified according to a predetermined first rule so that the first distribution becomes a second distribution different from the first distribution. A second process for converting into any of a plurality of second attribute values;
Is a conversion processing program for causing a computer to execute.

（付記２）
前記第２処理は、
前記属性値の出現頻度についての第１の分布において所定基準以上の偏りを有する属性値を予め定められた第２のルールに従って複数の第２属性値に展開し、当該所定基準以上の偏りを有する属性値と前記複数の第２属性値との対応関係を第２データ格納部に格納する第３処理と、
前記第２データ格納部に格納されている前記対応関係に従って、前記複数のレコードにおける前記特定の属性項目についての属性値を、前記複数の第２属性値のうちのいずれかの第２属性値に変換する第４処理と、
を含む付記１記載の変換処理プログラム。 (Appendix 2)
The second process includes
An attribute value having a bias greater than or equal to a predetermined reference in the first distribution of the appearance frequency of the attribute value is developed into a plurality of second attribute values according to a predetermined second rule, and has a bias greater than or equal to the predetermined reference A third process for storing a correspondence relationship between an attribute value and the plurality of second attribute values in a second data storage unit;
In accordance with the correspondence relationship stored in the second data storage unit, an attribute value for the specific attribute item in the plurality of records is changed to any one of the plurality of second attribute values. A fourth process to convert;
A conversion processing program according to appendix 1, including:

（付記３）
前記第２処理は、
前記複数のレコードにおける前記特定の属性項目についての前記第２属性値を暗号化する第５処理
をさらに含む付記２記載の変換処理プログラム。 (Appendix 3)
The second process includes
The conversion processing program according to supplementary note 2, further comprising: a fifth process for encrypting the second attribute value for the specific attribute item in the plurality of records.

（付記４）
前記第３処理が、
前記特定の属性項目について出現する属性値のうち出現頻度が基準値を上回る属性値について、当該属性値の出現頻度及び前記基準値に基づき展開数を算出し、
前記第２のルールに従って前記出現頻度が基準値を上回る属性値について前記展開数の第２属性値を生成する
処理を含む付記２又は３記載の変換処理プログラム。 (Appendix 4)
The third process includes
Of the attribute values that appear for the specific attribute item, for the attribute value whose appearance frequency exceeds the reference value, calculate the number of expansions based on the appearance frequency of the attribute value and the reference value,
The conversion processing program according to appendix 2 or 3, including a process of generating the second attribute value of the number of expansions for an attribute value whose appearance frequency exceeds a reference value according to the second rule.

（付記５）
前記第３処理が、
前記展開数に応じて前記第２の属性値の各々について出現確率を算出し、前記対応関係に関連付けて前記第２データ格納部に格納する処理
を含み、
前記第４処理において、前記対応関係及び出現確率に従って、前記複数のレコードにおける前記特定の属性項目についての属性値を、前記複数の第２属性値のうちのいずれかの第２属性値に変換する
付記４記載の変換処理プログラム。 (Appendix 5)
The third process includes
A process of calculating an appearance probability for each of the second attribute values according to the number of expansions, and storing it in the second data storage unit in association with the correspondence relationship;
In the fourth process, an attribute value for the specific attribute item in the plurality of records is converted into a second attribute value of any of the plurality of second attribute values according to the correspondence relationship and the appearance probability. The conversion processing program according to attachment 4.

（付記６）
前記第１の処理を行った後の前記複数のレコードを、集計処理を行うコンピュータに送信する処理
をさらに前記コンピュータに実行させるための付記１乃至５のいずれか１つ記載の変換処理プログラム。 (Appendix 6)
The conversion processing program according to any one of appendices 1 to 5 for causing the computer to further execute a process of transmitting the plurality of records after the first process to the computer that performs the aggregation process.

（付記７）
集計対象の複数のレコードにおける第１の属性項目について出現する第１属性値から、予め定められたルールに従って複数の第２属性値を生成する第１処理と、
前記第１の属性項目について出現した第３属性値の各々について第２の属性項目の属性値毎の計数値又は属性値の合計値を含む集計結果を格納する第１データ格納部において、前記複数の第２属性値の各々について、当該第２の属性値と一致する第３属性値が存在するか判断し、存在すれば当該第３属性値に対応付けられている前記第２の属性項目の属性値毎の計数値又は属性値の合計値を前記第１データ格納部から読み出し、当該第２の属性項目の属性値毎に累計し又は当該第２の属性項目の属性値を累計し、累計結果を前記第２の属性値に対応付けて第２データ格納部に格納する第２処理と、
を、コンピュータに実行させるための復元処理プログラム。 (Appendix 7)
A first process for generating a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted;
In the first data storage unit for storing a count result for each attribute value of the second attribute item or a total result of the attribute values for each of the third attribute values that appear for the first attribute item, the plurality of For each of the second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is, the second attribute value associated with the third attribute value is determined. The count value for each attribute value or the total value of the attribute values is read from the first data storage unit, accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is accumulated, and accumulated A second process for storing the result in the second data storage unit in association with the second attribute value;
Is a restoration processing program for causing a computer to execute.

（付記８）
前記第１処理が、
予め定められた第２のルールに従って、前記第１属性値を複数の第４属性値に展開する処理と、
前記複数の第４属性値の各々を暗号化することで、前記複数の第２属性値を生成する処理と、
を含む付記７記載の復元処理プログラム。 (Appendix 8)
The first process includes
A process of expanding the first attribute value into a plurality of fourth attribute values according to a predetermined second rule;
A process of generating the plurality of second attribute values by encrypting each of the plurality of fourth attribute values;
The restoration processing program according to appendix 7, including:

（付記９）
第１データ格納部に格納されており且つ各々複数の属性項目について属性値を有する複数のレコードについて、前記複数の属性項目のうち特定の属性項目について出現する属性値の出現頻度についての第１の分布を検出する第１処理と、
前記第１の分布を当該第１の分布とは異なる第２の分布となるように、前記属性値のうち少なくとも一部の属性値の各々を、予め定められた第１のルールに従って特定される複数の第２の属性値のいずれかに変換する第２処理と、
を、コンピュータが実行する変換処理方法。 (Appendix 9)
A plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items, the first of the appearance frequency of the attribute value appearing for a specific attribute item among the plurality of attribute items A first process for detecting a distribution;
Each of at least some of the attribute values is specified according to a predetermined first rule so that the first distribution becomes a second distribution different from the first distribution. A second process for converting into any of a plurality of second attribute values;
A conversion processing method executed by a computer.

（付記１０）
集計対象の複数のレコードにおける第１の属性項目について出現する第１属性値から、予め定められたルールに従って複数の第２属性値を生成する第１処理と、
前記第１の属性項目について出現した第３属性値の各々について第２の属性項目の属性値毎の計数値又は属性値の合計値を含む集計結果を格納する第１データ格納部において、前記複数の第２属性値の各々について、当該第２の属性値と一致する第３属性値が存在するか判断し、存在すれば当該第３属性値に対応付けられている前記第２の属性項目の属性値毎の計数値又は属性値の合計値を前記第１データ格納部から読み出し、当該第２の属性項目の属性値毎に累計し又は当該第２の属性項目の属性値を累計し、累計結果を前記第２の属性値に対応付けて第２データ格納部に格納する第２処理と、
を、コンピュータが実行する復元処理方法。 (Appendix 10)
A first process for generating a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted;
In the first data storage unit for storing a count result for each attribute value of the second attribute item or a total result of the attribute values for each of the third attribute values that appear for the first attribute item, the plurality of For each of the second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is, the second attribute value associated with the third attribute value is determined. The count value for each attribute value or the total value of the attribute values is read from the first data storage unit, accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is accumulated, and accumulated A second process for storing the result in the second data storage unit in association with the second attribute value;
A restoration processing method executed by the computer.

（付記１１）
第１データ格納部に格納されており且つ各々複数の属性項目について属性値を有する複数のレコードについて、前記複数の属性項目のうち特定の属性項目について出現する属性値の出現頻度についての第１の分布を検出する分布検出部と、
前記第１の分布を当該第１の分布とは異なる第２の分布となるように、前記属性値のうち少なくとも一部の属性値の各々を、予め定められた第１のルールに従って特定される複数の第２の属性値のいずれかに変換する変換処理部と、
を有する情報処理装置。 (Appendix 11)
A plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items, the first of the appearance frequency of the attribute value appearing for a specific attribute item among the plurality of attribute items A distribution detector for detecting the distribution;
Each of at least some of the attribute values is specified according to a predetermined first rule so that the first distribution becomes a second distribution different from the first distribution. A conversion processing unit for converting into any of a plurality of second attribute values;
An information processing apparatus.

（付記１２）
集計対象の複数のレコードにおける第１の属性項目について出現する第１属性値から、予め定められたルールに従って複数の第２属性値を生成する属性値生成部と、
前記第１の属性項目について出現した第３属性値の各々について第２の属性項目の属性値毎の計数値又は属性値の合計値を含む集計結果を格納する第１データ格納部において、前記複数の第２属性値の各々について、当該第２の属性値と一致する第３属性値が存在するか判断し、存在すれば当該第３属性値に対応付けられている前記第２の属性項目の属性値毎の計数値又は属性値の合計値を前記第１データ格納部から読み出し、当該第２の属性項目の属性値毎に累計し又は当該第２の属性項目の属性値を累計し、累計結果を前記第２の属性値に対応付けて第２データ格納部に格納する復元処理部と、
を有する情報処理装置。 (Appendix 12)
An attribute value generation unit that generates a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted;
In the first data storage unit for storing a count result for each attribute value of the second attribute item or a total result of the attribute values for each of the third attribute values that appear for the first attribute item, the plurality of For each of the second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is, the second attribute value associated with the third attribute value is determined. The count value for each attribute value or the total value of the attribute values is read from the first data storage unit, accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is accumulated, and accumulated A restoration processing unit that stores a result in the second data storage unit in association with the second attribute value;
An information processing apparatus.

１０１提供データ格納部
１０２分布検出部
１０３分布データ格納部
１０４分布変換データ生成部
１０５ルール格納部
１０６分布変換データ格納部
１０７変換処理部
１０８変換済データ格納部
１０９暗号化処理部
１１０キー格納部
１１１暗号化データ格納部
１１２送信部
１２１集計要求処理部
１２２集計結果格納部
１２３復元処理部
１２４属性値生成部
１２５属性値格納部
１２６復元集計結果格納部 101 Provided Data Storage Unit 102 Distribution Detection Unit 103 Distribution Data Storage Unit 104 Distribution Conversion Data Generation Unit 105 Rule Storage Unit 106 Distribution Conversion Data Storage Unit 107 Conversion Processing Unit 108 Converted Data Storage Unit 109 Encryption Processing Unit 110 Key Storage Unit 111 Encrypted data storage unit 112 Transmission unit 121 Total request processing unit 122 Total result storage unit 123 Restoration processing unit 124 Attribute value generation unit 125 Attribute value storage unit 126 Restoration total result storage unit

Claims

A plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items, the first of the appearance frequency of the attribute value appearing for a specific attribute item among the plurality of attribute items A first process for detecting a distribution;
Each of at least some of the attribute values is specified according to a predetermined first rule so that the first distribution becomes a second distribution different from the first distribution. A second process for converting into any of a plurality of second attribute values;
Is a conversion processing program for causing a computer to execute.

The second process includes
An attribute value having a bias greater than or equal to a predetermined reference in the first distribution of the appearance frequency of the attribute value is developed into a plurality of second attribute values according to a predetermined second rule, and has a bias greater than or equal to the predetermined reference A third process for storing a correspondence relationship between an attribute value and the plurality of second attribute values in a second data storage unit;
In accordance with the correspondence relationship stored in the second data storage unit, an attribute value for the specific attribute item in the plurality of records is changed to any one of the plurality of second attribute values. A fourth process to convert;
The conversion processing program according to claim 1 including:

The second process includes
The conversion processing program according to claim 2, further comprising: a fifth process for encrypting the second attribute value for the specific attribute item in the plurality of records.

The third process includes
Of the attribute values that appear for the specific attribute item, for the attribute value whose appearance frequency exceeds the reference value, calculate the number of expansions based on the appearance frequency of the attribute value and the reference value,
4. The conversion processing program according to claim 2, further comprising: generating a second attribute value of the number of expansions for an attribute value whose appearance frequency exceeds a reference value according to the second rule. 5.

The third process includes
A process of calculating an appearance probability for each of the second attribute values according to the number of expansions, and storing it in the second data storage unit in association with the correspondence relationship;
In the fourth process, an attribute value for the specific attribute item in the plurality of records is converted into a second attribute value of any of the plurality of second attribute values according to the correspondence relationship and the appearance probability. The conversion processing program according to claim 4.

The conversion processing program according to any one of claims 1 to 5, further causing the computer to execute a process of transmitting the plurality of records after the first process to a computer that performs an aggregation process.

A first process for generating a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted;
In the first data storage unit for storing a count result for each attribute value of the second attribute item or a total result of the attribute values for each of the third attribute values that appear for the first attribute item, the plurality of For each of the second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is, the second attribute value associated with the third attribute value is determined. The count value for each attribute value or the total value of the attribute values is read from the first data storage unit, accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is accumulated, and accumulated A second process for storing the result in the second data storage unit in association with the second attribute value;
Is a restoration processing program for causing a computer to execute.

The first process includes
A process of expanding the first attribute value into a plurality of fourth attribute values according to a predetermined second rule;
A process of generating the plurality of second attribute values by encrypting each of the plurality of fourth attribute values;
The restoration processing program according to claim 7 including:

A plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items, the first of the appearance frequency of the attribute value appearing for a specific attribute item among the plurality of attribute items A first process for detecting a distribution;
Each of at least some of the attribute values is specified according to a predetermined first rule so that the first distribution becomes a second distribution different from the first distribution. A second process for converting into any of a plurality of second attribute values;
A conversion processing method executed by a computer.

A first process for generating a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted;
In the first data storage unit for storing a count result for each attribute value of the second attribute item or a total result of the attribute values for each of the third attribute values that appear for the first attribute item, the plurality of For each of the second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is, the second attribute value associated with the third attribute value is determined. The count value for each attribute value or the total value of the attribute values is read from the first data storage unit, accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is accumulated, and accumulated A second process for storing the result in the second data storage unit in association with the second attribute value;
A restoration processing method executed by the computer.

A plurality of records stored in the first data storage unit and having attribute values for each of the plurality of attribute items, the first of the appearance frequency of the attribute value appearing for a specific attribute item among the plurality of attribute items A distribution detector for detecting the distribution;
Each of at least some of the attribute values is specified according to a predetermined first rule so that the first distribution becomes a second distribution different from the first distribution. A conversion processing unit for converting into any of a plurality of second attribute values;
An information processing apparatus.

An attribute value generation unit that generates a plurality of second attribute values according to a predetermined rule from a first attribute value that appears for a first attribute item in a plurality of records to be counted;
In the first data storage unit for storing a count result for each attribute value of the second attribute item or a total result of the attribute values for each of the third attribute values that appear for the first attribute item, the plurality of For each of the second attribute values, it is determined whether there is a third attribute value that matches the second attribute value. If there is, the second attribute value associated with the third attribute value is determined. The count value for each attribute value or the total value of the attribute values is read from the first data storage unit, accumulated for each attribute value of the second attribute item, or the attribute value of the second attribute item is accumulated, and accumulated A restoration processing unit that stores a result in the second data storage unit in association with the second attribute value;
An information processing apparatus.