JP5835745B2

JP5835745B2 - Information distribution system and information distribution method

Info

Publication number: JP5835745B2
Application number: JP2013014395A
Authority: JP
Inventors: 石井　淳; 淳石井; 浩之前大道; 依田　育生; 育生依田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-02-08
Filing date: 2013-01-29
Publication date: 2015-12-24
Anticipated expiration: 2033-01-29
Also published as: JP2013178756A

Description

本発明は、個々のプライバシ情報を含むデータを、プライバシ情報を秘匿したまま効率よく統計処理を行うための手法およびサーバ内での処理に関する。 The present invention relates to a technique for efficiently performing statistical processing on data including individual privacy information while keeping the privacy information secret, and processing in a server.

平均・分散・相関等の統計処理を行うためには個々人（または、個々人の所有する携帯端末など。以下ノードと呼ぶ）のデータを統計分析器（統計処理サーバ）に一度集めてから処理する必要があるため、プライバシを含むデータを提供するには然るべき信頼のおける調査機関などに対してでなければ抵抗感が大きく、また調査を行う側としても毎回データを安全に運用するために様々な配慮が必要であった。 In order to perform statistical processing such as averaging, variance, correlation, etc., it is necessary to collect the data of individual persons (or mobile terminals owned by individual persons, hereinafter referred to as nodes) in a statistical analyzer (statistical processing server) and then process them. Therefore, in order to provide data including privacy, there is a great sense of resistance unless it is a reasonably reliable research institution, and various considerations are also required for the data to be operated safely every time the investigator conducts the survey. Was necessary.

そのためプライバシを含んだデータの統計処理を行うためにノードのプライバシ情報を秘匿した状態でデータのやりとりを行う手法が考えられている。 Therefore, in order to perform statistical processing of data including privacy, a method of exchanging data in a state where privacy information of a node is concealed is considered.

従来技術では統計処理サーバを３台に分割することでデータを分散して保存し、プライバシを秘匿することで解決を図っている（例えば、非特許文献１及び２参照。）。この手法は軽量なアルゴリズムでデータ自体を秘匿したまま加算や定数倍、乗算といった基本演算を可能としている。 In the prior art, the data is distributed and stored by dividing the statistical processing server into three units, and the privacy is concealed (for example, see Non-Patent Documents 1 and 2). This method is a lightweight algorithm that allows basic operations such as addition, constant multiplication, and multiplication while keeping the data itself secret.

この手法では統計処理サーバの結託を考えない限りデータの復元が不可能であることは保障されているものの、誰が送信したデータなのかを匿名化することに関しては考えられていない。このことから、統計処理サーバやデータを分析する主体がノードに対していくつかの攻撃を行うことが可能となってしまっている。 Although this method guarantees that data cannot be restored unless the collusion of the statistical processing server is considered, it is not considered to anonymize who sent the data. This makes it possible for the statistical processing server and the main body analyzing the data to perform several attacks on the node.

たとえば、統計処理サーバがノードに回答対象の条件を含むクエリを送信することを考えたとき、クエリへの回答の有無によって回答者がそのクエリの条件に含まれている／いないということを結託することなく統計処理サーバは知ることができてしまう。また、データ分析主体がクエリの条件を攻撃対象の回答者一人だけしか該当しないような条件まで狭めたクエリにより統計情報を収集することで攻撃対象の回答者のデータは見せかけの統計処理を経てそのまま出力されてしまう。 For example, when the statistical processing server considers sending a query including a condition to be answered to a node, it is determined that the respondent is included or not included in the query condition depending on whether or not there is an answer to the query. Without knowing, the statistical processing server can know. In addition, the data analysis subject collects statistical information with a query that narrows the query condition to a condition that only one respondent subject to attack, so that the data of the respondent subject to attack is subjected to fake statistical processing as it is. Will be output.

このような問題を考えたとき、回答者の情報は秘匿されるべきであると言える。アドレスをほぼ完全に秘匿し、匿名でデータを送信するための通信手法はｐ２ｐを用いたＴｏｒ等の技術により可能とされている（例えば、非特許文献３参照。）。しかしながら匿名化を行い、送信者が完全に特定できない状態まで匿名化されてしまった場合、データの送信者、すなわちクエリの回答者に対価を支払うことが難しくなり、回答者側がデータを提供するモチベーションを得ることが難しい。結果として調査自体が不成立になりかねない。 When considering such problems, it can be said that the information of respondents should be kept secret. A communication technique for almost completely concealing the address and transmitting the data anonymously is enabled by a technique such as Tor using p2p (for example, see Non-Patent Document 3). However, if anonymization is performed and the sender is anonymized until it cannot be completely identified, it becomes difficult to pay the data sender, that is, the respondent of the query, and the motivation that the respondent provides the data Difficult to get. As a result, the survey itself may be unsuccessful.

また、データを分割して保存することを考えた時、完全にデータが匿名である場合にはもう一つ別の問題が発生する。すなわち分割されたデータの一部が欠損している場合において、そのまま統計処理を行った場合に統計データが正しくないものが出力されてしまう可能性がある。統計処理サーバ間のデータ数の差からデータが欠損していることが判明したとしても、匿名でそれぞれのデータが紐づいていないため取り除くべきデータがわからずすべてのデータが利用不可能となってしまう。 In addition, when considering dividing and storing data, another problem occurs when the data is completely anonymous. That is, when a part of the divided data is missing, if statistical processing is performed as it is, there is a possibility that the statistical data is incorrect. Even if it turns out that the data is missing from the difference in the number of data between the statistical processing servers, the data to be removed is unknown because each data is anonymous, and all the data becomes unavailable. End up.

千田浩司，五十嵐大，濱田浩気，高橋克巳：エラー検出可能な軽量３パーティ秘匿関数計算の提案と実装評価，情報処理学会論文誌，Ｖｏｌ．５２，Ｎｏ．９，ｐｐ２６７４−２６８５（２０１１）Koji Senda, Dai Igarashi, Hiroki Hirota, Katsumi Takahashi: Proposal and implementation evaluation of lightweight three-party secret function calculation with error detection, IPSJ Transactions, Vol. 52, no. 9, pp2674-2685 (2011) 千田浩司，五十嵐大，高橋克巳：効率的な３パーティ秘匿関数計算の提案とその運用モデルの考察、ＤＰＳ−１４２−１、ｐｐ１−７、２０１０．Koji Senda, Dai Igarashi, Katsumi Takahashi: Proposal of efficient three-party secret function calculation and consideration of its operation model, DPS-142-1, pp1-7, 2010. ｈｔｔｐｓ：／／ｗｗｗ．ｔｏｒｐｒｏｊｅｃｔ．ｏｒｇ／https: // www. torproject. org /

そこで、本発明は、プライバシ情報を含むデータを複数に分散しながら匿名化した状態で流通させることが可能な情報流通システム及び情報流通方法の提供を目的とする。 Therefore, an object of the present invention is to provide an information distribution system and an information distribution method capable of distributing data including privacy information in an anonymized state while being distributed in a plurality.

前述の目的を達成するために、本願発明の情報流通システム及び情報流通方法は、データ所有者は、データを分割した上で、それぞれのデータ（分割データ）を異なる仮名に基づき回収サーバに回答し、仮名サーバは、各データ所有者の仮名セットを記憶し、統計処理サーバからの問い合わせに対し、仮名セットが揃っているか否かを回答し、統計処理サーバは、受信した分割データの仮名を仮名サーバに通知し、仮名セットが揃っている場合に統計処理を行うことを特徴とする。 In order to achieve the above-described object, the information distribution system and the information distribution method of the present invention provide that the data owner divides the data and returns each data (divided data) to the collection server based on different pseudonyms. The kana server stores the kana set of each data owner, answers to the inquiry from the statistical processing server whether or not the kana set is prepared, and the statistical processing server kana the kana of the received divided data. The server is notified and statistical processing is performed when the kana set is complete.

具体的には、本願発明の情報流通システムは、ノードからの仮名発行要求を受けて当該ノードに複数の仮名からなる仮名セットを発行し、統計処理サーバから仮名の通知を受けると、通知を受けた仮名が仮名セットに含まれるか否かを確認し、仮名セットのすべての仮名が通知済みの場合に仮名が確認済みである旨を前記統計処理サーバに通知する仮名サーバと、前記仮名サーバの発行した仮名を送信元とするデータを蓄積し、蓄積した仮名を前記仮名サーバに通知し、前記確認済みである旨の通知を受けると、蓄積したデータのうちの確認済みの仮名を送信元とするデータを用いて統計処理を行う統計処理サーバと、を備える。 Specifically, the information distribution system of the present invention receives a kana issuance request from a node, issues a kana set consisting of a plurality of kana to the node, and receives a notification when receiving a kana notification from the statistical processing server. The pseudonym is included in the pseudonym set, and when all the pseudonyms in the pseudonym set have been notified, the pseudonym server that notifies the statistical processing server that the pseudonym has been confirmed, and the pseudonym server The data having the issued pseudonym as the transmission source is accumulated, the accumulated pseudonym is notified to the pseudonym server, and when the confirmation is received, the confirmed pseudonym of the accumulated data is set as the transmission source. A statistical processing server that performs statistical processing using the data to be processed.

本願発明の情報流通システムでは、前記統計処理サーバは、前記ノードのもつ元データを復元する演算処理を決定し、当該演算処理によって復元可能なデータを要求するクエリを生成し、蓄積したデータに当該演算処理を施すことで前記ノードのもつ元データを復元してもよい。 In the information distribution system of the present invention, the statistical processing server determines a calculation process for restoring the original data of the node, generates a query for requesting data that can be restored by the calculation process, and adds the query to the accumulated data. The original data of the node may be restored by performing arithmetic processing.

本願発明の情報流通システムでは、前記仮名サーバは、仮名セットのすべての仮名が通知済みとなっていない仮名を通知することで、前記確認済みである旨を前記統計処理サーバに通知してもよい。 In the information distribution system of the present invention, the kana server may notify the statistical processing server that the kana has been confirmed by notifying all kana of which kana has not been notified. .

本願発明の情報流通システムでは、前記仮名サーバは、仮名発行要求のあったノードの数が予め定められた数に達するまで仮名セットを発行せず、仮名発行要求のあったノードの数が予め定められた数に達すると仮名セットを発行してもよい。 In the information distribution system of the present invention, the kana server does not issue a kana set until the number of nodes for which a kana issue request has been made reaches a predetermined number, and the number of nodes for which a kana issue request has been made is predetermined. A kana set may be issued when the specified number is reached.

具体的には、本願発明の情報流通方法は、仮名サーバが、ノードからの仮名発行要求を受けて当該ノードに複数の仮名からなる仮名セットを発行する仮名発行手順と、統計処理サーバが、仮名を送信元とするデータを取得すると、仮名及びデータの組を蓄積し、蓄積した仮名を仮名サーバに通知し、仮名サーバから当該仮名が確認済みである旨の通知を受けると、蓄積したデータのうちの確認済みの仮名を送信元とするデータを用いて統計処理を行う統計処理手順と、を順に有する。 Specifically, in the information distribution method of the present invention, the kana server receives a kana issuance request from a node and issues a kana set including a plurality of kana to the node, and the statistical processing server Is acquired, the kana and data pairs are accumulated, the accumulated kana is notified to the kana server, and when the kana server receives notification that the kana has been confirmed, And a statistical processing procedure for performing statistical processing using data whose sender is a confirmed pseudonym.

本願発明の情報流通方法では、前記仮名発行手順において、統計処理サーバは、前記ノードのもつ元データを復元する演算処理を決定し、当該演算処理によって復元可能なデータを要求するクエリを生成し、前記統計処理手順において、統計処理サーバは、蓄積したデータに前記演算処理を施すことで前記ノードのもつ元データを復元してもよい。 In the information distribution method of the present invention, in the pseudonym issuing procedure, the statistical processing server determines a calculation process for restoring the original data of the node, generates a query for requesting data that can be restored by the calculation process, In the statistical processing procedure, the statistical processing server may restore the original data of the node by performing the arithmetic processing on the accumulated data.

本願発明の情報流通方法では、前記統計処理手順において、仮名サーバは、仮名セットのすべての仮名が通知済みとなっていない仮名を通知することで、前記確認済みである旨を統計処理サーバに通知してもよい。 In the information distribution method of the present invention, in the statistical processing procedure, the kana server notifies the statistical processing server that the kana has been confirmed by notifying all kana of which kana has not been notified. May be.

本願発明の情報流通方法では、前記仮名発行手順において、仮名サーバは、仮名発行要求のあったノードの数が予め定められた数に達するまで仮名セットを発行せず、仮名発行要求のあったノードの数が予め定められた数に達すると仮名セットを発行してもよい。 In the information distribution method of the present invention, in the kana issue procedure, the kana server does not issue a kana set until the number of nodes for which a kana issue request has been made reaches a predetermined number, and the node for which a kana issue request has been made A kana set may be issued when the number reaches the predetermined number.

なお、上記各発明は、可能な限り組み合わせることができる。 The above inventions can be combined as much as possible.

本発明によれば、仮名サーバが１つのノードに対して複数の仮名を発行し、統計処理サーバが仮名を用いて送信されたデータを統計処理するため、ノードから元データとは異なるデータを送信しておき、統計処理によってノードの元データを復元させることができる。このため、プライバシ情報を含むデータを複数に分散しながら匿名化した状態で流通させることができる。 According to the present invention, the kana server issues a plurality of kana to one node, and the statistical processing server statistically processes the data transmitted using the kana, so that the node transmits data different from the original data. In addition, the original data of the node can be restored by statistical processing. For this reason, data including privacy information can be distributed in an anonymized state while being distributed in a plurality.

本実施形態に係るシステムの機能ブロック図の一例を示す。An example of the functional block diagram of the system concerning this embodiment is shown. 本実施形態に係る情報流通システムのシーケンス図を示す。The sequence diagram of the information distribution system concerning this embodiment is shown. 統計処理サーバが複数台の場合のシーケンス図を示す。The sequence diagram in case there are a plurality of statistical processing servers is shown. 実施形態１におけるクエリ生成から仮名セットの発行までの説明図である。6 is an explanatory diagram from query generation to issuance of a pseudonym set in Embodiment 1. FIG. 実施形態１におけるデータの分割から統計処理までの説明図である。It is explanatory drawing from the division | segmentation of the data in Embodiment 1 to a statistical process. 実施形態２におけるクエリ生成から仮名セットの発行までの説明図である。It is explanatory drawing from the query production | generation in Embodiment 2 to issuance of a kana set. 実施形態２におけるデータの分割から統計処理までの説明図である。FIG. 10 is an explanatory diagram from data division to statistical processing in the second embodiment.

添付の図面を参照して本発明の実施形態を説明する。以下に説明する実施形態は本発明の実施の例であり、本発明は、以下の実施形態に制限されるものではない。なお、本明細書及び図面において符号が同じ構成要素は、相互に同一のものを示すものとする。 Embodiments of the present invention will be described with reference to the accompanying drawings. The embodiments described below are examples of the present invention, and the present invention is not limited to the following embodiments. In the present specification and drawings, the same reference numerals denote the same components.

（実施形態１）
本実施形態に係る情報流通システム及び情報流通方法は、複数に分散したデータを数値、送信者ともに復元不可能な状態で保存するために送信者の匿名化を行うサーバ（仮名サーバ）と統計処理サーバを分割し、さらに分割したデータそれぞれに異なる仮名を割り当てる。サーバを仮名発行と統計処理の役割によって分割することで処理を行う機関を分割することも可能となり、双方が供託しなければデータが復元されないような状況とすることでプライバシ情報の秘匿度合を高める。 (Embodiment 1)
An information distribution system and an information distribution method according to the present embodiment include a server (a pseudonym server) that performs anonymization of a sender and statistical processing in order to save a plurality of distributed data in a state where both the numerical value and the sender cannot be restored. The server is divided and a different pseudonym is assigned to each divided data. It is also possible to divide the server that performs processing by dividing the server according to the role of pseudonym issuance and statistical processing, and increase the confidentiality of privacy information by making the situation that data will not be restored unless both are deposited .

また、本実施形態に係る情報流通システム及び情報流通方法は、統計処理サーバ単体による攻撃を防ぐために、仮名サーバは回答者数の管理も同時に行う機能を持つ。具体的には統計的に十分と考えられる回答数が予想される状態になるまで各ノードに回答を許可しない、すなわち仮名の発行を行わない機能である。 Further, in the information distribution system and the information distribution method according to the present embodiment, the kana server has a function of simultaneously managing the number of respondents in order to prevent an attack by the statistical processing server alone. Specifically, this is a function that does not allow each node to reply, that is, does not issue a pseudonym until a statistically sufficient number of answers are expected.

そして、分割したデータの欠損による正しくない統計処理を防ぐために、本実施形態に係る情報流通システム及び情報流通方法は、統計処理サーバに集められた仮名セットの照会も仮名サーバは機能として有する。 In order to prevent incorrect statistical processing due to the loss of the divided data, the information distribution system and the information distribution method according to the present embodiment also have a function of the pseudonym server as a function of the query of the pseudonym set collected in the statistical processing server.

本実施形態に係る情報流通システムの機能ブロック図を図１に示す。本実施形態に係る情報流通システムは、個々のプライバシ情報を含んだデータを持つ複数のノード１０、クエリ配布サーバ４０、仮名サーバ３０、統計処理サーバ２０からなる。ノード１０は、クエリ処理部１１と、データ分割部１２と、データ格納部１３を備える。統計処理サーバ２０は、クエリ生成部２１と、回答回収・処理部２２を備える。仮名サーバ３０は、仮名発行部３１と、仮名生成・管理部３２と、仮名確認部３３を備える。クエリ配布サーバ４０は、クエリ格納部４１を備える。 A functional block diagram of the information distribution system according to the present embodiment is shown in FIG. The information distribution system according to the present embodiment includes a plurality of nodes 10 having data including individual privacy information, a query distribution server 40, a pseudonym server 30, and a statistical processing server 20. The node 10 includes a query processing unit 11, a data dividing unit 12, and a data storage unit 13. The statistical processing server 20 includes a query generation unit 21 and an answer collection / processing unit 22. The kana server 30 includes a kana issuing unit 31, a kana generation / management unit 32, and a kana confirmation unit 33. The query distribution server 40 includes a query storage unit 41.

図２に、本実施形態に係る情報流通方法のシーケンス図を示す。本実施形態に係る情報流通方法は、仮名発行手順と、クエリ処理手順と、統計処理手順と、を順に有する。
仮名発行手順では、仮名サーバ３０が、ノード１０からの仮名発行要求を受けて、ノード１０に複数の仮名からなる仮名セットを発行する。具体的には、ステップＳ２０１〜ステップＳ２０６を行う。
クエリ処理手順では、ノード１０がクエリ処理を行う。具体的には、ステップＳ２０７〜ステップＳ２０８を行う。
統計処理手順では、統計処理サーバ２０が、仮名を送信元とするデータを取得すると、仮名及びデータの組を蓄積し、蓄積した仮名を仮名サーバ３０に通知し、確認済みである旨の通知を受けると、蓄積したデータのうちの確認済みの仮名を送信元とするデータを用いて統計処理を行う。具体的には、ステップＳ２０９〜ステップＳ２１３を行う。 FIG. 2 shows a sequence diagram of the information distribution method according to the present embodiment. The information distribution method according to the present embodiment includes a kana issue procedure, a query processing procedure, and a statistical processing procedure in this order.
In the kana issue procedure, the kana server 30 receives a kana issue request from the node 10 and issues a kana set including a plurality of kana to the node 10. Specifically, steps S201 to S206 are performed.
In the query processing procedure, the node 10 performs query processing. Specifically, steps S207 to S208 are performed.
In the statistical processing procedure, when the statistical processing server 20 acquires the data having the pseudonym as the transmission source, the pseudonym and the data pair are accumulated, the accumulated pseudonym is notified to the kana server 30, and a notification that it has been confirmed is sent. Upon receipt, the statistical processing is performed using the data having the confirmed pseudonym as the transmission source among the accumulated data. Specifically, steps S209 to S213 are performed.

最初に、ステップＳ２０１では、統計処理サーバ２０のクエリ生成部２１は、要求された統計データを求めるためのクエリを生成する。統計処理サーバ２０は、ノードのもつ元データを復元する演算処理を決定し、当該演算処理によって復元可能なデータを要求するクエリを生成する。演算処理は、例えば総和である。 First, in step S201, the query generation unit 21 of the statistical processing server 20 generates a query for obtaining requested statistical data. The statistical processing server 20 determines a calculation process for restoring the original data of the node, and generates a query for requesting data that can be restored by the calculation process. The arithmetic processing is, for example, summation.

生成されたクエリは、ノード１０と統計処理サーバ２０の直接のやり取りを防ぐため、ノード１０に直接配布されず、各ノード１０から取得可能な状態でクエリ配布サーバ４０のクエリ格納部４１に設置される。これにより、各ノード１０にクエリが配布される（Ｓ２０２）。
クエリ生成部２１は、クエリをクエリ配布サーバ４０に設置した旨を仮名サーバ３０へ通知する（Ｓ２０４）。このときから、仮名発行部３１は、当該クエリについて、仮名の発行申請のあったノード数の計数を開始する。 In order to prevent direct exchange between the node 10 and the statistical processing server 20, the generated query is not directly distributed to the node 10 but is installed in the query storage unit 41 of the query distribution server 40 in a state where it can be acquired from each node 10. The As a result, the query is distributed to each node 10 (S202).
The query generation unit 21 notifies the pseudonym server 30 that the query is installed in the query distribution server 40 (S204). From this time, the pseudonym issuing unit 31 starts counting the number of nodes for which a pseudonym issuance application has been made for the query.

各ノード１０のクエリ処理部１１は、定期的にクエリ配布サーバ４０に対して問い合わせを行い、新着のクエリがある場合にはクエリ配布要求を行い、新着のクエリを得る（Ｓ２０３）。クエリの条件などから回答したいと判断した場合、ノード１０は、仮名サーバ３０へ仮名の発行申請を行い（Ｓ２０５）、発行処理が行われるまで待機する。 The query processing unit 11 of each node 10 periodically inquires the query distribution server 40, and if there is a new arrival query, makes a query distribution request to obtain a new arrival query (S203). If it is determined that the user wants to reply from the query conditions, the node 10 applies to the kana server 30 for issuance of a kana (S205), and waits until the issuance process is performed.

仮名サーバ３０の仮名発行部３１は、仮名発行要求に対して、同一のクエリＩＤへの発行要求が一定数Ｎ以上（例えば１０００個）発生するまで仮名の発行を待機する。これにより統計の標本集団数を仮名サーバ３０が第三者機関として保証することができ、そして先に述べた回答者一人しか該当しないような悪意あるクエリへの回答をノード１０にさせないための仕組みとなる。仮名の発行申請のあったノード数が予め定められたＮ個に達すると、仮名サーバ３０の仮名発行部３１は、各ノード１０に対してｎ個の仮名からなる仮名セットを生成し発行する（Ｓ２０６）。仮名サーバ３０の仮名生成・管理部３２は、仮名セットとノード１０の組を記憶する。 The kana issuer 31 of the kana server 30 waits for issuance of a kana until a predetermined number N or more (for example, 1000) of issuance requests for the same query ID are generated in response to the kana issuance request. As a result, the pseudonym server 30 can guarantee the number of sample groups of statistics as a third-party organization, and a mechanism for preventing the node 10 from answering a malicious query that corresponds to only one respondent described above. It becomes. When the number of nodes for which a pseudonym is issued has reached a predetermined number N, the pseudonym issuing unit 31 of the pseudonym server 30 generates and issues a kana set consisting of n pseudonyms to each node 10 ( S206). The kana generation / management unit 32 of the kana server 30 stores a set of the kana set and the node 10.

発行された仮名セットを構成する仮名の数ｎに応じて、各ノード１０のデータ分割部１２は自身のデータの値Ｄ_１を任意に分割する（Ｓ２０７）。ただし統計処理サーバ２０に要求されたデータの統計情報に応じて分割の方法は異なるものとする。統計処理サーバ２０での統計処理が平均であれば、データの値Ｄ_１を分割した値ｄ_１１〜ｄ_ｎ１の総和がデータの値Ｄ_１に等しくなるように分割すればどのような分割方法でもよい。例えば、仮名セットの仮名数ｎが３の場合、乱数ｒ_１，ｒ_２を生成して元の値ｘをｘ_１＝ｒ_１，ｘ_２＝−ｒ_２，ｘ_３＝ｘ−ｒ_１＋ｒ_２などとする分割方法が考えられる。クエリ処理部１１は、それぞれの仮名を用いて統計処理サーバ２０にデータｘ_１、ｘ_２及びｘ_３を分割した値ｄ_１１〜ｄ_ｎ１として送信する（Ｓ２０８）。これにより、クエリ処理部１１は、クエリに対する回答を送信する。 Depending on the number n of kana constituting the issued pseudonym set, the data dividing unit 12 of each node 10 is arbitrarily divided the value D ₁ of the own data (S207). However, the division method differs depending on the statistical information of the data requested from the statistical processing server 20. If statistical processing an average of a statistical processing server 20, in any division method if divided as the sum of the values d ₁₁ to d _n1 which divides the value D ₁ of the data is equal to the value D ₁ of the data Good. For example, when the kana number n of the kana set is 3, random numbers r ₁ and r ₂ are generated and the original values x are converted to x ₁ = r ₁ , x ₂ = −r ₂ , x ₃ = x−r ₁ + r ₂ A division method such as The query processing unit 11 transmits the data x ₁ , x _2, and x ₃ to the statistical processing server 20 as values d _{11 to} d _n1 using the respective pseudonyms (S208). Thereby, the query processing unit 11 transmits an answer to the query.

なお、ノード１０のクエリ処理部１１は、分割したデータを同時に送るのではなく、乱数によって決定される時間（例えば、１０分から１０時間の一様分布）、待機した後に、統計処理サーバ２０に送付してもよい。これらの方法により、分割データ間のマッチングを取ることを一層困難とできる。 The query processing unit 11 of the node 10 does not send the divided data at the same time, but sends it to the statistical processing server 20 after waiting for a time determined by random numbers (for example, a uniform distribution from 10 minutes to 10 hours). May be. With these methods, it is possible to make it more difficult to match the divided data.

また、統計処理サーバ２０での統計処理が分散や相関である場合を考えたとき、値の分割値、値の二乗の分割値、複数の要素の値の積の分割値等複数の統計処理に用いる値をそれぞれ分割し、同一パケットにまとめて送信することも可能である。なお、仮名を用いた場合でもデータの送信プロトコル自体は通常のパケット送信に従えば、ノード１０は自身のデータの送信の失敗をＡＣＫのレスポンスが返ってこないことから知ることができる。 In addition, when considering the case where the statistical processing in the statistical processing server 20 is variance or correlation, the statistical processing server 20 is used for a plurality of statistical processing such as a divided value of a value, a divided value of a square of a value, and a divided value of a product of a plurality of element values. It is also possible to divide the values to be used and transmit them together in the same packet. Even when the pseudonym is used, if the data transmission protocol itself follows normal packet transmission, the node 10 can know that the data transmission has failed because the ACK response does not return.

結果を受け取った統計処理サーバ２０の回答回収・処理部２２は、すぐに結果を処理せず、送信してきた仮名の情報を仮名サーバ３０の仮名確認部３３へ送信し、仮名の問合せを行う（Ｓ２０９）。これにより各ノード１０の分割されたデータがすべて揃う前に統計処理に加えてしまい統計データが正しくない結果になることを防ぐ。 The answer collection / processing unit 22 of the statistical processing server 20 that has received the result does not immediately process the result, but transmits the transmitted kana information to the kana confirmation unit 33 of the kana server 30 to inquire about the kana ( S209). This prevents the statistical data from being incorrect because it is added to the statistical processing before all the divided data of each node 10 is collected.

仮名情報を統計処理サーバ２０から送信された仮名サーバ３０の仮名確認部３３は、統計処理サーバ２０から受信した仮名が確認済みである旨を仮名生成・管理部３２に登録する。そして、あるノード１０に対して発行した仮名セットのｎ個の仮名をすべて受信したか否かを確認し（Ｓ２１０）、仮名セットが揃うまで統計処理サーバ２０への応答を待機する。あるノード１０の仮名セットが揃ったことが確認されたのちに、仮名サーバ３０の仮名確認部３３は、認証された仮名への肯定応答を統計処理サーバ２０へ返す（Ｓ２１１）。 The pseudonym confirmation unit 33 of the pseudonym server 30 that has transmitted the pseudonym information from the statistical processing server 20 registers in the pseudonym generation / management unit 32 that the pseudonym received from the statistical processing server 20 has been confirmed. Then, it is confirmed whether or not all n kana characters of the kana set issued to a certain node 10 have been received (S210), and a response to the statistical processing server 20 is waited until the kana sets are prepared. After it is confirmed that the kana set of a certain node 10 is prepared, the kana confirmation unit 33 of the kana server 30 returns an acknowledgment to the authenticated kana to the statistical processing server 20 (S211).

ここで、肯定応答は仮名セットを統計処理サーバ２０側が一意に特定できないような応答であればよく、複数のノード１０に相当する十分量の仮名を統計処理サーバ２０が受け取ったのちに、仮名セットを構成するすべての仮名を統計処理サーバ２０に通知してもよいし、仮名の通知に代えて、統計量に加えるべきでない、不揃いの仮名を仮名サーバ３０が応答する形式でもよい。また、ノード１０からの送信者に対し対価が存在する場合には、あらかじめ統計処理サーバ２０が仮名サーバ３０に対価を預け、仮名サーバ３０がこの応答待機間に仮名セットが揃ったノード１０から順に支払いを済ませる方法が考えられる。 Here, the positive response may be a response in which the statistical processing server 20 cannot uniquely identify the kana set, and after the statistical processing server 20 receives a sufficient amount of kana corresponding to the plurality of nodes 10, May be notified to the statistical processing server 20, or instead of notifying the kana, the kana server 30 may respond with an irregular kana that should not be added to the statistics. In addition, when there is consideration for the sender from the node 10, the statistical processing server 20 deposits the consideration to the kana server 30 in advance, and the kana server 30 is in order from the node 10 where the kana set is prepared during this response waiting. A possible way to complete the payment.

統計処理サーバ２０の回答回収・処理部２２は、仮名サーバ３０から肯定応答を得た順に統計処理を行う（Ｓ２１３）。肯定応答によって、仮名セットに含まれる仮名の全てが揃っているデータを用いて、統計処理を行うことができる。例えば、回答回収・処理部２２は、仮名ｆ_３５、ｆ_ｎ６、ｆ_１１、ｆ_５２で送信されたデータｄ_３５、ｄ_ｎ６、ｄ_１１、ｄ_５２について肯定応答を受けると、データｄ_３５、ｄ_ｎ６、ｄ_１１、ｄ_５２を用いて統計処理を行い、暫定的な統計データＳを算出する。次に、回答回収・処理部２２は、仮名ｆ_５１、ｆ_１７、ｆ_７７、ｆ_２１で送信されたデータｄ_５１、ｄ_１７、ｄ_７７、ｄ_２１について肯定応答を受けると、暫定的な統計データＳとデータｄ_５１、ｄ_１７、ｄ_７７、ｄ_２１を用いて統計処理を行い、更なる暫定的な統計データＳを算出する。これを繰り返す。ここで、統計処理は、クエリ発生時にクエリ生成部２１が決定した統計処理である。 The answer collection / processing unit 22 of the statistical processing server 20 performs statistical processing in the order in which an affirmative response is obtained from the pseudonym server 30 (S213). The statistical processing can be performed by using the data in which all of the kana included in the kana set are prepared by the affirmative response. For example, response collection and processing unit 22, when the kana _{_{_{f 35, f n6, f 11}}} , data _d 35 transmitted by _{_{_{f 52, d n6, d 11}}} , the _{d 52} receives the acknowledgment, the data _d 35, d _n6, performs statistical processing using the _{_d 11,} _d _52, calculates a provisional statistics S. Next, when the response collection / processing unit 22 receives an affirmative response for the data d ₅₁ , d ₁₇ , d ₇₇ , d ₂₁ transmitted in the pseudonyms f ₅₁ , f ₁₇ , f ₇₇ , f ₂₁ , provisional statistics Statistical processing is performed using the data S and the data d ₅₁ , d ₁₇ , d ₇₇ , and d ₂₁ , and further provisional statistical data S is calculated. Repeat this. Here, the statistical process is a statistical process determined by the query generation unit 21 when a query is generated.

最終的にクエリ生成時に決定したＮ個のノード１０に確認済みの仮名セット数が達すると、仮名確認部３３は標本数Ｎが到達した旨の通知を統計処理サーバ２０に行う（Ｓ２１２）。回答回収・処理部２２は、回答の数が達した時点で標本数到達通知を仮名サーバ３０から受信し、当該通知により統計処理を終え、統計データを得る。そして、回答回収・処理部２２は、統計データを統計データの要求元に送信する（Ｓ２１３）。 When the number of confirmed kana sets finally reaches N nodes 10 determined at the time of query generation, the kana confirmation unit 33 notifies the statistical processing server 20 that the number of samples N has been reached (S212). The response collection / processing unit 22 receives a sample number arrival notification from the pseudonym server 30 when the number of responses reaches, finishes statistical processing by the notification, and obtains statistical data. Then, the answer collection / processing unit 22 transmits the statistical data to the statistical data request source (S213).

なお、シーケンス図では簡単のため、統計処理において平均を求める手法について説明しているが、分散や、相関を計算するための共分散についても同様に計算することができる。 For simplicity, the sequence diagram describes a method for obtaining an average in statistical processing, but the variance and covariance for calculating the correlation can be similarly calculated.

また、これらの内容におけるノード１０を無線端末と考えたとき、各スマートフォン上への利用形態も考えられる。ノード１０は、端末が異なるワイヤレスネットワークに属した際に付与される異なるＩＰアドレスを用いて、統計処理サーバ２０に回答を返却することもできる。これにより、通信に用いられたＩＰアドレスを利用してマッチングすることを回避できる。 Moreover, when the node 10 in these contents is considered as a wireless terminal, a usage form on each smartphone is also conceivable. The node 10 can also return an answer to the statistical processing server 20 using a different IP address given when the terminal belongs to a different wireless network. Thereby, it can avoid matching using the IP address used for communication.

また、統計処理サーバ２０が１台の場合について記述したが、複数の統計処理サーバ２０に分割することも可能である。その場合のシーケンス図を図３に示す。
複数台の統計処理サーバ２０のうちの１台が統計処理主体（親サーバ）となり、統計処理主体（親サーバ）がクエリ生成、クエリ配布サーバとの通信や仮名サーバへのクエリ配布通知、最終的な統計処理を行う。そこで、クエリ生成ステップＳ２０１の前に、統計処理主体（親サーバ）を決定する（Ｓ３０１）。例えば、第１の統計処理サーバ２０及び第２の統計処理サーバ２０のうちの第１の統計処理サーバ２０を統計処理主体に決定する。なお、統計処理主体（親サーバ）の決定の仕方は任意である。 Moreover, although the case where there was one statistical processing server 20 was described, it is also possible to divide into a plurality of statistical processing servers 20. A sequence diagram in that case is shown in FIG.
One of the plurality of statistical processing servers 20 becomes a statistical processing entity (parent server), and the statistical processing entity (parent server) generates a query, communicates with the query distribution server, sends a query distribution notification to the pseudonym server, and finally. Perform statistical processing. Therefore, the statistical processing entity (parent server) is determined before the query generation step S201 (S301). For example, the first statistical processing server 20 out of the first statistical processing server 20 and the second statistical processing server 20 is determined as a statistical processing entity. The method of determining the statistical processing subject (parent server) is arbitrary.

図３に示すシーケンスでは、ステップＳ２０８において、ノード１０はどの統計処理サーバ２０に対しても分割したデータを送ってもよい。各統計処理サーバ２０は、自身で集計した総和や総積を統計処理主体に預ける。例えば、第１の統計処理サーバ２０が統計処理主体の場合、ステップＳ２１１において、第１の統計処理サーバ２０がデータｄ_３５、ｄ_ｎ６、ｄ_１１、ｄ_５２を用いて暫定的な統計データＳ_１を算出し、第２の統計処理サーバ２０がデータｄ_２１、ｄ_１７、ｄ_７７を用いて暫定的な統計データＳ_２を算出する。
そして、第１の統計処理サーバ２０は、標本数到達通知を仮名サーバ３０から受信すると（Ｓ２１２）、終了通知及び統計値の要求を第２の統計処理サーバ２０に送信する（Ｓ３０２）。すると、第２の統計処理サーバ２０は、第２の統計処理サーバ２０の算出した暫定的な統計データＳ_２を第１の統計処理サーバ２０に送信する（Ｓ３０３）。
そして、ステップＳ２１３を実行する。このとき、第１の統計処理サーバ２０が、暫定的な統計データＳ_１及びＳ_２を用いて統計処理を行うことで最終的な統計データを得る。 In the sequence shown in FIG. 3, in step S <b> 208, the node 10 may send the divided data to any statistical processing server 20. Each statistical processing server 20 deposits the sum and total summed up by the statistical processing entity. For example, when the first statistical processing server 20 is a statistical processing entity, in step S211, the first statistical processing server 20 uses the data d ₃₅ , d _n6 , d ₁₁ , d ₅₂ to provisional statistical data S _1. calculating a second statistical processing server 20 calculates a provisional statistics _{S 2} using the data _{_{_{d 21, d 17, d 77}}} .
When the first statistical processing server 20 receives the sample number arrival notification from the pseudonym server 30 (S212), the first statistical processing server 20 transmits an end notification and a statistical value request to the second statistical processing server 20 (S302). Then, the second statistical processing server 20 transmits a provisional statistics S ₂ calculated in the second statistical processing server 20 to the first statistical processing server 20 (S303).
Then, step S213 is executed. At this time, the first statistical processing server 20 to obtain the final statistical data by performing statistical processing using the temporary statistical data S ₁ and S _2.

実際に複数のノードから分割されたデータを受け取って統計処理が正しく行われることを示した実施例が図４及び図５になる。今、身長１６０ｃｍというデータがノードＡのデータ格納部１３に格納され、身長１８８ｃｍというデータがノードＢのデータ格納部１３に格納されており、二人の平均身長およびその分散を二人の身長を知ることなく求めることを考える。この場合、統計データ要求は平均及び分散である。 4 and 5 show an example in which statistical processing is correctly performed by actually receiving data divided from a plurality of nodes. Now, data with a height of 160 cm is stored in the data storage unit 13 of the node A, and data with a height of 188 cm is stored in the data storage unit 13 of the node B. Think about what you want without knowing. In this case, the statistical data requests are average and variance.

図４に示されたＳ２０１〜Ｓ２０６までの処理は図２に示すシーケンス図に詳細に記述された通りである。
ステップＳ２０２において配布されるクエリには、例えば、クエリの送信者名がＭである旨と、クエリＩＤがＭ−１である旨と、条件がノード全員である旨と、クエリの内容が身長ｈの平均ａｖｅ_ｈ及び分散ｖａｒ_ｈである旨と、必要データが身長ｈ及び身長ｈの２乗である旨と、統計処理で行う計算が総和である旨が記述される。
ステップＳ２０５において、ノードＡが仮名発行要求を行う際には、ノードＡは、送信者名がＡである旨と、クエリＩＤがＭ−１である旨を、仮名サーバ３０へ送信する。ノードＢが仮名発行要求を行う際には、ノードＢは、送信者名がＢである旨と、クエリＩＤがＭ−１である旨を、仮名サーバ３０へ送信する。ステップＳ２０５における仮名セット生成時に、仮名サーバ３０側で仮名セットの発行を待機している状態になっている。
ステップＳ２０６において仮名サーバ３０は、ノードＡについては仮名Ｗ及びＹからなる仮名セットを発行し、ノードＢについては仮名Ｘ及びＺからなる仮名セットを発行する。 The processing from S201 to S206 shown in FIG. 4 is as described in detail in the sequence diagram shown in FIG.
The query distributed in step S202 includes, for example, that the sender name of the query is M, the query ID is M-1, the condition is all nodes, and the content of the query is height h. Of the average ave _h and variance var _h , the fact that the necessary data is the height h and the height h squared, and the fact that the calculation performed by the statistical processing is the sum.
In step S205, when the node A makes a pseudonym issue request, the node A transmits to the pseudonym server 30 that the sender name is A and the query ID is M-1. When the node B makes a kana issuance request, the node B transmits to the kana server 30 that the sender name is B and the query ID is M-1. At the time of generating the kana set in step S205, the kana server 30 is waiting to issue a kana set.
In step S206, the kana server 30 issues a kana set consisting of kana W and Y for the node A, and issues a kana set consisting of the kana X and Z for the node B.

図５に示されたＳ２０７〜Ｓ２１３までの処理は図２に示すシーケンス図に詳細に記述された通りである。
ステップＳ２０７の回答分割において、仮名が２つであること、クエリで求められている必要データが平均ａｖｅ_ｈと分散ｖａｒ_ｈであり、元データの復元方法が総和であることから、ノードＡ及びＢは身長ｈ及び身長ｈの２乗をそれぞれ総和で復元されるように２つの任意の値に分割する。例えば、身長１６０ｃｍの場合、身長ｈを７７と８３に分割し、身長の２乗である２５６００を１５６００と１００００に分割する。身長１８８ｃｍの場合、身長ｈを９０と９８に分割し、身長の２乗である３５３４４を−１５３３１と５０６７５に分割する。 The processing from S207 to S213 shown in FIG. 5 is as described in detail in the sequence diagram shown in FIG.
In the answer division of step S207, since there are two pseudonyms, the required data obtained by the query is the average ave _h and the variance var _h , and the restoration method of the original data is the sum, the nodes A and B Divides height h and the square of height h into two arbitrary values so as to be restored as a sum. For example, when the height is 160 cm, the height h is divided into 77 and 83, and the height squared 25600 is divided into 15600 and 10,000. In the case of a height of 188 cm, the height h is divided into 90 and 98, and 35344, which is the square of the height, is divided into -15331 and 50675.

続いてステップＳ２０８の回答送信において、ノードＡから送信者をＷとする回答と送信者をＹとする回答を統計処理サーバ２０へ送信し、ノードＢから送信者をＸとする回答と送信者をＺとする回答を統計処理サーバ２０へ送信する。このとき、クエリＩＤがＭ−１の回答として、送信者Ｘから、身長ｈが９０であり、身長ｈの２乗が１５３３１である旨を送信し、送信者Ｚから、身長ｈが９８であり、身長ｈの２乗が５０６７５である旨を送信し、送信者Ｗから、身長ｈが７７であり、身長ｈの２乗が１５６００である旨を送信し、送信者Ｙから、身長ｈが８３であり、身長ｈの２乗が１００００である旨を送信する。ここで、それぞれの仮名を用いて分割されたデータを送信しているが、ここでは身長ｈと身長ｈの２乗の分割値を同じパケットで送信している。このように、データの種類が異なれば同一のパケットに含めることが可能である。 Subsequently, in the reply transmission in step S208, the reply having the sender W and the reply having the sender Y are transmitted from the node A to the statistical processing server 20, and the reply having the sender X and the sender are transmitted from the node B. An answer as Z is transmitted to the statistical processing server 20. At this time, as a reply with the query ID M-1, from the sender X, the fact that the height h is 90 and the square of the height h is 15331 is transmitted, and from the sender Z, the height h is 98. , The fact that the height h square is 50675, and the sender W sends a message that the height h is 77 and the height h square is 15600. And the fact that the square of height h is 10,000 is transmitted. Here, the divided data is transmitted using each pseudonym, but here, the divided value of height h and the square of height h is transmitted in the same packet. In this way, different data types can be included in the same packet.

ステップＳ２１１の確認済み仮名通知では、今回の総標本数すなわちクエリを配布するノード数Ｎが２であることが仮名発行時点で仮名サーバ３０からはわかっているので、統計処理サーバ２０への仮名Ｘ、Ｚ、Ｗ、Ｙについての通知（Ｓ２１１）と同時に終了通知と総標本数（Ｓ２１２）を統計処理サーバ２０に対して送信する。
最後にステップＳ２１３の統計処理で、平均ａｖｅ_ｈと分散ｖａｒ_ｈを求める。例えば、平均ａｖｅ_ｈは、（７７＋９０＋８３＋９８）／２＝１７４を算出することで、身長１６０ｃｍと１８８ｃｍの平均ａｖｅ_ｈ１７４ｃｍを求めることができる。分散ｖａｒ_ｈは、（１５６００−１５３３１＋１００００＋５０６７５）／２−１７４^２＝１９６を算出することで、身長１６０ｃｍと１８８ｃｍの分散ｖａｒ_ｈ１９６を求めることができる。このように、統計処理サーバ２０からはノードＡ、Ｂの真の値はわからないが、正しく統計処理が行われることが確認された。 In the confirmed kana notification in step S211, since the kana server 30 knows that the current total number of samples, that is, the number of nodes N to which the query is distributed is 2, from the kana server 30, the kana X to the statistical processing server 20 is known. , Z, W, and Y (S211), the end notification and the total number of samples (S212) are transmitted to the statistical processing server 20.
Finally, average ave _h and variance var _h are obtained by statistical processing in step S213. For example, the average ave _h can be obtained by calculating (77 + 90 + 83 + 98) / 2 = 174 to obtain an average ave _{h of} 174 cm and height of 188 cm. The variance var _h can be obtained by calculating (15600-15331 + 10000 + 50675) / 2-174 ² = 196, and the variance var _h 196 of 160 cm and 188 cm in height can be obtained. As described above, the statistical processing server 20 does not know the true values of the nodes A and B, but it has been confirmed that the statistical processing is correctly performed.

データの処理には複数のサーバ間でやりとりを行うが、本実施形態においては仮名サーバ３０と統計処理サーバ２０がすべて結託した場合でなければプライバシを含む元のデータが復元されることはなく、これにより仮名サーバ３０が信頼のおける機関であればデータを分析する主体は容易に自身でクエリを発行することができるようになる。また回答の有無だけで「クエリ条件に該当する／しない」というプライバシ情報が漏れる場合があるが、回答者を仮名サーバ３０によって秘匿することにより回答しなかったことでプライバシ情報が特定されることがなくなる。具体的な例を挙げると、質問クエリを３つ用意し、条件を「２０代」、「男性」、「身長１７０ｃｍ以上」と分けたときに、結果が分散されていたとしてもノード１０のアドレスは毎回同一なのでクエリ自体によってプライバシを含む情報量が少しずつ増えており、結託を行っていなくても個人が特定されることが従来技術ではあり得る。しかし本実施形態においては、毎回異なる仮名でデータを送信することにより、一回の質問で特定されるような内容でなければ、すなわち回答者が一人になる攻撃と同様の状況にならなければプライバシ情報は洩れることがなく、そのような状況もクエリの発行数管理によって対処可能となっている。 Data processing is performed between a plurality of servers. In this embodiment, the original data including privacy is not restored unless the pseudonym server 30 and the statistical processing server 20 are all collocated. As a result, if the kana server 30 is a reliable organization, the subject analyzing the data can easily issue a query by itself. In addition, privacy information “matching / does not satisfy the query condition” may be leaked only by whether or not there is a response, but privacy information may be specified by not answering by concealing the respondent by the kana server 30. Disappear. As a specific example, when three question queries are prepared and the conditions are divided into “20s”, “male”, and “height of 170 cm or more”, even if the result is distributed, the address of the node 10 Is the same every time, the amount of information including privacy is gradually increased by the query itself, and it is possible in the prior art that an individual can be identified without performing collusion. However, in this embodiment, if data is transmitted with a different pseudonym each time and the content is not specified by a single question, that is, if the situation is not the same as an attack where the respondent is alone, the privacy Information is not leaked, and such a situation can be dealt with by managing the number of issued queries.

さらにその回答者が一人になる攻撃が行われた場合でも、仮名サーバ３０のクエリに対する仮名の発行待機を行うことで攻撃を防ぐことが可能となる。 Furthermore, even when an attack is performed in which the respondent is alone, the attack can be prevented by waiting for issuance of a pseudonym for the query of the pseudonym server 30.

仮名サーバ３０においてクエリの発行数および回答数を管理し、キュー出しを行うことで十分な回答者数を保証するとともに、分散されたデータのロスに対しても耐性を保つことができる。たとえば、同一のノードに対して発行を行った仮名セット（ａ，ｂ，ｃ）のうち統計処理サーバ２０から問い合わせが（ａ，ｃ）のみ来た場合であれば、ｂが得られていないのでこのノードの仮名セットに対しては計算処理を行わないような指示、ないしはｂが得られるまでの待機処理を行うことができるようになる。 The pseudonym server 30 manages the number of issued queries and the number of answers and performs queuing to guarantee a sufficient number of respondents, and to maintain resistance against the loss of distributed data. For example, in the case where only a query (a, c) is received from the statistical processing server 20 in the kana set (a, b, c) issued to the same node, b is not obtained. An instruction not to perform calculation processing for the kana set of this node, or standby processing until b is obtained can be performed.

仮名サーバを用いて分散したデータ間のつながりと所有者の情報を秘匿することで、プライバシ情報を秘匿することが可能になり、分析主体や統計処理サーバに対して信頼がない場合でも統計分析が可能となる。また、統計情報の正確性、信頼性という面においても頑健なシステムを構成することができる。 By concealing the connection between the distributed data and the owner's information using the Kana server, it becomes possible to conceal the privacy information, and statistical analysis can be performed even if the analysis subject or the statistical processing server is not reliable. It becomes possible. In addition, a robust system can be configured in terms of accuracy and reliability of statistical information.

（実施形態２）
実施形態１で説明した秘匿統計処理技術では、平均・分散などに代表される統計値を計算することが可能であった。一方で、統計調査では集積されたデータ全体の傾向から得られる知見も存在している。度数分布やそれをグラフ化したヒストグラムは平均や分散のような値からは判別できない特徴を発見したり、最頻値や中央値を求めたりする場合に有効である。また居住している都道府県などのように、平均化することができない・意味を持たないデータに対してもヒストグラムを用いることでその統計的傾向を把握することが可能となる。本実施形態では、このようなデータを匿名化して集積する。 (Embodiment 2)
In the secret statistical processing technique described in the first embodiment, it is possible to calculate statistical values represented by average / variance. On the other hand, in statistical surveys, there are also knowledge that can be obtained from the trend of the total data collected. The frequency distribution and the histogram that graphs it are effective for finding features that cannot be discriminated from values such as average and variance, and for finding the mode and median. It is also possible to grasp the statistical tendency of data that cannot be averaged or has no meaning, such as the prefecture where you live, by using a histogram. In this embodiment, such data is anonymized and accumulated.

システムの機能ブロック図を図１に示す。実施形態１における機能ブロックと構成に違いはなく、個々のプライバシ情報を含んだデータを持つ複数のノード１０、クエリ配布サーバ４０、仮名サーバ３０、統計処理サーバ２０からなる。実施形態１と同様に、分割したデータそれぞれに異なる仮名を割り当てる。本実施形態では、度数分布を可能とするために、データは２値の行列式に変換される。統計処理サーバ２０は得られた度数分布を元のデータの組に復元し、最頻値や中央値を求める。 A functional block diagram of the system is shown in FIG. There is no difference in configuration from the functional blocks in the first embodiment, and it includes a plurality of nodes 10 having data including individual privacy information, a query distribution server 40, a pseudonym server 30, and a statistical processing server 20. Similar to the first embodiment, a different kana is assigned to each of the divided data. In this embodiment, the data is converted into a binary determinant to enable frequency distribution. The statistical processing server 20 restores the obtained frequency distribution to the original data set, and obtains the mode value and the median value.

本実施形態では、ステップＳ２０７において、発行された仮名セットの数に応じて各ノードは自身のデータの値を任意に分割する。ただし統計情報に応じて分割の方法は異なるものとする。度数分布を集計するための分割においては、例えば性別であれば［男，女］を行列として［１，０］ないしは［０，１］の変換を行い、それぞれの要素に乱数を用いて分割する。たとえば［１，０］の行列を、加算によって復元可能な形で、［−２，１］と［３，−１］に分割する。この行列と要素の対応付けはクエリ上で表記することで整合性を取る。 In this embodiment, in step S207, each node arbitrarily divides its own data value according to the number of issued kana sets. However, the division method differs according to the statistical information. In the division for counting the frequency distribution, for example, in the case of gender, [male, female] is used as a matrix and [1,0] or [0,1] is converted, and each element is divided using random numbers. . For example, a matrix of [1, 0] is divided into [−2, 1] and [3, −1] in a form that can be restored by addition. The correspondence between the matrix and the element is represented by a query to ensure consistency.

またステップＳ２０８において、それぞれの仮名を用いてノードは統計処理サーバ２０にデータを送信する。なお、ノード１０は分割したデータを同時に送るのではなく、乱数によって決定される時間（例えば、１０分から１０時間の一様分布）、待機した後に、統計処理サーバ２０に送付してもよい。これらの方法により、分割データ間のマッチングを取ることを一層困難とできる。なお、仮名を用いた場合でもデータの送信プロトコル自体は通常のパケット送信に従えば、ノード１０は自身のデータの送信の失敗をＡＣＫのレスポンスが返ってこないことから知ることができる。 In step S208, the node transmits data to the statistical processing server 20 using each pseudonym. The node 10 may send the divided data to the statistical processing server 20 after waiting for a time determined by random numbers (for example, a uniform distribution of 10 minutes to 10 hours) instead of sending the divided data at the same time. With these methods, it is possible to make it more difficult to match the divided data. Even when the pseudonym is used, if the data transmission protocol itself follows normal packet transmission, the node 10 can know that the data transmission has failed because the ACK response does not return.

結果を受け取った統計処理サーバ２０はすぐに結果を処理せず、送信してきた仮名の情報を仮名サーバ３０へ送信する（ステップＳ２０９）。これにより各ノード１０の分割されたデータがすべて揃う前に統計処理に加えてしまい統計データが正しくない結果になることを防ぐ。 The statistical processing server 20 that has received the result does not immediately process the result, but transmits the transmitted kana information to the kana server 30 (step S209). This prevents the statistical data from being incorrect because it is added to the statistical processing before all the divided data of each node 10 is collected.

仮名情報を統計処理サーバ２０から送信された仮名サーバ３０はあるノード１０に対して発行した仮名の組が揃うまで統計処理サーバ２０への応答を待機する。あるノード１０のセットが揃ったことが確認されたのちに仮名サーバ３０は認証された仮名への肯定応答を統計処理サーバ２０へ返す。なお、これは仮名セットを統計処理サーバ２０側が一意に特定できないような応答であればよく、十分量の標本セットを統計処理サーバ２０が受け取ったのちに統計量に加えるべきでない、不揃いの仮名を仮名サーバ３０が応答する形式でもよい。また、送信者に対し対価が存在する場合には、あらかじめ統計処理サーバ２０が仮名サーバ３０に対価を預け、仮名サーバ３０がこの応答待機間に仮名セットが揃ったノード１０から順に支払いを済ませる方法が考えられる。 The pseudonym server 30 that has transmitted the pseudonym information from the statistical processing server 20 waits for a response to the statistical processing server 20 until a set of pseudonyms issued to a certain node 10 is prepared. After confirming that a set of a certain node 10 has been prepared, the kana server 30 returns an acknowledgment to the authenticated kana to the statistical processing server 20. Note that this may be a response that the statistical processing server 20 cannot uniquely identify the kana set, and after the statistical processing server 20 receives a sufficient amount of sample sets, it should not add to the statistics. A format in which the kana server 30 responds may be used. Further, when there is consideration for the sender, the statistical processing server 20 deposits the consideration in advance in the kana server 30, and the kana server 30 finishes paying in order from the node 10 in which the kana set is prepared while waiting for this response. Can be considered.

統計処理サーバ２０は仮名サーバ３０から肯定応答を得た順に統計処理を行う。最終的にクエリ生成時に決定した個数に達した時点で処理を終え、統計データを得る。 The statistical processing server 20 performs statistical processing in the order in which an affirmative response is obtained from the kana server 30. When the number finally determined at the time of generating the query is reached, the process is terminated and statistical data is obtained.

図６、図７を用いて、実際に複数のノードから分割されたデータを受け取ってヒストグラム処理が正しく行われることを説明する。今、男性ノードＡと女性ノードＢの二人の性別の分布を二人の性別を知ることなく求めることを考える。図６に示されたＳ２０１〜Ｓ２０６までの処理は実施形態１と同様である。 The fact that the histogram processing is correctly performed by receiving data actually divided from a plurality of nodes will be described with reference to FIGS. Now, consider obtaining the sex distribution of male node A and female node B without knowing the gender of the two. The processes from S201 to S206 shown in FIG. 6 are the same as those in the first embodiment.

図７のステップＳ２０７に示す回答分割において仮名が２つであること、クエリで求められているのが性別の度数分布行列であることからノードＡ、Ｂは自身の性別をそれぞれ総和で復元されるように２つの任意の値に分割している。ステップＳ２１１に示す確認済み仮名通知では今回の総標本数が２であることが仮名発行時点で仮名サーバ３０からはわかっているので同時に終了通知と総標本数を統計処理サーバ２０に対して送信している。最後にステップＳ２１３に示す統計処理で行列の各要素の総和が求められており、統計処理サーバからはノードＡ、Ｂの性別はわからないが正しくその度数分布が求められていることが確認された。 In the answer division shown in step S207 of FIG. 7, since there are two pseudonyms and the frequency distribution matrix of the gender that is obtained by the query, the nodes A and B are restored with their respective genders as sums. In this way, it is divided into two arbitrary values. In the confirmed kana notification shown in step S211, the kana server 30 knows that the current total number of samples is 2 at the time of issuance of the kana, so the end notification and the total number of samples are simultaneously transmitted to the statistical processing server 20. ing. Finally, the sum of each element of the matrix was obtained by the statistical processing shown in step S213, and it was confirmed that the statistical distribution server correctly obtained the frequency distribution although the genders of the nodes A and B were not known.

最頻値は、変数が連続値の場合数値をビンの幅によっていくつかのグループに分類することで求めることができる。中央値も同様にしてどのグループに含まれているかでおおよその値を求められる。離散値であればそれぞれの数値を行列の１要素とすることで正確な最頻値、中央値が求まる。 The mode value can be obtained by classifying numerical values into several groups according to bin width when the variable is a continuous value. In the same way, an approximate value can be obtained depending on which group the median is included in. If it is a discrete value, an accurate mode value and median value can be obtained by making each numerical value one element of the matrix.

また、例えば「各都道府県の男女の平均身長」といったクエリもこれまでの変数の平均とヒストグラムを組み合わせることで実現できる。各都道府県の行列と男女の組を４７×２の行列として表現し、ノードが該当する行列要素の部分に身長を入力し、他の要素を０にすることで各個人の情報が表される。このようにヒストグラムの処理が可能になることで先の出願よりもより自由度の高い統計分析ができるようになる。 For example, a query such as “average height of men and women in each prefecture” can be realized by combining the average of variables so far and a histogram. Each individual's information is expressed by expressing the matrix of each prefecture and the pair of men and women as a 47 x 2 matrix, entering the height in the corresponding matrix element part of the node, and setting the other elements to 0 . Since the histogram can be processed in this way, statistical analysis with a higher degree of freedom than the previous application can be performed.

本実施形態によれば、複数の主体が協調することでプライバシを含む個々のデータが復元不可能かつ、そのデータの所有者を匿名化した状態で平均や分散だけでなく、度数分布の計測が可能となる。 According to this embodiment, individual data including privacy cannot be restored by cooperation of a plurality of subjects, and not only the average and variance in the state where the owner of the data is anonymized, but also the measurement of the frequency distribution It becomes possible.

本発明は情報通信産業に適用することができる。 The present invention can be applied to the information communication industry.

１０：ノード
１１：クエリ処理部
１２：データ分割部
１３：データ格納部
２０：統計処理サーバ
２１：クエリ生成部
２２：回答回収・処理部
３０：仮名サーバ
３１：仮名発行部
３２：仮名生成・管理部
３３：仮名確認部
４０：クエリ配布サーバ
４１：クエリ格納部 10: Node 11: Query processing unit 12: Data division unit 13: Data storage unit 20: Statistical processing server 21: Query generation unit 22: Answer collection / processing unit 30: Kana server 31: Kana issuer 32: Kana generation / management Unit 33: Kana confirmation unit 40: Query distribution server 41: Query storage unit

Claims

In response to a kana issue request from a node, a kana set consisting of multiple kana is issued to the node, and when a kana notification is received from the statistical processing server, it is confirmed whether or not the kana received is included in the kana set. A kana server for notifying the statistical processing server that kana has been confirmed when all kana in the kana set has been notified;
Accumulating data whose source is the pseudonym issued by the pseudonym server, notifying the pseudonym server of the stored pseudonym, and receiving confirmation that the confirmation has been made, confirming the pseudonym of the accumulated data A statistical processing server that performs statistical processing using data with
An information distribution system comprising

The statistical processing server determines a calculation process for restoring the original data of the node, generates a query for requesting data that can be restored by the calculation process, and applies the calculation process to the accumulated data 2. The information distribution system according to claim 1, wherein the original data of the data is restored.

The said kana server notifies the said statistical processing server to the said statistical processing server by notifying that the kana which all kana of the kana set has not been notified has been notified to the said statistical processing server, The Claim 1 or 2 characterized by the above-mentioned. The information distribution system described.

The kana server does not issue a kana set until the number of nodes having a kana issue request reaches a predetermined number, and when the number of nodes having a kana issue request reaches a predetermined number, the kana set The information distribution system according to any one of claims 1 to 3, wherein the information distribution system is issued.

A kana server that receives a kana issue request from a node and issues a kana set consisting of a plurality of kana to the node;
When the statistical processing server obtains the data having the kana as the transmission source, the kana and the data pair are accumulated, the kana is notified to the kana server, and the kana server is notified that the kana has been confirmed. And a statistical processing procedure for performing statistical processing using data having a confirmed pseudonym of the accumulated data as a transmission source,
The information distribution method which has in order.

In the pseudonym issuing procedure, the statistical processing server determines a calculation process for restoring the original data of the node, generates a query for requesting data that can be restored by the calculation process,
6. The information distribution method according to claim 5, wherein in the statistical processing procedure, the statistical processing server restores the original data of the node by performing the arithmetic processing on the accumulated data.

In the statistical processing procedure, the kana server notifies the statistical processing server that the kana has been confirmed by notifying all the kana not yet notified of all kana in the kana set. 5. The information distribution method according to 5 or 6.

In the kana issue procedure, the kana server does not issue the kana set until the number of nodes having a kana issue request reaches a predetermined number, and the number of nodes having a kana issue request is a predetermined number. The information distribution method according to any one of claims 5 to 7, wherein a kana set is issued when the value reaches.