JP2010108488A

JP2010108488A - Tabulation system, tabulation processor, information provider terminal, tabulation method and program

Info

Publication number: JP2010108488A
Application number: JP2009224212A
Authority: JP
Inventors: Masaru Igarashi; 大五十嵐; Koji Senda; 浩司千田; Katsumi Takahashi; 克巳高橋; Akira Nagai; 彰永井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-10-03
Filing date: 2009-09-29
Publication date: 2010-05-13
Anticipated expiration: 2029-09-29
Also published as: JP5307678B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a used amount of a memory when cross tabulation is computed while keeping data secret. <P>SOLUTION: A tabulation device 1 transmits maintenance probability predetermined for each attribute to terminals 2-0 to 2-(N-1), respectively. After that, the tabulation device 1 computes the cross tabulation after change using each of pieces of data on attributes, which each terminal determines based on the maintenance probability and random numbers generated to the attributes, transmitted from each terminal. Furthermore, the tabulation device 1 generates a transition probability matrix. Then, the tabulation device 1 computes estimated cross tabulation which is an estimated value of true cross tabulation based on the computed cross tabulation after change and the generated transition probability matrix. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、集計システム、集計処理装置、情報提供者端末、集計方法およびプログラムに関する。 The present invention relates to an aggregation system, an aggregation processing device, an information provider terminal, an aggregation method, and a program.

近年、データベースにおける個々のデータ（例えば、情報提供者端末から取得したアンケート結果など）を統計的手法により秘匿しつつ（つまり、情報提供者端末の有効なデータを経由して抽出できないことを担保しつつ）、クロス集計結果のみを得る技術が考えられている（例えば、非特許文献１および２参照。）。 In recent years, it has been ensured that individual data in a database (for example, questionnaire results obtained from an information provider terminal, etc.) is concealed by a statistical method (that is, cannot be extracted via valid data of the information provider terminal). However, a technique for obtaining only the cross tabulation result is considered (for example, see Non-Patent Documents 1 and 2).

なお、以下に、「クロス集計」と、「単純集計」とについて説明しておく。 In the following, “cross tabulation” and “simple tabulation” will be described.

「単純集計」とは、テーブルに含まれている１つの属性に着目し、該着目した１つの属性のみに関してレコードを集計する集計法である。 “Simple tabulation” is a tabulation method that focuses on one attribute included in the table and tabulates records for only the one attribute focused on.

より具体的には、図１０に示す集計結果が、単純集計を行ったときに得られた集計結果の一例である。図１０に示した例では、「人数」の列の値が集計値である。 More specifically, the aggregation result shown in FIG. 10 is an example of the aggregation result obtained when simple aggregation is performed. In the example illustrated in FIG. 10, the value in the “number of people” column is the total value.

一方、「クロス集計」とは、テーブルに含まれている複数の属性に着目し、該着目した属性すべてに関して値が等しいようなレコードを集計する集計法である。 On the other hand, “cross tabulation” is a tabulation method that focuses on a plurality of attributes included in a table and tabulates records having the same value for all of the focused attributes.

より具体的には、図１１に示す集計結果が、「クロス集計」を行ったときに得られた集計結果の一例である。図１１に示した例では、「年代」と「性別」との２つの属性に着目して、「人数」の集計値を算出している。例えば、「１０代」で「男性」であるレコードの数は「４」である。 More specifically, the tabulation result shown in FIG. 11 is an example of the tabulation result obtained when “cross tabulation” is performed. In the example illustrated in FIG. 11, the total value of “number of people” is calculated by paying attention to two attributes of “age” and “gender”. For example, the number of records “10” and “male” is “4”.

なお、以下では、複数の属性（例えば、「１０代」と「男性」）に対する値を１つの値とみなしたものを「クロス値」という。 In the following description, a value regarding a plurality of attributes (for example, “teen” and “male”) regarded as one value is referred to as a “cross value”.

Ｒ．Ａｇｒａｗａｌ，Ｒ．ＳｒｉｋａｎｔａｎｄＤ．Ｔｈｏｍａｓ， “ＰｒｉｖａｃｙＰｒｅｓｅｒｖｉｎｇＯＬＡＰ”，ＳＩＧＭＯＤＣｏｎｆｅｒｅｎｃｅＡＣＭ，第２５１頁〜第２６２頁，２００５R. Agrawal, R.A. Srikant and D.C. Thomas, “Privacy Preserving OLAP”, SIGMOD Conference ACM, pages 251 to 262, 2005 高見澤秀久，有次正義，「プライバシーを保護するカウント演算の多値属性分類への適用」，ＤＥＷＳ２００７，２００７Hidehisa Takamizawa, Masayoshi Ariji, "Application of Count Operation to Protect Privacy to Multi-valued Attribute Classification", DEWS 2007, 2007 五十嵐大，千田浩司，高橋克巳,「多値属性に適用可能な効率的プライバシー保護クロス集計」，コンピュータセキュリティシンポジウム２００８（２００８年１０月８日〜１０日）Dai Igarashi, Koji Senda, Katsumi Takahashi, “Efficient Privacy Protection Cross Tabulation Applicable to Multi-Valued Attributes”, Computer Security Symposium 2008 (October 8-10, 2008)

しかしながら、非特許文献１および２に開示された一般的な技術においては、個々のデータを秘匿しつつ真のクロス集計の推定値を算出する際、クロス値として指定する各属性が取り得る値が多い場合、これらの技術を適用して構成した集計処理装置がクロス集計結果を算出するために要する時間が非常に長くなってしまうという問題点がある。 However, in the general techniques disclosed in Non-Patent Documents 1 and 2, when calculating an estimated value of a true cross tabulation while concealing individual data, there are values that can be taken by each attribute specified as a cross value. In many cases, there is a problem that it takes a very long time for the totalization processing apparatus configured by applying these techniques to calculate the crosstabulation result.

また、非特許文献１および２に開示された一般的な技術においては、クロス値として指定する各属性が取り得る値が多い場合、これらの技術を適用して構成した集計処理装置がクロス集計結果を算出する際、当該算出のために集計処理装置が使用するメモリ使用量が数ＧＢ以上となってしまう場合もある。そのため、当該メモリ使用量が、該算出を行う集計処理装置が搭載しているメモリ量のうちでクロス集計結果の算出用に割当可能なメモリ量よりも多くなってしまった場合には、集計処理装置がクロス集計結果の算出を行うことをできなくなってしまうという問題点がある。 Further, in the general techniques disclosed in Non-Patent Documents 1 and 2, when there are many possible values for each attribute specified as a cross value, a totalization processing device configured by applying these techniques displays a crosstabulation result. May be used, the memory usage used by the totalization processing device for the calculation may be several GB or more. Therefore, if the memory usage is larger than the amount of memory that can be allocated for calculating the cross tabulation result among the amount of memory installed in the tabulation processing device that performs the calculation, the tabulation processing There is a problem that the apparatus cannot calculate the cross tabulation result.

本発明は、上述した課題を解決する集計システム、集計処理装置、情報提供者端末、集計方法およびプログラムを提供することを目的とする。 An object of this invention is to provide the total system, total processing apparatus, information provider terminal, total method, and program which solve the subject mentioned above.

上記課題を解決するために、本発明の集計システムは、入力を受付けたデータのうちから決定した送信データを送信する情報提供者端末と、該情報提供者端末から送信されてきた送信データを集計する集計処理装置とを具備する集計システムにおいて、前記集計処理装置は、前記データが属する属性に対してあらかじめ定められた維持確率と、前記属性とを前記情報提供者端末へ送信する情報送信部と、 In order to solve the above-mentioned problem, the tabulation system of the present invention tabulates an information provider terminal that transmits transmission data determined from data that has been accepted, and transmission data transmitted from the information provider terminal. In the aggregation system comprising the aggregation processing device, the aggregation processing device includes an information transmission unit that transmits a predetermined maintenance probability to the attribute to which the data belongs and the attribute to the information provider terminal. ,

で表される要素Ａ_pqをそれぞれ算出し、該算出した要素Ａ_pqを有する遷移確率行列を生成する要素算出部と、前記情報提供者端末から送信されてきた送信データを集計したクロス集計を生成する集計部と、前記集計部が生成したクロス集計と、前記要素算出部が生成した遷移確率行列とに基づいて、真のクロス集計の推定値である推定クロス集計を算出する推定部とを有し、前記情報提供者端末は、前記データを記憶する記憶部と、前記集計処理装置から維持確率と属性とが送信されてきた際、乱数を生成する確率変更部と、前記集計処理装置から送信されてきた維持確率および属性と、前記確率変更部が生成した乱数とに基づいて、前記記憶しているデータのうちから該集計処理装置へ送信する前記送信データを決定する送信データ決定部と、前記送信データ決定部が決定した送信データを前記集計処理装置へ送信するデータ送信部とを有することを特徴とする。 Each element A _pq is calculated and an element calculation unit that generates a transition probability matrix having the calculated element A _pq and a cross tabulation that totals transmission data transmitted from the information provider terminal are generated. And an estimation unit that calculates an estimated cross tabulation that is an estimated value of the true cross tabulation based on the cross tabulation generated by the tabulation unit and the transition probability matrix generated by the element calculation unit. The information provider terminal transmits a storage unit that stores the data, a probability changing unit that generates a random number when a maintenance probability and an attribute are transmitted from the aggregation processing device, and a transmission from the aggregation processing device. A transmission data determination unit that determines the transmission data to be transmitted to the aggregation processing device from the stored data based on the maintenance probability and attribute that have been generated and the random number generated by the probability change unit , And having a data transmitting unit for transmitting the transmission data to which the transmission data determination unit has determined to the aggregation processing device.

また、本発明の集計処理システムにおいては、前記要素算出部は、前記属性毎に前記要素Ａ_pqを算出し、該属性毎に算出した前記要素Ａ_pqをそれぞれ有する複数の属性別遷移確率行列を生成し、前記推定部は、前記集計部が生成したクロス集計と、前記要素算出部が生成した複数の属性別遷移確率行列とに基づいて、前記推定クロス推計を算出してもよい。 In the counting processing system of the present invention, said element calculation unit, the element A _pq is calculated for each of the attributes, a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute The estimation unit may calculate the estimated cross estimate based on the cross tabulation generated by the tabulation unit and the plurality of attribute-specific transition probability matrices generated by the element calculation unit.

また、本発明の集計システムにおいては、前記要素算出部は、前記属性別遷移確率行列のそれぞれがメモリ上に展開されるときのサイズに基づき、それぞれが少なくとも１つの前記属性別遷移確率行列を含む複数の分割遷移確率行列を生成し、前記推定部は、前記集計部が生成したクロス集計と、前記要素算出部が生成した複数の分割遷移確率行列とに基づいて、前記推定クロス推計を算出してもよい。 In the tabulation system of the present invention, the element calculation unit includes at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory. A plurality of split transition probability matrices are generated, and the estimation unit calculates the estimated cross estimate based on the cross tabulation generated by the tabulation unit and the plurality of split transition probability matrices generated by the element calculation unit. May be.

また、本発明の集計システムにおいては、前記推定部は、前記集計部が生成したクロス集計に対して、該クロス集計が得られた場合の事後確率を求めるための条件である所定の事後確率算出条件を適用することにより、前記推定クロス集計を算出してもよい。 In the tabulation system of the present invention, the estimation unit calculates a predetermined posterior probability that is a condition for obtaining a posterior probability when the cross tabulation is obtained for the cross tabulation generated by the tabulation unit. The estimated cross tabulation may be calculated by applying a condition.

また、本発明の集計システムにおいては、前記推定部は、前記集計部が生成したクロス集計と、前記要素算出部が算出した遷移確率行列の逆行列とに基づいて、前記推定クロス集計を算出してもよい。 In the tabulation system of the present invention, the estimation unit calculates the estimated cross tabulation based on the cross tabulation generated by the tabulation unit and the inverse matrix of the transition probability matrix calculated by the element calculation unit. May be.

上記課題を解決するために、本発明の集計処理装置は、入力を受付けたデータのうちから決定した送信データを送信する情報提供者端末と接続されており、該情報提供者端末から送信されてきた送信データを集計する集計処理装置であって、前記データが属する属性に対してあらかじめ定められた維持確率と、前記属性とを前記情報提供者端末へ送信する情報送信部と、 In order to solve the above-described problem, the aggregation processing device of the present invention is connected to an information provider terminal that transmits transmission data determined from among the received data, and is transmitted from the information provider terminal. A totalization processing device that counts the transmitted data, a maintenance probability predetermined for the attribute to which the data belongs, and an information transmission unit that transmits the attribute to the information provider terminal;

で表される要素Ａ_pqをそれぞれ算出し、該算出した要素Ａ_pqを有する遷移確率行列を生成する要素算出部と、前記情報提供者端末から送信されてきた、前記維持確率と前記属性と該情報提供者端末が生成した乱数とに基づいて該情報提供者端末が決定した送信データを集計したクロス集計を生成する集計部と、前記集計部が生成したクロス集計と、前記要素算出部が生成した遷移確率行列とに基づいて、真のクロス集計の推定値である推定クロス集計を算出する推定部とを有する。 Calculated in represented elements A _pq are respectively, the element calculation unit for generating a transition probability matrix having elements A _pq of the calculated, transmitted from the information provider terminal, the attributes and the said maintaining probability Based on the random number generated by the information provider terminal, a totaling unit that generates a cross tabulation that totals transmission data determined by the information provider terminal, a cross tabulation generated by the tabulation unit, and the element calculation unit generated And an estimation unit that calculates an estimated cross tabulation that is an estimated value of the true cross tabulation based on the transition probability matrix.

また、本発明の集計処理装置においては、前記要素算出部は、前記属性毎に前記要素Ａ_pqを算出し、該属性毎に算出した前記要素Ａ_pqをそれぞれ有する複数の属性別遷移確率行列を生成し、前記推定部は、前記集計部が生成したクロス集計と、前記要素算出部が生成した複数の属性別遷移確率行列とに基づいて、前記推定クロス推計を算出してもよい。 Further, the aggregation processing apparatus of the present invention, said element calculation unit, the element A _pq is calculated for each of the attributes, a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute The estimation unit may calculate the estimated cross estimate based on the cross tabulation generated by the tabulation unit and the plurality of attribute-specific transition probability matrices generated by the element calculation unit.

また、本発明の集計処理装置においては、前記要素算出部は、前記属性別遷移確率行列のそれぞれがメモリ上に展開されるときのサイズに基づき、それぞれが少なくとも１つの前記属性別遷移確率行列を含む複数の分割遷移確率行列を生成し、前記推定部は、前記集計部が生成したクロス集計と、前記要素算出部が生成した複数の分割遷移確率行列とに基づいて、前記推定クロス推計を算出してもよい。 In the tabulation processing apparatus of the present invention, the element calculation unit may determine at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory. A plurality of split transition probability matrices including, and the estimation unit calculates the estimated cross estimate based on the cross tabulation generated by the tabulation unit and the plurality of split transition probability matrices generated by the element calculation unit May be.

また、本発明の集計処理装置においては、前記推定部は、前記集計部が生成したクロス集計に対して、該クロス集計が得られた場合の事後確率を求めるための条件である所定の事後確率算出条件を適用することにより、前記推定クロス集計を算出してもよい。 In the tabulation processing apparatus of the present invention, the estimation unit is a predetermined posterior probability that is a condition for obtaining a posterior probability when the cross tabulation is obtained for the cross tabulation generated by the tabulation unit. The estimated cross tabulation may be calculated by applying a calculation condition.

また、本発明の集計処理装置においては、前記推定部は、前記集計部が生成したクロス集計と、前記要素算出部が算出した遷移確率行列の逆行列とに基づいて、前記推定クロス集計を算出してもよい。 In the tabulation processing apparatus of the present invention, the estimation unit calculates the estimated cross tabulation based on the cross tabulation generated by the tabulation unit and the inverse matrix of the transition probability matrix calculated by the element calculation unit. May be.

上記課題を解決するために、本発明の情報提供者端末は、集計処理装置と接続された情報提供者端末であって、当該情報提供者端末にて入力を受付けたデータを記憶する記憶部と、前記データが属する属性に対してあらかじめ定められた維持確率と、前記属性とが前記集計処理装置から送信されてきた際、乱数を生成する確率変更部と、前記集計処理装置から送信されてきた維持確率および属性と、前記確率変更部が生成した乱数とに基づいて、前記記憶しているデータのうちから該集計処理装置へ送信する前記送信データを決定する送信データ決定部と、前記送信データ決定部が決定した送信データを前記集計処理装置へ送信するデータ送信部とを有する。 In order to solve the above-described problem, an information provider terminal of the present invention is an information provider terminal connected to a totalization processing device, and a storage unit that stores data accepted by the information provider terminal; When the maintenance probability predetermined for the attribute to which the data belongs and the attribute are transmitted from the tabulation processing device, the probability changing unit that generates a random number and the tabulation processing device A transmission data determination unit for determining the transmission data to be transmitted to the aggregation processing device from the stored data based on a maintenance probability and an attribute and a random number generated by the probability change unit; and the transmission data And a data transmission unit that transmits the transmission data determined by the determination unit to the aggregation processing device.

上記課題を解決するために、本発明の集計方法は、入力を受付けたデータのうちから決定した送信データを送信する情報提供者端末と、該情報提供者端末から送信されてきた送信データを集計する集計処理装置とを具備する集計システムにおいて、該送信データを集計する集計方法であって、前記情報提供者端末が、前記データを記憶する記憶処理と、前記集計処理装置が、前記データが属する属性に対してあらかじめ定められた維持確率と、前記属性とを前記情報提供者端末へ送信する情報送信処理と、前記情報提供者端末が、前記集計処理装置から維持確率と属性とが送信されてきた際、乱数を生成する確率変更処理と、前記情報提供者端末が、前記集計処理装置から送信されてきた維持確率および属性と、前記生成した乱数とに基づいて、前記記憶しているデータのうちから該集計処理装置へ送信する前記送信データを決定する送信データ決定処理と、前記情報提供者端末が、前記決定した送信データを前記集計処理装置へ送信するデータ送信処理と、前記集計処理装置が、 In order to solve the above-described problems, the tabulation method of the present invention tabulates an information provider terminal that transmits transmission data determined from data that has been accepted, and transmission data transmitted from the information provider terminal. In the tabulation system comprising the tabulation processing device, the tabulation method for tabulating the transmission data, wherein the information provider terminal stores the data, and the tabulation processing device belongs to the data A maintenance probability predetermined for the attribute, an information transmission process for transmitting the attribute to the information provider terminal, and the information provider terminal has transmitted the maintenance probability and the attribute from the aggregation processing device. The probability changing process for generating a random number, the information provider terminal based on the maintenance probability and attribute transmitted from the tabulation processing apparatus, and the generated random number A transmission data determination process for determining the transmission data to be transmitted to the aggregation processing device from the stored data, and data for the information provider terminal to transmit the determined transmission data to the aggregation processing device Transmission processing and the totalization processing device,

で表される要素Ａ_pqをそれぞれ算出し、該算出した要素Ａ_pqを有する遷移確率行列を生成する要素算出処理と、前記集計処理装置が、前記情報提供者端末から送信されてきた送信データを集計したクロス集計を生成する集計処理と、前記集計処理装置が、前記生成したクロス集計と、前記生成した遷移確率行列とに基づいて、真のクロス集計の推定値である推定クロス集計を算出する推定処理とを有する。 In represented elements A _pq is the calculated respectively, and the element calculation process of generating the transition probability matrix having elements A _pq of the calculated, the aggregation processing device, the transmission data transmitted from the information provider terminal Based on the generated cross tabulation and the generated transition probability matrix, the tabulation processing for generating the tabulated cross tabulation and the tabulation processing device calculate an estimated cross tabulation that is an estimated value of the true cross tabulation. And an estimation process.

また、本発明の集計方法においては、前記要素算出処理は、前記集計処理装置が、前記属性毎に前記要素Ａ_pqを算出し、該属性毎に算出した前記要素Ａ_pqをそれぞれ有する複数の属性別遷移確率行列を生成する処理とし、前記推定処理は、前記集計処理装置が、前記生成したクロス集計と、前記生成した複数の属性別遷移確率行列とに基づいて、前記推定クロス推計を算出する処理としてもよい。 In the aggregation process of the present invention, said element calculation processing, the aggregation processing device calculates the element A _pq for each of the attributes, the plurality of attributes each having the element A _pq calculated for each the attribute In the estimation processing, the aggregation processing device calculates the estimated cross estimation based on the generated cross aggregation and the generated plurality of attribute-specific transition probability matrices. It is good also as processing.

また、本発明の集計方法においては、前記要素算出処理は、前記集計処理装置が、前記属性別遷移確率行列のそれぞれがメモリ上に展開されるときのサイズに基づき、それぞれが少なくとも１つの前記属性別遷移確率行列を含む複数の分割遷移確率行列を生成する処理とし、前記推定処理は、前記集計処理装置が、前記生成したクロス集計と、前記生成した複数の分割遷移確率行列とに基づいて、前記推定クロス推計を算出する処理としてもよい。 Further, in the tabulation method of the present invention, the element calculation processing is performed by the tabulation processing device based on a size when each of the attribute-specific transition probability matrices is expanded on a memory, and each of the attribute calculation processes is at least one attribute. A process of generating a plurality of split transition probability matrices including another transition probability matrix, and the estimation process is based on the generated cross tabulation and the generated plurality of split transition probability matrices. The estimated cross estimation may be calculated.

また、コンピュータに実行させるプログラムであって、入力を受付けたデータのうちから決定した送信データを送信する情報提供者端末と接続されている集計処理装置に、前記情報提供者端末にて入力を受付けたデータが属する属性に対してあらかじめ定められた維持確率と、前記属性とを前記情報提供者端末へ送信する情報送信手順と、 Also, a program to be executed by a computer, and accepting an input at the information provider terminal to an aggregation processing device connected to an information provider terminal that transmits transmission data determined from among the accepted data An information transmission procedure for transmitting a predetermined maintenance probability to the attribute to which the data belongs and the attribute to the information provider terminal;

で表される要素Ａ_pqをそれぞれ算出し、該算出した要素Ａ_pqを有する遷移確率行列を生成する要素算出手順と、前記情報提供者端末から送信されてきた、前記維持確率と前記属性と該情報提供者端末が生成した乱数とに基づいて該情報提供者端末が決定した送信データを集計したクロス集計を生成する集計手順と、前記生成したクロス集計と、前記生成した遷移確率行列とに基づいて、真のクロス集計の推定値である推定クロス集計を算出する推定手順とを実行させる。 The element calculation procedure for calculating the element A _pq represented by the above and generating the transition probability matrix having the calculated element A _pq , the maintenance probability, the attribute, and the attribute transmitted from the information provider terminal Based on a counting procedure for generating a cross tabulation in which transmission data determined by the information provider terminal is tabulated based on a random number generated by the information provider terminal, the generated cross tabulation, and the generated transition probability matrix Then, an estimation procedure for calculating an estimated cross tabulation that is an estimated value of the true cross tabulation is executed.

また、本発明のプログラムにおいては、前記要素算出手順は、前記属性毎に前記要素Ａ_pqを算出し、該属性毎に算出した前記要素Ａ_pqをそれぞれ有する複数の属性別遷移確率行列を生成する手順とし、前記推定手順は、前記生成したクロス集計と、前記生成した複数の属性別遷移確率行列とに基づいて、前記推定クロス推計を算出する手順としてもよい。 Further, in the program of the present invention, said element calculation procedure, the calculated element A _pq, generates a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute for each of the attributes The estimation procedure may be a procedure for calculating the estimated cross estimate based on the generated cross tabulation and the generated plurality of attribute-specific transition probability matrices.

また、本発明のプログラムにおいては、前記要素算出手順は、前記属性別遷移確率行列のそれぞれがメモリ上に展開されるときのサイズに基づき、それぞれが少なくとも１つの前記属性別遷移確率行列を含む複数の分割遷移確率行列を生成する手順とし、前記推定手順は、前記生成したクロス集計と、前記生成した複数の分割遷移確率行列とに基づいて、前記推定クロス推計を算出する手順としてもよい。 In the program of the present invention, the element calculation procedure includes a plurality of attribute-specific transition probability matrices each including at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory. The divided transition probability matrix may be generated, and the estimation procedure may be a procedure for calculating the estimated cross estimate based on the generated cross tabulation and the plurality of generated divided transition probability matrices.

本発明によれば、個々のデータを秘匿しつつ真のクロス集計の推定値を算出する際、レコード全体に対する遷移確率行列の次数として、各属性が取り得る値の２進表現を用いた場合よりも値が小さな当該各属性の取り得る値の個数を用いて真のクロス集計の推定値を算出する。 According to the present invention, when calculating the estimated value of the true cross tabulation while concealing individual data, as a degree of the transition probability matrix for the entire record, a binary representation of values that each attribute can take is used. The estimated value of the true cross tabulation is calculated using the number of values that can be taken by each attribute with a small value.

このような構成としたため、クロス集計結果を算出するときに集計処理装置が使用するメモリ使用量を一般的な技術よりも小さくすることができる。 With such a configuration, the memory usage used by the totalization processing apparatus when calculating the cross tabulation result can be made smaller than that of a general technique.

さらに、メモリ使用量を抑えることにより、クロス値として指定する各属性が取り得る値が多い場合でも、集計処理のうちで最も計算量が多いレコード全体に対する遷移確率行列に対する演算を一般的な技術よりも短い時間で行うことが可能となる。 Furthermore, by reducing the amount of memory used, even when there are many possible values for each attribute specified as a cross value, the calculation for the transition probability matrix for the entire record with the largest amount of calculation in the aggregation process can be performed using general techniques. Can also be performed in a short time.

本発明の実施形態に従った集計システムの構成を示す図である。It is a figure which shows the structure of the total system according to embodiment of this invention. 図１に示した集計処理装置の構成を示す図である。It is a figure which shows the structure of the total processing apparatus shown in FIG. 図２に示した対応情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the corresponding | compatible information shown in FIG. 図１に示した情報提供者端末の構成を示す図である。It is a figure which shows the structure of the information provider terminal shown in FIG. クロス集計による集計結果のうちの属性に対して、順番を付与したときの模式図である。It is a schematic diagram when order is given to the attribute of the total result by cross tabulation. 図４に示したデータ決定情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data determination information shown in FIG. 実施形態１の集計システムにおいて、真のクロス集計の推定値を算出するときの動作シーケンスを示す図である。FIG. 10 is a diagram illustrating an operation sequence when calculating an estimated value of true cross tabulation in the tabulation system according to the first embodiment. 実施形態２の集計システムにおいて、真のクロス集計の推定値を算出するときの動作シーケンスを示す図である。FIG. 10 is a diagram illustrating an operation sequence when calculating an estimated value of true cross tabulation in the tabulation system according to the second embodiment. 実施形態３の集計システムにおいて、分割遷移確率行列を生成するときの動作シーケンスを示す図である。FIG. 10 is a diagram illustrating an operation sequence when a divided transition probability matrix is generated in the aggregation system according to the third embodiment. 単純集計により求めた集計結果の一例を示す図である。It is a figure which shows an example of the total result calculated | required by simple totalization. クロス集計により求めた集計結果の一例を示す図である。It is a figure which shows an example of the total result calculated | required by cross tabulation.

（実施形態１）
以下、本発明の実施形態１に従った集計システム（集計処理装置、情報提供者端末、集計方法およびプログラムを含む）を説明する。 (Embodiment 1)
Hereinafter, a counting system (including a counting processing device, an information provider terminal, a counting method, and a program) according to the first embodiment of the present invention will be described.

実施形態１の集計システムは、ある結果（データ）が得られた際に、当該結果を反映した下での事後確率を求めるための「所定の事後確率算出条件」を用いて、真のクロス集計ｘの推定値である「推定クロス集計」を算出する。 When a certain result (data) is obtained, the tabulation system according to the first embodiment uses a “predetermined posterior probability calculation condition” for obtaining a posterior probability under the reflection of the result. An “estimated cross tabulation” that is an estimated value of x is calculated.

なお、実施形態１は、「所定の事後確率算出条件」として、「ベイズの定理」を適用した例である。 The first embodiment is an example in which “Bayes's theorem” is applied as the “predetermined posterior probability calculation condition”.

まず、実施形態１の集計システムの構成について説明する。 First, the configuration of the aggregation system according to the first embodiment will be described.

図１に示すように、本集計システムは、集計処理装置１（以下、「集計装置１」という）と、情報提供者端末２−０〜２−（Ｎ−１）（以下、「端末２−０〜２−（Ｎ−１）」という）とを具備する。 As shown in FIG. 1, the totalization system includes a totalization processing apparatus 1 (hereinafter referred to as “totalization apparatus 1”) and information provider terminals 2-0 to 2- (N−1) (hereinafter referred to as “terminal 2- 0 to 2- (N-1) ").

なお、以下では、端末２−０〜２−（Ｎ−１）の台数が「Ｎ」である場合を例に挙げて説明するが、端末２−０〜２−（Ｎ−１）の台数は１以上の自然数であればよい。 In the following, a case where the number of terminals 2-0 to 2- (N-1) is “N” will be described as an example, but the number of terminals 2-0 to 2- (N-1) is Any natural number greater than or equal to 1 is acceptable.

集計装置１は、ｎ個の属性α_aそれぞれ（例えば、「年代」、「性別」など）に対する以下の式１に示す要素Ａ_pqをｐ行ｑ列の要素として有する遷移確率行列Ａを構成するために、集計対象となるデータが属する属性α_aそれぞれに対してあらかじめ定められた維持確率ρ_aと、当該属性α_aとを、端末２−０〜２−（Ｎ−１）それぞれへ送信する。 Aggregation device 1 constitutes transition probability matrix A having element A _pq shown in the following formula 1 as an element of p rows and q columns for each of n attributes α _a (for example, “age”, “sex”, etc.). Therefore, _a predetermined maintenance probability ρ _a for each attribute α _a to which the data to be aggregated belongs and the attribute α _a are transmitted to each of the terminals 2-0 to 2- (N-1). .

なお、「維持確率ρ_a」とは、端末２−０〜２−（Ｎ−１）それぞれが属性α_aを撹乱（つまり、確率的な変化）した後においても、属性α_aが元の値を維持する確率のことを指す。 The “maintenance probability ρ _a ” means that the attribute α _a is the original value even after each of the terminals 2-0 to 2-(N−1) disturbs the attribute α _a (that is, probabilistic change). Indicates the probability of maintaining

ここで、遷移確率行列Ａの行の数ｐおよび列の数ｑは、以下の式２に示す値である。 Here, the number of rows p and the number of columns q of the transition probability matrix A are values shown in Equation 2 below.

また、式１に示したｄ_p ^aおよびｄ_q ^aそれぞれは、以下の式３および式４でそれぞれ定義される。 Further, each d _p ^a and d _q ^a as shown in Equation 1 are defined respectively by Equations 3 and 4 below.

ここで、式３および式４にて示した「÷」は、整数の除算を示す演算記号である。また、式３および式４にて示した「％」は、剰余算を示す演算記号である。 Here, “÷” shown in Expression 3 and Expression 4 is an arithmetic symbol indicating integer division. In addition, “%” shown in Expression 3 and Expression 4 is an operation symbol indicating remainder calculation.

また、集計装置１は、レコード全体に対する遷移確率行列Ａに含まれる式１に示したｐ行ｑ列の要素Ａ_pqを算出する。 Further, the tabulation device 1 calculates the element A _pq of p rows and q columns shown in Expression 1 included in the transition probability matrix A for the entire record.

ここで、各属性α_aに対する「遷移確率行列Ａ^a」とは、属性α_a同士の間での変化の確率を行列形式で表現したものである。 Here, the “transition probability matrix A ^a ” for each attribute α _a represents the probability of change between the attributes α _{a in a} matrix format.

また、集計装置１は、端末２−０〜２−（Ｎ−１）それぞれから送信されてきた送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aを集計したクロス集計である「撹乱後のクロス集計ｙ」を算出する。 The tabulation device 1 is a cross tabulation that tabulates transmission data T ′ (0) _{a to} T ′ (N−1) _a transmitted from the terminals 2-0 to 2- (N−1). Calculate “cross tabulation y after disturbance”.

ここで、送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aとは、端末２−０〜２−（Ｎ−１）が、維持確率ρ_aと属性α_aと乱数とに基づいて決定した送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aのことを指す。 Here, the transmission data T ′ (0) _{a to} T ′ (N−1) _a means that the terminals 2-0 to 2- (N−1) are based on the maintenance probability ρ _a , the attribute α _a, and a random number. Transmission data T ′ (0) _{a to} T ′ (N−1) _a determined in this way.

そして、集計装置１は、算出した撹乱後のクロス集計ｙに基づいて、「真のクロス集計ｘ」の推定値である推定クロス集計を算出する。 Then, the tabulation device 1 calculates an estimated cross tabulation that is an estimated value of “true cross tabulation x” based on the calculated cross tabulation y after disturbance.

端末２−０〜２−（Ｎ−１）それぞれは、端末２−０〜２−（Ｎ−１）それぞれに対応するレコードＴ（０）〜Ｔ（Ｎ−１）を記憶している。例えば、端末２−０は、レコードＴ（０）を記憶している。 Each of the terminals 2-0 to 2- (N-1) stores records T (0) to T (N-1) corresponding to the terminals 2-0 to 2- (N-1), respectively. For example, the terminal 2-0 stores the record T (0).

ここで、レコードＴ（０）〜Ｔ（Ｎ−１）それぞれは、各属性α_aのデータＴ（０）_a〜Ｔ（Ｎ−１）_aを含んでいる。例えば、レコードＴ（０）であれば、属性α_aのデータＴ（０）_aを含んでいる。 Here, the record T (0) ~T (N- 1) each includes a respective data T (0) of the attribute _{_{α a a ~T (N-1}} ) a. For example, if the record T (0), the data T (0) of the attribute alpha _a contains _a.

なお、ここでいう「データＴ（０）_a〜Ｔ（Ｎ−１）_a」とは、例えば、属性α_aが「性別」である場合、該属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1（「男性」および「女性」）のうちのいずれか１つの値（例えば、女性）である。 Herein, the term "data _{T (0) a ~T (N} -1) a " is, for example, attribute alpha when _a is "sex", the attribute alpha _a possible value Vα ₀ ~ Vα _{Ma −1} (“male” and “female”).

つまり、端末２−０〜２−（Ｎ−１）それぞれは、レコードＴ（０）〜Ｔ（Ｎ−１）それぞれを記憶することにより、データＴ（０）_a〜Ｔ（Ｎ−１）_aそれぞれを記憶している。 That is, each of the terminals 2-0 to 2- (N-1) stores data T (0) _{a to} T (N-1) _a by storing the records T (0) to T (N-1). I remember each one.

なお、属性α_aの数は「ｎ」個であり、集計装置１および端末２−０〜２−（Ｎ−１）それぞれは、当該属性α_aの数ｎを記憶している。 Note that the number of the attribute α _a is “n”, and the totaling device 1 and the terminals 2-0 to 2- (N−1) each store the number n of the attribute α _a .

なお、各属性α_aは、ａの値が属性α_aの数ｎ未満である場合（ａ＜ｎ）、「Ｍ_a」個の値（Ｖα₀ 〜Ｖα_Ma-1）を取り得る。そして、集計装置１および端末２−０〜２−（Ｎ−１）それぞれは、各属性α_aが取り得る値の数Ｍ_aと、各属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1を記憶している。 Each attribute α _a can take “M _a ” values (Vα ₀ to Vα _Ma-1 ) when the value of _a is less than the number n of the attributes α _a (a <n). The collection apparatus 1 and the terminal 2-0~2- (N-1), respectively, the number M _a of each attribute alpha _a possible value, the value Vα ₀ ~ Vα _Ma-1 to each attribute alpha _a can take Is remembered.

端末２−０〜２−（Ｎ−１）は、集計装置１から送信されてきた維持確率ρ_aおよび属性α_aを用いて、当該端末２−０〜２−（Ｎ−１）がそれぞれ記憶している該端末２−０〜２−（Ｎ−１）にて入力を受付けたデータＴ（０）_a〜Ｔ（Ｎ−１）_aを確率的に変化（「撹乱」）させる。 The terminals 2-0 to 2- (N-1) store the respective terminals 2-0 to 2- (N-1) using the maintenance probability ρ _a and the attribute α _a transmitted from the aggregation device 1. The data T (0) _{a to} T (N-1) _a received at the terminals 2-0 to 2- (N-1) are changed ("disturbed") stochastically.

ここで、データＴ（０）_a〜Ｔ（Ｎ−１）_aを確率的に変化させること（撹乱）とは、集計装置１から送信されてきた維持確率ρ_aと属性α_aと、該送信の際に生成した乱数とに基づいて、記憶しているデータのうちから集計装置１へ送信する送信データＴ´（０）_aを決定することを指す。 Here, changing the data T (0) _{a to} T (N−1) _a stochastically (disturbance) means that the maintenance probability ρ _a and the attribute α _a transmitted from the counting device 1 and the transmission This refers to determining transmission data T ′ (0) _a to be transmitted to the counting device 1 from stored data based on the random number generated at the time.

そして、端末２−０〜２−（Ｎ−１）それぞれは、確率的に変化させたデータ（つまり、維持確率ρ_aと属性α_aと乱数とに基づいて集計装置１への送信を決定した送信データ）を、集計装置１へ送信する。 Then, each of the terminals 2-0 to 2- (N-1) has determined transmission to the counting device 1 based on the data that is stochastically changed (that is, the maintenance probability ρ _a , the attribute α _a, and the random number). Transmission data) is transmitted to the counting device 1.

つぎに、集計装置１が有する構成について詳細に説明する。 Next, the configuration of the counting device 1 will be described in detail.

図２に示すように、集計装置１は、通信部１１と、要素算出部１２と、集計部１３と、推定部１４と、記憶部１５とを有する。 As illustrated in FIG. 2, the tabulation device 1 includes a communication unit 11, an element calculation unit 12, a tabulation unit 13, an estimation unit 14, and a storage unit 15.

通信部１１は、例えば、通信モジュールで構成され、端末２−０〜２−（Ｎ−１）それぞれと任意のデータを送受信する。 The communication unit 11 includes, for example, a communication module, and transmits / receives arbitrary data to / from each of the terminals 2-0 to 2- (N-1).

また、通信部１１は、ｎ個の各属性α_aに対する遷移確率行列Ａを構成するために、属性α_aそれぞれに対してあらかじめ定められた維持確率ρ_aと当該属性α_aとを、端末２−０〜２−（Ｎ−１）それぞれへ送信する「情報送信部」である。 Further, the communication unit 11, in order to constitute a transition probability matrix A for n each attribute alpha _a, the attribute alpha _a and maintain the probability [rho _a predetermined for each and the attribute alpha _a, terminal 2 It is an “information transmitter” that transmits to each of −0 to 2- (N−1).

また、通信部１１は、端末２−０〜２−（Ｎ−１）それぞれから送信されてきた、該端末２−０〜２−（Ｎ−１）それぞれが確率的に変化させた送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aそれぞれを受信する。 Further, the communication unit 11 transmits the transmission data T transmitted from each of the terminals 2-0 to 2- (N-1) and changed stochastically by each of the terminals 2-0 to 2- (N-1). ′ (0) _{a to} T ′ (N−1) _a are received.

要素算出部１２は、ｎ個の各属性α_aのデータを有するレコード全体に対する遷移確率行列Ａに含まれる式１に示したｐ行ｑ列の要素Ａ_pqそれぞれを算出する。各要素Ａ_pqの算出により、要素算出部１２は、遷移確率行列Ａを生成する。 The element calculation unit 12 calculates each of the elements A _pq of p rows and q columns shown in Equation 1 included in the transition probability matrix A for the entire record having data of n pieces of attributes α _a . By calculating each element A _pq , the element calculation unit 12 generates a transition probability matrix A.

この説明例では、要素算出部１２は、ｐ行ｑ列の要素Ａ_pqが後述する式５で表されるような正方行列の遷移確率行列Ａを生成する。 In this explanatory example, the element calculation unit 12 generates a transition probability matrix A that is a square matrix such that the element A _pq of p rows and q columns is represented by Expression 5 described later.

集計部１３は、情報提供者端末２−０〜２−（Ｎ−１）それぞれから送信されてきた、維持確率ρ_aと属性α_aと乱数とに基づいて決定された送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aそれぞれを集計することにより、撹乱後のクロス集計ｙを算出する。 The aggregation unit 13 transmits the transmission data T ′ (0) determined based on the maintenance probability ρ _a , the attribute α _a, and the random number transmitted from each of the information provider terminals 2-0 to 2- (N−1). ) by aggregating _{_a ~T'} (N-1) _a, respectively to calculate the cross-tabulation y after disturbance.

推定部１４は、集計部１３が算出した撹乱後のクロス集計ｙに基づいて、真のクロス集計ｘの推定値である「推定クロス集計ｘⁱ⁺¹」を算出する。なお、推定部１４は、算出した推定クロス集計ｘⁱ⁺¹を、例えば、集計装置１と接続されている外部の表示装置（図示せず）などへ出力する。 Based on the post-disturbance cross tabulation y calculated by the tabulation unit 13, the estimation unit 14 calculates “estimated cross tabulation x ^{i + 1} ” that is an estimated value of the true cross tabulation x. The estimation unit 14 outputs the calculated estimated cross tabulation x ^{i + 1} to, for example, an external display device (not shown) connected to the tabulation device 1.

なお、実施形態１においては、推定部１４は、要素算出部１２が算出した遷移確率行列Ａと、集計部１３が算出した撹乱後のクロス集計ｙ（変化後のクロス集計）に基づいて、「推定クロス集計ｘⁱ⁺¹」を算出する。 Note that in the first embodiment, the estimation unit 14 is based on the transition probability matrix A calculated by the element calculation unit 12 and the post-disturbance cross tabulation y (cross tabulation after change) calculated by the tabulation unit 13. Estimated cross tabulation x ^{i + 1} ”is calculated.

また、当該推定クロス集計ｘⁱ⁺¹を算出する際、推定部１４は、集計部１３が算出した撹乱後のクロス集計ｙ（変化後のクロス集計）が得られた場合の事後確率を求めるための所定の事後確率算出条件として、「ベイズの定理」を適用する。 Further, when calculating the estimated cross tabulation x ^{i + 1} , the estimation unit 14 obtains the posterior probability when the post-disturbance cross tabulation y calculated by the tabulation unit 13 (cross tabulation after change) is obtained. As a predetermined posterior probability calculation condition, “Bayes's theorem” is applied.

記憶部１５は、任意のデータを記憶する。 The storage unit 15 stores arbitrary data.

例えば、記憶部１５は、対応情報ＡＳを記憶している。 For example, the storage unit 15 stores correspondence information AS.

図３に示すように、対応情報ＡＳは、データが属する属性α_aと、当該属性α_aに対してあらかじめ定められた維持確率ρ_aとを対応付ける情報である。 As shown in FIG. 3, the correspondence information AS is information for associating the attribute α _a to which the data belongs with _a predetermined maintenance probability ρ _a for the attribute α _a .

また、記憶部１５は、データが属する属性α_aの数「ｎ」を記憶している。 In addition, the storage unit 15 stores the number “n” of the attribute α _a to which the data belongs.

また、記憶部１５は、属性α_aそれぞれが取り得る値の数Ｍ_aと、各属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1を記憶している。図１１に示した例においては、属性α_aが「性別」である場合、その取り得る値は「男性」と「女性」とのいずれかである。 The storage unit 15 stores the number M _a value, each attribute alpha _a can take the value Vα ₀ ~ Vα _Ma-1 to each attribute alpha _a can take. In the example shown in FIG. 11, when the attribute α _a is “sex”, the possible value is either “male” or “female”.

つぎに、端末２−０〜２−（Ｎ−１）が有する構成について説明する。なお、端末２−０〜２−（Ｎ−１）とは互いに同じ構成を有するため、以下では、端末２−０の構成を例に挙げて説明する。 Next, the configuration of the terminals 2-0 to 2- (N-1) will be described. Since terminals 2-0 to 2- (N-1) have the same configuration, the configuration of terminal 2-0 will be described below as an example.

図４に示すように、端末２−０は、通信部２１−０と、確率変更部２２−０と、送信データ決定部２３−０と、記憶部２４−０とを有する。 As illustrated in FIG. 4, the terminal 2-0 includes a communication unit 21-0, a probability change unit 22-0, a transmission data determination unit 23-0, and a storage unit 24-0.

通信部２１−０は、例えば、通信モジュールで構成され、集計装置１と任意のデータを送受信する。 The communication unit 21-0 includes, for example, a communication module, and transmits / receives arbitrary data to / from the counting device 1.

通信部２１−０は、集計装置１から送信されてきた、ｎ個の属性α_aそれぞれに対してあらかじめ定められた維持確率ρ_aと属性α_aとを受信する。 The communication unit 21-0 receives the maintenance probability ρ _a and the attribute α _a that are determined in advance for each of the n attributes α _a transmitted from the tabulation device 1.

また、通信部２１−０は、送信データ決定部２３−０が決定した送信データＴ´（０）_aを集計装置１へ送信する「データ送信部」である。 The communication unit 21-0 is a “data transmission unit” that transmits the transmission data T ′ (0) _a determined by the transmission data determination unit 23-0 to the counting device 1.

なお、端末２−１〜２−（Ｎ−１）それぞれは、送信データＴ´（１）_a〜Ｔ´（Ｎ−１）_aそれぞれを集計装置１へ送信する。 Each of the terminals 2-1 to 2- (N-1) transmits the transmission data T '(1) _{a to} T' (N-1) _a to the counting device 1.

確率変更部２２−０は、集計装置１から送信されてきた維持確率ρ_aと属性α_aとを用いて、記憶部２４−０が記憶しているデータＴ（０）_aを確率的に変化させる。 Probability changing unit 22-0, by using the and attributes alpha _a _a probability maintain that sent ρ from the aggregation device 1, storage unit 24-0 data T (0) for storing the _a stochastically varying Let

より具体的には、確率変更部２２−０は、集計装置１から維持確率ρ_aと属性α_aとが送信されてきた際、ｎ個の属性α_aそれぞれに対して、実数の値を有する「一様乱数ｒ_a（０≦ｒ_a≦１）」と、整数の値を有する「一様乱数ｚ_a（０≦ｚ_a≦Ｍ_a−１）」とを生成する。 More specifically, when the maintenance probability ρ _a and the attribute α _a are transmitted from the counting device 1, the probability changing unit 22-0 has a real value for each of the n attributes α _a. a "uniform random number _{_{r a (0 ≦ r a ≦}} 1) " has an integer value to produce a "uniform random number _{_{z a (0 ≦ z a ≦}} M a -1) ".

続いて、確率変更部２２−０は、属性α_aごとにあらかじめ定められた維持確率ρ_aと、該属性α_aに対して生成した一様乱数ｒ_aとを比較する。 Then, the probability changing unit 22-0 compares the maintaining probability [rho _a predetermined for each attribute alpha _a, and a uniform random number r _a raised against the attribute alpha _a.

送信データ決定部２３−０は、確率変更部２２−０による比較の結果、一様乱数ｒ_aの値が維持確率ρ_a以下である場合（ｒ_a≦ ρ_a）、該端末２−０の属性α_aの正しいデータそのものであるデータＴ（０）_aを、送信データＴ´（０）_aとして決定する。 Transmission data determination unit 23-0, the comparison by the probability changing unit 22-0 result, if the value of the uniform random number r _a is less maintenance probability _{_{_{ρ a (r a ≦ ρ a}}} ), of the terminal 2-0 data T (0) _a is the correct data itself attributes alpha _a, determined as the transmission data T'(0) _a.

また、送信データ決定部２３−０は、確率変更部２２−０による比較の結果、一様乱数ｒ_aの値が維持確率ρ_aよりも大きい場合（ｒ_a＞ ρ_a）、記憶部２４−０が記憶しているデータ決定情報ＤＴに基づいて、属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1のうちから確率変更部２２−０が生成した整数値を有する一様乱数ｚ_aと同じ値を有する順番ＮＢが付与された値（つまり、ｚ_a番目に存在する値）を、該端末２−０の属性α_aの送信データＴ´（０）_aとして決定する。 The transmission data determination unit 23-0, the result of the comparison by the probability changing unit 22-0, when the value of the uniform random number r _a is larger than the sustain probability _{_{_{ρ a (r a> ρ a}}} ), the storage unit 24 Based on the data determination information DT stored in 0, the uniform random number z _a having an integer value generated by the probability changing unit 22-0 from the values Vα ₀ to Vα _Ma-1 that the attribute α _a can take A value to which the order NB having the same value is assigned (that is, a value existing in the z _a th) is determined as transmission data T ′ (0) _a of the attribute α _a of the terminal 2-0.

例えば、図１１に示した集計結果において、図５に示すように属性α_aを「性別」として、該属性α_aに属する要素のうちの「男性」に「０」番の順番ＮＢを付与しておき、該属性α_aに属する要素のうちの「女性」に「１」番の順番ＮＢを付与しておく。この場合、確率変更部２２−０が生成した一様乱数ｚ_aが「１」である際、送信データ決定部２３−０は、該一様乱数ｚ_aと同じ値（１）の順番ＮＢが付与された「女性」を送信データＴ´（０）_aとして決定する。 For example, the counting result shown in FIG. 11, the attribute alpha _a, as shown in FIG. 5 as "sex", to grant the order NB number "0" to "male" of the elements belonging to the attribute alpha _a in advance, keep giving the "1" number of order NB to the "female" of the elements belonging to the attribute α _a. In this case, when the uniform random number z _a generated by the probability changing unit 22-0 is “1”, the transmission data determining unit 23-0 has an order NB of the same value (1) as the uniform random number z _a. The given “female” is determined as transmission data T ′ (0) _a .

さらに、送信データ決定部２３−０は、集計装置１から送信されてきた維持確率ρ_aの値が「０」である場合、記憶部２４−０が記憶しているデータ決定情報ＤＴに基づいて、属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1のうちから確率変更部２２−０が生成した整数値を有する一様乱数ｚ_aと同じ値を有する順番ＮＢが付与された値（つまり、ｚ_a番目に存在する値）を、該端末２−０の属性α_aの送信データＴ´（０）_aとして決定する。 Furthermore, when the value of the maintenance probability ρ _a transmitted from the totalization device 1 is “0”, the transmission data determination unit 23-0 is based on the data determination information DT stored in the storage unit 24-0. , A value to which an order NB having the same value as the uniform random number z _a having an integer value generated by the probability changing unit 22-0 from among the values Vα ₀ to Vα _Ma−1 that the attribute α _a can take (ie, , Z _a -th value) is determined as transmission data T ′ (0) _a of the attribute α _a of the terminal 2-0.

記憶部２４−０は、任意のデータを記憶する。 The storage unit 24-0 stores arbitrary data.

例えば、記憶部２４−０は、レコードＴ（０）と、属性データＴ（０）_aとを記憶している。なお、当該レコードＴ（０）は、属性データＴ（０）_aを含んでいる。 For example, the storage unit 24-0 stores a record T (0) and attribute data T (0) _a . The record T (0) includes attribute data T (0) _a .

また、記憶部２４−０は、属性α_aの数ｎを記憶している。 The storage unit 24-0 stores the number n of attributes alpha _a.

さらに、記憶部２４−０は、データ決定情報ＤＴを記憶している。 Further, the storage unit 24-0 stores data determination information DT.

図６に示すように、データ決定情報ＤＴは、ｎ個の属性α_aと、当該属性α_aが取り得るＭ_a個の値Ｖα₀ 〜Ｖα_Ma-1と、当該値Ｖα₀ 〜Ｖα_Ma-1それぞれに対して付与された順番ＮＢとを対応付けて記憶している。なお、属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1それぞれに順番ＮＢを付与する順序は任意でよい。 As shown in FIG. 6, data determining information DT includes n attributes alpha _a, and M _a number of values Vα ₀ ~ Vα _Ma-1 to the attribute alpha _a can assume, the value Vα ₀ ~ Vα _{Ma- 1} is stored in association with the order NB assigned to each. Note that the order in which the order NB is assigned to each of the values Vα ₀ to Vα _Ma−1 that the attribute α _a can take may be arbitrary.

つぎに、上述した構成を有する集計システムにおいて、集計装置１が端末２−０〜２−（Ｎ−１）から送信されてきた送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aそれぞれの撹乱後のクロス集計ｙを算出した場合、上述した一般的な技術よりも高速かつメモリ効率が高いことを、図７に示すフローチャートを参照して説明する。 Next, in the tabulation system having the above-described configuration, the tabulation device 1 transmits transmission data T ′ (0) _{a to} T ′ (N−1) _a transmitted from the terminals 2-0 to 2- (N−1). The calculation of the cross tabulation y after each disturbance will be described with reference to the flowchart shown in FIG. 7, which is faster and has higher memory efficiency than the general technique described above.

なお、端末２−０〜２−（Ｎ−１）とは互いに同じ動作を行うため、以下では、端末２−０の動作を例に挙げて説明する。 Since the terminals 2-0 to 2- (N-1) perform the same operation, the operation of the terminal 2-0 will be described below as an example.

図７に示すように、集計装置１は、それぞれの属性α_aごとにあらかじめ定められた維持確率ρ_a（０≦ρ_a≦１）と属性α_aとを、端末２−０〜２−（Ｎ−１）それぞれへ送信する（ステップＳ１１）。 As shown in FIG. 7, the aggregation device 1 maintains predetermined for each attribute alpha _a probability [rho _a and (0 ≦ ρ _a ≦ 1) and the attribute alpha _a, terminal 2-0～2- ( N-1) Transmit to each (step S11).

集計装置１から送信されてきた維持確率ρ_aと属性α_aとを受信した際、端末２−０の確率変更部２２−０は、各属性α_aに対して、実数の値を有する一様乱数ｒ_a（０≦ｒ_a≦１）と、整数の値を有する一様乱数ｚ_a（０≦ｚ_a≦Ｍ_a−１）とを生成する（ステップＳ１２）。 When the maintenance probability ρ _a and the attribute α _a transmitted from the counting device 1 are received, the probability changing unit 22-0 of the terminal 2-0 uniformly has a real value for each attribute α _a . a random number _{_{r a (0 ≦ r a ≦}} 1), to produce a uniform random number _{_{z a (0 ≦ z a ≦}} M a -1) having an integer value (step S12).

続いて、確率変更部２２−０は、集計装置１から送信されてきた属性α_aごとにあらかじめ定められた維持確率ρ_aと、該属性α_aそれぞれに対して生成した実数値を有する一様乱数ｒ_aとを比較する（ステップＳ１３）。 Then, the probability changing unit 22-0, and maintain the probability [rho _a predetermined for each attribute alpha _a, which has been transmitted from the aggregation device 1 uniformly with a real value that is generated for each the attribute alpha _a comparing the random number r _a (step S13).

そして、端末２−０の送信データ決定部２３−０は、当該比較の結果に応じて、集計装置１へ送信する送信データＴ´（０）_aを決定する（ステップＳ１４）。すると、通信部２１−０は、送信データ決定部２３−０が決定した送信データＴ´（０）_aを集計装置１へ送信する。 Then, the transmission data determination unit 23-0 of the terminal 2-0 determines transmission data T ′ (0) _a to be transmitted to the tabulation device 1 according to the comparison result (step S14). Then, the communication unit 21-0 transmits the transmission data T ′ (0) _a determined by the transmission data determination unit 23-0 to the counting device 1.

なお、確率変更部２２−０による比較の結果、一様乱数ｒ_aの値が維持確率ρ_a以下である場合（ｒ_a≦ ρ_a）、送信データ決定部２３−０は、ステップＳ１４において、該端末２−０の属性α_aの正しいデータそのものであるデータＴ（０）_aを、送信データＴ´（０）_aとして決定する。 As a result of comparison by the probability changing unit 22-0, when the value of the uniform random number r _a is less maintenance probability _{_{_{ρ a (r a ≦ ρ a}}} ), the transmission data determination unit 23-0 in step S14, data T (0) _a is the correct data itself attributes alpha _a of the terminal 2-0, is determined as transmission data T'(0) _a.

また、確率変更部２２−０による比較の結果、一様乱数ｒ_aの値が維持確率ρ_aよりも大きい場合（ｒ_a＞ ρ_a）、送信データ決定部２３−０は、ステップＳ１４において、記憶部２４−０が記憶しているデータ決定情報ＤＴに基づいて、属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1のうちからステップＳ１２にて生成した一様乱数ｚ_aと同じ値を有する順番ＮＢが付与された値（つまり、ｚ_a番目に存在する値）を、該端末２−０の属性α_aの送信データＴ´（０）_aとして決定する。 As a result of comparison by the probability changing unit 22-0, when the value of the uniform random number r _a is larger than the sustain probability _{_{_{ρ a (r a> ρ a}}} ), the transmission data determination unit 23-0 in step S14, Based on the data determination information DT stored in the storage unit 24-0, the same value as the uniform random number z _a generated in step S12 is selected from the values Vα ₀ to Vα _Ma-1 that the attribute α _a can take. The value to which the order NB is given (that is, the value that exists in the z _a th) is determined as the transmission data T ′ (0) _a of the attribute α _a of the terminal 2-0.

また、ステップＳ１４において、送信データ決定部２３−０は、集計装置１から送信されてきた維持確率ρ_aが「０」である場合にも、記憶部２４−０が記憶しているデータ決定情報ＤＴに基づいて、属性α_aが有する値Ｖα₀ 〜Ｖα_Ma-1のうちでｚ_a番の順番が付与された値を、該端末２−０の属性α_aの送信データＴ´（０）_aとして決定する。 In step S14, the transmission data determination unit 23-0 also stores the data determination information stored in the storage unit 24-0 even when the maintenance probability ρ _a transmitted from the totalization device 1 is “0”. Based on DT, among the values Vα ₀ to Vα _Ma−1 of the attribute α _a , the value assigned with the order of the z _a is used as the transmission data T ′ (0) of the attribute α _a of the terminal 2-0. It is determined as _a.

図６に示した例では、属性α_aが性別である場合、該属性α_aのうちの「男性」に０番の順番ＮＢが付与されていて、該属性α_aのうちの要素「女性」に１番の順番ＮＢが付与されている。この場合、一様乱数ｚ_aの値が１であれば、送信データ決定部２３−０は、当該一様乱数ｚ_aと同じ値の順番ＮＢ（１番）が付与された値（女性）を送信データＴ´（０）_aとして決定する。 In the example shown in FIG. 6, if the attribute α _a is gender, the order NB of No. 0 to "male" is has been granted one of the attribute α _a, element of the attribute α _a "female" Is assigned the first order NB. In this case, if the value of the uniform random number z _a is 1, the transmission data determination unit 23-0 sets the value (female) to which the order NB (No. 1) of the same value as the uniform random number z _a is assigned. transmitting data T'(0) is determined as _a.

なお、維持確率ρ_aが小さくなるに伴って、送信データＴ´（０）_aは一様にランダムな値に近づいていく。この場合、集計装置１が、端末２−０から送信されてきた送信データＴ´（０）_aを用いて、該端末２−０が記憶している真のデータＴ（０）_aを推定することは困難となる。 As the maintenance probability ρ _a becomes smaller, the transmission data T ′ (0) _a approaches a random value uniformly. In this case, the aggregation device 1, transmission data T'transmitted from the terminal 2-0 using a (0) _a, to estimate the true data T (0) _a to the terminal 2-0 has stored It becomes difficult.

なお、属性α_aのｋ番目の値Ｖ_k ^aから当該属性α_aのｌ番目の値Ｖ_l ^aへと変化する確率は、遷移確率行列Ａ_kl ^aとして表すことが可能である。この場合、各属性α_aに対する遷移確率行列Ａ^aは、各要素Ａ^a _klが以下の式５で表されるような行列であり、各属性に対する維持確率ρ_aと、各属性α_aが取り得る値の数「Ｍ_a」とから算出できる。なお、式５におけるδ_klは、「クロネッカーのデルタ」である。 Incidentally, the probability of changing from k-th value V _k ^a attribute alpha _a to l-th value V _l ^a of the attribute alpha _a may be represented as a transition probability matrix A _kl ^a. In this case, the transition probability matrix A _a for each attribute α _a is ^a matrix in which each element A ^a _kl is expressed by the following Equation 5, and the maintenance probability ρ _a for each attribute and each attribute α _a take It can be calculated from the number of values obtained “M _a ”. In Equation 5, _δkl is “Kronecker delta”.

そのため、維持確率ρ_aが小さくなるに伴って、式１に示した遷移確率行列Ａ^aの要素Ａ_pq（値が変化する確率）は、一様乱数における遷移確率の値「１／Ｍ_a」に単調に近づいていく。 Therefore, as the maintenance probability ρ _a becomes smaller, the element A _pq (the probability that the value changes) of the transition probability matrix A ^a shown in Expression 1 is the transition probability value “1 / M _a ” in a uniform random number. It approaches to monotony.

さらに、維持確率ρ_a＝０である場合、遷移確率行列Ａ^aが有する要素Ａ_pqは、一様乱数における遷移確率の値そのものである。 Further, when the maintenance probability ρ _a = 0, the element A _pq included in the transition probability matrix A ^a is the transition probability value itself in a uniform random number.

例えば、属性α_aが性別である場合、当該属性α_aが取り得る値の数Ｍ_aは「２」（男性と女性との２つ）であるから、当該属性α_aの遷移確率は「１／２」である。 For example, if the attribute alpha _a is gender, because the number M _a of the attribute alpha _a possible value is "2" (two males and females), the transition probabilities of the attribute alpha _a, "1 / 2 ".

また、維持確率ρ_a＝１である場合、遷移確率行列Ａ^aが有する要素Ａ_pqは、真のデータＴ（０）_a〜Ｔ（Ｎ−１）_aそのものとなる。 When the maintenance probability ρ _a = 1, the element A _pq included in the transition probability matrix A ^a is the true data T (0) _{a to} T (N−1) _a itself.

端末２−０〜２−（Ｎ−１）それぞれから送信データＴ´（０）_a〜送信データＴ´（Ｎ−１）_aが送信されてきた場合、集計装置１の集計部１３は、「撹乱後のクロス集計ｙ」を算出するために、まず、以下の式６で表される次数の横ベクトルである「撹乱後のクロス集計ｙ」を、遷移確率行列Ａ^aが有する全要素Ａ_pqに対して「０」を設定することにより、全要素Ａ_pqを初期化する（ステップＳ１５）。 If the terminal 2-0~2- (N-1) sent from each data T'(0) _a ~ transmission data T'(N-1) _a has been transmitted, totaling unit 13 of the collection apparatus 1, " In order to calculate the “cross tabulation y after disturbance”, first, all elements A _{pq of the} transition probability matrix A ^a have “cross tabulation y after disturbance”, which is a horizontal vector of the order expressed by the following Expression 6. All elements A _pq are initialized by setting “0” to (step S15).

続いて、集計部１３は、端末２−０〜２−（Ｎ−１）それぞれから送信されてきた送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aについて、撹乱後のクロス集計ｙにおける以下の式７に示す順番が付与された要素に、１を加算していく（ステップＳ１６）。なお、式７におけるＩ_aは、送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aに付与されている順番ＮＢを返すための関数である。 Subsequently, the totaling unit 13 performs cross tabulation after disturbance for the transmission data T ′ (0) _{a to} T ′ (N−1) _a transmitted from each of the terminals 2-0 to 2- (N−1). 1 is added to the element to which the order shown in the following Expression 7 in y is given (step S16). Note that I _a in Expression 7 is _a function for returning the order NB given to the transmission data T ′ (0) _{a to} T ′ (N−1) _a .

該加算により、集計部１３は、撹乱後のクロス集計ｙを算出する。 By the addition, the totaling unit 13 calculates the cross total y after disturbance.

すると、要素算出部１２は、レコード全体に対する遷移確率行列Ａの生成手法に従って、以下の式８で表されるようなｐ行ｑ列の要素Ａ_pqそれぞれを有する、式６に示した次数の正方行列である、レコード全体に対する遷移確率行列Ａを生成する（ステップＳ１７）。なお、集計装置１が生成した遷移確率行列Ａの行の数ｐおよび列の数ｑは、式２に示した値である。 Then, in accordance with the method for generating the transition probability matrix A for the entire record, the element calculation unit 12 has each of the elements A _pq of p rows and q columns as represented by the following Expression 8, and the square of the order shown in Expression 6 A transition probability matrix A for the entire record, which is a matrix, is generated (step S17). Note that the number of rows p and the number of columns q of the transition probability matrix A generated by the aggregation device 1 are the values shown in Equation 2.

非特許文献２に開示された一般的な技術においては、遷移確率行列の次数は、以下の式９で表される。 In the general technique disclosed in Non-Patent Document 2, the order of the transition probability matrix is expressed by Equation 9 below.

なお、式９に示した［log₂Ma］は、log ₂Ma 以上の最小の整数を表す。 [Log ₂ Ma] shown in Equation 9 represents a minimum integer equal to or greater than log ₂ Ma.

一方、本発明の集計装置１においては、遷移確率行列Ａの次数は式６に示した次数である。 On the other hand, in the counting device 1 of the present invention, the order of the transition probability matrix A is the order shown in Equation 6.

このように、本発明の集計装置１が撹乱後のクロス集計ｙの算出に用いる式６に示した次数と、一般的な技術にてクロス集計結果の算出に用いる式９に示した次数との間の差異により、集計装置１では、レコード全体に対する遷移確率行列Ａに費やすメモリ使用量を、例えば、非特許文献２に開示されたような一般的な技術に比べて小さく抑えることができる。 As described above, the order shown in Expression 6 used by the counting device 1 of the present invention for calculating the cross-count y after disturbance and the order shown in Expression 9 used for calculating the cross-tabulation result by a general technique. Due to the difference between them, the totaling device 1 can suppress the memory usage consumed for the transition probability matrix A for the entire record to be smaller than that of a general technique disclosed in Non-Patent Document 2, for example.

その後、推定部１４は、要素算出部１２が算出した「遷移確率行列Ａ」と集計部１３が算出した「撹乱後のクロス集計ｙ」とを用いて推定クロス集計ｘⁱ⁺¹を算出し、該推定クロス集計ｘⁱ⁺¹を出力する（ステップＳ１８）。なお、推定クロス集計ｘⁱ⁺¹とは、真のクロス集計ｘの推定値のことを指す。 Thereafter, the estimation unit 14 calculates the estimated cross tabulation x ^{i + 1} using the “transition probability matrix A” calculated by the element calculation unit 12 and the “cross tabulation y after disturbance” calculated by the tabulation unit 13, The estimated cross tabulation x ^{i + 1} is output (step S18). Note that the estimated cross tabulation x ^{i + 1} indicates an estimated value of the true cross tabulation x.

ステップＳ１８において、推定部１４は、はじめ、ｉ＝０、ｘ⁰＝ｙを初期値として、式６に示した次数を有する横ベクトルとして表される（ｉ＋１）番目の「推定クロス集計ｘⁱ⁺¹」を、ｘⁱ⁺¹＝ｘⁱ・（（ｙ／（ｘⁱＡ））Ａ^t）として算出していく。 In step S18, the estimation unit 14 firstly sets i = 0, x ⁰ = y as initial values, and represents the (i + 1) -th “estimated cross tabulation x ^{i +} expressed as a horizontal vector having the order shown in Expression 6. ¹ "is calculated as x ^{i + 1} = x ⁱ · ((y / (x ⁱ A)) A ^t ).

ここで、「・」は、ベクトルの内積を示す演算記号である。また、「／」は、ベクトルの成分ごとの除算を示す演算記号である。また、Ａ^tは、行列Ａの転置行列を表す。 Here, “·” is an operation symbol indicating an inner product of vectors. “/” Is an operation symbol indicating division for each vector component. Also, A ^t represents a transposed matrix of the matrix A.

そして、推定部１４は、ステップＳ１８における推定クロス集計ｘⁱ⁺¹の算出を実行していて｜ｘⁱ⁺¹−ｘⁱ｜_L1 ≦εN となった場合、ｉに対する推定クロス集計ｘⁱ⁺¹を、真のクロス集計ｘの推定値として出力する。 Then, when the estimation unit 14 calculates the estimated cross tabulation x ^{i + 1} in step S18 and | x ^{i + 1} −x ⁱ | _L1 ≦ εN, the estimation cross tabulation x ^{i + 1 for i is performed.} Is output as the estimated value of the true cross tabulation x.

ここで、｜ｘⁱ⁺¹−ｘⁱ｜_L1は、（ｘⁱ⁺¹−ｘⁱ）のＬ１ノルムであり、行列の成分ごとの差の絶対値の総和を示す演算記号である。また、εはあらかじめ設定された実数である。 Here, | x ^{i + 1} −x ⁱ | _L1 is an L1 norm of (x ^{i + 1} −x ⁱ ) and is an operation symbol indicating the sum of absolute values of differences for each component of the matrix. Ε is a preset real number.

なお、ステップＳ１８に示した処理は、ある結果（例えば、撹乱後のクロス集計ｙ）が得られた際に、当該結果を反映した下での事後確率を求めるための「ベイズの定理」に基づいている。 Note that the processing shown in step S18 is based on the “Bayesian theorem” for obtaining a posterior probability that reflects the result when a certain result (for example, cross tabulation y after disturbance) is obtained. ing.

ベイズの定理を用いた場合、あるレコードの撹乱後のクロス値Ｑがｑであること（この例では、撹乱後のクロス集計ｙ）が既知である場合に、該レコードの真のクロス値Ｐがｐである事後確率Ｐｒ（Ｐ＝ｐ｜Ｑ＝ｑ）は、式６に示した値をＭとして、真のクロス集計をｘとした場合、以下の式１０で表される。 When Bayes' theorem is used, if it is known that the cross value Q after disturbance of a record is q (in this example, the cross tabulation y after disturbance), the true cross value P of the record is The posterior probability Pr (P = p | Q = q), which is p, is expressed by the following Expression 10 where M is the value shown in Expression 6 and x is the true cross tabulation.

さらに、クロス値Ｑがｑである事後確率Ｐｒ（Ｑ＝ｑ）については、以下の式１１が成立する。 Further, for the posterior probability Pr (Q = q) where the cross value Q is q, the following expression 11 is established.

そのため、式１１に示した関係を用いると、真のクロス値Pがｐである事後確率Ｐｒ（Ｐ＝ｐ）は、以下の式１２で表すことができる。 Therefore, using the relationship shown in Expression 11, the posterior probability Pr (P = p) where the true cross value P is p can be expressed by Expression 12 below.

ここで、式１０に示した関係式と、式１２に示した関係式とを用いると、真のクロス集計ｘ_pは、以下の式１３で表すことができる。 Here, when the relational expression shown in Expression 10 and the relational expression shown in Expression 12 are used, the true cross tabulation x _p can be expressed by the following Expression 13.

以上より、ｘ＝ｘ・（（ｙ／（ｘＡ））Ａ^t）という方程式を得ることができる。 From the above, the equation x = x · ((y / (xA)) A ^t ) can be obtained.

ステップＳ１８において（ｉ＋１）番目の「推定クロス集計ｘⁱ⁺¹」を算出する際、集計装置１は、上述した方程式ｘ＝ｘ・（（ｙ／（ｘＡ））Ａ^t）を用いている。 When calculating the (i + 1) -th “estimated cross tabulation x ^{i + 1} ” in step S18, the tabulation apparatus 1 uses the above-described equation x = x · ((y / (xA)) A ^t ).

なお、推定部１４は、ステップＳ１８においてレコード全体に対する遷移確率行列Ａについての乗算を実行する。しかしながら、当該乗算を行うときの計算量は、該遷移確率行列Ａが有する次数の２乗に比例して増大し、本発明の集計装置１が実行する演算のなかで最も多くの計算量を伴う演算である。しかしながら、当該集計装置１の要素算出部１２は、レコード全体に対する遷移確率行列Ａを算出し、例えば、非特許文献２に開示されたような一般的な技術よりも、属性α_aが取り得る値の数Ｍ_aに対して遷移確率行列Ａが有する次数が非常に小さなものとなる。 In addition, the estimation part 14 performs the multiplication about the transition probability matrix A with respect to the whole record in step S18. However, the amount of calculation when performing the multiplication increases in proportion to the square of the order of the transition probability matrix A, and involves the largest amount of calculation among the operations executed by the counting device 1 of the present invention. It is an operation. However, the element calculation unit 12 of the aggregation device 1 calculates the transition probability matrix A for the entire record, and is a value that the attribute α _a can take, for example, compared to a general technique disclosed in Non-Patent Document 2. The order of the transition probability matrix A is very small with respect to the number M _a .

そのため、集計装置１においては、真のクロス集計ｘの推定値である推定クロス集計ｘⁱ⁺¹の算出に要する時間が一般的な技術よりも非常に短くできるとともに、該推定クロス集計ｘⁱ⁺¹を算出するときに集計装置１が使用するメモリ使用量についても小さなものとできる。
（実施形態２）
つぎに、実施形態２の集計装置１について説明する。 Therefore, in the totaling device 1, the time required for calculating the estimated cross tabulation x ^{i + 1} that is an estimated value of the true cross tabulation x can be made much shorter than that of a general technique, and the estimated cross tabulation x ^{i + The} amount of memory used by the counting device 1 when calculating ¹ can also be reduced.
(Embodiment 2)
Next, the counting device 1 according to the second embodiment will be described.

実施形態２の集計装置１の構成および端末２−０〜２−（Ｎ−１）の構成それぞれは、実施形態１における集計装置１の構成および端末２−０〜２−（Ｎ−１）の構成と同じである。 The configuration of the counting device 1 of the second embodiment and the configuration of the terminals 2-0 to 2- (N-1) are respectively the configurations of the counting device 1 and the terminals 2-0 to 2- (N-1) of the first embodiment. Same as the configuration.

ただし、実施形態２の集計装置１は、個別に指定したクロス値に対する推定クロス集計を算出する点で、実施形態１の集計装置１と異なっている。なお、以下では、複数の属性に対する値を１つの値とみなした値であるクロス値として、ベクトルｖを指定する場合を例に挙げて説明する。 However, the tabulation device 1 of the second embodiment is different from the tabulation device 1 of the first embodiment in that an estimated cross tabulation is calculated for individually designated cross values. In the following, a case where the vector v is designated as a cross value that is a value obtained by regarding values for a plurality of attributes as one value will be described as an example.

以下に、上述した集計システムにおいて、実施形態２の集計装置１が推定クロス集計ｘⁱ⁺¹を算出する動作を、図８に示すフローチャートを参照して説明する。 In the following, the operation in which the aggregation device 1 of the second embodiment calculates the estimated cross aggregation x ^{i + 1} in the aggregation system described above will be described with reference to the flowchart shown in FIG.

図８に示すように、まず、集計装置１は、複数の属性に対する値を１つの値とみなしたクロス値として、ベクトルｖを指定する入力を、入力部（図示せず）により利用者から受付ける（ステップＳ２１）。 As illustrated in FIG. 8, first, the aggregation device 1 accepts an input for designating a vector v as a cross value in which values for a plurality of attributes are regarded as one value, from a user by an input unit (not shown). (Step S21).

なお、当該ベクトルｖに含まれている各成分ｖ_aは、推定クロス集計ｘⁱ⁺¹の算出を行う際の属性α_aそれぞれに対する値（例えば、図４に示した「１０代」と「男性」）の指定である。 Each component contained in the vector v v _a is estimated crosstab x ^{i + 1} Attribute alpha _a value for each time of performing the calculation of (e.g., shown in FIG. 4 as "teens", "male )).

すると、集計装置１は、属性α_aごと（例えば、「年代」や「性別」）にあらかじめ定められた維持確率ρ_a（０≦(_a≦１）と属性α_aとを、端末２−０〜２−（Ｎ−１）それぞれへ送信する（ステップＳ２２）。 Then, the totalization device 1 sets the maintenance probability ρ _a (0 ≦ ( _a ≦ 1)) and the attribute α _a that are predetermined for each attribute α _a (for example, “age” and “gender”) to the terminal 2-0. To 2- (N-1) (step S22).

集計装置１から維持確率ρ_aと属性α_aとが送信されてきた場合、端末２−０の確率変更部２２−０は、属性α_aそれぞれに対して、実数の値を有する一様乱数ｒ_a（０≦ｒ_a≦１）と、整数の値を有する一様乱数ｚ_a（０≦ｚ_a≦Ｍ_a−１）とを生成する（ステップＳ２３）。 When the maintenance probability ρ _a and the attribute α _a are transmitted from the counting device 1, the probability changing unit 22-0 of the terminal 2-0 has a uniform random number r having a real value for each attribute α _a. _a (0 ≦ r _a ≦ 1) and a uniform random number z _a (0 ≦ z _a ≦ M _a −1) having an integer value are generated (step S23).

続いて、確率変更部２２−０は、属性α_aごと（例えば、「年代」や「性別」）に、集計装置１から送信されてきた維持確率ρ_aと、該属性α_aに対して生成した実数値を有する一様乱数ｒ_aとを比較する（ステップＳ２４）。 Subsequently, the probability changing unit 22-0 generates, for each attribute α _a (for example, “age” and “gender”), the maintenance probability ρ _a transmitted from the tabulation device 1 and the attribute α _a . comparing the uniform random number r _a having real values (step S24).

そして、送信データ決定部２３−０は、確率変更部２２−０による比較の結果に応じて、集計装置１へ送信する送信データＴ´（０）_aを決定する（ステップＳ２５）。 Then, the transmission data determination unit 23-0 determines transmission data T ′ (0) _a to be transmitted to the tabulation device 1 according to the comparison result by the probability changing unit 22-0 (step S25).

ステップＳ２５において、送信データ決定部２３−０は、比較の結果、一様乱数ｒ_aの値が維持確率ρ_a以下である場合（ｒ_a≦ ρ_a）、当該端末２−０の属性α_aの正しいデータそのものであるデータＴ（０）_aを、送信データＴ´（０）_aとして決定する。 In step S25, when the value of the uniform random number ra is equal to or less than the maintenance probability ρ _a (r _a ≦ ρ _a ), the transmission data determination unit 23-0 determines the attribute α _{a of the} terminal 2-0 as _a result of the comparison. correct data itself a data T (0) _a is a is determined as the transmission data T'(0) _a.

また、ステップＳ２５において、送信データ決定部２３−０は、比較の結果、一様乱数ｒ_aの値が維持確率ρ_aよりも大きい場合（ｒ_a＞ ρ_a）、または、維持確率ρ_aが「０」である場合、記憶部２４−０が記憶しているデータ決定情報ＤＴに基づいて、属性α_aが取り得る値Ｖα₀ 〜Ｖα_Ma-1のうちからステップＳ２３にて生成した一様乱数ｚ_aと同じ値を有する順番ＮＢが付与された値（つまり、ｚ_a番目に存在する値）を、該端末２−０の属性α_aの送信データＴ´（０）_aとして集計装置１へ送信する。 Further, in step S25, the transmission data determination unit 23-0, the result of the comparison, when the value of the uniform random number r _a is larger than the sustain probability _{_{_{ρ a (r a> ρ a}}} ), or, maintaining probability [rho _a is In the case of “0”, based on the data determination information DT stored in the storage unit 24-0, the uniform value generated in step S23 from the values Vα ₀ to Vα _Ma−1 that the attribute α _a can take. The aggregation device 1 uses, as the transmission data T ′ (0) _a of the attribute α _a of the terminal 2-0, the value to which the order NB having the same value as the random number z _a is assigned (that is, the value that exists in the z _a number) Send to.

このとき、維持確率ρ_aが小さくなるに伴って送信データＴ´（０）_aは一様にランダムな値に近づく。そのため、集計装置１が、端末２−０から送信されてきた送信データＴ´（０）_aを用いて、該端末２−０が記憶している真のデータＴ（０）_aを推定することは困難となる。 At this time, the transmission data T ′ (0) _a uniformly approaches a random value as the maintenance probability ρ _a decreases. Therefore, totaling device 1, transmission data T'transmitted from the terminal 2-0 using a (0) _a, true data T (0) to the terminal 2-0 is stored to estimate _a Will be difficult.

その後、通信部２１−０は、ステップＳ２５にて送信データ決定部２３−０が決定した送信データＴ´（０）_aを集計装置１へ送信する。 Thereafter, the communication unit 21-0 transmits the transmission data T ′ (0) _a determined by the transmission data determination unit 23-0 in Step S25 to the counting device 1.

なお、端末２−１〜２−（Ｎ−１）それぞれも、それぞれの比較の結果に応じた送信データＴ´（１）_a〜Ｔ´（Ｎ−１）_aそれぞれを集計装置１へ送信する。 Each of the terminals 2-1 to 2- (N-1) also transmits transmission data T '(1) _{a to} T' (N-1) _a corresponding to the result of the comparison to the counting device 1. .

端末２−０〜２−（Ｎ−１）から送信データＴ´（０）_a〜Ｔ´（Ｎ−１）_aが送信されてきた場合、集計装置１の集計部１３は、「撹乱後のクロス集計ｙ」を算出するための処理を実行する。 When the transmission data T ′ (0) _{a to} T ′ (N−1) _a are transmitted from the terminals 2-0 to 2- (N−1), the counting unit 13 of the counting device 1 “ A process for calculating “cross tabulation y” is executed.

集計部１３は、まず、式６に示した次数の横ベクトルである撹乱後のクロス集計ｙを、遷移確率行列Ａ^aが有する全要素に対して０を設定することにより、全要素を初期化する（ステップＳ２６）。 Totaling unit 13 first by setting the 0 cross tabulation y after disturbance is a lateral vector of the following number shown in Formula 6, with respect to all elements with the transition probability matrix A ^a, initializing all elements (Step S26).

続いて、集計部１３は、端末２−０〜２−（Ｎ−１）それぞれに対して、撹乱後のクロス集計ｙにおける式３に示した順番の要素に、「１」を加算していく（ステップＳ２７）。該加算により、集計部１３は、撹乱後のクロス集計ｙを算出する。 Subsequently, the totaling unit 13 adds “1” to the elements in the order shown in Expression 3 in the cross tabulation y after the disturbance for each of the terminals 2-0 to 2- (N−1). (Step S27). By the addition, the totaling unit 13 calculates the cross total y after disturbance.

さらに、集計装置１の推定部１４は、各属性α_aに対する遷移確率行列Ａ^aの逆行列（Ａ^a）^-1を算出する（ステップＳ２８）。 Furthermore, the estimation unit 14 of the totalization device 1 calculates an inverse matrix (A ^a ) ⁻¹ of the transition probability matrix A ^a for each attribute α _a (step S28).

逆行列（Ａ^a）^-1を算出した場合、さらに、推定部１４は、各属性α_aに対する値としてベクトルｖに含まれている成分ｖ_aを有するレコードの数、つまり、クロス値として指定されたベクトルｖに対する以下の式１４に示すクロス集計を算出して出力する（ステップＳ２９）。 Inverse when calculating the (A ^a) ^-1, further, the estimation unit 14, the number of records having components v _a contained in the vector v as the value for each attribute alpha _a, i.e., designated as a cross value The cross tabulation shown in the following equation 14 for the vector v is calculated and output (step S29).

真のクロス集計ｘが与えられた場合、撹乱後のクロス集計ｙの期待値のベクトルＥ（Ｙ）は、Ｅ（Ｙ）＝ｘＡで表すことができる。そのため、｜Ａ｜≠０である場合、Ｅ（ＹＡ^-1）＝Ｅ（Ｙ）Ａ^-1＝ｘが成立する。すなわち、ｙＡ^-1は真のクロス集計ｘの近似であることが期待できる。 When a true cross tabulation x is given, an expected value vector E (Y) of the cross tabulation y after the disturbance can be expressed by E (Y) = xA. Therefore, when | A | ≠ 0, E (YA ⁻¹ ) = E (Y) A ⁻¹ = x is established. That is, yA ⁻¹ can be expected to be an approximation of the true cross tabulation x.

ここで、クロネッカー積を用いると、式１に示した各要素を有するようなレコード全体に対する遷移確率行列Ａは、以下の式１５で表すことが可能である。 Here, when the Kronecker product is used, the transition probability matrix A for the entire record having each element shown in Expression 1 can be expressed by Expression 15 below.

式１５に示した関係を用いると、クロネッカー積の性質から、遷移確率行列Ａの逆行列Ａ^-1は、以下の式１６で表すことができる。 Using the relationship shown in Equation 15, the inverse matrix A ⁻¹ of the transition probability matrix A can be expressed by Equation 16 below from the nature of the Kronecker product.

このことから、遷移確率行列Ａと同様に、逆行列Ａ^-1が有する各成分は、以下の式１７で表すことができる。 From this, similarly to the transition probability matrix A, each component of the inverse matrix A ⁻¹ can be expressed by the following Expression 17.

そのため、真のクロス集計ｘは、ｘ＝ｙＡ^-1という関係に基づいて推定することが可能である。 Therefore, the true cross tabulation x can be estimated based on the relationship x = yA ⁻¹ .

また、逆行列Ａ^-1のｐ列目の各成分によって構成される縦ベクトルを縦ベクトルＡ_p ^-1とした場合、（Ａ_p ^-1）_q＝Ａ_pq ^-1が成立する。 Further, when a vertical vector constituted by each component of the p-th column of the inverse matrix A ⁻¹ is a vertical vector A _p ⁻¹ , (A _p ⁻¹ ) _q = A _pq ⁻¹ is established.

そのため、真のクロス集計ｘ_pは、逆行列Ａ^-1のｐ列目以外の成分を用いることなく、当該逆行列Ａ^-1のｐ列目の各成分を用いて、以下の式１８として直接算出することが可能である。 Therefore, the true cross-tabulation x _p, without using a component other than p-th column of the inverse matrix A ^-1, using the components of the p-th column of the inverse matrix A ^-1, directly as the following equation 18 It is possible to calculate.

ここで、ｐとして、クロス値であるベクトルｖに対応する以下の式１９に示す数を代入する。 Here, as p, a number shown in the following Expression 19 corresponding to the vector v which is a cross value is substituted.

すると、Ｉ_a（ｖ_a）＝ｄ_p ^aの関係が導出できる。そのため、真のクロス集計ｘ_pは、式１４に示した値により算出することができる。 Then, I _{_a} (v _a) = relationship of d _p ^a can be derived. Therefore, the true cross tabulation x _p can be calculated by the value shown in Expression 14.

実施形態２において推定部１４が真のクロス集計ｘの推定値を算出する場合、推定クロス集計の算出に要する所要時間、および、該クロス集計を算出する際に使用するメモリ使用量ともに、遷移確率行列Ａの次数に対して線形となる。 In the second embodiment, when the estimation unit 14 calculates the estimated value of the true cross tabulation x, both the time required for calculating the estimated cross tabulation and the memory usage used when calculating the cross tabulation are both transition probabilities. Linear with respect to the order of the matrix A.

なお、最終的には、各属性α_aに対する値ｖ_aをそれぞれ有するようなレコードの数は、（ｉ＋１）番目の推定クロス集計ｘⁱ⁺¹における式１９に示した順番が付与された要素が表している。 In the end, the number of records each having _a value va for each attribute α _{a is} determined by the element to which the order shown in Expression 19 in the (i + 1) th estimated cross tabulation x ^{i + 1} is given. Represents.

以上説明したように、実施形態１及び実施形態２においては、レコード全体に対する遷移確率行列Ａの次数として、各属性α_aが取り得る値の数Ｍ_aの２進表現よりも値が小さな各属性α_aが取り得る値の個数Ｍ_aを用いて、推定クロス集計を算出する。これにより、個々のデータを秘匿しつつ推定クロス集計を算出する際に、集計装置１が使用するメモリ使用量を抑えることができる。 As described above, in the first and second embodiments, each attribute having _a smaller value than the binary representation of the number of values Ma that each attribute α _a can take as the order of the transition probability matrix A for the entire record. The estimated cross tabulation is calculated using the number M _a of values that α _a can take. Thereby, when calculating the estimated cross tabulation while keeping individual data secret, the memory usage used by the tabulation apparatus 1 can be suppressed.

さらに、メモリ使用量を抑えることにより、推定クロス集計の集計処理中に最も多くの計算量を伴うボトルネックとなる、レコード全体に対する遷移確率行列Ａに対する演算もより短い時間で行うことが可能となる。
（実施形態３）
上述した実施形態１において、レコード全体に対する遷移確率行列Ａが有する次数Ｍは、例えば非特許文献２に開示されたような一般的な技術と比べ、属性α_aが取り得る値の個数Ｍ_aに対して非常に小さなものとすることができる。 Furthermore, by suppressing the memory usage, it is possible to perform the calculation for the transition probability matrix A for the entire record, which becomes a bottleneck with the largest amount of calculation during the estimation cross tabulation processing, in a shorter time. .
(Embodiment 3)
In the first embodiment described above, the order M of the transition probability matrix A for the entire record is, for example, the number of values M _a that the attribute α _a can take compared to the general technique disclosed in Non-Patent Document 2. On the other hand, it can be very small.

そのため、推定クロス集計の算出に要する時間を、一般的な技術よりも非常に短くすることができる。また、それとともに、推定クロス集計を算出するときに集計装置１が使用するメモリ使用量も小さなものとすることができる。なお、以降、レコード全体に対する遷移確率行列Ａを単に、遷移確率行列Ａという。 Therefore, the time required for calculating the estimated cross tabulation can be much shorter than that of a general technique. At the same time, the amount of memory used by the counting device 1 when calculating the estimated cross tabulation can be reduced. Hereinafter, the transition probability matrix A for the entire record is simply referred to as a transition probability matrix A.

ここで、上述した式２に示したように、遷移確率行列Ａの行の数ｐ及び列の数ｑはＭ個となる。そのため、遷移確率行列Ａは、Ｍ次の正方行列である。一般的に、Ｍ次の正方行列をメモリ上に展開するときのメモリ使用量は、「正方行列の要素のサイズ×Ｍ²」となる。また、遷移確率行列Ａが有する次数Ｍは、上述した式６に示したように、属性α_aの取り得る値の個数Ｍ_aを用いて表される。 Here, as shown in Equation 2 described above, the number of rows p and the number of columns q of the transition probability matrix A are M. Therefore, the transition probability matrix A is an M-order square matrix. In general, the amount of memory used when an M-th order square matrix is expanded on a memory is “size of square matrix element × M ² ”. In addition, the degree M of the transition probability matrix A is expressed using the number of values M _a that the attribute α _a can take, as shown in Equation 6 described above.

従って、属性α_aの取り得る値の個数Ｍ_aが多くなると、遷移確率行列Ａをメモリ上に展開したときのメモリ使用量は、飛躍的に増加してしまう。 Therefore, when the number of values M _a that the attribute α _a can take increases, the amount of memory used when the transition probability matrix A is expanded on the memory increases dramatically.

そこで、本実施形態では、遷移確率行列Ａを生成することなく、推定クロス集計を算出する場合について説明する。 Therefore, in the present embodiment, a case where the estimated cross tabulation is calculated without generating the transition probability matrix A will be described.

なお、本実施形態における集計システムの構成は、上述した実施形態１における構成と同様なので、ここでは、構成の説明は省略する。 In addition, since the structure of the totalization system in this embodiment is the same as the structure in Embodiment 1 mentioned above, description of a structure is abbreviate | omitted here.

以下に、本実施形態において推定クロス集計を算出する場合の動作を説明する。 Hereinafter, an operation when calculating the estimated cross tabulation in the present embodiment will be described.

まず、遷移確率行列Ａは、クロネッカー積を用いると、上述した式１５で表すことが可能である。 First, the transition probability matrix A can be expressed by Equation 15 described above using a Kronecker product.

そして、クロネッカー積を用いて表された遷移確率行列Ａを分割したＢ^jを設定する。以下の式２０にＢ^jを示す。なお、式２０においてＪは分割数であり、ｍ_-1＝０とする。 Then, B ^j obtained by dividing the transition probability matrix A expressed using the Kronecker product is set. B ^j is shown in Equation 20 below. In Equation 20, J is the number of divisions, and m ₋₁ = 0.

そして、上記の式２０に示すようなＢ^jをメモリ上に展開したとき、各Ｂ^jのサイズの総和が、予め決められ、推定クロス集計の算出用に割り当て可能なメモリサイズに収まるようにする。各Ｂ^jをメモリ上に展開したときのサイズの総和を以下の式２１に示す。 Then, when B ^j as shown in the above equation 20 is expanded on the memory, the total sum of the sizes of each B ^j is determined in advance so that it falls within the memory size that can be allocated for calculation of the estimated cross tabulation. . The sum of the sizes when each B ^j is expanded on the memory is shown in Equation 21 below.

ここで、具体的な分割の方法としては、ａの値が属性α_aの数ｎ未満である場合に（ａ＜ｎ）、Ｂ^a＝Ａ^aとする方法がある。以降、この方法のことを分割方法１という。つまり、分割方法１では、属性α_a毎に複数の遷移確率行列を生成することとなる。以降、属性α_a毎の複数の遷移確率行列のそれぞれを属性別遷移確率行列という。 Here, as a specific division method, there is a method of setting B ^a = A ^a when the value of a is less than the number n of the attributes α _a (a <n). Hereinafter, this method is referred to as division method 1. That is, in the division method 1, and generating a plurality of transition probability matrix for each attribute alpha _a. Hereinafter, each of the plurality of transition probability matrices for each attribute α _{a is referred} to as an attribute-specific transition probability matrix.

分割方法１において要素算出部１２は、要素Ａ_pqを属性毎に算出する。そして、要素算出部１２は、属性毎に算出された要素Ａ_pqから、複数の属性別遷移確率行列を生成する。 In the division method 1, the element calculation unit 12 calculates the element A _pq for each attribute. Then, the element calculation unit 12 generates a plurality of attribute-specific transition probability matrices from the element A _pq calculated for each attribute.

そして、推定部１４は、要素算出部１２が算出した複数の属性別遷移確率行列と、集計部１３が算出した撹乱後のクロス集計ｙとに基づき、上述した実施形態１におけるステップＳ１８と同様の処理を実行する。これにより、推定クロス集計が算出される。 And the estimation part 14 is the same as that of step S18 in Embodiment 1 mentioned above based on the some transition probability matrix classified by attribute which the element calculation part 12 calculated, and the cross tabulation y after the disturbance which the calculation part 13 calculated. Execute the process. Thereby, the estimated cross tabulation is calculated.

また、上述した分割方法１以外にも例えば、属性別遷移確率行列をメモリ上に展開したときのサイズに基づき、少なくとも１つの属性別遷移確率行列を含む遷移確率行列である分割遷移確率行列を複数生成する方法がある。以降、この方法のことを分割方法２という。 In addition to the division method 1 described above, for example, a plurality of divided transition probability matrices that are transition probability matrices including at least one attribute-specific transition probability matrix based on the size when the attribute-specific transition probability matrix is expanded on the memory. There is a way to generate. Hereinafter, this method is referred to as a division method 2.

以下に、本実施形態の分割方法２において分割遷移確率行列を生成する場合の動作について説明する。なお、以下の説明においては、予め決められ、推定クロス集計の算出用に割り当て可能なメモリサイズをＬと表記し、行列Ｘをメモリ上に展開したときのサイズをｓｉｚｅＸと表記する。 Hereinafter, an operation in the case of generating a division transition probability matrix in the division method 2 of the present embodiment will be described. In the following description, a memory size that is determined in advance and can be allocated for calculation of the estimated cross tabulation is denoted as L, and a size when the matrix X is expanded on the memory is denoted as sizeX.

図９は、実施形態３の集計システムにおいて、分割遷移確率行列を生成するときの動作シーケンスを示す図である。 FIG. 9 is a diagram illustrating an operation sequence when a divided transition probability matrix is generated in the aggregation system according to the third embodiment.

まず、要素算出部１２は、遷移確率行列Ａと同じクロネッカー積で表される行列Ｂを定義する。そして、ａの値が属性α_aの数ｎ未満である場合に（ａ＜ｎ）、行列Ｂを構成する各Ｂ^jの初期値を以下の式２２に示すように設定する（ステップＳ３１）。つまり、この段階では、各Ｂ^jは、上述した属性別遷移確率行列となる。 First, the element calculation unit 12 defines a matrix B represented by the same Kronecker product as the transition probability matrix A. Then, when the value of a is less than the number n of the attributes α _a (a <n), the initial value of each B ^j constituting the matrix B is set as shown in the following Expression 22 (step S31). That is, at this stage, each B ^j becomes the attribute-specific transition probability matrix described above.

次に、要素算出部１２は、以下の式２３に示すように各Ｂ^jのサイズの総和Ｓ_iを計算する（ステップＳ３２）。なお、式２３においてｉは、０から始まる反復のカウンタである。 Next, the element calculation unit 12 calculates the sum S _i of the sizes of each B ^j as shown in the following Expression 23 (step S32). In Equation 23, i is an iterative counter starting from 0.

次に、要素算出部１２は、以下の式２４に示す範囲において、以下の式２５に示すような、２つのＢ^jをクロネッカー積に置き換えた場合のサイズの増加量を算出する（ステップＳ３３）。以降、２つのＢ^jをクロネッカー積に置き換えた場合のサイズの増加量のことを単に増加量という。なお、ここでは、式２４に示す範囲のｊについて、式２５に示すような計算が行われるため、複数の増加量が算出される。また、算出された複数の増加量のそれぞれは、クロネッカー積のそれぞれと対応付けられる。 Next, the element calculation unit 12 calculates the amount of increase in size when two B ^j are replaced with the Kronecker product as shown in the following expression 25 within the range shown in the following expression 24 (step S33). . Hereinafter, the increase in size when two B ^j are replaced with the Kronecker product is simply referred to as an increase. Here, since the calculation shown in Expression 25 is performed for j in the range shown in Expression 24, a plurality of increases are calculated. In addition, each of the plurality of calculated increases is associated with each of the Kronecker products.

次に、要素算出部１２は、算出された複数の増加量のうち、最も値の小さな増加量である最小増加量が、ＬとＳ_iとの差よりも大きいかどうかを判定する（ステップＳ３４）。 Next, element calculation unit 12, among the plurality of increasing amount calculated, the minimum increment is small increment of the most value, determine if it is greater than the difference between the L and S _i (step S34 ).

ステップＳ３４における判定の結果、最小増加量がＬとＳ_iとの差よりも大きな場合、要素算出部１２は、行列Ｂを構成する各Ｂ^jのそれぞれを分割遷移確率行列とする。 As a result of the determination in step S34, when the minimum increase amount is larger than the difference between L and S _i , the element calculation unit 12 sets each B ^j constituting the matrix B as a divided transition probability matrix.

そして、要素算出部１２は、分割遷移確率行列のそれぞれに含まれる属性別遷移確率行列の要素Ａ_pqを算出することにより、複数の分割遷移確率行列を生成する（ステップＳ３５）。そして、処理が終了する。 Then, the element calculating unit 12 generates a plurality of divided transition probability matrices by calculating the element _Apq of the attribute-specific transition probability matrix included in each of the divided transition probability matrices (step S35). Then, the process ends.

一方、ステップＳ３４における判定の結果、最小増加量がＬとＳ_iとの差以下である場合には、要素算出部１２は、当該最小増加量に対応するクロネッカー積を算出する（ステップＳ３６）。 On the other hand, the result of determination in step S34, the minimum increment if less difference between L and S _i, the element calculation unit 12 calculates the Kronecker product corresponding to the minimum increment (step S36).

そして、要素算出部１２は、以下の式２６に示す範囲において、最小増加量が算出されたｊの値に応じ、以下の式２７に示すように、行列Ｂを各Ｂ^j'で構成される行列として再定義する（ステップＳ３７）。 Then, the element calculation unit 12 is configured with each matrix B ^{j ′} as shown in the following expression 27 according to the value of j for which the minimum increase amount is calculated in the range shown in the following expression 26. It is redefined as a matrix (step S37).

そして、ステップＳ３２の動作へ遷移し、要素算出部１２は、反復のカウンタｉを１つインクリメントするとともにｊ'をｊとして、ステップＳ３７において再定義された行列Ｂを構成する各Ｂ^jのサイズの総和を計算する。 Then, the process proceeds to the operation of step S32, and the element calculation unit 12 increments the iteration counter i by 1 and j ′ is j, and the size of each B ^j constituting the matrix B redefined in step S37. Calculate the sum.

上述したステップＳ３２〜Ｓ３７の動作は、最小増加量がＬとＳ_iとの差よりも大きくなるまで繰り返される。最小増加量がＬとＳ_iとの差よりも大きくなると、上述したステップＳ３５で説明したように、複数の分割遷移確率行列が生成される。 The operations in steps S32 to S37 described above are repeated until the minimum increase amount becomes larger than the difference between L and S _i . The minimum increment is greater than the difference between the L and S _i, as described in step S35 described above, a plurality of divided transition probability matrix is generated.

そして、推定部１４は、要素算出部１２にて生成された複数の分割遷移確率行列と、集計部１３が算出した撹乱後のクロス集計ｙとに基づき、上述した実施形態１におけるステップＳ１８と同様の処理を実行する。これにより、推定クロス集計が算出される。 And the estimation part 14 is the same as step S18 in Embodiment 1 mentioned above based on the some division | segmentation transition probability matrix produced | generated in the element calculation part 12, and the cross tabulation y after the disturbance which the totalization part 13 calculated. Execute the process. Thereby, the estimated cross tabulation is calculated.

このように本実施形態の分割方法１においては、属性毎に要素Ａ_pqが算出され、その属性毎に算出された要素Ａ_pqをそれぞれ有する複数の属性別遷移確率行列が生成される。そして、クロス集計と、複数の属性別遷移確率行列とに基づいて、推定クロス推計が算出される。 As described above, in the division method 1 of the present embodiment, the element A _pq is calculated for each attribute, and a plurality of attribute-specific transition probability matrices each having the element A _pq calculated for each attribute are generated. Then, an estimated cross estimate is calculated based on the cross tabulation and a plurality of attribute-specific transition probability matrices.

そのため、推定クロス集計を算出するときに集計装置１が使用するメモリ使用量を上述した実施形態１よりもさらに小さなものとすることができる。 Therefore, the amount of memory used by the counting device 1 when calculating the estimated cross tabulation can be made smaller than that of the first embodiment described above.

例えば、属性数を５とし、各属性の取り得る値の個数を１０とした場合、Ｍは１０となる。この場合、要素Ａ_pqのサイズを８Ｂ（バイト）とすると、１つの属性別遷移確率行列をメモリ上に展開したときのサイズは、８Ｂ×１０²＝８００Ｂとなる。そして、属性数が５であるため、全ての属性別遷移確率行列をメモリ上に展開したときのサイズは、４ＫＢ（８００Ｂ×５）と非常に少なく済む。 For example, if the number of attributes is 5, and the number of values that each attribute can take is 10, M is 10. In this case, _{assuming that} the size of the element A _pq is 8B (bytes), the size when one attribute-specific transition probability matrix is expanded on the memory is 8B × 10 ² = 800B. Since the number of attributes is 5, the size when all the attribute-specific transition probability matrices are expanded on the memory is as small as 4 KB (800 B × 5).

なお、本実施形態の分割方法１を用いた場合、複数の属性別遷移確率行列に基づいて推定クロス集計が算出される。そのため、上述した実施形態１のステップＳ１８における乗算の回数が実施形態１よりも増加する。しかし、例えば高速なメモリ上に遷移確率行列Ａを展開しきれず、低速なメモリへのメモリ転送が発生する場合等と比較すると、乗算の回数が数倍程度増えても、本実施形態で示したように遷移確率行列Ａを分割した方が処理速度は速くなる。 When the division method 1 of the present embodiment is used, the estimated cross tabulation is calculated based on a plurality of attribute-specific transition probability matrices. Therefore, the number of multiplications in step S18 of the first embodiment described above is greater than in the first embodiment. However, for example, even if the number of multiplications increases by several times as compared with the case where the transition probability matrix A cannot be expanded on a high-speed memory and memory transfer to a low-speed memory occurs, the present embodiment shows Thus, the processing speed becomes faster when the transition probability matrix A is divided.

また、本実施形態の分割方法２においては、属性別遷移確率行列のそれぞれがメモリ上に展開されるときのサイズに基づき、それぞれが少なくとも１つの属性別遷移確率行列を含む複数の分割遷移確率行列が生成される。そして、クロス集計と、複数の分割遷移確率行列とに基づいて、推定クロス推計が算出される。 Further, in the division method 2 of the present embodiment, a plurality of divided transition probability matrices each including at least one attribute-specific transition probability matrix based on the size when each of the attribute-specific transition probability matrices is expanded on the memory. Is generated. Then, an estimated cross estimate is calculated based on the cross tabulation and a plurality of divided transition probability matrices.

つまり、分割方法２における遷移確率行列Ａの分割数は、分割方法１における遷移確率行列Ａの分割数よりも少ない。そのため、上述した実施形態１のステップＳ１８における乗算の回数の増加を分割方法１よりも少なくすることができ、処理速度をさらに速くすることができる。 That is, the number of divisions of the transition probability matrix A in the division method 2 is smaller than the number of divisions of the transition probability matrix A in the division method 1. Therefore, the increase in the number of multiplications in step S18 of the first embodiment described above can be made smaller than that in the division method 1, and the processing speed can be further increased.

なお、本発明においては、集計処理装置１内の処理は上述の専用のハードウェアにより実現されるもの以外に、その機能を実現するためのプログラムを集計処理装置１にて読取可能な記録媒体に記録し、この記録媒体に記録されたプログラムを集計処理装置１に読み込ませ、実行するものであってもよい。集計処理装置１にて読取可能な記録媒体とは、フロッピーディスク（登録商標）、光磁気ディスク、ＤＶＤ、ＣＤなどの移設可能な記録媒体の他、集計処理装置１に内蔵されたＨＤＤ等を指す。この記録媒体に記録されたプログラムは、例えば、集計処理装置１が有する要素算出部１２、集計部１３および推定部１４にて読み込まれ、要素算出部１２、集計部１３および推定部１４の制御によって、上述したものと同様の処理が行われる。 In the present invention, the processing in the totalization processing apparatus 1 is implemented by a recording medium that can be read by the totalization processing apparatus 1 in addition to the above-described dedicated hardware. The program may be recorded, and the program recorded on the recording medium may be read into the totalization processing apparatus 1 and executed. The recording medium readable by the totalization processing apparatus 1 refers to a transfer medium such as a floppy disk (registered trademark), a magneto-optical disk, a DVD, and a CD, as well as an HDD built in the totalization processing apparatus 1. . The program recorded in this recording medium is read by, for example, the element calculation unit 12, the aggregation unit 13, and the estimation unit 14 included in the aggregation processing device 1, and is controlled by the element calculation unit 12, the aggregation unit 13, and the estimation unit 14. The same processing as described above is performed.

ここで、集計処理装置１が有する要素算出部１２、集計部１３および推定部１４は、プログラムが記録された記録媒体から読み込まれたプログラムを実行するコンピュータとして動作するものである。 Here, the element calculation unit 12, the totaling unit 13, and the estimation unit 14 included in the totalization processing device 1 operate as a computer that executes a program read from a recording medium on which the program is recorded.

なお、上述したプログラムは、情報提供者端末２−０〜２−（Ｎ−１）についても、同様に適用可能である。 The above-described program can be similarly applied to the information provider terminals 2-0 to 2- (N-1).

以上、実施形態１〜３を参照して本発明を説明したが、本発明は上記実施形態１〜３に限定されるものではない。本発明の構成や詳細には、本発明の要旨を逸脱しない範囲で当業者が理解し得る各種の変形が可能である。 As mentioned above, although this invention was demonstrated with reference to Embodiment 1-3, this invention is not limited to the said Embodiment 1-3. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention without departing from the gist of the present invention.

１集計処理装置
１１通信部
１２要素算出部
１３集計部
１４推定部
１５記憶部
２−０〜２−（Ｎ−１）情報提供者端末
２１−０通信部
２２−０確率変更部
２３−０送信データ決定部
２４−０記憶部 DESCRIPTION OF SYMBOLS 1 Total processing apparatus 11 Communication part 12 Element calculation part 13 Total part 14 Estimation part 15 Storage part 2-0-2- (N-1) Information provider terminal 21-0 Communication part 22-0 Probability change part 23-0 Transmission Data determination unit 24-0 storage unit

Claims

In an aggregation system comprising an information provider terminal that transmits transmission data determined from data that has been accepted, and an aggregation processing device that aggregates transmission data transmitted from the information provider terminal,
The aggregation processing device
An information transmission unit for transmitting a maintenance probability predetermined for the attribute to which the data belongs and the attribute to the information provider terminal;

Each of the element A _pq represented by: and an element calculation unit for generating a transition probability matrix having the calculated element A _pq ;
A counting unit that generates a cross tabulation that tabulates transmission data transmitted from the information provider terminal;
Based on the cross tabulation generated by the tabulation unit and the transition probability matrix generated by the element calculation unit, an estimation unit that calculates an estimated cross tabulation that is an estimated value of a true cross tabulation,
The information provider terminal is
A storage unit for storing the data;
When a maintenance probability and an attribute are transmitted from the aggregation processing device, a probability changing unit that generates a random number,
Based on the maintenance probability and attribute transmitted from the aggregation processing device and the random number generated by the probability changing unit, the transmission data to be transmitted to the aggregation processing device is determined from the stored data. A transmission data determination unit;
And a data transmission unit that transmits the transmission data determined by the transmission data determination unit to the aggregation processing device.

The aggregation system according to claim 1,
Said element calculation unit, wherein the element A _pq is calculated, to generate a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute for each of the attributes,
The estimation unit calculates the estimated cross estimate based on the cross tabulation generated by the tabulation unit and a plurality of attribute-specific transition probability matrices generated by the element calculation unit.

The aggregation system according to claim 2,
The element calculation unit generates a plurality of divided transition probability matrices each including at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory,
The estimation unit calculates the estimated cross estimate based on the cross tabulation generated by the tabulation unit and a plurality of divided transition probability matrices generated by the element calculation unit.

In the aggregation system according to any one of claims 1 to 3,
The estimation unit applies a predetermined posterior probability calculation condition, which is a condition for obtaining a posterior probability when the cross tabulation is obtained, to the cross tabulation generated by the tabulation unit. A tally system characterized by calculating tally.

The aggregation system according to claim 1,
The estimation unit calculates the estimated cross tabulation based on the cross tabulation generated by the tabulation unit and the inverse matrix of the transition probability matrix calculated by the element calculation unit.

It is connected to an information provider terminal that transmits transmission data determined from data that has been accepted, and is an aggregation processing device that aggregates transmission data transmitted from the information provider terminal,
An information transmission unit for transmitting a maintenance probability predetermined for the attribute to which the data belongs and the attribute to the information provider terminal;

Each of the element A _pq represented by: and an element calculation unit for generating a transition probability matrix having the calculated element A _pq ;
Aggregation for generating a cross tabulation that tabulates transmission data determined by the information provider terminal based on the maintenance probability, the attribute, and a random number generated by the information provider terminal, transmitted from the information provider terminal And
An aggregation processing apparatus including: an estimation unit that calculates an estimated cross tabulation that is an estimated value of a true cross tabulation based on the cross tabulation generated by the tabulation unit and the transition probability matrix generated by the element calculation unit.

The aggregation processing device according to claim 6,
Said element calculation unit, wherein the element A _pq is calculated, to generate a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute for each of the attributes,
The estimation unit is a tabulation processing device that calculates the estimated cross estimate based on the cross tabulation generated by the tabulation unit and a plurality of attribute-specific transition probability matrices generated by the element calculation unit.

The aggregation processing device according to claim 7,
The element calculation unit generates a plurality of divided transition probability matrices each including at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory,
The estimation unit is a tabulation processing device that calculates the estimated cross estimate based on the cross tabulation generated by the tabulation unit and a plurality of divided transition probability matrices generated by the element calculation unit.

In the aggregation processing device according to any one of claims 6 to 8,
The estimation unit applies a predetermined posterior probability calculation condition, which is a condition for obtaining a posterior probability when the cross tabulation is obtained, to the cross tabulation generated by the tabulation unit. A tally processing device that calculates tally.

The aggregation processing device according to claim 6,
The estimation unit calculates the estimated cross tabulation based on the cross tabulation generated by the tabulation unit and the inverse matrix of the transition probability matrix calculated by the element calculation unit.

An information provider terminal connected to the aggregation processing device,
A storage unit for storing data received by the information provider terminal;
A probability changing unit for generating a random number when the maintenance probability predetermined for the attribute to which the data belongs and the attribute are transmitted from the aggregation processing device;
Based on the maintenance probability and attribute transmitted from the aggregation processing device and the random number generated by the probability changing unit, the transmission data to be transmitted to the aggregation processing device is determined from the stored data. A transmission data determination unit;
An information provider terminal comprising: a data transmission unit that transmits transmission data determined by the transmission data determination unit to the aggregation processing device.

In a totaling system comprising: an information provider terminal that transmits transmission data determined from data that has been accepted; and a totalization processing device that totals transmission data transmitted from the information provider terminal. Is a counting method for counting
A storage process in which the information provider terminal stores the data;
An information transmission process in which the aggregation processing device transmits a predetermined maintenance probability to the attribute to which the data belongs and the attribute to the information provider terminal;
When the information provider terminal transmits a maintenance probability and an attribute from the aggregation processing device, a probability changing process for generating a random number;
The transmission data that the information provider terminal transmits to the aggregation processing device from the stored data based on the maintenance probability and attribute transmitted from the aggregation processing device and the generated random number Transmission data determination processing to determine,
A data transmission process in which the information provider terminal transmits the determined transmission data to the aggregation processing device;
The aggregation processing device is

An element calculation process for calculating each of the elements A _pq represented by: and generating a transition probability matrix having the calculated element A _pq ;
A tabulation process for generating a cross tabulation in which the tabulation processing device tabulates transmission data transmitted from the information provider terminal;
An aggregation method including: an estimation process in which the aggregation processing device calculates an estimated cross aggregation that is an estimated value of a true cross aggregation based on the generated cross aggregation and the generated transition probability matrix.

The tabulation method according to claim 12,
Said element calculation process, the aggregation processing device calculates the element A _pq for each of the attributes, be a process of generating a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute ,
The estimation method is a tabulation method in which the tabulation processing device calculates the estimated cross estimate based on the generated cross tabulation and the generated plurality of attribute-specific transition probability matrices.

The aggregation method according to claim 13,
In the element calculation process, the aggregation processing device has a plurality of divided transitions each including at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory. A process for generating a probability matrix,
The estimation method is a tabulation method in which the tabulation processing device calculates the estimated cross estimate based on the generated cross tabulation and the generated plurality of divided transition probability matrices.

To the aggregation processing device connected to the information provider terminal that transmits the transmission data determined from the data that has received the input,
A maintenance probability predetermined for the attribute to which the data accepted by the information provider terminal belongs, and an information transmission procedure for transmitting the attribute to the information provider terminal;

An element calculation procedure for calculating each of the elements A _pq represented by: and generating a transition probability matrix having the calculated element A _pq ;
Aggregation for generating a cross tabulation that tabulates transmission data determined by the information provider terminal based on the maintenance probability, the attribute, and a random number generated by the information provider terminal, transmitted from the information provider terminal Procedure and
A program for executing an estimation procedure for calculating an estimated cross tabulation that is an estimated value of a true cross tabulation based on the generated cross tabulation and the generated transition probability matrix.

The program according to claim 15, wherein
Said element calculation procedure, the calculates the element A _pq for each attribute, a procedure for generating a plurality of demographic transition probability matrix with each said element A _pq calculated for each the attribute,
The estimation procedure is a program for calculating the estimated cross estimate based on the generated cross tabulation and the generated plurality of attribute-specific transition probability matrices.

The program according to claim 16, wherein
The element calculation procedure is a procedure of generating a plurality of divided transition probability matrices each including at least one attribute-specific transition probability matrix based on a size when each of the attribute-specific transition probability matrices is expanded on a memory. And
The estimation procedure is a program for calculating the estimated cross estimate based on the generated cross tabulation and the generated plurality of divided transition probability matrices.