JP2021033382A5 - - Google Patents
Download PDFInfo
- Publication number
- JP2021033382A5 JP2021033382A5 JP2019149284A JP2019149284A JP2021033382A5 JP 2021033382 A5 JP2021033382 A5 JP 2021033382A5 JP 2019149284 A JP2019149284 A JP 2019149284A JP 2019149284 A JP2019149284 A JP 2019149284A JP 2021033382 A5 JP2021033382 A5 JP 2021033382A5
- Authority
- JP
- Japan
- Prior art keywords
- program
- frequency
- attribute
- frequency table
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004364 calculation method Methods 0.000 description 10
- 230000015556 catabolic process Effects 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 8
- 230000002776 aggregation Effects 0.000 description 7
- 238000004220 aggregation Methods 0.000 description 5
- 238000000034 method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 1
Images
Description
6.匿名化処理プログラム
個票データの秘匿処理を行うための匿名化処理プログラムは、(1)多次元度数表集計処理部、及び(2)秘匿変換処理部を構成する各プログラムにより構成する(図2)。
多次元度数表集計処理部は、個票データを入力データとし、(1)多次元クロス度数表集計プログラムにより構成する(図3)。秘匿変換処理部は、多次元クロス度数表を入力データとし、(2)消去秘匿、(3)丸め秘匿、(4)差分計算、及び(5)個票形式変換の各プログラムにより構成する(図4)。
(1)多次元度数表集計処理部については、個票データを入力データとして用いて多次元度数表を集計することから、外部アクセスから遮断された閉鎖環境で利用することを想定している。(2)秘匿変換処理部については秘匿処理を行う機能と度数表を個票形式に変換する機能を有しており、多次元度数表の秘匿処理については閉鎖環境で行う必要がある。ただし、秘匿処理後の秘匿度数表を入力データとして個票データ形式に変換する場合は、外部アクセスが可能な解放環境で利用する可能性を視野に入れた設計としている。プログラム作成については、どのようなプログラミング言語を用いてもよい。
6. Anonymization processing program The anonymization processing program for anonymizing individual data consists of programs that make up (1) the multidimensional frequency table aggregation processing unit and (2) the anonymization conversion processing unit (Fig. 2 ).
The multidimensional frequency table tabulation processing unit uses individual data as input data, and consists of (1) a multidimensional cross frequency table tabulation program ( Fig. 3 ). The ciphering conversion processing unit uses the multidimensional cross frequency table as input data, and consists of programs for (2) erasing ciphering, (3) rounding ciphering, (4) difference calculation, and (5) individual form format conversion ( Fig. 4 ).
(1) The multi-dimensional frequency table tabulation processing part uses individual data as input data to tabulate the multi-dimensional frequency table, so it is assumed that it will be used in a closed environment that is blocked from external access. (2) The ciphering conversion processing unit has functions for ciphering and converting frequency tables into individual tables, and ciphering of multidimensional frequency tables must be performed in a closed environment. However, when the confidentiality frequency table after confidentiality processing is converted into individual data format as input data, it is designed with the possibility of using it in an open environment where external access is possible. Any programming language may be used for programming.
6.1多次元クロス度数表集計プログラム
多次元クロス度数表集計プログラムは、属性フィールドの全ての組合わせごとに内訳合計の計算処理を行う。計算方法としては、合計キーを付与したデータを基データに結合し、属性キーで並び替えを行った上で、同一属性ごとに度数及び加重度数の合算を行う方法を用いて、再帰計算により全組合わせ別の合計計算処理を行う(図5)。
(1) 前処理
下記の前処理を行う。
・処理対象とする属性項目フィールドを選択
・不要なフィールドを除去
・フィールドに加重度数(Weight)が無い場合は加重度数(Weight)フィールドを追加し、各レコードの値を1にセット
・度数(Freauency)フィールドを追加して、各レコードの値を1にセット
(2) フィールド番号j = 0
指定したフィールドについて、カウンターjの初期化を行う。
(3) 処理フィールドj = j+1
処理対象となる指定フィールドをカウンターjによりカウントする。
(4) 全レコードコピーD1
入力ファイルの全レコードをコピーしてファイルD1として出力する。
(5) レコード番号i = 0
入力レコードのカウンターiについて初期化を行う。
(6) レコード読込
個票データのレコード読込を行う。
(7) i = i + 1
処理対象となるレコードをカウンターiによりカウントする。
(8) 属性j番目分類=”~”
処理対象となるレコードについて、属性j番目の属性を合計符号”~”に置き換える。
(9) 最終レコード?
全レコードについて処理が完了したか判定を行う。
Yes ⇒処理(6)
No ⇒処理(10)
(10) 結合D1+全レコード
属性i番目について処理済の全レコードをファイルD1に結合する。
(11) クロス属性並び替え
結合ファイルについて、属性別に並び替えを行う。
(12) 属性別度数合算
属性項目の同一組合せごとに度数(Frequency)の合算処理を行う。
(13) 属性別加重度数合算
属性項目の同一組合せごとに加重度数(Weight)の合算処理を行う。
(14) 処理対象データ置換
再帰計算を行うために、処理対象データを処理済データに置き換える。
(15) 最終フィールド?
全フィールドについて処理が完了したか判定を行う。
Yes ⇒終了
No ⇒処理(3)
6.1 Multidimensional Cross Frequency Table Aggregation Program The multidimensional cross frequency table aggregation program performs a breakdown total calculation process for each combination of attribute fields. As a calculation method, the data to which the total key is assigned is combined with the base data, sorted by the attribute key, and then summed up the frequency and weighted frequency for each same attribute. Total calculation processing for each combination is performed ( Fig. 5 ).
(1) Pretreatment Perform the following pretreatment.
・Select the attribute item field to be processed ・Remove unnecessary fields ・If the field does not have a weighted frequency (Weight), add a weighted frequency (Weight) field and set the value of each record to 1 ・Freauency ) field and set the value to 1 for each record
(2) field number j = 0
Initialize the counter j for the specified field.
(3) processing field j = j+1
The specified fields to be processed are counted by the counter j.
(4) Copy all records D1
Copy all records of the input file and output as file D1.
(5) record number i = 0
Initialize the input record counter i.
(6) Read record Read record of individual data.
(7) i = i + 1
A counter i counts records to be processed.
(8) Attribute j-th classification = “~”
For the record to be processed, the j-th attribute is replaced with the total sign "~".
(9) Last record?
Determine whether the processing has been completed for all records.
Yes ⇒ Process (6)
No ⇒ Process (10)
(10) Combine D1 + all records Combine all processed records for attribute i to file D1.
(11) Cross-attribute sorting Sorts the combined files by attribute.
(12) Addition of frequencies by attribute The frequencies are added up for each identical combination of attribute items.
(13) Addition of weighted frequencies by attribute The weighted frequencies (Weight) are added up for each identical combination of attribute items.
(14) Replacing data to be processed To perform recursive calculation, replace the data to be processed with the processed data.
(15) Last field?
Determine whether processing has been completed for all fields.
Yes ⇒ end
No ⇒ Process (3)
6.2消去秘匿プログラム
消去秘匿プログラムは、属性項目の組合わせごとに分類符号を付与して分類符号ごとの最小度数の検査を行い、最小度数が安全基準Kに満たない場合は、その属性項目組合わせのレコードについて内訳をゼロ値に置き換えることで多次元クロス度数表の消去秘匿処理を行う(図6、図7、図8)。
6.2 Erasure concealment program The erasure concealment program assigns a classification code to each combination of attribute items and checks the minimum frequency for each classification code. Erasure and concealment processing of the multidimensional cross frequency table is performed by replacing the details of the combination record with zero values ( FIGS. 6, 7, and 8 ).
6.3丸め秘匿プログラム
丸め秘匿プログラムは、消去秘匿処理済みの多次元クロス度数表について、丸め基数Bで除算して小数点以下を四捨五入することにより度数及び加重度数の丸め処理を行い、度数表の秘匿を補強する(図9)。
6.3 Rounding concealment program The rounding concealment program divides the erased concealed multi-dimensional cross frequency table by the rounding radix B and rounds off the decimals to round off the frequencies and weighted frequencies. Reinforce confidentiality ( Fig. 9 ).
6.4差分計算プログラム
差分計算プログラムは、内訳合計と総数の差分計算を行う。差分計算は、度数表集計の逆処理を行う方法で計算し、属性項目の組合わせごとの内訳について正負の符号を反転させて合計と合算していくことで、全ての属性項目の組合わせごとの差分計算を再帰計算により行う。正負を反転させた内訳は、属性フィールドを合計符号“~”に置き換えて、基の度数表データと結合し、属性項目別に並び替えを行った上で、同一属性キーごとに度数及び加重度数の合算を行うことで差分を計算する(図10)。
6.4 Difference Calculation Program The difference calculation program calculates the difference between the breakdown total and the total number. The difference calculation is calculated by the method of performing the reverse processing of the frequency table aggregation, and by reversing the positive and negative signs of the breakdown for each combination of attribute items and adding them to the total, all combinations of attribute items is calculated by recursive calculation. The positive/negative breakdown is replaced with the total sign “~” in the attribute field, combined with the original frequency table data, sorted by attribute item, and then the frequency and weighted frequency for each same attribute key The difference is calculated by summation ( Fig. 10 ).
6.5個票形式変換プログラム
個票形式変換プログラムは、差分計算を行った度数表の内訳と差分について、縮尺Rで度数または加重度数を除算して整数化した値を出力レコード数とし、度数または加重度数を出力レコード数で除算した値をウェイトとして、属性ごとにレコード出力を行う(図11)。
6.5 Individual form format conversion program The individual form format conversion program divides the frequency or weighted frequency by the scale R and converts the difference and the breakdown of the frequency table for which the difference calculation was performed into an integer, and outputs the number of output records. Alternatively, a value obtained by dividing the weighted frequency by the number of output records is used as a weight, and records are output for each attribute ( Fig. 11 ).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019149284A JP7442995B2 (en) | 2019-08-16 | 2019-08-16 | Anonymization device for individual data using secret conversion processing of multidimensional cross frequency table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019149284A JP7442995B2 (en) | 2019-08-16 | 2019-08-16 | Anonymization device for individual data using secret conversion processing of multidimensional cross frequency table |
Publications (3)
Publication Number | Publication Date |
---|---|
JP2021033382A JP2021033382A (en) | 2021-03-01 |
JP2021033382A5 true JP2021033382A5 (en) | 2022-08-29 |
JP7442995B2 JP7442995B2 (en) | 2024-03-05 |
Family
ID=74676517
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019149284A Active JP7442995B2 (en) | 2019-08-16 | 2019-08-16 | Anonymization device for individual data using secret conversion processing of multidimensional cross frequency table |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP7442995B2 (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017126112A (en) | 2016-01-12 | 2017-07-20 | 株式会社リコー | Server, distributed server system, and information processing method |
JP6711689B2 (en) | 2016-05-12 | 2020-06-17 | 株式会社Nttドコモ | Privacy protector |
-
2019
- 2019-08-16 JP JP2019149284A patent/JP7442995B2/en active Active
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11978057B2 (en) | Single instance storage of metadata and extracted text | |
Noblett et al. | Recovering and examining computer forensic evidence | |
US20100205020A1 (en) | System and method for establishing, managing, and controlling the time, cost, and quality of information retrieval and production in electronic discovery | |
US20080270370A1 (en) | Desensitizing database information | |
US8782734B2 (en) | Semantic controls on data storage and access | |
Breitinger et al. | On the database lookup problem of approximate matching | |
US20180137149A1 (en) | De-identification data generation apparatus, method, and non-transitory computer readable storage medium thereof | |
Bogachev | Unified derivation of the limit shape for multiplicative ensembles of random integer partitions with equiweighted parts | |
Prajapati et al. | Performance comparison of different sorting algorithms | |
Alshugran et al. | Extracting and modeling the privacy requirements from HIPAA for healthcare applications | |
JP2021033382A5 (en) | ||
JP2019067096A (en) | Recording medium recording code-code classification and search software | |
US8307001B2 (en) | Auditing of curation information | |
Chen et al. | Email visualization correlation analysis forensics research | |
Wang et al. | A detection model of malicious Android applications based on Naive Bayes | |
JP7442995B2 (en) | Anonymization device for individual data using secret conversion processing of multidimensional cross frequency table | |
Das et al. | Decision support grievance redressal system using sentence sentiment analysis | |
Sadreddin | Exploring Digitalization in New Venture Ecosystems–Three Essays | |
Kuo et al. | The study of plagiarism detection for object-oriented programming | |
Lim et al. | A digital media similarity measure for triage of digital forensic evidence | |
Paul | Entropy-based file type identification and partitioning | |
Silva et al. | Evaluating the impact of anonymization on large interaction network datasets | |
JP6251437B1 (en) | Recording medium recording classification code generation software | |
Ranbaduge | A scalable blocking framework for multidatabase privacy-preserving record linkage | |
O’keefe | Privacy and confidentiality in service science and big data analytics |