JP5860116B2

JP5860116B2 - Reduction coefficient calculation device, anonymous processing device, method and program using the same

Info

Publication number: JP5860116B2
Application number: JP2014202232A
Authority: JP
Inventors: 秀暢小栗
Original assignee: ニフティ株式会社
Priority date: 2014-06-13
Filing date: 2014-09-30
Publication date: 2016-02-16
Anticipated expiration: 2034-09-30
Also published as: JP2016015110A

Description

本発明は、個人情報を匿名化又は多様化するための情報処理技術に関する。 The present invention relates to information processing technology for anonymizing or diversifying personal information.

情報処理技術の発展に伴い、日常の多くの場面で情報が収集され、この収集された情報を用いた処理が行われている。例えば、消費者が店舗の会員となって商品を購入する場合、会員登録時に消費者の氏名、年齢、性別、住所、メールアドレス等を登録することが多い。そして、消費者が商品を購入すると、店舗側のシステムが、この消費者と購入した商品の情報を対応付けて記録する。このように購入した商品の情報を蓄積して分析すると、当該消費者の嗜好が推定でき、この消費者が好む新商品が発売されたような場合にダイレクトメールを発送するといったサービスを行うことができる。更に、多くの消費者の情報について分析することで、２０代女性の好む商品や関東エリアで好まれる商品といった情報を導くことができ、マーケティング等に利用される。 With the development of information processing technology, information is collected in many everyday situations, and processing using the collected information is performed. For example, when a consumer purchases a product as a member of a store, the consumer's name, age, gender, address, e-mail address, etc. are often registered at the time of membership registration. When a consumer purchases a product, the store-side system records the consumer and the purchased product information in association with each other. By accumulating and analyzing information on purchased products in this way, it is possible to estimate the consumer's preferences and perform a service such as sending a direct mail when a new product preferred by the consumer is released. it can. Furthermore, by analyzing information of many consumers, information such as products preferred by women in their 20s and products preferred in the Kanto area can be derived and used for marketing and the like.

また、これらの情報は、当該店舗だけでなく、商品を製造するメーカや、他の企業にとっても新商品の開発や安全性の向上などに用いることができ、価値を有することがある。 In addition, the information can be used not only for the store but also for the manufacturer of the product and other companies for the development of new products and the improvement of safety, and may have value.

しかし、店舗が有する消費者の個人情報を各消費者の許諾を得ずに、他者へ提供することはできない。このため、上記消費者に関する情報を他者へ提供する場合には、個人を特定できないように、匿名化する必要がある。 However, the consumer's personal information in the store cannot be provided to others without obtaining the consent of each consumer. For this reason, when providing information related to the consumer to others, it is necessary to anonymize so that individuals cannot be identified.

例えば、年齢が記載されている会員リストに２５歳の人が一人だけであると、２５歳の知人がその会員であることを知った時点で、その人を特定できることになる。即ち、２５歳の会員という属性を持つ人が一人だけであると、他の情報と照らし合わせることで、個人を特定できる可能性が高い。 For example, if there is only one person 25 years old in the member list in which the age is described, the person can be identified when he / she knows that the 25-year-old acquaintance is the member. That is, if there is only one person with the attribute of a 25-year-old member, there is a high possibility that an individual can be specified by comparing with other information.

そこで、会員リストの年齢の記載を１０歳区切りに抽象化し、２０代が３人のように同じ属性を持つ人が複数人となるようにすれば、３人のうちの誰であるかを特定できなくなる。このように同じ属性を持つ人がｋ人以上いる状態を、「ｋ−匿名性」を満たすと称し、そのようにデータを加工することを「k-匿名化」と称する。 Therefore, if the age description in the member list is abstracted into 10-year breaks, and there are multiple people with the same attribute, such as three in their 20s, who of the three is identified become unable. A state in which there are k or more people having the same attribute in this way is referred to as “k-anonymity” and processing such data is referred to as “k-anonymization”.

また、匿名化の基準や手法としては、種々のものが提案されており、例えば、ｌ−多様性、Ｐｋ匿名化、t-closeness（非特許文献１参照）が知られている。 Various anonymization standards and methods have been proposed. For example, l-diversity, Pk anonymization, and t-closeness (see Non-Patent Document 1) are known.

特開２０１２−１３３４５１号公報JP 2012-133451 A 特開２０１１−１０８１９５号公報JP 2011-108195 A 特開２０１１−１２８８６２号公報JP 2011-128862 A 特開２０１２−７８９３２号公報JP 2012-78932 A 特開２０１４−１０２６４３号公報JP 2014-102643 A

中川裕志著、“プライバシ保護データマイニング”、[平成26年5月23日検索］、インターネット〈URL：http://www.r.dl.itc.u-tokyo.ac.jp/~nakagawa/labintro/2010PPDM-summary.pdf〉Yuji Nakagawa, “Privacy Protection Data Mining”, [Search May 23, 2014], Internet <URL: http://www.r.dl.itc.u-tokyo.ac.jp/~nakagawa/labintro /2010PPDM-summary.pdf>

図２２は、ユーザがＩＣカードを用いて駅の自動改札を出入りし、乗車料金を決済した場合に、管理サーバ側に記録される履歴データ（フローデータ）の一例を示す図である。図２２の履歴データ９１は、ユーザＩＤや、利用日時、利用駅、利用内容、料金等が対応付けられている。この履歴データ９１は、ユーザＩＤとユーザの姓、年齢、性別を対応付けたユーザ情報９２を参照することで、履歴データの各ユーザが識別できる。 FIG. 22 is a diagram illustrating an example of history data (flow data) recorded on the management server side when a user enters and exits an automatic ticket gate of a station using an IC card and settles a boarding fee. The history data 91 in FIG. 22 is associated with a user ID, use date and time, use station, use contents, fee, and the like. The history data 91 can identify each user of the history data by referring to the user information 92 in which the user ID is associated with the user's last name, age, and gender.

この履歴データ９１を他の事業者へ提供する場合、ユーザＩＤとユーザの姓等とを対応付けるユーザ情報９２を削除する、或は参照できないように管理することで、ユーザＩＤから個人を識別できないようにすること（仮名化状態とすること）が考えられる。 When this history data 91 is provided to other business operators, the user information 92 that associates the user ID with the user's last name is deleted or managed so that it cannot be referred to, so that the individual cannot be identified from the user ID. It can be considered to be in a pseudonymized state.

しかし、仮名化状態の場合、ユーザＩＤから氏名が特定できないとしても、ユーザＩＤと対応付けられた利用駅等の情報が一個人に限定されている場合、即ち、他に利用駅等の情報が一致するユーザがいない場合、利用駅等の情報から再識別できる可能性がある。例えば、ＩＤ＝Ａ００１のユーザが新宿駅、秋葉原駅、人形町を利用していた場合に、同じように駅を利用した人が他にいなければ、ＩＤ＝Ａ００１のユーザの行動を知る人であれば、この履歴データからＩＤ＝Ａ００１のユーザを再識別できる。 However, in the kana conversion state, even if the name cannot be specified from the user ID, if the information such as the use station associated with the user ID is limited to one individual, that is, the other information such as the use station is the same If there is no user to do, there is a possibility that it can be re-identified from information such as the station used. For example, when a user with ID = A001 uses Shinjuku Station, Akihabara Station, and Ningyocho, if there is no other person who uses the station in the same way, a person who knows the behavior of the user with ID = A001. If there is, the user with ID = A001 can be re-identified from the history data.

例えば、ｎ＝４２４７万人のユーザが、ｍ＝９２６２の駅を一様分布で選択した場合に、再識別できる駅の数を式１によって求めると、
ｍＳ＝ｎ・・・（式１）
Ｓ＝２．２３７となり、履歴データに３駅記録されていれば、再識別できることが分かる。 For example, when n = 4,247,000 users select m = 9262 stations in a uniform distribution, the number of stations that can be re-identified is calculated by Equation 1,
mS = n (Formula 1)
It becomes S = 2.237, and it can be understood that re-identification is possible if three stations are recorded in the history data.

このようにデータの項目が駅で、選択肢数（属性種）が９２６２駅と非常に大きい場合、利用履歴に３駅含まれていただけ、即ちデータの項目数（属性数）が３つあるだけで、母数が４２４７万人と非常に大きいデータであっても匿名化できなくなってしまう。 In this way, if the data item is a station and the number of options (attribute type) is very large as 9262 stations, the usage history can include 3 stations, that is, there are only 3 data items (attributes). , Even if the parameter is very large with 42.47 million people, it becomes impossible to anonymize.

また、ＩＣカードの履歴データには、この他にもショッピングの情報が含まれることがあり、購入した商品名や店舗名等の多大な選択肢数となる情報が更に含まれた場合、再識別の可能性が更に高くなる。 In addition, the history data of the IC card may include other shopping information. If the information including a large number of choices such as purchased product names and store names is further included, re-identification may be performed. The possibility is even higher.

このため、各項目の値を抽象化して、各項目の値の組み合わせが一個人に限定されないように匿名化することが考えられるが、行動履歴のようなデータは、データ量が非常に多くなり易く、例えば１０万人を超えるような所謂ビッグデータの場合、抽象化を人手で行うのは現実的ではない。 For this reason, it is conceivable that the values of each item are abstracted and anonymized so that the combination of the values of each item is not limited to one individual, but data such as action history tends to be very large in data amount. For example, in the case of so-called big data that exceeds 100,000 people, it is not realistic to perform abstraction manually.

また、機械的に抽象化を行うことも考えられるが、機械的に抽象化を行うと、抽象化した結果が例え匿名性を満たしたとしても、有用なデータになるとは限らない。例えば項目の値の組み合わせが一個人に限定されなくなるまで抽象化した結果、利用価値が無くなるほど抽象的な項目の値（語）になってしまった場合、匿名性を満たしても意味が無い。このため機械的に抽象化を行う場合でも抽象化の結果を人が確認し、有用なデータになっていなければ、抽象化する項目を変える等の設定を変更して抽象化の処理をやり直すといった試行の繰り返しになる。 Although abstraction can be performed mechanically, if abstraction is performed mechanically, even if the abstracted result satisfies anonymity, it is not always useful data. For example, if the combination of item values is abstracted until it is not limited to one individual, and the value (word) of the item is so abstract that there is no use value, it does not make sense to satisfy anonymity. For this reason, even when performing abstraction mechanically, the result of the abstraction is confirmed by a person, and if it is not useful data, the setting of changing items to be abstracted is changed and the abstraction process is restarted. Repeated trials.

しかし、単に試行を繰り返すのは非効率であり、特にビッグデータの場合、抽象化の処理や匿名性を検定する処理に多大な時間がかかってしまうため、充分に試行を行うことが困難であった。 However, simply repeating trials is inefficient, especially in the case of big data, it takes a lot of time to process abstraction and anonymity, making it difficult to perform trials sufficiently. It was.

そこで本発明は、減少係数に基づき、匿名性を満たす可能性が高い区分数で匿名化処理を実行することで、匿名化処理の効率の向上を可能にさせる技術を提供する。 Then, this invention provides the technique which makes it possible to improve the efficiency of anonymization processing by performing anonymization processing with the number of divisions with high possibility of satisfy | filling anonymity based on a reduction coefficient.

本発明に係る減少係数算出装置は、
個人情報を匿名化した匿名情報を取得する匿名情報取得部と、
匿名情報を構成する属性がとり得る語の種類毎に区分して区分数を求め、各語の最少出現数を求める出現数取得部と、
前記区分数が異なる複数の前記区分数及び前記最少出現数の組み合わせに基づいて、前記区分数を増加させた場合の前記最少出現数の減少量を示す減少係数を求める係数算出部と、を備えた。 The reduction coefficient calculation apparatus according to the present invention is:
Anonymous information acquisition unit that acquires anonymous information obtained by anonymizing personal information;
The number of occurrences for obtaining the number of divisions for each type of words that can be attributed to the anonymous information, obtaining the minimum number of occurrences of each word, and
A coefficient calculation unit for obtaining a reduction coefficient indicating a decrease amount of the minimum occurrence number when the number of divisions is increased based on a combination of the plurality of division numbers and the minimum occurrence number different from each other. It was.

前記減少係数算出装置は、前記減少係数を直線近似式、多項式近似式、指数近似式、又は累乗近似式として求めても良い。 The reduction coefficient calculation device may obtain the reduction coefficient as a linear approximation formula, a polynomial approximation formula, an exponential approximation formula, or a power approximation formula.

本発明に係る匿名処理装置は、
匿名化対象の個人情報を匿名化する際の区分数を取得する区分数取得部と、
前記減少係数算出装置によって算出された減少係数を取得する係数取得部と、
前記減少係数と前記区分数に基づいて、前記個人情報を前記区分数で匿名化した場合の最少出現数の減少量が所定の基準値を超える可能性を判定する可能性判定部と、
前記可能性が高い場合に前記個人情報の匿名化を行い、前記可能性が低い場合に前記個人情報の匿名化を中止する匿名化部と、
を備えた。 The anonymous processing device according to the present invention is:
A number-of-sections acquisition unit that acquires the number of sections when anonymizing personal information to be anonymized,
A coefficient acquisition unit for acquiring a reduction coefficient calculated by the reduction coefficient calculation device;
Based on the reduction coefficient and the number of categories, a possibility determination unit that determines the possibility that the amount of decrease in the minimum number of appearances when the personal information is anonymized by the number of categories exceeds a predetermined reference value;
Anonymizing the personal information when the possibility is high, and anonymizing the anonymization of the personal information when the possibility is low,
Equipped with.

本発明に係る匿名処理装置は、
匿名化対象の個人情報を受け付ける受付部と、
前記減少係数算出装置によって算出された減少係数を取得する係数取得部と、
前記減少係数と前記個人情報の全体数に基づいて、前記個人情報を匿名化した場合の最少出現数の減少量が所定の基準値を超えない区分数を求める区分数算出部と、
前記区分数で前記個人情報の匿名化を行う匿名化部と、
を備えた。 The anonymous processing device according to the present invention is:
A reception unit that accepts personal information to be anonymized;
A coefficient acquisition unit for acquiring a reduction coefficient calculated by the reduction coefficient calculation device;
Based on the reduction coefficient and the total number of personal information, the number-of-segments calculation unit for obtaining the number of categories in which the amount of decrease in the minimum number of appearances when the personal information is anonymized does not exceed a predetermined reference value;
An anonymization unit that anonymizes the personal information by the number of divisions;
Equipped with.

本発明に係る減少係数算出方法は、
個人情報を匿名化した匿名情報を取得するステップと、
匿名情報を構成する属性がとり得る語の種類毎に区分して区分数を求め、各語の最少出現数を求めるステップと、
前記区分数が異なる複数の前記区分数及び前記最少出現数の組み合わせに基づいて、前記区分数を増加させた場合の前記最少出現数の減少量を示す減少係数を求めるステップと、
をコンピュータが実行する。 The reduction coefficient calculation method according to the present invention is:
Acquiring anonymous information obtained by anonymizing personal information;
A step of obtaining the number of divisions for each type of words that can be attributed to the anonymous information, and obtaining the minimum number of occurrences of each word;
Obtaining a reduction coefficient indicating a decrease amount of the minimum number of occurrences when the number of divisions is increased based on a combination of a plurality of the number of divisions and the minimum number of occurrences of which the number of divisions is different;
Is executed by the computer.

本発明に係る匿名処理方法は、
匿名化対象の個人情報を匿名化する際の区分数を取得するステップと、
前記減少係数算出装置によって算出された減少係数を取得するステップと、
前記減少係数と前記区分数に基づいて、前記個人情報を前記区分数で匿名化した場合の最少出現数の減少量が所定の基準値を超える可能性を判定するステップと、
前記可能性が高い場合に前記個人情報の匿名化を行い、前記可能性が低い場合に前記個人情報の匿名化を中止するステップと、
をコンピュータが実行する。 The anonymous processing method according to the present invention is:
Obtaining the number of sections when anonymizing personal information to be anonymized,
Obtaining a reduction coefficient calculated by the reduction coefficient calculation device;
Determining the possibility that the amount of decrease in the minimum number of occurrences when the personal information is anonymized by the number of categories based on the decrease coefficient and the number of categories exceeds a predetermined reference value;
Performing anonymization of the personal information when the possibility is high, and stopping anonymization of the personal information when the possibility is low;
Is executed by the computer.

本発明に係る匿名処理方法は、
匿名化対象の個人情報を受け付けるステップと、
前記減少係数算出装置によって算出された減少係数を取得するステップと、
前記減少係数と前記個人情報の全体数に基づいて、前記個人情報を匿名化した場合の最少出現数の減少量が所定の基準値を超えない区分数を求めるステップと、
前記区分数で前記個人情報の匿名化を行うステップと、
をコンピュータが実行する。 The anonymous processing method according to the present invention is:
Receiving personal information to be anonymized;
Obtaining a reduction coefficient calculated by the reduction coefficient calculation device;
Based on the reduction coefficient and the total number of personal information, obtaining a number of divisions in which the amount of decrease in the minimum number of appearances when anonymizing the personal information does not exceed a predetermined reference value;
Anonymizing the personal information by the number of sections;
Is executed by the computer.

また、本発明は、上記方法をコンピュータに実行させるためのプログラムであっても良い。更に、前記プログラムは、コンピュータが読み取り可能な記録媒体に記録されていても良い。 The present invention may be a program for causing a computer to execute the above method. Furthermore, before Kipu program, the computer may be recorded in a recording medium readable.

ここで、コンピュータが読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータから読み取ることができる記録媒体をいう。このような記録媒体の内コンピュータから取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、CD-ROM、CD-R/W、DVD、DAT、８mmテープ、メモリカード等がある。 Here, the computer-readable recording medium refers to a recording medium that accumulates information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from the computer. . Examples of such a recording medium that can be removed from the computer include a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R / W, a DVD, a DAT, an 8 mm tape, and a memory card.

また、コンピュータに固定された記録媒体としてハードディスクやＲＯＭ（リードオンリーメモリ）等がある。 Further, there are a hard disk, a ROM (read only memory), and the like as a recording medium fixed to the computer.

本発明は、減少係数に基づき、匿名性を満たす可能性が高い区分数で匿名化処理を実行することで、匿名化処理の効率の向上を可能にさせる技術を提供できる。 This invention can provide the technique which makes it possible to improve the efficiency of anonymization processing by performing anonymization processing by the number of divisions with high possibility of satisfy | filling anonymity based on a reduction coefficient.

図１は、匿名化処理の説明図である。FIG. 1 is an explanatory diagram of anonymization processing. 図２は、多様化処理の説明図である。FIG. 2 is an explanatory diagram of the diversification process. 図３は、実施形態における匿名化システムの概略構成図である。FIG. 3 is a schematic configuration diagram of an anonymization system according to the embodiment. 図４Ａは、個人情報の一例を示す図である。FIG. 4A is a diagram illustrating an example of personal information. 図４Ｂは、匿名情報の一例を示す図である。FIG. 4B is a diagram illustrating an example of anonymous information. 図５は、区分の説明図である。FIG. 5 is an explanatory diagram of classification. 図６は、匿名結果ＤＢに記憶される匿名データの一例を示す図である。FIG. 6 is a diagram illustrating an example of anonymous data stored in the anonymous result DB. 図７は、匿名処理装置及び減少係数算出装置のハードウェア構成を示す図である。FIG. 7 is a diagram illustrating a hardware configuration of the anonymous processing device and the reduction coefficient calculating device. 図８は、匿名化処理の説明図である。FIG. 8 is an explanatory diagram of the anonymization process. 図９は、出現数を取得する処理の説明図である。FIG. 9 is an explanatory diagram of a process for acquiring the number of appearances. 図１０は、属性パターンの一例を示す図である。FIG. 10 is a diagram illustrating an example of an attribute pattern. 図１１は、減少係数算出の処理の説明図である。FIG. 11 is an explanatory diagram of the reduction coefficient calculation process. 図１２は、減少係数算出の処理の説明図である。FIG. 12 is an explanatory diagram of the reduction coefficient calculation process. 図１３は、出現頻度を求める処理の説明図である。FIG. 13 is an explanatory diagram of processing for obtaining the appearance frequency. 図１４は、減少係数を用いた匿名化処理の説明図である。FIG. 14 is an explanatory diagram of anonymization processing using a reduction coefficient. 図１５は、減少係数を用いた匿名化処理の説明図である。FIG. 15 is an explanatory diagram of anonymization processing using a reduction coefficient. 図１６は、図１４の匿名化処理の変形例を示す図である。FIG. 16 is a diagram illustrating a modification of the anonymization process of FIG. 図１７は、減少係数算出の処理の説明図である。FIG. 17 is an explanatory diagram of the reduction coefficient calculation process. 図１８は、累乗近似式によって近似した例の説明図である。FIG. 18 is an explanatory diagram of an example approximated by a power approximation formula. 図１９は、減少係数を用いた匿名化処理の説明図である。FIG. 19 is an explanatory diagram of anonymization processing using a reduction coefficient. 図２０は、減少係数を用いた匿名化処理の説明図である。FIG. 20 is an explanatory diagram of anonymization processing using a reduction coefficient. 図２１は、減少係数算出の処理の説明図である。FIG. 21 is an explanatory diagram of the reduction coefficient calculation process. 図２２は、ユーザの行動履歴の一例を示す図である。FIG. 22 is a diagram illustrating an example of a user's behavior history.

以下、図面を参照して本発明を実施するための形態について説明する。以下の実施の形態の構成は例示であり、本発明は実施の形態の構成に限定されない。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. The configuration of the following embodiment is an exemplification, and the present invention is not limited to the configuration of the embodiment.

〈実施形態１〉
図１は匿名化処理の説明図、図２は多様化処理の説明図である。図１（Ａ）は、姓、年齢、性別の項目を含む会員情報から姓の項目を削除した例を示す。図１（Ａ）に示すように年齢が記載されている会員情報に１６歳の女性が一人だけであると、１６歳の女性が、この会員であることが分かった時点で、その人を特定できる。即ち、１６歳・女性という属性を持つ人が一人だけであると、他の情報と照らし合わせることで、個人を特定できる可能性がある。 <Embodiment 1>
FIG. 1 is an explanatory diagram of anonymization processing, and FIG. 2 is an explanatory diagram of diversification processing. FIG. 1A shows an example in which the last name item is deleted from the member information including the last name, age, and gender items. As shown in Fig. 1 (A), if there is only one 16-year-old woman in the member information in which the age is described, when the 16-year-old woman is found to be this member, the person is identified. it can. That is, if there is only one person with the attribute of 16 years old and female, there is a possibility that an individual can be identified by comparing with other information.

図１（Ｂ）では、会員リストの年齢の記載を抽象化し、０代（１０歳未満）、１０代、２０代のように年代別とした。しかし、この場合でも１０代女性は一人だけであり、図１（Ａ）と同様に個人が特定できてしまい匿名化としては不十分である。 In FIG. 1 (B), the description of the age in the member list is abstracted and classified by age, such as 0's (under 10 years), 10's, and 20's. However, even in this case, there is only one female teenager, and an individual can be identified as in FIG. 1A, which is insufficient for anonymization.

そこで、図１（Ｃ）では、更に抽象化し、１０代以下（１９歳以下）と２０代のように年代の区切りを変更した。図１（Ｃ）の場合、１０代以下の女性が２人であり、［１０代以下］及び［女性］という属性が単一では無くなる。このため前述のように１６歳の女性が、この会員であることが分かったとしても、どちらが当該１６歳女性のデータであるかは特定できない。このように同じ属性を持つ人がｋ人以上いる状態を、「k-匿名性」を満たすと称し、そのようにデータを加工することを「k-匿名化」と称する。 Therefore, in FIG. 1 (C), it was further abstracted and the age divisions were changed to those in their teens (under 19 years old) and those in their 20s. In the case of FIG. 1C, there are two women in their teens or less, and the attributes of “10 or less” and [female] are not single. For this reason, even if it turns out that a 16-year-old woman is this member as mentioned above, it cannot be specified which is the data of the 16-year-old woman. A state in which there are k or more people having the same attribute in this way is referred to as “k-anonymity” and processing such data is referred to as “k-anonymization”.

図２は、ユーザ毎の利用駅のデータを抽象化し、ユーザ毎の利用駅が属する区のデータとした例を示す。抽象化前のデータでは、駅が特定されているために、住居が新宿駅付近で勤務地が東京駅付近といったデータと照らし合わせることでユーザを特定できる可能性がある。このため利用駅を抽象化して、利用駅が属する区とすることで、新宿区内の駅と千代田区内の駅を利用するユーザが複数となり、利用者が特定されなくなる。このように「新宿区内の駅と千代田区内の駅を利用する」のように属性値がｌ種類の可能性を持つよう抽象化することをｌ−多様化と称する。 FIG. 2 shows an example in which the data of the use stations for each user is abstracted and used as data for the ward to which the use station for each user belongs. In the pre-abstraction data, since the station is specified, there is a possibility that the user can be specified by comparing the data such as the residence near Shinjuku Station and the work place near Tokyo Station. For this reason, by abstracting the use station and making it a ward to which the use station belongs, there are a plurality of users who use stations in Shinjuku ward and stations in Chiyoda ward, and the user is not specified. The abstraction that attribute values have l types of possibilities, such as “use stations in Shinjuku ward and stations in Chiyoda ward” is called l-diversification.

図３は、本実施形態における匿名化システム１０の概略構成図である。匿名化システム１０は、図１に示すように、匿名処理装置１と減少係数算出装置２を有している。 FIG. 3 is a schematic configuration diagram of the anonymization system 10 in the present embodiment. As shown in FIG. 1, the anonymization system 10 includes an anonymization processing device 1 and a reduction coefficient calculation device 2.

匿名処理装置１は、データ受付部１１や、区分数取得部１２、係数取得部１３、可能性判定部１４、匿名化部１５、検定部１６、縦列登録部１７、データ出力部１８、匿名結果ＤＢ（データベース）３１、匿名情報縦列ＤＢ３２を備えている。 The anonymity processing device 1 includes a data reception unit 11, a classification number acquisition unit 12, a coefficient acquisition unit 13, a possibility determination unit 14, an anonymization unit 15, a test unit 16, a column registration unit 17, a data output unit 18, and an anonymous result. A DB (database) 31 and an anonymous information column DB 32 are provided.

データ受付部１１は、個人と対応付けられた複数の項目を含む対象データ（個人情報）や、匿名化の条件、匿名化に係る命令等を受け付ける。なお、個人情報や匿名化の条件等の受付は、インターネット等のネットワークを介して受信するものや、記憶媒体から読み出されるもの、キーボード等の入力手段から入力されるものであっても良い。図４は個人情報の一例を示す図である。図４に示す例では、ユーザ毎のＩＤや姓、年齢、性別、購入商品、購入場所等の情報を有している。 The data reception unit 11 receives target data (personal information) including a plurality of items associated with an individual, anonymization conditions, an anonymization command, and the like. The reception of personal information and anonymization conditions may be received via a network such as the Internet, read from a storage medium, or input from an input means such as a keyboard. FIG. 4 is a diagram showing an example of personal information. The example shown in FIG. 4 has information such as ID, last name, age, sex, purchased product, and purchase place for each user.

区分数取得部１２は、匿名化対象の個人情報を匿名化する際の区分数を取得する。区分数は、匿名情報に含まれる属性が取り得る属性値（語）の種類の数、換言すると属性値を同一の属性値毎に区分した場合の区分の数である。図５は、区分の説明図である。例えば、属性が性別の場合に、属性値を男性と女性の２区分とする。また、属性が年齢の場合に
、属性値を未成年、成人、老人の３区分や、２０代以下、３０代、４０代、５０代、６０代以上の５区分、更に０代、１０代、２０代、３０代、４０代、５０代、６０代、７０代、８０代以上の９区分とする。また、属性が住所や購入場所等の地域の場合に、属性値を西日本と東日本の２区分や、北海道、東北、関東、中部、近畿、中国、四国、九州、沖縄の９区分、北海道、青森県、岩手県・・・東京都・・・大阪府といった都道府県の４７区分とする。 The number-of-sections acquisition unit 12 acquires the number of sections when anonymizing personal information to be anonymized. The number of divisions is the number of types of attribute values (words) that can be taken by the attributes included in the anonymous information, in other words, the number of divisions when the attribute values are divided for the same attribute values. FIG. 5 is an explanatory diagram of classification. For example, when the attribute is sex, the attribute value is divided into two categories, male and female. In addition, when the attribute is age, the attribute value is classified into 3 categories of minors, adults, and elderly people, 5 categories of 20s or less, 30s, 40s, 50s, 60s or more, 0s, 10s, There are 9 categories, 20s, 30s, 40s, 50s, 60s, 70s, 80s and over. In addition, when the attribute is an area such as an address or purchase location, the attribute value is divided into two categories: West Japan and East Japan, Hokkaido, Tohoku, Kanto, Chubu, Kinki, China, Shikoku, Kyushu, Okinawa, 9 categories, Hokkaido, Aomori 47 prefectures such as prefecture, Iwate prefecture ... Tokyo ... Osaka prefecture.

区分数取得部１２は、例えば、匿名化の処理を指示するオペレータによる入力、過去の履歴から読み出し、対象データの属性を抽象化する語（属性値）として匿名化辞書に登録されている語の計数により区分数を取得する。 The number-of-sections acquisition unit 12 reads, for example, the words registered in the anonymization dictionary as words (attribute values) that are input by an operator instructing anonymization processing, read from past history, and abstract the attributes of the target data. Get the number of categories by counting.

係数取得部１３は、減少係数算出装置２によって算出された減少係数を取得する。減少係数は、例えば、対象データを匿名化する際、区分数を増加させた場合の最少出現数の減少数又は前記減少数の全体数に対する割合である。 The coefficient acquisition unit 13 acquires the decrease coefficient calculated by the decrease coefficient calculation device 2. The decrease coefficient is, for example, a decrease number of the minimum number of appearances when the number of classifications is increased when anonymizing target data, or a ratio of the decrease number to the total number.

可能性判定部１４は、前記減少係数と前記区分数に基づいて、前記個人情報を前記区分数で匿名化した場合の最少出現数の減少量が所定の基準値を超える可能性を判定する。 The possibility determination unit 14 determines, based on the decrease coefficient and the number of sections, the possibility that the amount of decrease in the minimum number of appearances when the personal information is anonymized by the number of sections exceeds a predetermined reference value.

匿名化部１５は、対象データを匿名化或いは多様化する際に、対象データ中の項目の値であるワード（語）を抽象化したワードに替えることで匿名化を行い、対象データを匿名候補データとする。本実施形態においてワード（語）は、単語や句など、一まとまりの言葉であり、位置情報や電話番号等の数値、メールアドレスやＩＰアドレス等の識別情報、言葉と同様の意味を持つ記号等を含んでも良い。本実施形態の匿名化部１５は、前記可能性判定部１４で匿名性を満たす可能性が高いと判定した場合に前記対象データの匿名化を行い、匿名性を満たす可能性が低いと判定した場合には前記対象データの匿名化を中止する。 When the anonymization unit 15 anonymizes or diversifies the target data, the anonymization unit 15 performs anonymization by replacing the word (word) that is the value of the item in the target data with an abstracted word, and the target data is anonymized candidates Data. In this embodiment, a word (word) is a group of words such as a word or a phrase, a numerical value such as location information or a telephone number, identification information such as an e-mail address or an IP address, a symbol having the same meaning as the word, or the like. May be included. The anonymization unit 15 of the present embodiment performs anonymization of the target data when the possibility determination unit 14 determines that the possibility of satisfying anonymity is high, and determines that the possibility of satisfying anonymity is low. In this case, the anonymization of the target data is stopped.

検定部１６は、匿名候補データの一個人と対応する項目の値の組み合わせが、当該匿名候補データ中で単一でないことを条件として検定する。例えば検定部１６は、匿名候補データがｋ−匿名性を満たしているか、ｌ−多様性を満たしているかを検定する。即ち、検定部１６は、匿名候補データのｋ値（最少出現数）が基準値以上で、ｋ−匿名性を満たしているか、匿名候補データのｌ値が基準値以上で、ｌ−多様性を満たしているかを検定する。検定部１６は、この検定の結果、匿名性を満たした匿名候補データを匿名情報として匿名結果ＤＢ３１に記憶させる。 The test | inspection part 16 tests on condition that the combination of the value of the item corresponding to one individual of anonymous candidate data is not single in the said anonymous candidate data. For example, the test unit 16 tests whether the anonymous candidate data satisfies k-anonymity or l-diversity. That is, the test unit 16 determines that the k value (minimum number of occurrences) of the anonymous candidate data is equal to or greater than the reference value and satisfies k-anonymity, or the l value of the anonymous candidate data is equal to or greater than the reference value and the l-diversity is determined. Test whether it meets. The test | inspection part 16 memorize | stores the anonymous candidate data which satisfy | filled anonymity as a result of this test in anonymous result DB31 as anonymous information.

図４Ａは、対象データの一例を示す図、図４Ｂは、匿名結果ＤＢ３１に記憶される匿名データの一例を示す図である。図４Ｂに示す匿名データは、図４Ａに示す対象データのうち、ユーザ毎のＩＤを匿名情報用のＩＤに変更し、姓を削除し、年齢、購入商品、購入場所の情報を抽象化している。なお、匿名情報用のＩＤは、対象データのＩＤとは別のＩＤを付しているので、匿名情報用のＩＤから個人を特定できるものでは無い。また、この匿名情報用のＩＤと対象データのＩＤとの対応テーブルを対象データと共に記憶しておき、匿名情報と対象データの対応付けを可能としても良い。 FIG. 4A is a diagram illustrating an example of target data, and FIG. 4B is a diagram illustrating an example of anonymous data stored in the anonymous result DB 31. The anonymous data shown in FIG. 4B is obtained by changing the ID for each user to the ID for anonymous information in the target data shown in FIG. 4A, deleting the last name, and abstracting information on age, purchased product, and purchase place. . In addition, since ID for anonymous information has attached | subjected ID different from ID of object data, it cannot identify an individual from ID for anonymous information. In addition, a correspondence table between the ID for anonymous information and the ID of the target data may be stored together with the target data so that the anonymous information and the target data can be associated with each other.

縦列登録部１７は、匿名情報を属性毎に分割して、匿名情報縦列ＤＢ３２へ縦列に登録する。図４Ｂの匿名結果ＤＢ３１では、年齢、性別、購入商品、購入場所といったユーザ毎の属性を行方向に連ねて登録しているのに対し、図６の匿名情報縦列ＤＢ３２では、これらの属性を属性毎、及びこれらの属性の組み合わせ毎に別のレコードとして分割し、縦列に登録している。例えば、図４Ａの匿名結果ＤＢ３１では、ＩＤがＸのレコードに「１７才」「男」「新宿」「ラーメン」といった属性が登録されているのに対し、図６の匿名情報縦列ＤＢ３２では、ＩＤがＺ００１のレコードに「１７才」、ＩＤがＺ００４のレコ
ードに「男」、ＩＤがＸ００１のレコードに「新宿」、ＩＤがＹ００１のレコードに「ラーメン」、ＩＤがＹ００８のレコードに「１７才−男」、ＩＤがＶ００３のレコードに「新宿−ラーメン」等のように、それぞれ別のレコードに登録されている。 The column registration unit 17 divides anonymous information for each attribute and registers the anonymous information in the column in the anonymous information column DB 32. In the anonymous result DB 31 of FIG. 4B, attributes for each user such as age, sex, purchased product, and purchase place are registered in the row direction, whereas in the anonymous information column DB 32 of FIG. Each record and a combination of these attributes are divided as separate records and registered in columns. For example, in the anonymous result DB 31 in FIG. 4A, attributes such as “17 years old”, “male”, “Shinjuku”, and “ramen” are registered in the record with ID X, whereas in the anonymous information column DB 32 in FIG. Is “17 years old” in the record with Z001, “male” in the record with ID Z004, “Shinjuku” in the record with ID X001, “ramen” in the record with ID Y001, and “17 years old in the record with ID Y008” Each record is registered in a separate record, such as “Shinjuku-Ramen”, etc.

データ出力部１８は、匿名情報縦列ＤＢ３２から匿名化情報を読み出して出力する。ここで、匿名化情報の出力とは、例えば、表示装置による表示出力や、プリンタによる印刷出力、他のコンピュータへの送信、記憶媒体への書き込み等である。 The data output unit 18 reads out the anonymized information from the anonymous information column DB 32 and outputs it. Here, the output of anonymization information includes, for example, display output by a display device, print output by a printer, transmission to another computer, writing to a storage medium, and the like.

区分数算出部１９は、減少係数と個人情報の全体数に基づいて、個人情報を匿名化した場合の最少出現数の減少量が所定の基準値を超えない区分数を求める。 Based on the reduction coefficient and the total number of personal information, the number-of-segments calculation unit 19 obtains the number of categories in which the amount of decrease in the minimum number of appearances when the personal information is anonymized does not exceed a predetermined reference value.

また、減少係数算出装置２は、匿名情報取得部２１や、出現数取得部２２、係数算出部２３、頻出パターンＤＢ３３、減少係数ＤＢ３４を備えている。 The reduction coefficient calculation device 2 includes an anonymous information acquisition unit 21, an appearance number acquisition unit 22, a coefficient calculation unit 23, a frequent pattern DB 33, and a reduction coefficient DB 34.

匿名情報取得部２１は、個人情報を匿名化した匿名情報縦列ＤＢ３２から匿名情報を取得する。 The anonymous information acquisition unit 21 acquires anonymous information from the anonymous information column DB 32 in which personal information is anonymized.

出現数取得部２２は、匿名情報を構成する属性がとり得る語の種類毎に区分して区分数を求め、各語の最少出現数を求める、例えば、匿名情報に含まれる語を同一の語毎に区分して区分数を求め、各区分における語の最少出現数を求める。 The number-of-appearance acquisition unit 22 obtains the number of divisions by classifying each type of words that can be attributed to the anonymous information, and obtains the minimum number of occurrences of each word. For example, the words included in the anonymous information are the same words. The number of divisions is obtained for each division, and the minimum number of words in each division is obtained.

係数算出部２３は、区分数の異なる複数の前記区分数及び前記最少出現数の組み合わせに基づいて、前記区分数を増加させた場合の最少出現数の減少数又は前記減少数の全体数に対する割合を減少係数として求め、減少係数ＤＢ３４に記憶する。 The coefficient calculation unit 23 may reduce the minimum number of occurrences or the ratio of the reduction number to the total number when the number of divisions is increased based on a combination of the plurality of divisions and the minimum number of appearances having different numbers of divisions. Is obtained as a reduction coefficient and stored in the reduction coefficient DB 34.

図７は匿名処理装置１及び減少係数算出装置２のハードウェア構成を示す図である。匿名処理装置１及び減少係数算出装置２は、ＣＰＵ１０１、メモリ１０２、通信制御部１０３、記憶装置１０４、入出力インタフェース１０５を有する所謂コンピュータである。 FIG. 7 is a diagram illustrating a hardware configuration of the anonymous processing device 1 and the reduction coefficient calculation device 2. The anonymous processing device 1 and the reduction coefficient calculation device 2 are so-called computers having a CPU 101, a memory 102, a communication control unit 103, a storage device 104, and an input / output interface 105.

ＣＰＵ１０１は、メモリ１０２に実行可能に展開されたプログラムを実行する。これにより、匿名処理装置１のＣＰＵ１０１は、前述のデータ受付部１１や、区分数取得部１２、係数取得部１３、可能性判定部１４、匿名化部１５、検定部１６、縦列登録部１７、データ出力部１８の機能を提供する。また、減少係数算出装置２のＣＰＵ１０１は、前述の匿名情報取得部２１や、出現数取得部２２、係数算出部２３の機能を提供する。 The CPU 101 executes a program that is loaded in the memory 102 so as to be executable. Thereby, CPU101 of the anonymous processing apparatus 1 is the above-mentioned data reception part 11, the classification number acquisition part 12, the coefficient acquisition part 13, the possibility determination part 14, the anonymization part 15, the test | inspection part 16, the column registration part 17, The function of the data output unit 18 is provided. Further, the CPU 101 of the reduction coefficient calculation device 2 provides the functions of the above-described anonymous information acquisition unit 21, appearance number acquisition unit 22, and coefficient calculation unit 23.

メモリ１０２は、主記憶装置ということもできる。メモリ１０２は、例えば、ＣＰＵ１０１が実行するプログラムや、通信制御部１０３を介して受信したデータ、記憶装置１０４から読み出したデータ、その他のデータ等を記憶する。 The memory 102 can also be called a main storage device. The memory 102 stores, for example, a program executed by the CPU 101, data received via the communication control unit 103, data read from the storage device 104, other data, and the like.

通信制御部１０３は、ネットワークを介して他の装置と接続し、当該装置との通信を制御する。入出力インタフェース１０５は、表示装置やプリンタ等の出力手段や、キーボードやポインティングデバイス等の入力手段、ドライブ装置等の入出力手段が適宜接続される。ドライブ装置は、着脱可能な記憶媒体の読み書き装置であり、例えば、フラッシュメモリカードの入出力装置、ＵＳＢメモリを接続するＵＳＢのアダプタ等である。また、着脱可能な記憶媒体は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ブルーレイディスク（Blu-ray(登録商標) Disc）等のディスク媒体であってもよい。ドライブ装置は、着脱可能な記憶媒体からプログラムを読み出し、記憶装置１０４に格納する。 The communication control unit 103 is connected to another device via a network and controls communication with the device. The input / output interface 105 is appropriately connected to output means such as a display device and a printer, input means such as a keyboard and pointing device, and input / output means such as a drive device. The drive device is a removable storage medium read / write device, such as an input / output device for a flash memory card, a USB adapter for connecting a USB memory, or the like. The removable storage medium may be a disk medium such as a CD (Compact Disc), a DVD (Digital Versatile Disk), or a Blu-ray (registered trademark) disc. The drive device reads the program from the removable storage medium and stores it in the storage device 104.

記憶装置１０４は、外部記憶装置ということもできる。記憶装置１０４としては、ＳＳ
Ｄ（Solid State Drive）やＨＤＤ等であってもよい。記憶装置１０４は、ドライブ装置
との間で、データを授受する。例えば、記憶装置１０４は、ドライブ装置からインストールされる情報処理プログラム等を記憶する。また、記憶装置１０４は、プログラムを読み出し、メモリ１０２に引き渡す。本実施形態では、匿名処理装置１の記憶装置１０４が前述の匿名結果ＤＢ３１、匿名情報縦列ＤＢ３２を格納している。また、減少係数算出装置２の記憶装置１０４が、頻出パターンＤＢ３３、減少係数ＤＢ３４を格納している。 The storage device 104 can also be called an external storage device. As the storage device 104, SS
D (Solid State Drive), HDD, etc. may be sufficient. The storage device 104 exchanges data with the drive device. For example, the storage device 104 stores an information processing program installed from the drive device. Further, the storage device 104 reads the program and delivers it to the memory 102. In the present embodiment, the storage device 104 of the anonymous processing device 1 stores the anonymous result DB 31 and the anonymous information column DB 32 described above. Further, the storage device 104 of the reduction coefficient calculation device 2 stores a frequent pattern DB 33 and a reduction coefficient DB 34.

次に本実施形態における匿名化システム１０の匿名処理装置１及び減少係数算出装置２がプログラムに従って実行する匿名処理方法及び減少係数算出方法について説明する。図８は、匿名化処理の説明図である。匿名処理装置１は、先ず他のコンピュータ或いは記憶装置から対象データを受け付ける（ステップＳ１０）。本実施形態の匿名処理装置１は、複数の匿名化アルゴリズムを有し、オペレータが任意に選択できるようにしている。複数の匿名化アルゴリズムとしては、例えば、医療情報の匿名化に特化したアルゴリズムや、購買履歴等のフローデータの匿名化に特化したアルゴリズム、ファッションや教育、外食産業等の特定の業種に特化したアルゴリズムが挙げられる。また、このアルゴリズムは、匿名化手法だけでなく、匿名化辞書や、前処理手法、フィルタリング手法などを選択可能としても良い。即ち、オペレータは、匿名化の対象データ共に、これらのアルゴリズムを選択する情報を入力する。 Next, the anonymous processing method and the reduction coefficient calculation method which the anonymous processing apparatus 1 and the reduction coefficient calculation apparatus 2 of the anonymization system 10 in this embodiment perform according to a program are demonstrated. FIG. 8 is an explanatory diagram of the anonymization process. The anonymous processing device 1 first receives target data from another computer or storage device (step S10). The anonymity processing device 1 of the present embodiment has a plurality of anonymization algorithms so that an operator can arbitrarily select them. The multiple anonymization algorithms include, for example, algorithms specialized for anonymization of medical information, algorithms specialized for anonymization of flow data such as purchase history, and specific industries such as fashion, education, and the restaurant industry. Algorithm. In addition, this algorithm may select not only an anonymization method but also an anonymization dictionary, a preprocessing method, a filtering method, and the like. That is, the operator inputs information for selecting these algorithms together with the anonymization target data.

次に、匿名処理装置１は、選択されたアルゴリズムで対象データを匿名化し（ステップＳ２０）、最少出現数が基準値を超えているか否かによって匿名性の検定を行う(ステッ
プＳ３０)。 Next, the anonymous processing device 1 anonymizes the target data with the selected algorithm (step S20), and performs anonymity test depending on whether the minimum number of appearances exceeds the reference value (step S30).

検定後、匿名処理装置１は、匿名情報を匿名結果ＤＢ３１に蓄積する（ステップＳ４０）。図８において、データパターンＡは、対象データの年齢を５区分で匿名化処理した匿名情報の例であり、検定の結果、１６才、１７才、１８才、２０才の４区分は基準値１０人を超えて匿名性を満たし（図中、○で示す）、１５才以下の１区分は基準値１０人を下回り匿名性を満たしていない（図中、×で示す）。同様に、データパターンＢは、対象データの年齢を３区分で匿名化処理した匿名情報の例であり、データパターンＣは、対象データの年齢を２区分で匿名化処理した匿名情報の例である。 After the test, the anonymous processing device 1 accumulates anonymous information in the anonymous result DB 31 (step S40). In FIG. 8, data pattern A is an example of anonymized information obtained by anonymizing the age of the target data in 5 categories. As a result of the test, 4 categories of 16, 17, 18, and 20 are the reference value 10. Anonymity is satisfied beyond people (indicated by a circle in the figure), and one category of 15 years or younger is below the reference value of 10 and does not satisfy anonymity (indicated by x in the figure). Similarly, the data pattern B is an example of anonymous information obtained by anonymizing the age of the target data in three categories, and the data pattern C is an example of anonymous information obtained by anonymizing the age of the target data in two categories. .

そして、匿名処理装置１は、定期的に匿名結果ＤＢ３１から匿名情報を読み取り、匿名情報縦列ＤＢ３２へ縦列に登録する (ステップＳ５０)。図８において、縦列匿名情報８
１は、匿名情報縦列ＤＢ３２に登録される匿名情報の一例を示す図であり、データパターンや、データパターン毎の行番号、属性、存在数、匿名の基準値を対応付けて記憶している。（ステップＳ６０） And the anonymous processing apparatus 1 reads anonymous information periodically from the anonymous result DB31, and registers it in the anonymous information column DB32 in a column (step S50). In FIG. 8, column anonymous information 8
1 is a diagram illustrating an example of anonymous information registered in the anonymous information column DB 32, and stores a data pattern, a row number, an attribute, an existence number, and an anonymous reference value for each data pattern in association with each other. (Step S60)

図９は、出現数を取得する処理の説明図である。減少係数算出装置２は、図９に示すように、先ず匿名情報縦列ＤＢ３２からデータパターン毎に匿名データを取得する(ステッ
プＳ１１０)。 FIG. 9 is an explanatory diagram of a process for acquiring the number of appearances. As shown in FIG. 9, the reduction coefficient calculation device 2 first acquires anonymous data for each data pattern from the anonymous information column DB 32 (step S110).

次に、減少係数算出装置２は、データパターン毎の区分数や存在数を取得し(ステップ
Ｓ１２０)、存在数が所定値以上の属性値を頻出パターン８２として頻出パターンＤＢ３
３に登録し、属性毎の区分数と最少出現数を属性パターン８３として減少係数ＤＢ３４に登録する(ステップＳ１３０)。なお、属性パターン８３は、図１０に示すように、日時や企業名、利用回数等の情報を更に付加しても良い。 Next, the reduction coefficient calculation device 2 acquires the number of divisions and the number of existence for each data pattern (step S120), and sets the attribute value having the existence number equal to or greater than a predetermined value as the frequent pattern 82.
3 and the number of divisions and the minimum number of appearances for each attribute are registered in the reduction coefficient DB 34 as the attribute pattern 83 (step S130). As shown in FIG. 10, the attribute pattern 83 may further include information such as date and time, company name, and number of uses.

そして、減少係数算出装置２は、次のデータパターンがあるか否かを判定し、次のデータパターンがあればステップＳ１１０に戻り、次のデータパターンがなければ終了する(
ステップＳ１４０)。 Then, the reduction coefficient calculation device 2 determines whether or not there is a next data pattern. If there is a next data pattern, the process returns to step S110, and if there is no next data pattern, the process ends.
Step S140).

図１１は、減少係数算出の処理の説明図である。減少係数算出装置２は、図１０に示すように、先ず減少係数ＤＢ３４から属性パターン８３を取得する(ステップＳ１５０)。 FIG. 11 is an explanatory diagram of the reduction coefficient calculation process. As shown in FIG. 10, the reduction coefficient calculation device 2 first acquires the attribute pattern 83 from the reduction coefficient DB 34 (step S150).

次に減少係数算出装置２は、取得した属性パターン８３のうち、各属性について、区分数の異なる複数の区分数及び最少出現数の組み合わせに基づいて、区分数を増加させた場合の最少出現数の減少数や前記減少数の全体数に対する割合を減少係数として求め (ステップＳ１６０)、この属性毎の減少係数８４を減少係数ＤＢ３４に記憶させる。 Next, the reduction coefficient calculation apparatus 2 uses the minimum number of occurrences when the number of divisions is increased based on a combination of a plurality of division numbers having different division numbers and the minimum number of appearances for each attribute in the acquired attribute pattern 83. The reduction number or the ratio of the reduction number to the total number is obtained as a reduction coefficient (step S160), and the reduction coefficient 84 for each attribute is stored in the reduction coefficient DB 34.

図１２に示されるように、全体数が２００００、区分数２のときの最少出現数が１００００、区分数７のときの最少出現数が１０００、区分数１１のときの最少出現数が５００・・・といった区分数及び最少出現数の組み合わせ８５であるとき、この回帰直線８６を求めると、ｙ＝−９５．９０５ｘ＋８７２７．９となる。この回帰直線の傾きから区分数が１増加すると、最少出現数が約９５減少することがわかる。即ち減少定数が９５．９０５である。ここで、全体数が２００００であるので、減少定率は９５．９０５／２００００≒０．４７％である。 As shown in FIG. 12, when the total number is 20000 and the number of sections is 2, the minimum number of appearances is 10000, when the number of sections is 7, the minimum number of appearances is 1000, and when the number of sections is 11, the minimum number of appearances is 500. When the regression line 86 is obtained for the combination 85 of the number of divisions and the minimum number of occurrences, y = −95.905x + 877.9. From the slope of the regression line, it can be seen that when the number of segments increases by 1, the minimum number of appearances decreases by about 95. That is, the reduction constant is 95.905. Here, since the total number is 20000, the rate of decrease is 95.905 / 20000≈0.47%.

なお、本実施形態では、属性毎に減少係数を求めたが、これに限らず複数の属性を併合して減少係数を求めても良い。例えば、区分数（属性種）が２の属性と区分数が３の属性とを併合して区分数６として減少係数の算出に用いても良い。また、同一の属性の抽象度を変えて異なる複数の区分数の属性として減少係数の算出に用いても良い。更に、類似した属性に基づいて減少係数を求めた方が精度が高いため、匿名化の際に用いる属性に優先度を付けておき、優先度の高いものから順に所定数の属性を用いて減少係数を求めても良い。また、属性を所定のジャンル（例えば、地域、時期、音楽、ファッション等）に分類しておき、同じジャンルに分類される属性を用いて減少係数を算出しても良い。更に、Ａ社の年齢、Ａ社の購入場所等のように企業名毎のデータに基づいて減少係数を求めても良い。 In the present embodiment, the reduction coefficient is obtained for each attribute. However, the present invention is not limited to this, and a reduction coefficient may be obtained by merging a plurality of attributes. For example, an attribute with 2 divisions (attribute type) and an attribute with 3 divisions may be merged and used as 6 divisions to calculate the reduction coefficient. Also, the abstraction level of the same attribute may be changed and used for calculating the reduction coefficient as a plurality of different attributes. Furthermore, since it is more accurate to calculate the reduction factor based on similar attributes, prioritize the attributes used for anonymization, and use a predetermined number of attributes in descending order of priority. A coefficient may be obtained. Further, the attribute may be classified into a predetermined genre (for example, region, time, music, fashion, etc.), and the reduction coefficient may be calculated using the attribute classified into the same genre. Further, the reduction coefficient may be obtained based on data for each company name such as the age of company A, the purchase location of company A, and the like.

そして、減少係数算出装置２は、次のデータがあるか否かを判定し、次のデータがあればステップＳ１５０に戻り、次のデータがなければ終了する(ステップＳ１８０)。 Then, the reduction coefficient calculation device 2 determines whether or not there is next data. If there is next data, the process returns to step S150, and if there is no next data, the process ends (step S180).

図１３は、出現頻度を求める処理の説明図である。減少係数算出装置２は、図１３に示すように、先ず頻出パターンＤＢ３３から頻出パターン８２を取得する(ステップＳ２１
０)。減少係数算出装置２は、この頻出パターン８２から、属性値毎の存在数の平均や、
全体数に対する存在数の割合（出現率）を統計情報として求め(ステップＳ２２０)、頻出パターンＤＢ３３に登録する(ステップＳ２３０)。 FIG. 13 is an explanatory diagram of processing for obtaining the appearance frequency. As shown in FIG. 13, the reduction coefficient calculation apparatus 2 first acquires the frequent pattern 82 from the frequent pattern DB 33 (step S21).
0). The reduction coefficient calculation device 2 calculates the average number of existence for each attribute value from the frequent pattern 82,
The ratio (appearance rate) of the existence number to the total number is obtained as statistical information (step S220), and is registered in the frequent pattern DB 33 (step S230).

そして、減少係数算出装置２は、次のデータがあるか否かを判定し、次のデータがあればステップＳ２１０に戻り、次のデータがなければ終了する(ステップＳ２４０)。 Then, the reduction coefficient calculation device 2 determines whether or not there is next data. If there is next data, the process returns to step S210, and if there is no next data, the process ends (step S240).

図１４は、減少係数を用いた匿名化処理の説明図である。匿名処理装置１は、先ず他のコンピュータから対象データと共に匿名化のリクエストを受け付ける（ステップＳ３１０）。このとき例えば男女２区分×年代８区分＝１６区分等のように、オペレータが指定した区分数のリクエストを受ける。なお、図１４には省略したが、前述の図８と同様に匿名処理装置１は、複数の匿名化アルゴリズムを有し、オペレータが任意に選択できる。 FIG. 14 is an explanatory diagram of anonymization processing using a reduction coefficient. The anonymity processing device 1 first receives an anonymization request together with target data from another computer (step S310). At this time, for example, a request for the number of categories designated by the operator is received, such as 2 categories of men and women × 8 categories of age = 16 categories. Although omitted in FIG. 14, the anonymous processing device 1 has a plurality of anonymization algorithms and can be arbitrarily selected by the operator, as in FIG. 8 described above.

次に匿名処理装置１は、匿名化する対象データの各属性について減少係数を減少係数ＤＢ３４から取得する(ステップＳ３２０)。なお、減少係数が、企業名と対応付けて記憶されている場合には、この企業名の一致する減少係数を取得する。即ち、当該企業が過去に用いた匿名データから求めた減少係数を取得する。 Next, the anonymous processing device 1 acquires a reduction coefficient for each attribute of the target data to be anonymized from the reduction coefficient DB 34 (step S320). When the reduction coefficient is stored in association with the company name, the reduction coefficient that matches the company name is acquired. That is, the reduction coefficient calculated | required from the anonymous data which the said company used in the past is acquired.

そして、匿名処理装置１は、取得した減少係数と区分数に基づいて、対象データを前記区分数で匿名化した場合の最少出現数の減少量が所定の基準値を超える可能性が高いか否かを判定する(ステップＳ３３０)。例えば、減少定率が１０％で、区分数が１６区分であると、１６区分×１０％＝１６０％となり、１００％（基準値）を超えるので、匿名性を満たす可能性が低いと判定する。一方、区分数が８区分であると、８区分×１０％＝８０％となり、１００％（基準値）を超えないので、匿名性を満たす可能性が高いと判定する。また、減少定数が８で、区分数が１６区分であると、１６区分×８＝１２８となり、全体数１００（基準値）を超えるので、匿名性を満たす可能性が低いと判定する。一方、区分数が８区分であると、８区分×８＝６４となり、１００（基準値）を超えないので、匿名性を満たす可能性が高いと判定する。 And the anonymous processing apparatus 1 is based on the acquired reduction coefficient and the number of divisions, and when there is a high possibility that the amount of decrease in the minimum number of appearances when the target data is anonymized by the number of divisions exceeds a predetermined reference value Is determined (step S330). For example, if the rate of decrease is 10% and the number of categories is 16, it is 16 categories × 10% = 160%, which exceeds 100% (reference value), so it is determined that the possibility of satisfying anonymity is low. On the other hand, if the number of sections is 8, it is 8 sections × 10% = 80%, and does not exceed 100% (reference value), so it is determined that there is a high possibility of satisfying anonymity. Further, if the reduction constant is 8 and the number of sections is 16, the number of sections is 16 × 8 = 128, which exceeds the total number 100 (reference value), so it is determined that the possibility of satisfying anonymity is low. On the other hand, if the number of sections is 8, it is 8 sections × 8 = 64 and does not exceed 100 (reference value), so it is determined that there is a high possibility of satisfying anonymity.

匿名性を満たす可能性が高いと判定した場合、匿名処理装置１は、選択されたアルゴリズムで対象データを匿名化し（ステップＳ３４０）、最少出現数が基準値を超えているか否かによって匿名性の検定を行う(ステップＳ３５０)。また、検定後、匿名処理装置１は、匿名情報を匿名結果ＤＢ３１に蓄積する（ステップＳ３６０）。 If it is determined that there is a high possibility of satisfying the anonymity, the anonymous processing device 1 anonymizes the target data with the selected algorithm (step S340), and the anonymity depends on whether the minimum number of appearances exceeds the reference value. A test is performed (step S350). Moreover, after the test, the anonymous processing device 1 accumulates anonymous information in the anonymous result DB 31 (step S360).

一方、ステップＳ３３０で、匿名性を満たす可能性が低いと判定した場合、匿名処理装置１は、匿名化処理を中止し、処理を終了する(ステップＳ３７０)。 On the other hand, if it is determined in step S330 that the possibility of satisfying anonymity is low, the anonymous processing device 1 stops the anonymization process and ends the process (step S370).

このように図１４の処理によれば、匿名性を満たす可能性が低ければ、匿名化処理を行わないので、無駄に匿名化処理を試行することが無くなり、匿名化処理の効率化が図れる。 As described above, according to the process of FIG. 14, if the possibility of satisfying anonymity is low, the anonymization process is not performed. Therefore, the anonymization process is not tried unnecessarily, and the efficiency of the anonymization process can be improved.

図１５は、減少係数を用いた匿名化処理の説明図である。匿名処理装置１は、先ず他のコンピュータから対象データを受け付ける（ステップＳ４１０）。なお、図１５には省略したが、前述の図８と同様に匿名処理装置１は、複数の匿名化アルゴリズムを有し、オペレータが任意に選択できる。 FIG. 15 is an explanatory diagram of anonymization processing using a reduction coefficient. The anonymous processing device 1 first receives target data from another computer (step S410). Although omitted in FIG. 15, the anonymous processing device 1 has a plurality of anonymization algorithms and can be arbitrarily selected by the operator as in FIG. 8 described above.

次に匿名処理装置１は、匿名化する対象データの各属性について減少係数を減少係数ＤＢ３４から取得する(ステップＳ４２０)。また、匿名処理装置１は、匿名化する対象データの各属性について頻出パターンを頻出パターンＤＢ３３から取得する(ステップＳ４３
０)。なお、減少係数や頻出パターンが、企業名と対応付けて記憶されている場合には、
この企業名の一致する減少係数や頻出パターンを取得する。即ち、当該企業が過去に用いた匿名データから求めた減少係数や頻出パターンを取得する。 Next, the anonymous processing device 1 acquires a reduction coefficient for each attribute of the target data to be anonymized from the reduction coefficient DB 34 (step S420). Further, the anonymous processing device 1 acquires a frequent pattern for each attribute of the target data to be anonymized from the frequent pattern DB 33 (step S43).
0). In addition, when the decrease coefficient and the frequent pattern are stored in association with the company name,
A reduction coefficient and a frequent pattern with the same company name are acquired. That is, the reduction coefficient and the frequent pattern obtained from the anonymous data used by the company in the past are acquired.

また、匿名処理装置１は、取得した減少係数と対象データの全体数に基づいて、対象データを匿名化した場合に匿名性を満たす可能性が高い区分数を求める(ステップＳ４４０)。例えば、減少定率が１０％で、全体数が１００である場合、１００×１０％＝１０区分のように区分数を求める。一方、減少定数が１２で、全体数が１００である場合、１００／１２≒８．３となるので、８区分とする。 Moreover, the anonymous processing apparatus 1 calculates | requires the number of divisions with high possibility of satisfy | filling anonymity, when object data is anonymized based on the acquired reduction coefficient and the whole number of object data (step S440). For example, when the decrement rate is 10% and the total number is 100, the number of divisions is obtained as 100 × 10% = 10 divisions. On the other hand, when the reduction constant is 12 and the total number is 100, 100 / 12≈8.3, so 8 divisions are set.

そして、匿名処理装置１は、ステップＳ４３０で取得した頻出パターンに含まれる区分を用い、且つステップＳ４４０で算出した区分数以下で匿名化の処理を行い（ステップＳ４５０）、最少出現数が基準値を超えているか否かによって匿名性の検定を行う(ステッ
プＳ４６０)。また、検定後、匿名処理装置１は、匿名情報を匿名結果ＤＢ３１に蓄積す
る（ステップＳ４７０）。 And the anonymous processing apparatus 1 performs the anonymization process using the classification included in the frequent pattern acquired at step S430 and the number of classifications calculated at step S440 (step S450), and the minimum number of appearances is the reference value. Anonymity is tested depending on whether or not it exceeds (step S460). Moreover, after the test, the anonymous processing device 1 accumulates anonymous information in the anonymous result DB 31 (step S470).

このように図１５の処理によれば、減少係数と対象データの全体数に基づき、匿名化を行った時の減少量が、全体数を超えないように区分数を定めたことにより、無駄に匿名化
処理を試行することが無くなり、匿名化処理の効率化が図れる。また、頻出パターンに基づき、頻出する区分を用いて匿名化を行うことで、匿名化処理を行った時の最少出現数が小さくなりすぎて、匿名性を満たさなくなることが避けられるので、匿名化処理の効率化が図れる。 As described above, according to the processing of FIG. 15, the number of divisions is determined based on the reduction coefficient and the total number of target data so that the reduction amount when anonymization is performed does not exceed the total number. The anonymization process is not tried and the anonymization process can be made more efficient. In addition, anonymization is performed based on frequent patterns, and anonymization is avoided because the minimum number of appearances when anonymization processing is performed becomes too small and anonymity is not satisfied. Processing efficiency can be improved.

<変形例>
図１６は、図１４の匿名化処理の変形例を示す図である。図１４の処理では、ステップＳ３３０で、匿名性を満たす可能性が低いと判定された場合に、処理を中断したが、図１６の処理では、ステップＳ３３０で、匿名性を満たす可能性が低いと判定された場合に、図１５の処理を実行し(ステップＳ３９０)、減少係数と全体数に基づいた区分数で匿名化を行うものである。なお、その他の構成は、同じであるため、再度の説明は省略する。 <Modification>
FIG. 16 is a diagram illustrating a modification of the anonymization process of FIG. In the process of FIG. 14, when it is determined in step S330 that the possibility of satisfying anonymity is low, the process is interrupted. However, in the process of FIG. 16, in step S330, the possibility of satisfying anonymity is low. If it is determined, the process of FIG. 15 is executed (step S390), and anonymization is performed with the number of sections based on the reduction coefficient and the total number. Since other configurations are the same, the description thereof will be omitted.

このように本変形例によれば、ステップＳ３１０でリクエストされた区分数で匿名性を満たす可能性が低い場合でも、減少係数と全体数に基づき匿名性を満たす可能性の高い区分数で匿名化を行うことができるため、匿名化処理の更なる効率化を図ることができる。 As described above, according to the present modification, anonymization is performed with the number of divisions that are highly likely to satisfy anonymity based on the decrease coefficient and the total number even when the number of divisions requested in step S310 is low. Therefore, the efficiency of the anonymization process can be further improved.

〈実施形態２〉
前述の実施形態１では、減少係数として直線近似式によって求めた減少定数や減少率を用いたが、これに限定されるものではなく、実施形態２では、減少係数として累乗近似式を用いた例を示す。本実施形態２は、前述の実施形態１と比べて、累乗近似式を用いた構成が異なり、他の構成は同じであるため、同一の要素は同符号を付す等して再度の説明を省略する。 <Embodiment 2>
In the above-described first embodiment, the reduction constant and the reduction rate obtained by the linear approximation formula are used as the reduction coefficient. However, the present invention is not limited to this, and the second embodiment uses a power approximation formula as the reduction coefficient. Indicates. The second embodiment is different from the first embodiment in the configuration using the power approximation formula, and the other configurations are the same. Therefore, the same elements are denoted by the same reference numerals and the description thereof is omitted. To do.

本実施形態２において、減少係数算出装置２の係数算出部２３は、区分数が異なる複数の前記区分数及び前記最少出現数の組み合わせに基づいて、前記区分数を増加させた場合の前記最少出現数の減少量を示す減少係数として、累乗近似式を求める。
例えば、係数算出部２３は、区分数を増加させた場合の最少出現数に基づいて、下記の累乗近似式１を求める。なお、累乗近似式１において、ｙは匿名レベル（ｋ値）、ｘは区分数を示す。

In the second embodiment, the coefficient calculation unit 23 of the reduction coefficient calculation device 2 performs the minimum occurrence when the number of divisions is increased based on a combination of the plurality of divisions and the minimum number of appearances having different numbers of divisions. A power approximation formula is obtained as a reduction coefficient indicating the amount of reduction of the number.
For example, the coefficient calculation unit 23 obtains the following power approximation equation 1 based on the minimum number of appearances when the number of categories is increased. In the power approximation equation 1, y indicates an anonymous level (k value), and x indicates the number of sections.

そして、本実施形態２における匿名処理装置１の可能性判定部１４は、累乗近似式１と匿名化を行う区分数に基づいて、個人情報を前記区分数で匿名化した場合の最少出現数の減少量が所定の基準値を超える可能性を判定する。
例えば、可能性判定部１４は、累乗近似式１を以下の式２のように展開して最少出現数を推定し、この最少出現数の推定値ｘが基準値を超えるか否かで可能性を判定する。なお、次の式では、匿名レベルｙをｋとしている。

And the possibility determination part 14 of the anonymous processing apparatus 1 in this Embodiment 2 is the minimum appearance number at the time of anonymizing personal information with the said division number based on the power approximation formula 1 and the division number which anonymizes. The possibility that the decrease amount exceeds a predetermined reference value is determined.
For example, the possibility determination unit 14 expands the power approximation expression 1 as the following expression 2 to estimate the minimum number of appearances, and determines whether the estimated value x of the minimum number of appearances exceeds the reference value. Determine. In the following expression, the anonymous level y is k.

なお、ｋ値は０に収束するものではないため、式１に１を加えて式３とし、式３を展開して式４として用いても良い

Since the k value does not converge to 0, 1 may be added to Equation 1 to obtain Equation 3, and Equation 3 may be expanded and used as Equation 4.

図１７は、減少係数算出の処理の説明図である。減少係数算出装置２は、図１０に示すように、先ず減少係数ＤＢ３４から属性パターン８３を取得する(ステップＳ１５０)。 FIG. 17 is an explanatory diagram of the reduction coefficient calculation process. As shown in FIG. 10, the reduction coefficient calculation device 2 first acquires the attribute pattern 83 from the reduction coefficient DB 34 (step S150).

次に減少係数算出装置２は、取得した属性パターン８３のうち、各属性について、区分数の異なる複数の区分数及び最少出現数の組み合わせを求め(ステップＳ１６０Ａ)、これらの区分数及び最少出現数に基づいて累乗近似式１を減少係数として求めて、この減少係数を減少係数ＤＢ３４に記憶させる(ステップＳ１７０Ａ)。 Next, the reduction coefficient calculation device 2 obtains a combination of a plurality of division numbers and minimum appearance numbers having different numbers for each attribute in the acquired attribute pattern 83 (step S160A), and the division number and the minimum appearance number. Based on the above, the power approximation formula 1 is obtained as a reduction coefficient, and this reduction coefficient is stored in the reduction coefficient DB 34 (step S170A).

図１８に示されるように、全体数が２００００、区分数２のときの最少出現数が１００００、区分数７のときの最少出現数が１０００、区分数１１のときの最少出現数が５００・・・といった区分数及び最少出現数の組み合わせ８５であるとき、累乗近似式１を求めると、ｙ＝１１４６５９ｘ^{−１．４１４}となる。この累乗近似式１から各区分の区分数が１増加した場合の最少出現数の減少数がわかる。例えば、この減少数が９５．９０５である場合、全体数が２００００であるので、減少率は９５．９０５／２００００≒０．４７％である。 As shown in FIG. 18, when the total number is 20000 and the number of sections is 2, the minimum number of appearances is 10000, when the number of sections is 7, the minimum number of appearances is 1000, and when the number of sections is 11, the minimum number of appearances is 500. In the case of the combination 85 of the number of divisions and the minimum number of appearances, when the power approximation expression 1 is obtained, y = 114659x− ^1.414 . From this power approximation formula 1, it is possible to know the decrease in the minimum number of appearances when the number of sections in each section increases by one. For example, when this reduction number is 95.905, the total number is 20000, and the reduction rate is 95.905 / 20000≈0.47%.

図１９は、減少係数を用いた匿名化処理の説明図である。匿名処理装置１は、先ず他のコンピュータから対象データと共に匿名化のリクエストを受け付ける（ステップＳ３１０）。このとき例えば男女２区分×年代８区分＝１６区分等のように、オペレータが指定し
た区分数のリクエストを受ける。なお、図１９には省略したが、前述の図８と同様に匿名処理装置１は、複数の匿名化アルゴリズムを有し、オペレータが任意に選択できる。 FIG. 19 is an explanatory diagram of anonymization processing using a reduction coefficient. The anonymity processing device 1 first receives an anonymization request together with target data from another computer (step S310). At this time, for example, a request for the number of categories designated by the operator is received, such as 2 categories of men and women × 8 categories of age = 16 categories. Although omitted in FIG. 19, the anonymous processing device 1 has a plurality of anonymization algorithms and can be arbitrarily selected by the operator as in FIG. 8 described above.

次に匿名処理装置１は、匿名化する対象データの各属性について減少係数（累乗近似式１）を減少係数ＤＢ３４から取得する(ステップＳ３２０)。なお、減少係数が、企業名と対応付けて記憶されている場合には、この企業名の一致する減少係数を取得する。即ち、当該企業が過去に用いた匿名データから求めた減少係数を取得する。 Next, the anonymous processing device 1 acquires a reduction coefficient (power approximation formula 1) from the reduction coefficient DB 34 for each attribute of the target data to be anonymized (step S320). When the reduction coefficient is stored in association with the company name, the reduction coefficient that matches the company name is acquired. That is, the reduction coefficient calculated | required from the anonymous data which the said company used in the past is acquired.

そして、匿名処理装置１は、取得した減少係数を式２のように展開し、区分数に基づいて、対象データを前記区分数で匿名化した場合の最少出現数の減少量が所定の基準値（ｋ値）を超える可能性が高いか否かを判定する(ステップＳ３３０Ａ)。 And the anonymous processing apparatus 1 expand | deploys the acquired reduction coefficient like Formula 2, and based on the number of divisions, the amount of reduction | decrease of the minimum appearance number at the time of anonymizing object data by the said number of divisions is a predetermined reference value. It is determined whether or not there is a high possibility of exceeding (k value) (step S330A).

このように図１９の処理によれば、匿名性を満たす可能性が低ければ、匿名化処理を行わないので、無駄に匿名化処理を試行することが無くなり、匿名化処理の効率化が図れる。 As described above, according to the process of FIG. 19, if there is a low possibility of satisfying the anonymity, the anonymization process is not performed. Therefore, the anonymization process is not used unnecessarily, and the efficiency of the anonymization process can be improved.

図２０は、減少係数を用いた匿名化処理の説明図である。匿名処理装置１は、先ず他のコンピュータから対象データを受け付ける（ステップＳ４１０）。なお、図２０には省略したが、前述の図８と同様に匿名処理装置１は、複数の匿名化アルゴリズムを有し、オペレータが任意に選択できる。 FIG. 20 is an explanatory diagram of anonymization processing using a reduction coefficient. The anonymous processing device 1 first receives target data from another computer (step S410). Although omitted in FIG. 20, the anonymous processing device 1 has a plurality of anonymization algorithms and can be arbitrarily selected by the operator, as in FIG.

次に匿名処理装置１は、匿名化する対象データの各属性について減少係数（累乗近似式１）を減少係数ＤＢ３４から取得する(ステップＳ４２０)。また、匿名処理装置１は、匿名化する対象データの各属性について頻出パターンを頻出パターンＤＢ３３から取得する(ステップＳ４３０)。なお、減少係数や頻出パターンが、企業名と対応付けて記憶されている場合には、この企業名の一致する減少係数や頻出パターンを取得する。即ち、当該企業が過去に用いた匿名データから求めた減少係数や頻出パターンを取得する。 Next, the anonymous processing device 1 acquires a reduction coefficient (power approximation formula 1) for each attribute of the target data to be anonymized from the reduction coefficient DB 34 (step S420). Further, the anonymous processing device 1 acquires a frequent pattern for each attribute of the target data to be anonymized from the frequent pattern DB 33 (step S430). If a decrease coefficient or a frequent pattern is stored in association with a company name, a decrease coefficient or a frequent pattern that matches the company name is acquired. That is, the reduction coefficient and the frequent pattern obtained from the anonymous data used by the company in the past are acquired.

また、匿名処理装置１は、取得した減少係数と対象データの全体数に基づいて、対象データを匿名化した場合に匿名性を満たす可能性が高い区分数を求める(ステップＳ４４０
Ａ)。 Moreover, the anonymous processing apparatus 1 calculates | requires the number of divisions with high possibility of satisfy | filling anonymity, when object data is anonymized based on the acquired reduction coefficient and the whole number of object data (step S440).
A).

このように図２０の処理によれば、減少係数と対象データの全体数に基づき、匿名化を行った時の減少量が、全体数を超えないように区分数を定めたことにより、無駄に匿名化処理を試行することが無くなり、匿名化処理の効率化が図れる。また、頻出パターンに基づき、頻出する区分を用いて匿名化を行うことで、匿名化処理を行った時の最少出現数が
小さくなりすぎて、匿名性を満たさなくなることが避けられるので、匿名化処理の効率化が図れる。 As described above, according to the processing of FIG. 20, the number of categories is determined based on the reduction coefficient and the total number of target data so that the amount of reduction when anonymization is performed does not exceed the total number. The anonymization process is not tried and the anonymization process can be made more efficient. In addition, anonymization is performed based on frequent patterns, and anonymization is avoided because the minimum number of appearances when anonymization processing is performed becomes too small and anonymity is not satisfied. Processing efficiency can be improved.

〈実施形態３〉
本実施形態３では、複数の事業者間で匿名情報を比較するために統一した区分数で匿名化を行う例を示している。実施形態３は、前述の実施形態２と比べて、統一した区分数で匿名化を行うための構成が異なり、他の構成は同じであるため、同一の要素は同符号を付す等して再度の説明を省略する。
複数の事業者間でデータを比較する場合、同じ属性で匿名化しなければならないが、互いが所有している個人情報がどのようなものかが分らないため、どの程度の区分数であれば共通の属性で匿名化できるのかが互いに分らなかった。このため無駄に試行を繰り返すことになり、匿名化処理の効率が悪かった。そこで、本実施形態３の減少係数算出装置２は、複数事業者からの匿名情報に基づいて、共通の属性で匿名化が行える可能性が高い区分数を推定して各事業者に通知する。 <Embodiment 3>
In the third embodiment, an example is shown in which anonymization is performed with a uniform number of divisions in order to compare anonymous information among a plurality of business operators. The third embodiment is different from the second embodiment in the configuration for anonymization with a uniform number of divisions, and the other configurations are the same. The description of is omitted.
When comparing data between multiple operators, it must be anonymized with the same attributes, but since it does not know what personal information each other owns, how many divisions are common I didn't know if it was possible to anonymize with the attributes. For this reason, trials were repeated unnecessarily, and the efficiency of the anonymization process was poor. Therefore, the reduction coefficient calculation apparatus 2 according to the third embodiment estimates the number of divisions that are likely to be anonymized with a common attribute based on anonymous information from a plurality of operators and notifies each operator.

図２１は、減少係数算出の処理の説明図である。減少係数算出装置２は、先ず複数の事業者から匿名情報を取得する(ステップＳ１５０Ｂ)。 FIG. 21 is an explanatory diagram of the reduction coefficient calculation process. The reduction coefficient calculation device 2 first acquires anonymous information from a plurality of business operators (step S150B).

次に減少係数算出装置２は、取得した属性パターン８３のうち、各属性について、区分数の異なる複数の区分数及び最少出現数の組み合わせを求め(ステップＳ１６０Ｂ)、これらの区分数及び最少出現数に基づいて累乗近似式１を減少係数として求め、この累乗近似式１に基づいて、最少出現数が基準値以上となる区分数の下限値を求めて減少係数ＤＢ３４に記憶させる(ステップＳ１７０Ｂ)。 Next, the reduction coefficient calculation apparatus 2 obtains a combination of a plurality of division numbers and minimum appearance numbers having different numbers for each attribute in the acquired attribute pattern 83 (step S160B), and the division number and the minimum appearance number. Based on this, the power approximation formula 1 is obtained as a reduction coefficient, and based on this power approximation formula 1, the lower limit value of the number of divisions whose minimum occurrence number is equal to or greater than the reference value is obtained and stored in the reduction coefficient DB 34 (step S170B).

また、減少係数算出装置２は、次のデータがあるか否かを判定し(ステップＳ１８０)、次のデータがあればステップＳ１５０Ｂに戻り、次のデータがなければ、各事業者における区分数の下限値のうち、最も小さい区分数を共通の区分数、即ち共通の属性で匿名化が行える可能性が高い区分数とし(ステップＳ１９０Ｂ)、各事業者の端末に通知する(ステ
ップＳ２００Ｂ)。 Further, the reduction coefficient calculation device 2 determines whether or not there is the next data (step S180 ) . If there is the next data, the process returns to step S150B. Among the lower limit values, the smallest division number is set to the common division number, that is, the division number that is highly likely to be anonymized with a common attribute (step S190B), and is notified to the terminal of each operator (step S200B).

そして、各事業者が、通知された区分数で前述の匿名化を行うことで、無駄に試行を繰り返すことなく匿名化を行うことができる。
このように本実施形態３によれば、複数事業者からの匿名情報に基づいて、効率良く共通の属性で匿名化を行うことができる。 And each provider can perform anonymization without repeating trials unnecessarily by performing the above-mentioned anonymization with the notified number of divisions.
Thus, according to the third embodiment, anonymization can be efficiently performed with a common attribute based on anonymous information from a plurality of business operators.

〈その他〉
本発明は、上述の図示例にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。例えば、前記実施形態２，３では、減少係数として累乗近似式を用いたが、これに代えて多項近似式や指数近似式等の近似式を用いても良い。 <Others>
The present invention is not limited to the illustrated examples described above, and various modifications can be made without departing from the scope of the present invention. For example, in the second and third embodiments, the power approximation formula is used as the reduction coefficient, but an approximation formula such as a polynomial approximation formula or an exponential approximation formula may be used instead.

１匿名処理装置
２減少係数算出装置
１０匿名化システム
１１データ受付部
１２区分数取得部
１３係数取得部
１４可能性判定部
１５匿名化部
１６検定部
１７縦列登録部
１８データ出力部
２１匿名情報取得部
２２出現数取得部
２３係数算出部
３１匿名結果ＤＢ
３２匿名情報縦列ＤＢ
３３頻出パターンＤＢ
３４減少係数ＤＢ DESCRIPTION OF SYMBOLS 1 Anonymous processing apparatus 2 Decrease coefficient calculation apparatus 10 Anonymization system 11 Data reception part 12 Division number acquisition part 13 Coefficient acquisition part 14 Possibility determination part 15 Anonymization part 16 Test part 17 Column registration part 18 Data output part 21 Anonymity information acquisition Unit 22 Appearance count acquisition unit 23 Coefficient calculation unit 31 Anonymous result DB
32 Anonymous Information Column DB
33 Frequent pattern DB
34 Reduction factor DB

Claims

Anonymous information acquisition unit that acquires anonymous information obtained by anonymizing personal information;
The number of occurrences for obtaining the minimum number of occurrences of the word for a plurality of the number of divisions different from the number of divisions, for each of the types of words that can be attributed to the attribute constituting anonymous information
A coefficient calculation unit for obtaining a reduction coefficient indicating a reduction amount of the minimum occurrence number when the number of divisions is increased based on a combination of the plurality of division numbers and the minimum occurrence number different from each other;
A number-of-sections acquisition unit that acquires the number of sections when anonymizing personal information to be anonymized ,
Based on the reduction coefficient and the number of classifications when anonymizing, there is a possibility that the amount of decrease in the minimum number of appearances when anonymizing the personal information with the number of classifications when anonymizing exceeds a predetermined reference value. A possibility determination unit;
Anonymizing the personal information when the possibility is high, and anonymizing the anonymization of the personal information when the possibility is low,
An anonymous processing device comprising:

Anonymous information acquisition unit that acquires anonymous information obtained by anonymizing personal information;
The number of occurrences for obtaining the minimum number of occurrences of the word for a plurality of the number of divisions different from the number of divisions, for each of the types of words that can be attributed to the attribute constituting anonymous information
A coefficient calculation unit for obtaining a reduction coefficient indicating a reduction amount of the minimum occurrence number when the number of divisions is increased based on a combination of the plurality of division numbers and the minimum occurrence number different from each other;
A reception unit that accepts personal information to be anonymized;
Based on the reduction coefficient and the total number of personal information, the number-of-segments calculation unit for obtaining the number of categories in which the amount of decrease in the minimum number of appearances when the personal information is anonymized does not exceed a predetermined reference value;
An anonymization unit that anonymizes the personal information by the number of divisions;
An anonymous processing device comprising:

The anonymous processing apparatus according to claim 1, wherein the reduction coefficient is obtained as a linear approximation formula, a polynomial approximation formula, an exponential approximation formula, or a power approximation formula.

Acquiring anonymous information obtained by anonymizing personal information;
A step of obtaining the number of divisions for each type of words that can be attributed to the anonymous information, and obtaining the minimum number of occurrences of each word;
A step of obtaining a reduction coefficient indicating a decrease amount of the minimum number of occurrences when the number of divisions is increased based on a combination of a plurality of the number of divisions and the minimum number of occurrences having different numbers of the divisions and storing them in a storage unit When,
Obtaining the number of sections when anonymizing personal information to be anonymized,
Based on the decrease coefficient stored in the storage means and the number of sections when the anonymization is performed, the amount of decrease in the minimum number of appearances when the personal information is anonymized by the number of sections when anonymized is a predetermined reference Determining the possibility of exceeding the value;
Performing anonymization of the personal information when the possibility is high, and stopping anonymization of the personal information when the possibility is low;
Anonymous processing method that the computer executes.

Acquiring anonymous information obtained by anonymizing personal information;
A step of obtaining the number of divisions for each type of words that can be attributed to the anonymous information, and obtaining the minimum number of occurrences of each word;
A step of obtaining a reduction coefficient indicating a decrease amount of the minimum number of occurrences when the number of divisions is increased based on a combination of a plurality of the number of divisions and the minimum number of occurrences having different numbers of the divisions and storing them in a storage unit When,
Receiving personal information to be anonymized;
Based on the reduction coefficient stored in the storage means and the total number of the personal information, obtaining the number of divisions in which the reduction amount of the minimum number of appearances when the personal information is anonymized does not exceed a predetermined reference value;
Anonymizing the personal information by the number of sections;
Anonymous processing method that the computer executes.

The anonymous processing method according to claim 4 or 5, wherein the reduction coefficient is obtained as a linear approximation formula, a polynomial approximation formula, an exponential approximation formula, or a power approximation formula.

Acquiring anonymous information obtained by anonymizing personal information;
A step of obtaining the number of divisions for each type of words that can be attributed to the anonymous information, and obtaining the minimum number of occurrences of each word;
A step of obtaining a reduction coefficient indicating a decrease amount of the minimum number of occurrences when the number of divisions is increased based on a combination of a plurality of the number of divisions and the minimum number of occurrences having different numbers of the divisions and storing them in a storage unit When,
Obtaining the number of sections when anonymizing personal information to be anonymized,
Based on the decrease coefficient stored in the storage means and the number of sections when the anonymization is performed, the amount of decrease in the minimum number of appearances when the personal information is anonymized by the number of sections when anonymized is a predetermined reference Determining the possibility of exceeding the value;
Performing anonymization of the personal information when the possibility is high, and stopping anonymization of the personal information when the possibility is low;
Anonymity processing program to make computer execute.

Acquiring anonymous information obtained by anonymizing personal information;
A step of obtaining the number of divisions for each type of words that can be attributed to the anonymous information, and obtaining the minimum number of occurrences of each word;
A step of obtaining a reduction coefficient indicating a decrease amount of the minimum number of occurrences when the number of divisions is increased based on a combination of a plurality of the number of divisions and the minimum number of occurrences having different numbers of the divisions and storing them in a storage unit When,
Receiving personal information to be anonymized;
Based on the reduction coefficient stored in the storage means and the total number of the personal information, obtaining the number of divisions in which the reduction amount of the minimum number of appearances when the personal information is anonymized does not exceed a predetermined reference value;
Anonymizing the personal information by the number of sections;
Anonymity processing program to make computer execute.

The anonymous processing program according to claim 7 or 8, wherein the reduction coefficient is obtained as a linear approximation formula, a polynomial approximation formula, an exponential approximation formula, or a power approximation formula.