JP6334915B2

JP6334915B2 - Anonymization system

Info

Publication number: JP6334915B2
Application number: JP2013270368A
Authority: JP
Inventors: 秀暢小栗
Original assignee: 富士通クラウドテクノロジーズ株式会社
Priority date: 2013-12-26
Filing date: 2013-12-26
Publication date: 2018-05-30
Anticipated expiration: 2033-12-26
Also published as: JP2015125646A

Description

本発明は、個人情報を匿名化又は多様化して利用する技術に関する。 The present invention relates to a technique for using personal information by making it anonymous or diversified.

情報処理技術の発展に伴い、日常の多くの場面で情報が収集され、この収集された情報を用いた処理が行われている。例えば、消費者が店舗の会員となって商品を購入する場合、会員登録時に消費者の氏名、年齢、性別、住所、メールアドレス等を登録することが多い。そして、消費者が商品を購入すると、店舗側のシステムが、この消費者と購入した商品の情報を対応付けて記録する。このように購入した商品の情報を蓄積して分析すると、当該消費者の嗜好が推定でき、この消費者が好む新商品が発売されたような場合にダイレクトメールを発送するといったサービスを行うことができる。更に、多くの消費者の情報について分析することで、２０代女性の好む商品や関東エリアで好まれる商品といった情報を導くことができ、マーケティング等に利用される。 With the development of information processing technology, information is collected in many everyday situations, and processing using the collected information is performed. For example, when a consumer purchases a product as a member of a store, the consumer's name, age, gender, address, e-mail address, etc. are often registered at the time of membership registration. When a consumer purchases a product, the store-side system records the consumer and the purchased product information in association with each other. By accumulating and analyzing information on purchased products in this way, it is possible to estimate the consumer's preferences and perform a service such as sending a direct mail when a new product preferred by the consumer is released. it can. Furthermore, by analyzing information of many consumers, information such as products preferred by women in their 20s and products preferred in the Kanto area can be derived and used for marketing and the like.

これらの情報は、当該店舗だけでなく、商品を製造するメーカや、他の企業にとっても利用価値が高く、例えば広告やクーポン等のレコメンドに用いたいという要求があった。 Such information has high utility value not only for the store but also for the manufacturer of the product and other companies, and there has been a demand to use it for recommendations such as advertisements and coupons.

しかし、店舗が有する消費者の個人情報を各消費者の許諾を得ずに、他者へ提供することはできない。このため、上記消費者に関する情報を他者へ提供する場合には、個人を特定できないように、匿名化する必要がある。 However, the consumer's personal information in the store cannot be provided to others without obtaining the consent of each consumer. For this reason, when providing information related to the consumer to others, it is necessary to anonymize so that individuals cannot be identified.

従来の匿名化方法には、氏名や電話番号のように個人を直接特定する情報を削除することで匿名化を行うものがあるが、これだけでは不十分な場合がある。例えば、年齢が記載されている会員リストに２５歳の人が一人だけであると、２５歳の知人がその会員であることを知った時点で、その人を特定できることになる。即ち、２５歳の会員という属性を持つ人が一人だけであると、他の情報と照らし合わせることで、間接的に個人を特定できる可能性が高い。 Some conventional anonymization methods perform anonymization by deleting information that directly identifies an individual such as a name and a telephone number, but this alone may not be sufficient. For example, if there is only one person 25 years old in the member list in which the age is described, the person can be identified when he / she knows that the 25-year-old acquaintance is the member. That is, if there is only one person with the attribute of a 25-year-old member, there is a high possibility that an individual can be indirectly identified by comparing with other information.

そこで、会員リストの年齢の記載を１０歳区切りに抽象化し、２０代が３人のように同じ属性を持つ人が複数人となるようにすれば、３人のうちの誰であるかを特定できなくなる。このように、個人情報を他の事業者へ提供する場合は、直接的に個人を特定できないことは勿論、間接的にも個人を特定できないよう充分な匿名化を行うことが望まれている。 Therefore, if the age description in the member list is abstracted into 10-year breaks, and there are multiple people with the same attribute, such as three in their 20s, who of the three is identified become unable. As described above, when providing personal information to other business operators, it is desired that anonymization is sufficiently performed so that an individual cannot be identified indirectly, as well as an individual cannot be identified directly.

また、匿名情報の重要度等に応じて、匿名情報にアクセス出来る権限のランクを設定し、このランク以上の権限を有する人には、匿名情報へのアクセスを許可し、このランク以上の権限がない人には匿名情報へのアクセスを許可しないようにアクセス管理を行うことがある。 Also, depending on the importance of anonymous information, etc., set the rank of authority that can access anonymous information, and for those who have authority over this rank, allow access to anonymous information, and authority over this rank Access management may be performed so that no one is allowed access to anonymous information.

特開２００３−１９６３９１号公報JP 2003-196391 A 特開２００３−２３３５５１号公報JP 2003-233551A 特開２００５−１００４０８号公報Japanese Patent Laid-Open No. 2005-100408 特開２００４−０８６３８３号公報JP 2004-086383 A 特開２００５−３４６２４８号公報JP 2005-346248 A

充分な匿名化を行うために各項目の値を抽象化し過ぎると、例え匿名性を満たしても利用価値の無いデータとなってしまうことがある。例えば、ファッションの傾向を知るためにデータを利用する場合、年齢の項目は重要であり、匿名化のために年齢の項目を抽象化し過ぎると、ファッションに関するマーケティングデータとしての利用価値は無くなってしまう。また、匿名性を満たすため、単に同じ属性を持つ人が複数人となるように年齢の項目を区切って抽象化した結果、例えば１７歳以上２２歳未満のような区切りでグループが作成されると、同一グループに成年と未成年が混在したり、高校生と社会人が混在したりすることになり、嗜好や生活スタイルが大きく異なる人の情報が混在してしまい統計情報やマーケティング情報としての利用価値が無くなってしまう。 If the value of each item is excessively abstracted in order to perform sufficient anonymization, even if the anonymity is satisfied, there may be data having no utility value. For example, when data is used to know the trend of fashion, the age item is important, and if the age item is excessively abstracted for anonymization, the use value as fashion marketing data is lost. In addition, in order to satisfy anonymity, as a result of separating and abstracting age items so that there are multiple people with the same attribute, for example, when a group is created with a break such as 17 to 22 years old Since the same group is mixed with adults and minors, high school students and working adults are mixed, information on people with very different tastes and lifestyles is mixed, and the use value as statistical information and marketing information Will disappear.

そこで本出願人は、複数の抽象化候補を作成し、各抽象化候補の価値を求めて、価値の高い抽象化候補を匿名情報として選択することで、自動的に利用価値の高い匿名情報が得られるようにした匿名化システムを提案している。 Therefore, the applicant creates a plurality of abstraction candidates, obtains the value of each abstraction candidate, and selects a high-value abstraction candidate as anonymous information, so that anonymous information with high utility value is automatically obtained. We propose an anonymization system that can be obtained.

一方、アクセス管理を行うためには、管理者が、利用者のアクセス権限のランクについて、各利用者の所属や立場、契約等に応じて、予め決定していた。また、匿名情報のアクセス権限のランクについても管理者が、当該匿名情報のジャンルや、重要度、抽象化の程度等に応じて決定していた。 On the other hand, in order to perform access management, the administrator determines the rank of the access authority of the user in advance according to the affiliation, position, contract, etc. of each user. Also, the rank of the access authority for anonymous information is determined by the administrator according to the genre, importance, and level of abstraction of the anonymous information.

このようにアクセス権の設定は、人手を要する負荷の高い処理である。従って上述のように自動的に利用価値の高い匿名情報が得られるようにした匿名化システムにおいて、複数の匿名情報が自動的に得られたとしても、各匿名情報に手動でアクセス権限のレベルを決定していたのでは、円滑に匿名情報を提供することができない。 Thus, the setting of access rights is a high-load process that requires manpower. Therefore, in the anonymization system in which anonymous information having high utility value is automatically obtained as described above, even if a plurality of anonymous information is automatically obtained, the level of access authority is manually set for each anonymous information. If it has been decided, anonymous information cannot be provided smoothly.

特に、様々な利用者にとって利用価値の高い匿名情報を提供できるように、抽象化の程度や抽象化する項目を変えて非常に多くの匿名情報を生成する場合、各々の匿名情報についてアクセル権限のレベルを手動で設定するのは現実的でないため、このように様々な利用者のニーズに特化した多種の匿名情報をアクセス管理することが出来なかった。 In particular, when generating a large amount of anonymous information by changing the degree of abstraction and the items to be abstracted so that anonymous information with high utility value can be provided for various users, the accelerator authority of each anonymous information Since it is not practical to set the level manually, it was not possible to access and manage various types of anonymous information specialized for the needs of various users.

そこで本発明は、匿名情報を構成する語の出現数に基づいてアクセス権限を求め、アクセス権限を自動で設定して適切にアクセス管理を行う技術を提供する。 Therefore, the present invention provides a technique for obtaining access authority based on the number of occurrences of words constituting anonymous information, and automatically setting access authority and appropriately managing access.

上記課題を解決するため、本発明の権限設定装置は、
匿名情報を取得する匿名情報取得部と、
前記匿名情報を構成する語の出現数を求める出現数取得部と、
前記匿名情報の出現数に基づいて当該匿名情報のアクセス権限を決定する権限決定部と、を備える。 In order to solve the above problems, the authority setting device of the present invention is
An anonymous information acquisition unit for acquiring anonymous information;
An appearance number obtaining unit for obtaining the number of appearances of words constituting the anonymous information;
An authority determining unit that determines an access authority for the anonymous information based on the number of appearances of the anonymous information.

前記権限設定装置は、前記匿名情報を構成する語の出現数のうち最少の出現数を最少出現数とし、前記匿名情報を構成する語の全数に対する前記最少出現数の割合を最少出現率とし、前記権限決定部が、前記最少出現率に基づいて当該匿名情報のアクセス権限を決定しても良い。 The authority setting device has the minimum number of appearances of the words constituting the anonymous information as the minimum number of appearances, and the ratio of the minimum number of appearances to the total number of words constituting the anonymous information as the minimum appearance rate, The authority determining unit may determine the access authority for the anonymous information based on the minimum appearance rate.

前記権限設定装置は、前記匿名情報の最少出現率と前記アクセス権限とを対応付けて記憶した権限記憶部を参照して、前記権限決定部が前記匿名情報の出現数に基づく前記アクセス権限を決定しても良い。 The authority setting device refers to an authority storage unit that associates and stores a minimum appearance rate of the anonymous information and the access authority, and the authority determining unit determines the access authority based on the number of appearances of the anonymous information. You may do it.

前記権限設定装置は、前記権限決定部が、前記匿名情報の最少出現率に応じて前記アクセス権限のランクを決定しても良い。 In the authority setting device, the authority determining unit may determine the rank of the access authority according to a minimum appearance rate of the anonymous information.

また、上記課題を解決するため、本発明の匿名化装置は、
匿名化の対象データを取得するデータ取得部と、
前記対象データを構成する複数の語の少なくとも一つを抽象化して抽象化候補データとする抽象化部と、
前記抽化候補データの項目の値の組み合わせが、前記対象データの一個人に限定されないことを条件として検定する検定部と、
前記検定の条件を満たした前記抽象化候補データを匿名情報として選択する選択部と、
前記匿名情報を構成する語の出現数を求める出現数取得部と、
前記匿名情報の出現数に基づいて当該匿名情報のアクセス権限を決定する権限決定部と、を備える。 Moreover, in order to solve the said subject, the anonymization apparatus of this invention is
A data acquisition unit for acquiring anonymization target data;
An abstraction unit that abstracts at least one of a plurality of words constituting the target data to be abstraction candidate data;
A test unit for testing on condition that the combination of the values of the items of the extraction candidate data is not limited to one individual of the target data;
A selection unit that selects the abstraction candidate data satisfying the test condition as anonymous information;
An appearance number obtaining unit for obtaining the number of appearances of words constituting the anonymous information;
An authority determining unit that determines an access authority for the anonymous information based on the number of appearances of the anonymous information.

前記匿名化装置は、前記匿名情報を構成する語の出現数のうち最少の出現数を最少出現数とし、前記匿名情報を構成する語の全数に対する前記最少出現数の割合を最少出現率とし、前記権限決定部が、前記最少出現率に基づいて当該匿名情報のアクセス権限を決定しても良い。 The anonymization device, the number of appearances of the words constituting the anonymous information is the minimum number of appearances, the ratio of the minimum number of appearances to the total number of words constituting the anonymous information is the minimum appearance rate, The authority determining unit may determine the access authority for the anonymous information based on the minimum appearance rate.

前記匿名化装置は、前記匿名情報の最少出現率と前記アクセス権限とを対応付けて記憶した権限記憶部を参照して、前記権限決定部が前記匿名情報の出現数に基づく前記アクセス権限を決定しても良い。 The anonymization device refers to an authority storage unit that associates and stores a minimum appearance rate of the anonymous information and the access authority, and the authority determination unit determines the access authority based on the number of appearances of the anonymous information. You may do it.

前記匿名化装置は、前記権限決定部が、前記匿名情報の最少出現率に応じて前記アクセス権限のランクを決定しても良い。 In the anonymization device, the authority determining unit may determine the rank of the access authority according to a minimum appearance rate of the anonymous information.

また、上記課題を解決するため、本発明の匿名化システムは、
匿名化の対象データを取得するデータ取得部と、
前記対象データを構成する複数の語の少なくとも一つを抽象化して抽象化候補データとする抽象化部と、
前記抽化候補データの項目の値の組み合わせが、前記対象データの一個人に限定されないことを条件として検定する検定部と、
前記検定の条件を満たした前記抽象化候補データを匿名情報として選択する選択部と、
前記匿名情報を構成する語の出現数を求める出現数取得部と、
前記匿名情報の出現数に基づいて当該匿名情報のアクセス権限を決定する権限決定部と、
ユーザの端末から前記匿名情報へのアクセス要求を受けた場合に、当該ユーザのアクセス権限と当該匿名情報のアクセス権限とを比較し、当該ユーザのアクセス権限が当該匿名情報のアクセスに必要なアクセス権限とを比較し、レベルと対応する匿名レベルの匿名情報へのアクセスを許可するアクセス制御部と、
を備える。 Moreover, in order to solve the said subject, the anonymization system of this invention is
A data acquisition unit for acquiring anonymization target data;
An abstraction unit that abstracts at least one of a plurality of words constituting the target data to be abstraction candidate data;
A test unit for testing on condition that the combination of the values of the items of the extraction candidate data is not limited to one individual of the target data;
A selection unit that selects the abstraction candidate data satisfying the test condition as anonymous information;
An appearance number obtaining unit for obtaining the number of appearances of words constituting the anonymous information;
An authority determining unit that determines the access authority of the anonymous information based on the number of appearances of the anonymous information;
When an access request to the anonymous information is received from the user's terminal, the access authority of the user is compared with the access authority of the anonymous information, and the access authority of the user is necessary for accessing the anonymous information. And an access control unit that permits access to anonymous information of the anonymous level corresponding to the level,
Is provided.

前記匿名化システムは、前記匿名情報を構成する語の出現数のうち最少の出現数を最少出現数とし、前記匿名情報を構成する語の全数に対する前記最少出現数の割合を最少出現率とし、前記権限決定部が、前記最少出現率に基づいて当該匿名情報のアクセス権限を決定しても良い。 The anonymization system is defined as the minimum number of appearances of the words constituting the anonymous information, the minimum number of appearances, and the ratio of the minimum number of appearances to the total number of words constituting the anonymous information as the minimum appearance rate, The authority determining unit may determine the access authority for the anonymous information based on the minimum appearance rate.

前記匿名化システムは、前記匿名情報の最少出現率と前記アクセス権限とを対応付けて記憶した権限記憶部を参照して、前記権限決定部が前記匿名情報の出現数に基づく前記アクセス権限を決定しても良い。 The anonymization system refers to an authority storage unit that associates and stores a minimum appearance rate of the anonymous information and the access authority, and the authority determination unit determines the access authority based on the number of appearances of the anonymous information. You may do it.

前記匿名化システムは、前記権限決定部が、前記匿名情報の最少出現率に応じて前記アクセス権限のランクを決定しても良い。 In the anonymization system, the authority determining unit may determine the rank of the access authority according to a minimum appearance rate of the anonymous information.

また、上記課題を解決するため、本発明の権限設定方法は、
匿名情報を取得するステップと、
前記匿名情報を構成する語の出現数を求めるステップと、
前記匿名情報の出現数に基づいて当該匿名情報のアクセス権限を決定するステップと、をコンピュータが実行する。 In addition, in order to solve the above problem, the authority setting method of the present invention includes:
Obtaining anonymous information;
Determining the number of occurrences of words constituting the anonymous information;
The computer executes a step of determining an access authority for the anonymous information based on the number of appearances of the anonymous information.

前記権限設定方法において、前記コンピュータは、前記匿名情報を構成する語の出現数のうち最少の出現数を最少出現数とし、前記匿名情報を構成する語の全数に対する前記最少出現数の割合を最少出現率とし、前記最少出現率に基づいて当該匿名情報のアクセス権限を決定しても良い。 In the authority setting method, the computer sets the minimum number of appearances of the words constituting the anonymous information as the minimum number of appearances, and minimizes the ratio of the minimum number of appearances to the total number of words constituting the anonymous information. The access rate for the anonymous information may be determined based on the minimum appearance rate.

前記権限設定方法において、前記コンピュータは、前記匿名情報の最少出現率と前記アクセス権限とを対応付けて記憶した権限記憶部を参照して、前記匿名情報の出現数に基づく前記アクセス権限を決定しても良い。 In the authority setting method, the computer determines the access authority based on the number of appearances of the anonymous information with reference to an authority storage unit that stores the minimum appearance rate of the anonymous information and the access authority in association with each other. May be.

前記権限設定方法において、前記コンピュータは、前記匿名情報の最少出現率に応じて前記アクセス権限のランクを決定しても良い。 In the authority setting method, the computer may determine the rank of the access authority according to a minimum appearance rate of the anonymous information.

また、本発明は、上記権限設定方法をコンピュータに実行させるための権限設定プログラムであっても良い。更に、前記権限設定プログラムは、コンピュータが読み取り可能な記憶媒体に記録されていても良い。 Further, the present invention may be an authority setting program for causing a computer to execute the authority setting method. Further, the authority setting program may be recorded on a computer-readable storage medium.

ここで、コンピュータが読み取り可能な記憶媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータから読み取ることができる記憶媒体をいう。このような記憶媒体の内コンピュータから取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、CD-ROM、CD-R/W、DVD、DAT、８mmテープ、メモリカード等がある。また、コンピュータに固定された記憶媒体としてハードディスクやＲＯＭ（リードオンリーメモリ）等がある。 Here, the computer-readable storage medium refers to a storage medium that stores information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from the computer. . Examples of such storage media that can be removed from the computer include a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R / W, a DVD, a DAT, an 8 mm tape, and a memory card. Further, there are a hard disk, a ROM (read only memory) and the like as a storage medium fixed to the computer.

本発明は、匿名情報を構成する語の出現数に基づいてアクセス権限を求め、アクセス権限を自動で設定して適切にアクセス管理を行う技術を提供できる。 INDUSTRIAL APPLICABILITY The present invention can provide a technique for obtaining access authority based on the number of appearances of words constituting anonymous information, automatically setting access authority, and appropriately managing access.

図１は、匿名化の説明図である。FIG. 1 is an explanatory diagram of anonymization. 図２は、多様化の説明図である。FIG. 2 is an explanatory diagram of diversification. 図３は、匿名化システムの機能ブロック図である。FIG. 3 is a functional block diagram of the anonymization system. 図４は、個人情報ＤＢの一例を示す図である。FIG. 4 is a diagram illustrating an example of the personal information DB. 図５は、匿名情報ＤＢに記憶される匿名情報の一例を示す図である。FIG. 5 is a diagram illustrating an example of anonymous information stored in the anonymous information DB. 図６は、匿名情報へのアクセスを管理するための情報の一例を示す図である。FIG. 6 is a diagram illustrating an example of information for managing access to anonymous information. 図７は匿名化装置のハードウェア構成を示す図である。FIG. 7 is a diagram illustrating a hardware configuration of the anonymization device. 図８は管理サーバのハードウェア構成を示す図である。FIG. 8 is a diagram illustrating a hardware configuration of the management server. 図９は、ユーザ管理ＤＢが記憶するユーザ管理情報の一例を示す図である。FIG. 9 is a diagram illustrating an example of user management information stored in the user management DB. 図１０は、匿名化装置がプログラムに従って実行する匿名化方法の概略を示した説明図である。FIG. 10 is an explanatory diagram showing an outline of the anonymization method executed by the anonymization device according to the program. 図１１は、匿名化の処理を示す図である。FIG. 11 is a diagram illustrating anonymization processing. 図１２は匿名化に用いる辞書の説明図である。FIG. 12 is an explanatory diagram of a dictionary used for anonymization. 図１３は匿名化に用いる辞書の説明図である。FIG. 13 is an explanatory diagram of a dictionary used for anonymization. 図１４は匿名化に用いる辞書の説明図である。FIG. 14 is an explanatory diagram of a dictionary used for anonymization. 図１５は、抽象化候補データの説明図である。FIG. 15 is an explanatory diagram of abstraction candidate data. 図１６は、対象データにおける年齢の項目の一部の例を示す図である。FIG. 16 is a diagram illustrating an example of a part of the age item in the target data. 図１７は、年齢について取得する価値データの一例を示す図である。FIG. 17 is a diagram illustrating an example of value data acquired for age. 図１８は、年齢の項目の価値を示す図である。FIG. 18 is a diagram showing the value of the age item. 図１９は、年齢の項目の価値を示す図である。FIG. 19 is a diagram illustrating the value of the age item. 図２０は、抽象化候補データにおける年齢の項目の一部の例を示す図である。FIG. 20 is a diagram illustrating an example of a part of the age item in the abstraction candidate data. 図２１は、年代について取得する各ワードの価値データの一例を示す図である。FIG. 21 is a diagram illustrating an example of value data of each word acquired for the age. 図２２は、年代の項目の価値を示す図である。FIG. 22 is a diagram showing the value of the item of the age. 図２３は、年齢の項目の価値を示す図である。FIG. 23 is a diagram illustrating the value of the age item. 図２４は、匿名化装置が匿名情報の公開条件を確認する処理を示す図である。FIG. 24 is a diagram illustrating processing in which the anonymization device confirms the disclosure condition of anonymous information. 図２５Ａは、権限設定ＤＢの一例を示す図である。FIG. 25A is a diagram illustrating an example of the authority setting DB. 図２５Ｂは、権限設定ＤＢの一例を示す図である。FIG. 25B is a diagram illustrating an example of the authority setting DB. 図２６は、公開条件ＤＢの一例を示す図である。FIG. 26 is a diagram illustrating an example of the disclosure condition DB. 図２７は、アクセス権限を設定する処理の具体例を示す図である。FIG. 27 is a diagram illustrating a specific example of processing for setting access authority. 図２８は、管理サーバによるアクセス管理方法の説明図である。FIG. 28 is an explanatory diagram of an access management method by the management server. 図２９は、匿名化システムの機能ブロック図である。FIG. 29 is a functional block diagram of the anonymization system. 図３０は、辞書ＤＢの例を示す図である。FIG. 30 is a diagram illustrating an example of the dictionary DB. 図３１は、優先度ＤＢの例を示す図である。FIG. 31 is a diagram illustrating an example of the priority DB. 図３２は、共通ＤＢの例を示す図である。FIG. 32 is a diagram illustrating an example of a common DB. 図３３は、個人情報ＤＢの例を示す図である。FIG. 33 is a diagram illustrating an example of the personal information DB. 図３４は、匿名情報ＤＢに記憶される匿名情報の一例を示す図である。FIG. 34 is a diagram illustrating an example of anonymous information stored in the anonymous information DB. 図３５は、匿名情報へのアクセスを管理するための情報の一例を示す図である。FIG. 35 is a diagram illustrating an example of information for managing access to anonymous information. 図３６は、管理サーバ２０のハードウェア構成を示す図である。FIG. 36 is a diagram illustrating a hardware configuration of the management server 20. 図３７は、匿名化装置１０のハードウェア構成を示す図である。FIG. 37 is a diagram illustrating a hardware configuration of the anonymization device 10. 図３８は、管理サーバ２０が統合匿名化辞書を作成する処理の説明図である。FIG. 38 is an explanatory diagram of processing in which the management server 20 creates an integrated anonymization dictionary. 図３９は、匿名化辞書を統合する処理の説明図である。FIG. 39 is an explanatory diagram of processing for integrating anonymization dictionaries. 図４０は、図１１の処理によって作成される各次元の説明図である。FIG. 40 is an explanatory diagram of each dimension created by the processing of FIG. 図４１は、複数の次元の説明図である。FIG. 41 is an explanatory diagram of a plurality of dimensions. 図４２は、図１３に示した次元に含まれる各ワードに重み付けをした例を示す図である。FIG. 42 is a diagram showing an example in which each word included in the dimension shown in FIG. 13 is weighted. 図４３は、各ワードの重みを集計して各次元の優先度を求める処理の説明図である。FIG. 43 is an explanatory diagram of a process for calculating the priority of each dimension by adding up the weights of the respective words. 図４４は、Ａ社における匿名化の例を示す図である。FIG. 44 is a diagram illustrating an example of anonymization in Company A. 図４５は、Ｂ社における匿名化の例を示す図である。FIG. 45 is a diagram illustrating an example of anonymization in the B company.

以下、図面を参照して本発明を実施するための形態について説明する。以下の実施の形態の構成は例示であり、本発明は実施の形態の構成に限定されない。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. The configuration of the following embodiment is an exemplification, and the present invention is not limited to the configuration of the embodiment.

〈実施形態１〉
§１．匿名化
図１はk−匿名化の説明図であり、図１（Ａ）は、姓、年齢、性別の項目を含む会員情
報から姓の項目を削除した例を示す。 <Embodiment 1>
§1. Anonymization FIG. 1 is an explanatory diagram of k-anonymization, and FIG. 1A shows an example in which the last name item is deleted from the member information including the last name, age, and sex items.

図１（Ａ）に示すように年齢が記載されている会員情報に１６歳の女性が一人だけであると、１６歳の女性が、この会員であることが分かった時点で、その人を特定できる。即ち、１６歳・女性という属性を持つ人が一人だけであると、他の情報と照らし合わせることで、個人を特定できる可能性がある。 As shown in Fig. 1 (A), if there is only one 16-year-old woman in the member information in which the age is described, when the 16-year-old woman is found to be this member, the person is identified. it can. That is, if there is only one person with the attribute of 16 years old and female, there is a possibility that an individual can be identified by comparing with other information.

図１（Ｂ）では、会員リストの年齢の記載を抽象化し、０代（１０歳未満）、１０代、２０代のように年代別とした。しかし、この場合でも１０代女性は一人だけであり、図１（Ａ）と同様に個人が特定できてしまい匿名化としては不十分である。 In FIG. 1 (B), the description of the age in the member list is abstracted and classified by age, such as 0's (under 10 years), 10's, and 20's. However, even in this case, there is only one female teenager, and an individual can be identified as in FIG. 1A, which is insufficient for anonymization.

そこで、図１（Ｃ）では、更に抽象化し、１０代以下（１９歳以下）と２０代のように年代の区切りを変更した。図１（Ｃ）の場合、１０代以下の女性が２人であり、［１０代以下］及び［女性］という属性が単一では無くなる。このため前述のように１６歳の女性が、この会員であることが分かったとしても、どちらが当該１６歳女性のデータであるかは特定できない。このように同じ属性を持つ人がｋ人（本例では２人）以上いる状態を、「k-匿名性」を満たすと称し、そのようにデータを加工することを「k-匿名化」と称する。 Therefore, in FIG. 1 (C), it was further abstracted and the age divisions were changed to those in their teens (under 19 years old) and those in their 20s. In the case of FIG. 1C, there are two women in their teens or less, and the attributes of “10 or less” and [female] are not single. For this reason, even if it turns out that a 16-year-old woman is this member as mentioned above, it cannot be specified which is the data of the 16-year-old woman. In this way, the state where there are more than k people (2 people in this example) with the same attribute is called “k-anonymity”, and processing such data as “k-anonymization” Called.

図２は、ｌ−多様化の説明図であり、ユーザ毎の利用駅のデータを抽象化し、ユーザ毎の利用駅が属する区のデータとした例を示す。 FIG. 2 is an explanatory diagram of l-diversification, and shows an example in which the data of the used station for each user is abstracted and used as data of the ward to which the used station for each user belongs.

抽象化前のデータでは、駅が特定されているために、住居が新宿駅付近で勤務地が東京駅付近といったデータと照らし合わせることでユーザを特定できる可能性がある。このため利用駅を抽象化して、利用駅が属する区とすることで、新宿区内の駅と千代田区内の駅を利用するユーザが複数となり、利用者が特定されなくなる。このように「新宿区内の駅と千代田区内の駅を利用する」のように属性値がｌ種類の可能性を持つ状態を、「ｌ-多
様性」を満たすと称し、そのようにデータを加工することを「ｌ-多様化」と称する。 In the pre-abstraction data, since the station is specified, there is a possibility that the user can be specified by comparing the data such as the residence near Shinjuku Station and the work place near Tokyo Station. For this reason, by abstracting the use station and making it a ward to which the use station belongs, there are a plurality of users who use stations in Shinjuku ward and stations in Chiyoda ward, and the user is not specified. In this way, the state where the attribute value has the possibility of l types, such as “Use stations in Shinjuku ward and Chiyoda ward” is called “I-diversity” and data like that Is called “l-diversification”.

本実施形態１の匿名化システム１００は、この「k-匿名性」や「ｌ-多様性」を満たす
ように対象データを抽象化する、即ちデータの項目の値の組み合わせが、対象データの一個人に限定されないように抽象化することにより匿名化を行う。 The anonymization system 100 of the first embodiment abstracts the target data so as to satisfy the “k-anonymity” and “l-diversity”, that is, the combination of the values of the data items is one individual of the target data. Anonymization is performed by abstracting so that it is not limited to.

§２．システム構成
図３は、匿名化システムの機能ブロック図である。本実施形態１の匿名化システム１００は、個人情報の匿名化を行う匿名化装置１０や、匿名化装置１０で匿名化された匿名情報を記憶する匿名情報ＤＢ１４５、ユーザ端末３０からのアクセス要求を受信し、各ユーザのアクセス権限に応じて匿名情報を提供する管理サーバ２０を有する。 §2. System Configuration FIG. 3 is a functional block diagram of the anonymization system. The anonymization system 100 according to the first embodiment receives an access request from the anonymization device 10 that anonymizes personal information, the anonymous information DB 145 that stores anonymous information anonymized by the anonymization device 10, and the user terminal 30. The management server 20 receives and provides anonymous information according to the access authority of each user.

図３に示すように匿名化装置１０は、データ取得部１０１や、抽象化部１０２、検定部１０３、選択部１０４、価値判定部１０６、価値データ取得部１０７、ワードカテゴリ分析部１０８、ワード価値計算部１０９、出現数取得部１１１、権限決定部１１２、個人情報データベース（ＤＢ）１３１、公開条件ＤＢ１３２、検索情報蓄積ＤＢ１３３、一時処理ＤＢ１３４、権限設定ＤＢ（権限記憶部）１３５を備えている。 As shown in FIG. 3, the anonymization device 10 includes a data acquisition unit 101, an abstraction unit 102, a test unit 103, a selection unit 104, a value determination unit 106, a value data acquisition unit 107, a word category analysis unit 108, a word value. A calculation unit 109, an appearance number acquisition unit 111, an authority determination unit 112, a personal information database (DB) 131, a disclosure condition DB 132, a search information accumulation DB 133, a temporary processing DB 134, and an authority setting DB (authority storage unit) 135 are provided.

データ取得部１０１は、個人と対応付けられた複数の項目を含むデータ、即ち個人情報を匿名化の対象データとして取得する。例えば、データ取得部１０１は、ネットワークを介して他のコンピュータからデータを受信する、又はネットワークを介してデータベースから対象データを読み出す。また、データ取得部１０１は、イベント会場の来場者が記載
したアンケートや来場者から聞き取った個人情報をキーボード等から入力して個人情報ＤＢ１３１に記憶しておき、この個人情報を個人情報ＤＢ１３１からデータ取得部１０１が対象データとして読み出す。また、来場者の名刺やアンケートに記載された事項を読み取り、ＯＣＲ（Optical Character Recognition）により電子データとしても良いし、来場
者のＲＦ−ＩＤタグやＩＣチップ等から当該来場者の情報を取得しても良い。なお、データ取得部１０１は、匿名化の対象データだけでなく、事業者側で匿名化した匿名情報を取得しても良い。即ち、データ取得部１０１は、匿名情報取得部として機能しても良い。 The data acquisition unit 101 acquires data including a plurality of items associated with an individual, that is, personal information as anonymization target data. For example, the data acquisition unit 101 receives data from another computer via a network or reads target data from a database via the network. In addition, the data acquisition unit 101 inputs a questionnaire written by a visitor at the event site or personal information heard from the visitor from a keyboard or the like and stores it in the personal information DB 131, and stores this personal information from the personal information DB 131. The acquisition unit 101 reads out the target data. It is also possible to read the information on the visitor's business card or questionnaire and use it as electronic data by OCR (Optical Character Recognition), or to obtain information about the visitor from the visitor's RF-ID tag or IC chip. May be. The data acquisition unit 101 may acquire not only the anonymization target data but also anonymous information anonymized on the business side. That is, the data acquisition unit 101 may function as an anonymous information acquisition unit.

抽象化部１０２は、前記次元からなる統合匿名化辞書を参照し、前記対象データ中の項目の値である語を前記優先度に基づいて抽象化した語に替えて匿名化候補データを生成する。 The abstraction unit 102 refers to the integrated anonymization dictionary including the dimensions, and generates anonymization candidate data by replacing words that are values of items in the target data with words abstracted based on the priority. .

検定部１０３は、前記抽象化候補データの項目の値の組み合わせが、前記対象データの一個人に限定されないことを条件として検定する。例えば、検定部１０３は、抽象化候補データの項目の値の組み合わせが、ｋ−匿名性を満たすこと、或いはｌ−多様性を満たすことを条件として検定する。 The test unit 103 performs test on the condition that the combination of the item values of the abstraction candidate data is not limited to one individual of the target data. For example, the test | inspection part 103 tests on condition that the combination of the value of the item of abstraction candidate data satisfy | fills k-anonymity or l-diversity.

選択部１０４は、検定の条件を満たした抽象化候補データの価値に基づいて抽象化候補データを選択する。例えば、選択部１０４は、ｋ−匿名性やｌ−多様性を満たした抽象化候補データを価値が高い順に所定数選択する。また、選択部１０４は、ｋ−匿名性やｌ−多様性を満たした抽象化候補データのうち、最も価値が高い抽象化候補データを選択しても良い。 The selection unit 104 selects the abstraction candidate data based on the value of the abstraction candidate data that satisfies the test condition. For example, the selection unit 104 selects a predetermined number of abstraction candidate data satisfying k-anonymity and l-diversity in descending order of value. The selection unit 104 may select abstraction candidate data having the highest value among the abstraction candidate data satisfying k-anonymity and l-diversity.

価値判定部１０６は、抽象化候補データに含まれるワードの価値に基づいて当該抽象化候補データの価値を求める。 The value determination unit 106 obtains the value of the abstraction candidate data based on the value of the word included in the abstraction candidate data.

価値データ取得部１０７は、抽象化候補データに含まれるワードの価値データを検索情報蓄積ＤＢから取得（受信）する。また、価値データ取得部１０７は、検索情報蓄積ＤＢに前記ワードの価値データが登録されていない場合に、他の装置にリクエストし、取得した価値データを検索情報蓄積ＤＢに登録する機能（データリクエスト）や、定期的に他の装置を巡回して最新の価値データを取得し、検索情報蓄積ＤＢに登録されている価値データを更新する機能（データクローラ）を有する。本実施形態では、この価値データとして検索エンジン９０から各ワードの統計情報を受信する。ここで、各ワードの統計情報は、例えばＳＥＭの広告単価（クリック単価）や、クリック率、平均掲載順位、１日の表示回数、１日のクリック数等である。なお、価値の取得先は、検索エンジンに限らず、ウェブページやＳＮＳ等であっても良い。この場合、例えばウェブページやＳＮＳにおける各ワードの使用頻度を価値としても良い。 The value data acquisition unit 107 acquires (receives) word value data included in the abstraction candidate data from the search information storage DB. Further, the value data acquisition unit 107 makes a request to another device when the value data of the word is not registered in the search information storage DB, and registers the acquired value data in the search information storage DB (data request ) And periodically visit other devices to acquire the latest value data and update the value data registered in the search information storage DB (data crawler). In this embodiment, the statistical information of each word is received from the search engine 90 as this value data. Here, the statistical information of each word includes, for example, an SEM advertising unit price (unit price per click), a click rate, an average ranking, the number of display times per day, the number of clicks per day, and the like. Note that the value acquisition destination is not limited to a search engine, and may be a web page, an SNS, or the like. In this case, for example, the use frequency of each word in a web page or SNS may be used as the value.

ワードカテゴリ分析部１０８は、ウェブサイト等のデータを分析して、新規のワードや、当該ワードを抽象化したワード（カテゴリ）を求め、検索情報蓄積ＤＢに登録する。 The word category analysis unit 108 analyzes data on a website or the like to obtain a new word or a word (category) obtained by abstracting the word and registers it in the search information storage DB.

価値計算部１０９は、価値データ取得部１０７で取得したワードの価値に基づき、ワードの価値の年平均や月平均、週平均など、ワードの価値の統計情報を求める。 Based on the value of the word acquired by the value data acquisition unit 107, the value calculation unit 109 obtains statistical information on the value of the word, such as an annual average, a monthly average, or a weekly average of the word value.

出現数取得部１１１は、前記匿名情報を構成する語の出現数を求める。例えば、匿名情報において、一個人の情報を一つのデータ（１レコード）とし、同じ情報（語）が幾つ出現したかを出現数としてカウントする。なお、各個人の情報が、単一の項目からなる場合には、当該項目の値である語が同じもの毎に出現数をカウントし、複数の項目からなる場合には、当該複数の項目の値である語の組み合わせが同じもの毎に出現数をカウントする。 The appearance number acquisition unit 111 obtains the number of appearances of words constituting the anonymous information. For example, in anonymous information, information of one individual is regarded as one data (1 record), and how many times the same information (word) appears is counted as the number of appearances. If each person's information consists of a single item, the number of occurrences is counted for each item that has the same word as the value of that item. The number of occurrences is counted for each word combination that has the same value.

また、出現数取得部１１１は、前記匿名情報を構成する語の出現数のうち最少の出現数を最少出現数とし、前記匿名情報を構成する語の全数に対する前記最少出現数の割合を最少出現率として求める出現率取得部として機能しても良い。 Further, the appearance number acquisition unit 111 sets the minimum number of appearances of the words constituting the anonymous information as the minimum number of appearances, and sets the ratio of the minimum number of appearances to the total number of words constituting the anonymous information as the minimum number of appearances. You may function as an appearance rate acquisition part calculated | required as a rate.

権限決定部１１２は、前記匿名情報の出現数又はこの出現数に基づいて算出した最少出現率等の値に基づいて当該匿名情報のアクセス権限を決定し、匿名情報に付加して匿名情報ＤＢ１４５に記憶させる。権限決定部１１２は、例えば、匿名情報の出現数又は前記出現数に基づいて算出した出現率等の値と前記アクセス権限とを対応付けて記憶した権限記憶部を参照して、前記権限決定部が前記匿名情報の出現数に基づく前記アクセス権限を決定する。 The authority determining unit 112 determines the access authority of the anonymous information based on the number of appearances of the anonymous information or a value such as the minimum appearance rate calculated based on the number of appearances, and adds it to the anonymous information DB 145 in addition to the anonymous information. Remember me. The authority determining unit 112 refers to, for example, the authority determining unit that refers to the authority storage unit that stores the access authority in association with a value such as the number of appearances of anonymous information or the appearance rate calculated based on the number of appearances. Determines the access authority based on the number of appearances of the anonymous information.

図４は、個人情報ＤＢ１３１の一例を示す図である。個人情報ＤＢ１３１は、データ取得部１０１が他のコンピュータから受信した個人情報やキーボード等の匿名化前の個人情報を記憶している。図４の個人情報は、例えば、個人ＩＤ、年齢、住所、車名等を記憶している。 FIG. 4 is a diagram illustrating an example of the personal information DB 131. The personal information DB 131 stores personal information received by the data acquisition unit 101 from another computer and personal information before anonymization such as a keyboard. The personal information in FIG. 4 stores, for example, a personal ID, age, address, car name, and the like.

個人ＩＤは、会員番号やシリアル番号等の個人を識別するための識別情報であり、氏名や電話番号、メールアドレスであっても良い。 The personal ID is identification information for identifying an individual such as a member number or a serial number, and may be a name, a telephone number, or an e-mail address.

車名は、当該個人の車を識別する情報であり、名称、通称、愛称等である。なお、本願において車名は、年式や型番等の識別情報を含んでも良い。 The vehicle name is information for identifying the individual vehicle, and is a name, common name, nickname, or the like. In the present application, the vehicle name may include identification information such as year and model number.

公開条件ＤＢ１３２は、公開可能な匿名情報の条件を記憶し、例えば「最少出現数が３０以上の場合に公開可、但し外部公開不可」「キーワード＝○○○が含まれている場合、△月□日以降公開可」のように、公開可能な最少出現数や、社外への公開の可否、公開不可とするキーワードを公開条件として記憶している。なお、公開条件は、「辞書ＩＤ＝Ｄ1を用いた場合、最少出現数が５以上の場合に内部公開可、それ以下の場合は公開不可と
し、また、最少出現数が１０以上の場合に外部公開可、それ以下の場合は外部公開不可とする」といったように、匿名化に用いた辞書に応じて公開の可否を定める条件であっても良い。 The disclosure condition DB 132 stores the condition of anonymous information that can be disclosed, for example, “can be released when the minimum number of appearances is 30 or more, but cannot be disclosed to the outside”, “when keyword = XXX is included, “Minimum number of appearances that can be disclosed, availability of disclosure outside the company, and keywords that cannot be disclosed” are stored as disclosure conditions. The disclosure condition is “When dictionary ID = D1 is used, internal disclosure is possible when the minimum number of appearances is 5 or more, public disclosure is not possible when the minimum number of appearances is less than 5, and external when the minimum number of appearances is 10 or more. It may be a condition that determines whether or not to be open according to the dictionary used for anonymization, such as “open to the public or not to be open to the public if it is less than that”.

また、匿名情報ＤＢ１４５は、匿名化装置１０で匿名化した匿名情報を記憶している。匿名情報ＤＢ１４５は、匿名化前の個人情報が異なる匿名情報や匿名化に用いる辞書が異なる匿名情報等、複数の匿名情報を記憶すると共に、これらの匿名情報へのアクセスを管理するための情報を記憶する。 The anonymous information DB 145 stores anonymous information that has been anonymized by the anonymization device 10. The anonymous information DB 145 stores a plurality of anonymous information such as anonymous information with different personal information before anonymization and anonymous information with different dictionaries used for anonymization, and information for managing access to these anonymous information. Remember.

図５は、匿名情報ＤＢ１４５に記憶される匿名情報の一例を示す図である。匿名情報は、個人情報の各語を抽象化したものであり、図５の例では、年代、住所（都道府県名）、車種、最少出現数を対応付けて記憶している。 FIG. 5 is a diagram illustrating an example of anonymous information stored in the anonymous information DB 145. The anonymous information is an abstraction of each word of personal information. In the example of FIG. 5, the age, address (prefecture name), vehicle type, and minimum number of appearances are stored in association with each other.

図６は、匿名情報へのアクセスを管理するための情報（以下アクセス管理情報とも称す）の一例を示す図である。このアクセス管理情報は、図６に示すように、例えば、レベル、匿名情報ＩＤ、使用辞書、最少出現率、情報種別、概要等を含んでいる。ここで、レベルは、当該匿名情報にアクセス可能な権限を示す情報であり、後述のように当該匿名情報の最少出現数や最少出現数に基づいて算出した最少出現率等の値に基づいて求めている。 FIG. 6 is a diagram illustrating an example of information for managing access to anonymous information (hereinafter also referred to as access management information). As shown in FIG. 6, this access management information includes, for example, a level, an anonymous information ID, a usage dictionary, a minimum appearance rate, an information type, an outline, and the like. Here, the level is information indicating an authority to access the anonymous information, and is obtained based on a value such as the minimum appearance rate calculated based on the minimum appearance number or the minimum appearance number of the anonymous information as described later. ing.

匿名情報ＩＤは、匿名情報を一意に識別する情報である。使用辞書は、当該匿名情報の匿名化に用いた辞書を示す情報であり、例えば各辞書の識別情報である。最少出現率は、当該匿名情報を構成する語の全数に対する最少出現数の割合である。ここで最少出現数は
、当該匿名情報において同じ属性値を持つ個人の人数、即ち匿名情報を構成する語の出現数のうち最少のものである。 The anonymous information ID is information that uniquely identifies anonymous information. The use dictionary is information indicating a dictionary used for anonymization of the anonymous information, for example, identification information of each dictionary. The minimum appearance rate is a ratio of the minimum number of appearances to the total number of words constituting the anonymous information. Here, the minimum number of appearances is the smallest number of individuals having the same attribute value in the anonymous information, that is, the number of appearances of words constituting the anonymous information.

情報種別は、当該匿名情報が複数の個人情報に基づく統計情報であるか、特定の事業者が持つ個人情報を匿名化したものか等の種別を示す。図６の例では、匿名情報が、複数の匿名情報の平均や合計を求めた統計情報の場合に、この種別を平均又は合計と示し、特定の事業者の個人情報を匿名化したものの場合、この事業者の名称を示している。概要は、当該匿名情報の説明であり、例えば匿名情報に含まれる項目や匿名化の条件を示す。 The information type indicates a type such as whether the anonymous information is statistical information based on a plurality of personal information or anonymized personal information held by a specific business operator. In the example of FIG. 6, when the anonymous information is statistical information obtained by calculating the average or total of a plurality of anonymous information, this type is shown as average or total, and the personal information of a specific business operator is anonymized, The name of this company is shown. The overview is an explanation of the anonymous information, and shows items included in the anonymous information and conditions for anonymization, for example.

なお、匿名情報ＤＢ１４５は、匿名化装置１０又は管理サーバ２０が備えた記憶装置に格納されたものでも良いし、匿名化装置１０及び管理サーバ２０からアクセス可能であれば独立したファイルサーバ等の装置に格納されたものでも良い。 The anonymous information DB 145 may be stored in a storage device included in the anonymization device 10 or the management server 20, or an apparatus such as an independent file server if accessible from the anonymization device 10 and the management server 20. It may be stored in.

また、管理サーバ２０は、図３に示すように、要求受付部２０１や、アクセス制御部２０２、出力制御部２０３、ユーザ管理ＤＢ２５１を備えている。 As shown in FIG. 3, the management server 20 includes a request reception unit 201, an access control unit 202, an output control unit 203, and a user management DB 251.

要求受付部２０１は、匿名情報を取得するためのアクセス要求を利用者の端末から受信する。 The request reception unit 201 receives an access request for acquiring anonymous information from the user's terminal.

アクセス制御部２０２は、利用者からのアクセス要求を受信した場合に、当該利用者の権限レベルと対応する匿名レベルの匿名情報へのアクセスを許可する。 When the access control unit 202 receives an access request from a user, the access control unit 202 permits access to anonymous information at an anonymous level corresponding to the authority level of the user.

出力制御部２０３は、アクセス制御部２０２によってアクセスを許可した匿名情報を匿名情報ＤＢ１４５から読み出して出力する。例えば要求元の利用者の端末３０へ送信する。ここで、匿名情報の出力とは、表示装置による表示出力や、プリンタによる印刷出力、他のコンピュータへの送信、記憶媒体への書き込み等であっても良い。 The output control unit 203 reads the anonymous information permitted to be accessed by the access control unit 202 from the anonymous information DB 145 and outputs it. For example, it is transmitted to the terminal 30 of the requesting user. Here, the output of anonymous information may be display output by a display device, print output by a printer, transmission to another computer, writing to a storage medium, or the like.

図７は匿名化装置１０のハードウェア構成を示す図である。匿名化装置１０は、ＣＰＵ１１、メモリ１２、通信制御部１３、記憶装置１４、入出力インタフェース１５を有する所謂コンピュータである。 FIG. 7 is a diagram illustrating a hardware configuration of the anonymization device 10. The anonymization device 10 is a so-called computer having a CPU 11, a memory 12, a communication control unit 13, a storage device 14, and an input / output interface 15.

ＣＰＵ１１は、メモリ１２に実行可能に展開されたプログラムを実行し、前述のデータ取得部１０１や、抽象化部１０２、検定部１０３、選択部１０４、価値判定部１０６、価値データ取得部１０７、ワードカテゴリ分析部１０８、ワード価値計算部１０９、出現数取得部１１１、権限決定部１１２の機能を提供する。 The CPU 11 executes the program expanded in an executable manner in the memory 12 and executes the data acquisition unit 101, the abstraction unit 102, the test unit 103, the selection unit 104, the value determination unit 106, the value data acquisition unit 107, the word The functions of the category analysis unit 108, the word value calculation unit 109, the appearance number acquisition unit 111, and the authority determination unit 112 are provided.

メモリ１２は、主記憶装置ということもできる。メモリ１２は、例えば、ＣＰＵ１１が実行するプログラムや、通信制御部１３を介して受信したデータ、記憶装置１４から読み出したデータ、その他のデータ等を記憶する。 The memory 12 can also be called a main storage device. The memory 12 stores, for example, a program executed by the CPU 11, data received via the communication control unit 13, data read from the storage device 14, and other data.

通信制御部１３は、ネットワークを介して他の装置と接続し、当該装置との通信を制御する。入出力インタフェース１５は、表示装置やプリンタ等の出力手段や、キーボードやポインティングデバイス等の入力手段、ドライブ装置等の入出力手段が適宜接続される。ドライブ装置は、着脱可能な記憶媒体の読み書き装置であり、例えば、フラッシュメモリカードの入出力装置、ＵＳＢメモリを接続するＵＳＢのアダプタ等である。また、着脱可能な記憶媒体は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ブルーレイディスク（Blu-ray（登録商標） Disc）等のディスク媒体であってもよい。ドライブ装置は、着脱可能な記憶媒体からプログラムを読み出し、記憶装置１４に格納する。 The communication control unit 13 is connected to another device via a network and controls communication with the device. The input / output interface 15 is appropriately connected to output means such as a display device and a printer, input means such as a keyboard and pointing device, and input / output means such as a drive device. The drive device is a removable storage medium read / write device, such as an input / output device for a flash memory card, a USB adapter for connecting a USB memory, or the like. The removable storage medium may be a disk medium such as a CD (Compact Disc), a DVD (Digital Versatile Disk), or a Blu-ray (registered trademark) disc. The drive device reads the program from the removable storage medium and stores it in the storage device 14.

記憶装置１４は、外部記憶装置ということもできる。記憶装置１４としては、ＳＳＤ（Solid State Drive）やＨＤＤ等であってもよい。記憶装置１４は、ドライブ装置との間
で、データを授受する。例えば、記憶装置１４は、ドライブ装置からインストールされるプログラム等を記憶する。また、記憶装置１４は、プログラムを読み出し、メモリ１２に引き渡す。本実施形態では、記憶装置１４が前述の、個人情報ＤＢ１３１及び公開条件ＤＢ１３２を格納している。 The storage device 14 can also be called an external storage device. The storage device 14 may be an SSD (Solid State Drive), an HDD, or the like. The storage device 14 exchanges data with the drive device. For example, the storage device 14 stores a program installed from the drive device. Further, the storage device 14 reads out the program and delivers it to the memory 12. In the present embodiment, the storage device 14 stores the personal information DB 131 and the disclosure condition DB 132 described above.

図８は管理サーバ２０のハードウェア構成を示す図である。管理サーバ２０は、ＣＰＵ２１、メモリ２２、通信制御部２３、記憶装置２４、入出力インタフェース２５を有する所謂コンピュータである。 FIG. 8 is a diagram illustrating a hardware configuration of the management server 20. The management server 20 is a so-called computer having a CPU 21, a memory 22, a communication control unit 23, a storage device 24, and an input / output interface 25.

ＣＰＵ２１は、メモリ２２に実行可能に展開されたプログラムを実行し、前述の要求受付部２０１や、アクセス制御部２０２、出力制御部２０３の機能を提供する。 The CPU 21 executes a program that is executable on the memory 22 and provides the functions of the request receiving unit 201, the access control unit 202, and the output control unit 203 described above.

メモリ２２は、主記憶装置ということもできる。メモリ２２は、例えば、ＣＰＵ２１が実行するプログラムや、通信制御部２３を介して受信したデータ、記憶装置２４から読み出したデータ、その他のデータ等を記憶する。 The memory 22 can also be called a main storage device. The memory 22 stores, for example, a program executed by the CPU 21, data received via the communication control unit 23, data read from the storage device 24, other data, and the like.

通信制御部２３は、ネットワークを介して他の装置と接続し、当該装置との通信を制御する。入出力インタフェース２５は、表示装置やプリンタ等の出力手段や、キーボードやポインティングデバイス等の入力手段、ドライブ装置等の入出力手段が適宜接続される。ドライブ装置は、着脱可能な記憶媒体の読み書き装置であり、例えば、フラッシュメモリカードの入出力装置、ＵＳＢメモリを接続するＵＳＢのアダプタ等である。また、着脱可能な記憶媒体は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ブルーレイディスク（Blu-ray Disc）等のディスク媒体であってもよい。ドライブ装置は、着脱可能な記憶媒体からプログラムを読み出し、記憶装置２４に格納する。 The communication control unit 23 is connected to another device via a network and controls communication with the device. The input / output interface 25 is appropriately connected to output means such as a display device and a printer, input means such as a keyboard and pointing device, and input / output means such as a drive device. The drive device is a removable storage medium read / write device, such as an input / output device for a flash memory card, a USB adapter for connecting a USB memory, or the like. The removable storage medium may be a disk medium such as a CD (Compact Disc), a DVD (Digital Versatile Disk), or a Blu-ray Disc. The drive device reads the program from the removable storage medium and stores it in the storage device 24.

記憶装置２４は、外部記憶装置ということもできる。記憶装置２４としては、ＳＳＤ（Solid State Drive）やＨＤＤ等であってもよい。記憶装置２４は、ドライブ装置との間
で、データを授受する。例えば、記憶装置２４は、ドライブ装置からインストールされる情報処理プログラム等を記憶する。また、記憶装置２４は、プログラムを読み出し、メモリ２２に引き渡す。本実施形態では、記憶装置２４が前述のユーザ管理ＤＢ２５１を格納している。 The storage device 24 can also be referred to as an external storage device. The storage device 24 may be an SSD (Solid State Drive), an HDD, or the like. The storage device 24 exchanges data with the drive device. For example, the storage device 24 stores an information processing program installed from the drive device. The storage device 24 reads out the program and delivers it to the memory 22. In the present embodiment, the storage device 24 stores the user management DB 251 described above.

図９は、ユーザ管理ＤＢ２５１が記憶するユーザ管理情報の一例を示す図である。図９に示すように、ユーザ管理ＤＢ２５１は、各ユーザの識別情報（ユーザＩＤ）と、権限情報と、使用可能辞書の情報を対応つけてユーザ管理情報としている。 FIG. 9 is a diagram illustrating an example of user management information stored in the user management DB 251. As shown in FIG. 9, the user management DB 251 associates each user's identification information (user ID), authority information, and usable dictionary information as user management information.

§３．匿名化方法
次に本実施形態の匿名化方法について説明する。図１０は、匿名化装置１０がプログラムに従って実行する匿名化方法の概略を示した説明図である。図１０に示すように、匿名化装置１０は、先ず匿名化情報を取得し(ステップＳ１)、この匿名情報が公開条件を満たしているか否かを判定し(ステップＳ２)、公開条件を満たした匿名情報にアクセス権限を設定する(ステップＳ３)。 §3. Anonymization method Next, the anonymization method of this embodiment will be described. FIG. 10 is an explanatory diagram showing an outline of an anonymization method executed by the anonymization device 10 according to a program. As shown in FIG. 10, the anonymization device 10 first acquires anonymization information (step S1), determines whether or not the anonymous information satisfies the disclosure condition (step S2), and satisfies the disclosure condition. Access authority is set to the anonymous information (step S3).

なお、ステップＳ１における匿名化情報の取得は、個人情報を有する事業者が匿名化した匿名情報をそれぞれの事業者から匿名化装置１０が受信するものでも良いし、それぞれの事業者から匿名化装置１０が個人情報を受信し、匿名化して取得するものでも良い。 In addition, acquisition of the anonymization information in step S1 may be such that the anonymization device 10 receives the anonymization information anonymized by the business operator having the personal information from each business operator, or the anonymization device from each business operator. 10 may receive personal information and anonymize and acquire it.

図１１は、匿名化の処理を示す図である。匿名化装置１０は、匿名化の処理を行う場合
、図１１に示すように、先ず他のコンピュータ或いは入力手段から個人情報を取得（受信）し（ステップＳ１０）、この個人情報を所定形式となるよう正規化して個人情報ＤＢ１３１に登録する（ステップＳ２０）。 FIG. 11 is a diagram illustrating anonymization processing. When performing anonymization processing, the anonymization device 10 first acquires (receives) personal information from another computer or input means as shown in FIG. 11 (step S10), and this personal information is in a predetermined format. The data is normalized and registered in the personal information DB 131 (step S20).

匿名化装置１０は、個人情報を個人情報ＤＢ１３１から対象データとして読み出す（ステップＳ３０）。ここで、匿名化装置１０は、個人情報に個人ＩＤや氏名、電話番号、メールアドレスのように、個人を識別するための情報で、抽象化したのでは意味の無いデータは、読み出さずに対象データから外しても良い。
次に匿名化装置１０は、対象データ中の各ワードについて、価値データが検索情報蓄積ＤＢ１３３に存在するか否か、即ち既に価値データを取得済みか否かを判定する（ステップＳ４０）。匿名化装置１０は、全てのワードの価値データが検索情報蓄積ＤＢ１３３に存在する場合にはステップＳ６０へ移行し（ステップＳ４０，Ｙｅｓ）、足りない価値データがある場合（ステップＳ４０，Ｎｏ）、当該ワードの価値データを外部の装置、本例では検索エンジン９０から取得する（ステップＳ５０）。なお、検索エンジンから取得した価値データ以外、即ち検索情報蓄積ＤＢ１３３に存在したワードの価値情報は、検索情報蓄積ＤＢ１３３から取得する（ステップＳ６０）。 The anonymization device 10 reads personal information from the personal information DB 131 as target data (step S30). Here, the anonymization device 10 is information for identifying an individual such as a personal ID, name, telephone number, and email address in personal information, and data that is meaningless if it is abstracted is not read out. It may be removed from the data.
Next, the anonymization device 10 determines whether or not value data exists in the search information storage DB 133 for each word in the target data, that is, whether or not value data has already been acquired (step S40). The anonymization device 10 proceeds to step S60 when the value data of all words exist in the search information storage DB 133 (step S40, Yes), and when there is insufficient value data (step S40, No), Word value data is obtained from an external device, in this example, the search engine 90 (step S50). In addition, the value information of the words that existed in the search information storage DB 133 other than the value data acquired from the search engine is acquired from the search information storage DB 133 (step S60).

また、匿名化装置１０は、匿名性を満たすため対象データの各項目を抽象化したワード（カテゴリ）に置き換えることで抽象化し、抽象化候補データを作成する（ステップＳ７０）。各ワードの抽象化は、図１２〜図１４に示すように抽象化前のワードと抽象化後のワードとを対応付けて記憶した辞書を用いて、抽象化前のワードと対応する抽象化後のワードにおきかえる。図１２は、車名と対応するメーカ名に抽象化する辞書の例を示している。図１３は、車名と対応する車種に抽象化する辞書の例を示している。図１４は、車名と対応する車両の区分に抽象化する辞書の例を示している。図１２〜図１４では、車名の項目についてのみ示したが、年齢や住所等、他の項目についても同様に対応するワードが各辞書に含まれている。各辞書は、システム１００側で一意に識別できるように、辞書ＩＤが付されており、例えば図１２〜図１４の辞書の辞書ＩＤはＤ１〜Ｄ３である。 Further, the anonymization device 10 abstracts by replacing each item of the target data with an abstract word (category) to satisfy anonymity, and creates abstraction candidate data (step S70). As shown in FIGS. 12 to 14, each word is abstracted by using a dictionary in which the word before abstraction and the word after abstraction are stored in association with each other, and after wording corresponding to the word before abstraction. Replace with the word. FIG. 12 shows an example of a dictionary that is abstracted to the manufacturer name corresponding to the car name. FIG. 13 shows an example of a dictionary abstracted to a car type corresponding to a car name. FIG. 14 shows an example of a dictionary that is abstracted into vehicle classifications corresponding to vehicle names. 12 to 14 show only the item of the car name, but the words corresponding to other items such as age and address are similarly included in each dictionary. Each dictionary is assigned a dictionary ID so that it can be uniquely identified on the system 100 side. For example, the dictionary IDs of the dictionaries in FIGS. 12 to 14 are D1 to D3.

また、抽象化可能な項目が複数存在する場合には、各項目を抽象化した場合と抽象化しない場合の全てのパターンを作成する。例えば対象データに三つの項目Ａ，Ｂ，Ｃが含まれ、全項目について抽象化が可能で、抽象化した項目をＡ´，Ｂ´，Ｃ´とした場合、図１５に示すように、項目Ａだけを抽象化した場合Ａ´，Ｂ，Ｃ、項目Ａ，Ｂを抽象化した場合Ａ´，Ｂ´，Ｃなど、七つの候補パターンが作成できる。また、全項目を用いるものに限らず、Ａ´，ＢやＢ´，Ｃなど、一部の項目を用いた候補パターンを作成しても良い。 When there are a plurality of items that can be abstracted, all patterns are created when each item is abstracted and when it is not abstracted. For example, if the target data includes three items A, B, and C and all items can be abstracted, and the abstracted items are A ′, B ′, and C ′, as shown in FIG. Seven candidate patterns can be created, such as A ′, B, C when only A is abstracted, and A ′, B ′, C when items A and B are abstracted. Moreover, you may create the candidate pattern using some items, such as A ', B, B', and C, without using all items.

次に匿名化装置１０は、抽象化候補データに含まれる各ワードの価値データに基づいて各パターンの抽象化候補データの価値を算出し（ステップＳ８０）、この抽象化候補データの価値に基づいて検定の順番を決定する（ステップＳ９０）。例えばこの価値が高い順（降順）に検定の順番を決定する。なお、全ての候補パターンについて検定を行うことが望ましいが、この抽象化候補データの価値に基づき、価値の低過ぎる抽象化候補データを順番から外しても良い。例えば、価値の高い順番で、所定番目以降或いは半分未満など所定割合未満の抽象化候補データを外しても良い。また、抽象化候補データの価値が対象データの価値に対して所定割合未満となった抽象化候補データを外しても良い。これにより検定数が少なくなり、処理時間の短縮化が図れる。 Next, the anonymization device 10 calculates the value of the abstraction candidate data of each pattern based on the value data of each word included in the abstraction candidate data (step S80), and based on the value of this abstraction candidate data. The order of testing is determined (step S90). For example, the test order is determined in descending order of the value. Although it is desirable to test all candidate patterns, abstract candidate data that is too low in value may be removed from the order based on the value of the abstract candidate data. For example, abstract candidate data less than a predetermined ratio, such as after a predetermined value or less than half, may be removed in order of value. Further, the abstraction candidate data whose value is less than a predetermined ratio with respect to the value of the target data may be excluded. This reduces the number of tests and shortens the processing time.

この検定の順番に従い、匿名化装置１０は、抽象化候補データの匿名性を検定する（ステップＳ１００）。例えば、ｋ−匿名性を検定するため、一個人と対応付けられた異なる項目の値の組み合わせが当該抽象化候補データ中に存在する数（存在数）を求める。或いは、ｌ多様性を検定するため、一個人と対応付けられた同じ項目の値の組み合わせが当該
抽象化候補データ中に存在する数（存在数）を求める。そして、この存在数のうち最小のものを最低出現数（ｋ値／ｌ値）として求め（ステップＳ１１０）、この最低出現数が１を超えているか否かを判定する（ステップＳ１２０）。即ち、ここでｋ値が１を超えていればｋ−匿名性を満たし、１であればｋ−匿名性を満たさない。同様にｌ値が１を超えていればｌ−多様性を満たし、１であればｌ−多様性を満たさない。 In accordance with the order of this test, the anonymization device 10 tests the anonymity of the abstraction candidate data (step S100). For example, in order to test k-anonymity, the number (existence number) of combinations of values of different items associated with one individual is present in the abstraction candidate data. Alternatively, in order to test 1 diversity, the number (existence number) in which the combination of values of the same item associated with one individual exists in the abstraction candidate data is obtained. Then, the smallest of the existence numbers is obtained as the minimum number of appearances (k value / l value) (step S110), and it is determined whether or not the minimum number of appearances exceeds 1 (step S120). That is, if the k value exceeds 1, k-anonymity is satisfied, and if it is 1, k-anonymity is not satisfied. Similarly, if the l value exceeds 1, l-diversity is satisfied, and if l value is 1, l-diversity is not satisfied.

最低出現数（ｋ値／ｌ値）が１を超えていない場合（ステップＳ１２０，Ｎｏ）、匿名化装置１０は、抽象化候補データのうち、少なくとも一つの項目の値を更に抽象化する、即ち抽象化したワードに置き換え（ステップＳ１３０）、ステップＳ１００に戻る。 When the minimum number of appearances (k value / l value) does not exceed 1 (step S120, No), the anonymization device 10 further abstracts the value of at least one item of the abstraction candidate data, that is, Replacing with the abstracted word (step S130), the process returns to step S100.

一方、最低出現数（ｋ値／ｌ値）が１を超えている場合（ステップＳ１２０，Ｙｅｓ）、匿名化装置１０は、当該抽象化候補データの価値と元の対象データの価値との差分を求め（ステップＳ１４０）、この差分や、この差分に基づく値、例えば対象データの価値に対する差分の割合、対象データの価値に対する抽象化候補データの価値の割合を当該抽象化候補データの価値として決定する（ステップＳ１５０）。 On the other hand, when the minimum number of appearances (k value / l value) exceeds 1 (step S120, Yes), the anonymization device 10 calculates the difference between the value of the abstraction candidate data and the value of the original target data. Obtain (Step S140), and determine the difference, a value based on the difference, for example, the ratio of the difference to the value of the target data, and the ratio of the value of the abstract candidate data to the value of the target data as the value of the abstract candidate data. (Step S150).

また、匿名化装置１０は、検定していない候補パターンがあるか否かを判定し（ステップＳ１６０）、検定していない候補パターンがあれば（ステップＳ１６０，Ｙｅｓ）、ステップＳ９０で決定した順番に従って、次の順番の抽象化候補データを特定し（ステップＳ１７０）、ステップＳ１００に戻って次の抽象化候補データについて検定を行う。 Further, the anonymization device 10 determines whether there is a candidate pattern that has not been verified (step S160). If there is a candidate pattern that has not been verified (step S160, Yes), the anonymization device 10 follows the order determined in step S90. Then, the next abstraction candidate data is specified (step S170), and the process returns to step S100 to test the next abstraction candidate data.

このように各パターンの抽象化候補データについて検定を繰り返し、次の候補パターンが無くなった場合（ステップＳ１６０，Ｎｏ）、匿名化装置１０は、ステップＳ１５０で求各抽象化候補データの価値に基づいて、採用すべき抽象化候補データを選択し（ステップＳ１８０）、匿名情報として匿名情報ＤＢ１４５に記憶する（ステップＳ１９０）。 In this way, when the test is repeated for the abstraction candidate data of each pattern and there is no next candidate pattern (step S160, No), the anonymization device 10 is based on the value of each abstraction candidate data obtained in step S150. Abstraction candidate data to be adopted is selected (step S180) and stored as anonymous information in the anonymous information DB 145 (step S190).

抽象化候補データの選択は、例えば、全候補パターンの中で価値の高い順に所定数の抽象化候補データを選択する。また、匿名化装置１０は、全候補パターンの中から価値の高い順に複数の抽象化候補データを出力し、この出力された抽象化候補データの中から操作者が適切だと思う抽象化候補データを指定し、この指定された抽象化候補データを選択しても良い。
次に図１６−図２３を用いて本実施形態におけるデータの価値について説明する。図１６は対象データにおける年齢の項目の一部の例を示す図である。図１６に示すように対象データは、年齢ｓｉ毎に人数ｃｉを有している。例えば、１８歳（ｓ１）の人数（ｃ１）が３０人、１９歳（ｓ２）の人数（ｃ２）が１０人である。 In the selection of abstraction candidate data, for example, a predetermined number of abstraction candidate data is selected in descending order of value among all candidate patterns. The anonymization device 10 outputs a plurality of abstraction candidate data in descending order of value from all candidate patterns, and the abstraction candidate data that the operator thinks is appropriate from the output abstraction candidate data May be selected, and the specified abstraction candidate data may be selected.
Next, the value of data in this embodiment will be described with reference to FIGS. FIG. 16 is a diagram illustrating an example of a part of the age item in the target data. As shown in FIG. 16, the target data has the number of people ci for each age si. For example, the number of people (c1) at the age of 18 (s1) is 30, and the number of people (c2) at the age of 19 (s2) is 10.

図１７は、年齢ｓｉについて取得する価値データの一例を示す。図１７の価値データは、年齢ｓｉ毎にＳＥＭ単価ｅｉを有している。 FIG. 17 shows an example of value data acquired for the age si. The value data in FIG. 17 has a SEM unit price ei for each age si.

この年齢ｓｉの価値は、ＳＥＭ単価ｅｉに人数ｃｉを乗じた値であり、式１で示される。 The value of this age si is a value obtained by multiplying the SEM unit price ei by the number of people ci, and is represented by Equation 1.

ｓｉ＝ｃｉ×ｅｉ・・・（式１）
そして、図１８に示すように年齢の項目Ｓ（ｅ）の価値は、各年齢ｓｉの総計であり、式２で示される。なお、図１８においてｎは５である。従って、年齢の項目Ｓ（ｅ）の価値は、図１９に示すように、２４４６円である。また、対象データにおける全ての項目の価値を合計したものが対象データの価値である。 si = ci × ei (Formula 1)
As shown in FIG. 18, the value of the age item S (e) is the total of each age si, and is expressed by Equation 2. In FIG. 18, n is 5. Therefore, the value of the age item S (e) is 2446 yen as shown in FIG. The value of the target data is the sum of the values of all items in the target data.

一方、図２０は抽象化候補データにおける年齢の項目の一部の例を示す図である。図２０に示すように抽象化候補データは、年代ｋｉ毎に人数ｃｉを有している。例えば、１０
代（ｋ１）の人数（ｃ１）が４０人、２０代（ｋ２）の人数（ｃ２）が２２人である。 On the other hand, FIG. 20 is a diagram showing an example of part of the age item in the abstraction candidate data. As shown in FIG. 20, the abstraction candidate data has the number of people ci for each age ki. For example, 10
The number of teenagers (k1) (c1) is 40 people, and the number of people in their 20s (k2) (c2) is 22 people.

図２１は、年代ｋｉについて取得する各ワードの価値データの一例を示す。図２１の価値データは、年代ｋｉ毎にＳＥＭ単価ｅｉを有している。 FIG. 21 shows an example of value data of each word acquired for the age ki. The value data in FIG. 21 has a SEM unit price ei for each age ki.

この年代ｋｉの価値は、ＳＥＭ単価ｅｉに人数ｃｉを乗じた値であり、式３で示される。 The value of this age ki is a value obtained by multiplying the SEM unit price ei by the number of people ci, and is expressed by Equation 3.

ｋｉ＝ｃｉ×ｅｉ・・・（式３）
そして、図２２に示すように年代の項目Ｓ（ｋ）の価値は、各年代ｋｉの総計であり、式４で示される。なお、図２２においてｎは２である。従って、年齢の項目Ｓ（ｋ）の価値は、図２３に示すように、２１３４円である。即ち、年齢の項目を年代に抽象化したことにより、価値が３１２円減損したことになる。また、抽象化候補データにおける全ての項目の価値を合計したものが抽象化候補データの価値である。 ki = ci × ei (Formula 3)
Then, as shown in FIG. 22, the value of the item S (k) of the age is the total of each age ki, and is expressed by Equation 4. In FIG. 22, n is 2. Therefore, the value of the age item S (k) is 2134 yen as shown in FIG. In other words, the value was lost by 312 yen by abstracting the age item into the age. Further, the sum of the values of all items in the abstraction candidate data is the value of the abstraction candidate data.

そして、ステップＳ１５０で求める抽象化候補データの価値として、例えば式５に示すように、抽象化候補データの価値を抽象化候補データの価値と対象データの価値の合計で除した減損率Ｍ（ｋ）を求める。 Then, as the value of the abstraction candidate data obtained in step S150, for example, as shown in Equation 5, an impairment rate M (k that is obtained by dividing the value of the abstraction candidate data by the sum of the value of the abstraction candidate data and the value of the target data. )

Ｍ（ｋ）＝Ｓ（ｋ）／（Ｓ（ｋ）＋Ｓ（ｅ））・・・（式５）
このように本実施形態の匿名化装置１０は、各抽象化候補データの価値を抽象化したワードの価値に基づいて評価することにより、自動で適切な匿名化処理を行うことができる。即ち、抽象化の程度を異ならせて、多数の匿名情報を自動生成するといったことができる。 M (k) = S (k) / (S (k) + S (e)) (Formula 5)
Thus, the anonymization apparatus 10 of this embodiment can perform an appropriate anonymization process automatically by evaluating based on the value of the word which abstracted the value of each abstraction candidate data. That is, it is possible to automatically generate a large number of anonymous information with different degrees of abstraction.

図２４は、匿名化装置１０が匿名情報の公開条件を確認する処理を示す図である。公開条件を確認するステップＳ２では、図２４に示すように匿名化装置１０は、ステップＳ１で取得した匿名情報について公開条件を確認する匿名情報を対象データとして記憶装置１４から読み出し（ステップＳ２１０）、公開条件を確認していない匿名情報、即ち未確認の匿名情報が存在するか否かを判定し（ステップＳ２２０）、未確認の匿名情報が存在しなければ処理を終了し（ステップＳ２２０，Ｎｏ）、未確認の匿名情報が存在すれば（ステップＳ２２０，Ｙｅｓ）、ステップＳ２３０に移行する。 FIG. 24 is a diagram illustrating processing in which the anonymization device 10 confirms the disclosure condition of anonymous information. In step S2 for confirming the disclosure condition, as shown in FIG. 24, the anonymization device 10 reads the anonymous information for confirming the disclosure condition for the anonymous information acquired in step S1 from the storage device 14 as target data (step S210), It is determined whether there is anonymous information that has not been checked for disclosure conditions, that is, unconfirmed anonymous information (step S220). If there is no anonymous information that has not been confirmed, the process is terminated (step S220, No), and unconfirmed. If the anonymous information exists (step S220, Yes), the process proceeds to step S230.

ステップＳ２３０では、未確認の匿名情報を権限設定ＤＢ１３５の権限情報と照合し、当該匿名情報に対応する権限情報が権限設定ＤＢ１３５に記憶されているか否かを判定する（ステップＳ２４０）。 In step S230, the unconfirmed anonymous information is checked against the authority information in the authority setting DB 135 to determine whether authority information corresponding to the anonymous information is stored in the authority setting DB 135 (step S240).

ステップＳ２４０において、当該匿名情報に対応する権限情報が権限設定ＤＢ１３５に記憶されていない、例えば当該匿名情報の提供元や提供先と一致する情報が権限設定ＤＢ１３５に無く、対応する権限情報が記憶されていないと判定された場合（ステップＳ２４０、Ｎｏ）、権限設定ＤＢ１３５へ新規に権限情報を追加する。新規権限情報を追加する場合、匿名化装置１０は、例えば当該匿名情報の提供元の事業者の装置から権限情報を取得して権限設定ＤＢ１３５に記憶させる（ステップＳ２４５）。なお、匿名化装置１０は、新規権限情報を追加する場合、匿名化装置１０の操作者に権限情報の入力を促し、権限情報が入力された場合に、この権限情報を権限設定ＤＢ１３５に記憶させても良い。 In step S240, the authority information corresponding to the anonymous information is not stored in the authority setting DB 135, for example, there is no information in the authority setting DB 135 that matches the source or destination of the anonymous information, and the corresponding authority information is stored. If it is determined that it is not present (step S240, No), authority information is newly added to the authority setting DB 135. When adding new authority information, the anonymization apparatus 10 acquires authority information, for example, from the apparatus of the provider of the anonymous information provider and stores the authority information in the authority setting DB 135 (step S245). The anonymization device 10 prompts the operator of the anonymization device 10 to input authority information when adding new authority information, and stores the authority information in the authority setting DB 135 when the authority information is input. May be.

ステップＳ２４５の処理により匿名情報に対する権限情報が権限設定ＤＢ１３５に記憶された場合や、ステップＳ２４０で匿名情報に対応する権限情報が権限設定ＤＢ１３５に全て記憶されていると判定した場合、（ステップＳ２４０、Ｙｅｓ）、この匿名情報を一時処理ＤＢ１３４に格納する（ステップＳ２５０）。 When authority information for anonymous information is stored in the authority setting DB 135 by the process of step S245, or when it is determined in step S240 that all authority information corresponding to anonymous information is stored in the authority setting DB 135 (step S240, Yes), this anonymous information is stored in the temporary process DB 134 (step S250).

次に匿名化装置１０は、一時処理ＤＢ１３４に格納した匿名情報が、公開条件ＤＢ１３２の公開条件に合致しているか否かを判定する（ステップＳ２６０）。ここで匿名情報が、公開条件ＤＢ１３２の公開条件に合致していなければ（ステップＳ２６０，Ｎｏ）、ステップＳ２１０に戻って次の匿名情報の処理に移る。一方、匿名情報が、公開条件ＤＢ１３２の公開条件に合致していれば（ステップＳ２６０，Ｙｅｓ）、当該匿名情報を匿名情報ＤＢ１４５に格納し、ステップＳ２１０に戻って次の匿名情報の処理に移る。 Next, the anonymization device 10 determines whether or not the anonymous information stored in the temporary process DB 134 matches the disclosure condition of the disclosure condition DB 132 (step S260). If the anonymous information does not match the disclosure condition of the disclosure condition DB 132 (step S260, No), the process returns to step S210 and proceeds to the next anonymous information process. On the other hand, if the anonymous information matches the disclosure condition of the disclosure condition DB 132 (step S260, Yes), the anonymous information is stored in the anonymous information DB 145, and the process returns to step S210 to move to the next anonymous information process.

図２５Ａは、権限設定ＤＢ１３５の一例を示す図である。権限設定ＤＢ１３５は、匿名化情報の最少出現数等の情報とアクセス権限（ランク）とを対応付けた権限設定情報を記憶している。即ち、権限設定ＤＢ１３５は、権限記憶部の一形態である。図２５Ａの例では、最少出現率の他、提供元や提供先、使用可能辞書の情報とアクセス権限（ランク）とを対応付けている。ここで、提供元は、匿名情報又は匿名化前の個人情報を提供した事業者を示す情報であり、提供元の事業者毎に各ランクの最少出現率や使用可能辞書が定められている。なお、図２５Ａの権限設定ＤＢ１３５では、複数の事業者に係る匿名情報を統計情報化したものの場合、提供元の項目に、平均や合計等、統計情報の種別を記憶している。提供先は、匿名情報の提供先（送信先）を示す情報である。最少出現率は、データ総数に対する最少出現数の割合であり、最少出現率が小さい場合、データ全体に対して個々のデータの占める割合が小さく、希釈された情報であるため、低いランクと対応付け、最少出現率が大きい場合、データ全体に対して個々のデータの占める割合が大きく、個々のデータから全体のデータを把握し易くなるため、高いランクと対応付けている。 FIG. 25A is a diagram illustrating an example of the authority setting DB 135. The authority setting DB 135 stores authority setting information in which information such as the minimum number of appearances of anonymization information is associated with an access authority (rank). That is, the authority setting DB 135 is a form of the authority storage unit. In the example of FIG. 25A, in addition to the minimum appearance rate, information on a provider, a provider, and a usable dictionary is associated with an access authority (rank). Here, the provider is information indicating the provider who provided the anonymous information or the personal information before the anonymization, and the minimum appearance rate and usable dictionary of each rank are determined for each provider of the provider. In the authority setting DB 135 in FIG. 25A, in the case where anonymous information relating to a plurality of businesses is converted into statistical information, the type of statistical information such as average or total is stored in the item of the provider. The provision destination is information indicating a provision destination (transmission destination) of anonymous information. The minimum appearance rate is the ratio of the minimum number of appearances to the total number of data. When the minimum appearance rate is small, the ratio of individual data to the entire data is small, and it is diluted information. When the minimum appearance rate is large, the ratio of the individual data to the entire data is large, and it becomes easy to grasp the entire data from the individual data.

例えば、図２５Ａの権限設定ＤＢ１３５では、提供元が販売店Ｐであって、当該匿名情報の提供先が、同一事業者内、即ち販売店Ｐの場合には最少出現率が０．０５％未満でランクＡと対応付けられ、同匿名情報の提供先が、事業者外の場合には最少出現率が０．０５％未満でランクＣのように事業者内に提供する場合と比べて高いランクと対応付けられている。また、提供先は、提供先に応じたランクの指定が無い場合、制限無しとしても良い。更に、提供先は、特定の事業者名や業種としても良い。例えば競合する事業者が提供先の場合は、他の事業者（事業者外）に提供するよりも高いランクと対応付け、業務提携している事業者が提供先の場合は、他の事業者（事業者外）に提供するよりも低いランクと対応付けても良い。同様に提供先が自動車販売店の場合や自動車修理工場の場合等、業種によって指定されても良い。 For example, in the authority setting DB 135 of FIG. 25A, the minimum appearance rate is less than 0.05% when the providing source is the store P and the providing destination of the anonymous information is the same company, that is, the store P. In the case where the destination of the anonymous information is outside the business operator, the minimum appearance rate is less than 0.05% and the rank is higher than that provided in the business operator as in rank C. Is associated. Further, the providing destination may be unrestricted when there is no designation of the rank according to the providing destination. Furthermore, the provider may be a specific business name or business type. For example, if the competing business is the provider, associate it with a higher rank than that provided to other businesses (outside the business), and if the business partner with which the business is affiliated is the provider, the other business It may be associated with a lower rank than that provided to (outside the business). Similarly, it may be specified according to the type of business, such as when the providing destination is an automobile dealer or an automobile repair shop.

また、図２５Ａの権限設定ＤＢ１３５において、提供元が販売店Ｐであって、使用可能辞書がＤ１の場合、ランクＡ〜Ｄの何れかと対応付けられ、使用可能辞書がＤ２の場合、ランクＥと対応付けられている。 Further, in the authority setting DB 135 of FIG. 25A, when the provider is the store P and the usable dictionary is D1, it is associated with any of ranks A to D, and when the usable dictionary is D2, rank E It is associated.

なお、図２５Ａの権限設定ＤＢ１３５では、最少出現率を含む条件とランクを対応付けているが、これに限らず、図２５Ｂに示すように、最少出限数を含む条件とランクとを対応付けて記憶しても良い。 In the authority setting DB 135 in FIG. 25A, the condition including the minimum appearance rate is associated with the rank, but not limited to this, as illustrated in FIG. 25B, the condition including the minimum limit number is associated with the rank. You may remember.

例えば、図２５Ｂの権限設定ＤＢ１３５では、提供元が販売店Ｐであって、当該匿名情報の提供先が、同一事業者内の場合は最少出現数が５０以上でランクＡであるのに対し、同匿名情報の提供先が、事業者外の場合は最少出現数が５０以上でランクＣのように事業者内に提供する場合と比べて高いランクと対応付けられている。 For example, in the authority setting DB 135 of FIG. 25B, when the providing source is the store P and the providing destination of the anonymous information is within the same company, the minimum appearance number is 50 or more and rank A. When the provision destination of the anonymous information is outside the business, the minimum number of appearances is 50 or more and is associated with a higher rank as compared with the case where the anonymous information is provided within the business as rank C.

図２６は、公開条件ＤＢ１３２の一例を示す図である。公開条件ＤＢ１３２は、匿名条件の属性値と公開条件とを対応付けて記憶している。例えば、図２６では、属性値に応じた最少出現数を指定し、属性値に車種が含まれている場合、最少出現率が０．０５％未満であることを公開条件としている。即ち、匿名情報に車種が含まれている場合、最少出現
率が０．０５％未満であれば匿名情報ＤＢ１４５に格納して公開の対象とし、最少出現率が０．０５％以上であれば匿名情報ＤＢ１４５に格納せず非公開とする。同様に属性値にメーカ名が含まれている場合、最少出現率が０．１％未満であることを公開条件としている。また、公開条件を国内メーカとし、国内メーカの情報を抽出して公開の対象とし、国外メーカの情報を非公開としても良い。なお、本例において、国内メーカと国外メーカとの判別は、予めメーカ名毎に、国内メーカであるか国外メーカであるかを示すテーブルを用意しておき、匿名化装置１０は、このテーブルを参照し、メーカ名に応じて国内メーカか否かを判定する。また、公開条件として、公開する期日や期間を定めても良い。図２６の例では、匿名情報が、所定のキーワード“▽ベンタ○ール”を含む場合、公開条件が“○月○日以降”であるので、“▽ベンタ○ール”を含むレコード又は匿名情報について、○月○日以降は公開対象とし、○月○日までは非公開とする。また、図２６の例では、匿名情報が、所定のキーワード“力○一ラ”を含む場合、公開条件が“１月１日〜２月２８日”であるので、“力○一ラ”を含むレコード又は匿名情報について、１月１日〜２月２８日の期間は公開対象とし、この期間以外は非公開とする。 FIG. 26 is a diagram illustrating an example of the disclosure condition DB 132. The disclosure condition DB 132 stores attribute values of anonymous conditions and disclosure conditions in association with each other. For example, in FIG. 26, when the minimum number of appearances corresponding to the attribute value is specified and the vehicle type is included in the attribute value, the disclosure condition is that the minimum appearance rate is less than 0.05%. That is, when the vehicle type is included in the anonymous information, if the minimum appearance rate is less than 0.05%, it is stored in the anonymous information DB 145 to be disclosed, and if the minimum appearance rate is 0.05% or more, the anonymous information is anonymous. It is not stored in the information DB 145 and is not disclosed. Similarly, when the maker name is included in the attribute value, the disclosure condition is that the minimum appearance rate is less than 0.1%. Further, the disclosure conditions may be domestic manufacturers, information on domestic manufacturers may be extracted and made public, and information on foreign manufacturers may not be disclosed. In this example, for the discrimination between the domestic manufacturer and the foreign manufacturer, a table indicating whether the manufacturer is a domestic manufacturer or a foreign manufacturer is prepared in advance for each manufacturer name, and the anonymization apparatus 10 uses this table. Refer to and determine whether the manufacturer is a domestic manufacturer according to the manufacturer name. Moreover, you may determine the date and period which make public as a publication condition. In the example of FIG. 26, when the anonymous information includes a predetermined keyword “▽ Venta * all”, since the disclosure condition is “after month * day *”, the record including “▽ Venta * all” or anonymous Information is subject to disclosure after XX month and day, and is not disclosed until XX month and day. Further, in the example of FIG. 26, when the anonymous information includes the predetermined keyword “power ○ 1 la”, the disclosure condition is “January 1 to February 28”. Regarding the records or anonymous information to be included, the period from January 1 to February 28 is set as the disclosure target, and it is not disclosed outside this period.

図２４の公開条件の確認処理が完了した場合、匿名化装置１０は、次に各匿名情報のアクセス権限を設定する（ステップＳ３）。図２７は、このアクセス権限を設定する処理の具体例を示す。匿名化装置１０は、権限設定ＤＢ１３５から権限情報を取得し（ステップＳ３１０）、匿名情報ＤＢ１４５から各匿名情報の最少出現率等の情報、例えば最少出現率、提供元、提供先、使用辞書といった情報を読み出し、これらの情報と対応するアクセス権限を権限設定ＤＢ１３５から求めて、当該匿名情報のアクセス権限情報として匿名情報ＤＢ１４５に記憶させる（ステップＳ３２０）。 When the confirmation process of the disclosure condition in FIG. 24 is completed, the anonymization device 10 next sets the access authority for each anonymous information (step S3). FIG. 27 shows a specific example of processing for setting this access authority. The anonymization device 10 obtains authority information from the authority setting DB 135 (step S310), and information such as the minimum appearance rate of each anonymous information from the anonymous information DB 145, for example, information such as the minimum appearance rate, the providing source, the providing destination, and the use dictionary Are obtained from the authority setting DB 135 and stored in the anonymous information DB 145 as access authority information of the anonymous information (step S320).

例えば、匿名化装置１０は、権限設定ＤＢ１３５を参照し、ステップＳ３１０で取得した匿名情報の最少出現率、提供元、提供先、使用辞書が全て適合するランクを当該匿名情報のアクセス権限情報として決定する。なお、最少出現率のように、低いランクの条件に適合した際、それよりも高いランクの条件にも同時に適合する場合、最も低いランクに決定する。図２５Ａの例では、提供元が販売店Ｐ、提供先が同一事業者内、即ち販売店Ｐ、使用辞書がＤ１、最少出現率が０．０５％未満の場合、匿名化装置１０は、アクセス権限をランクＡと決定する。また、提供元が販売店Ｐ、提供先が事業者外、使用辞書がＤ１、最少出現率が０．０５％未満の場合、匿名化装置１０は、アクセス権限をランクＣと決定する。そして、提供元が販売店Ｐ、提供先が同一事業者内、使用辞書がＤ２、最少出現率が０．２％の場合、使用可能辞書以外はランクＣの条件を満たすが、辞書Ｄ２と適合する使用可能辞書はランクＥであるため、匿名化装置１０は、アクセス権限をランクＥと決定する。 For example, the anonymization apparatus 10 refers to the authority setting DB 135, and determines the rank in which the minimum appearance rate of the anonymous information acquired in step S310, the provider, the provider, and the use dictionary all match as the access authority information of the anonymous information. To do. Note that when the condition of a lower rank is met, such as the minimum appearance rate, when the condition of a higher rank is also met, the lowest rank is determined. In the example of FIG. 25A, when the provider is the dealer P, the provider is the same company, that is, the dealer P, the usage dictionary is D1, and the minimum appearance rate is less than 0.05%, the anonymization device 10 accesses The authority is determined as rank A. Further, when the provider is the store P, the provider is outside the business operator, the usage dictionary is D1, and the minimum appearance rate is less than 0.05%, the anonymization device 10 determines the access authority to be rank C. If the provider is the store P, the provider is the same company, the dictionary used is D2, and the minimum appearance rate is 0.2%, the dictionary satisfies the rank C condition except for the usable dictionary, but is compatible with the dictionary D2. Since the usable dictionary to be used is rank E, the anonymization device 10 determines the access authority to be rank E.

また、匿名化装置１０は、これらの匿名情報について合計、平均、標準偏差等の統計情報を求め、ステップＳ３２０と同様に当該統計情報のアクセス権限を求め、当該統計情報とアクセス権限とを対応付けて匿名情報ＤＢ１４５に記憶させる（ステップＳ３３０）。 Further, the anonymization device 10 obtains statistical information such as total, average, standard deviation, etc. for these anonymous information, obtains access authority for the statistical information in the same manner as in step S320, and associates the statistical information with the access authority. And stored in the anonymous information DB 145 (step S330).

上記のようにアクセス権限を付加した匿名情報に対するアクセス管理について、次に説明する。図２８は、管理サーバ２０が匿名情報のアクセス権限に応じて当該匿名情報へのアクセスを管理するアクセス管理方法の説明図である。 Next, access management for anonymous information to which access authority is added as described above will be described. FIG. 28 is an explanatory diagram of an access management method in which the management server 20 manages access to the anonymous information according to the access authority of the anonymous information.

管理サーバ２０は、ユーザ端末３０から匿名情報へのアクセス要求を受けた場合に、図２８の処理を開始し、まずユーザの認証を行う（ステップＳ４１０）。ユーザの認証処理は、管理サーバ２０が、ユーザＩＤやパスワード等の認証情報をユーザ端末３０から受信し、この認証情報を登録済の情報と比較して一致していれば認証成功として次のステップＳ４３０へ移行し、一致しなければ認証失敗として図２８の処理を終了する。なお、管理サーバ２０が、ウエブサーバの機能を有し、匿名情報等の情報をウエブページとして提供
し、ユーザ端末３０が所謂ウエブブラウザの機能によって管理サーバ２０にアクセスする構成の場合、認証情報はHTTP Cookie等によってユーザ端末３０から管理サーバ２０へ送
信されても良い。また、認証情報は、ユーザの操作によってキーボード等の入力手段から入力され、ユーザ端末３０から管理サーバ２０へ送信されても良い。 When receiving the request for access to the anonymous information from the user terminal 30, the management server 20 starts the process of FIG. 28 and first authenticates the user (step S410). In the user authentication process, the management server 20 receives authentication information such as a user ID and a password from the user terminal 30, and compares this authentication information with the registered information. The process proceeds to S430, and if they do not match, the processing of FIG. When the management server 20 has a web server function, provides information such as anonymous information as a web page, and the user terminal 30 accesses the management server 20 by a so-called web browser function, the authentication information is It may be transmitted from the user terminal 30 to the management server 20 by HTTP Cookie or the like. The authentication information may be input from an input unit such as a keyboard by a user operation and transmitted from the user terminal 30 to the management server 20.

認証が成功した場合、管理サーバ２０は、ユーザ管理ＤＢ２５１から当該ユーザのユーザ管理情報を取得する（ステップＳ４２０）。このユーザ管理情報は、例えば図９のようにユーザＩＤ、アクセス権限、使用可能辞書等の情報を対応付けてユーザ管理ＤＢ２５１に記録されたものである。ユーザＩＤは、各ユーザを一意に識別するための識別情報である。ユーザのアクセス権限は、当該ユーザの持つ権限、即ち当該ユーザのアクセスが可能な匿名情報の範囲を示す情報である。特に、図９の例では、アクセス権限の範囲（アクセス可能な範囲）をランクで示している。例えば権限の低い（アクセス可能な範囲が狭い）順にランクＡ〜Ｅとした場合、ランクＡはランクＡの匿名情報をアクセス範囲とし、ランクＢはランクＡとランクＢの匿名情報をアクセス範囲とし、ランクＥはランクＡからランクＥの匿名情報をアクセス範囲とする。このように上位の権限の範囲に、下位の権限の範囲が含まれるように設定しても良いし、各権限を独立に設定しても良い。例えば、権限Ａと権限Ｅを有するユーザは、権限Ａ、Ｅの匿名情報のみアクセスでき、権限Ｂ，Ｃ，Ｄにはアクセスできない、というように設定しても良い。 When the authentication is successful, the management server 20 acquires the user management information of the user from the user management DB 251 (step S420). The user management information is recorded in the user management DB 251 in association with information such as a user ID, access authority, and usable dictionary as shown in FIG. The user ID is identification information for uniquely identifying each user. The user access authority is information indicating the authority of the user, that is, the range of anonymous information that can be accessed by the user. In particular, in the example of FIG. 9, the range of access authority (accessible range) is indicated by rank. For example, when ranks A to E are assigned in order of lower authority (the accessible range is narrow), rank A is anonymous information of rank A as an access range, rank B is anonymous information of rank A and rank B as an access range, Rank E uses anonymous information from rank A to rank E as the access range. In this way, the upper authority range may be set to include the lower authority range, or each authority may be set independently. For example, a user having authority A and authority E may access only anonymous information of authorities A and E, and may not be able to access authorities B, C, and D.

そして管理サーバ２０は、当該ユーザの権限内の匿名情報、即ち当該ユーザのアクセス権限でアクセス可能な匿名情報の概要情報を匿名情報ＤＢ１４５から取得する（ステップＳ４３０）。この概要情報の取得は、図６に示すように、予め各匿名情報のアクセス管理情報に記録されている概要情報を読み出すものでも良いし、項目名や匿名情報の一部のデータを概要情報として読み出しても良い。 Then, the management server 20 acquires anonymous information within the authority of the user, that is, summary information of anonymous information accessible with the access authority of the user from the anonymous information DB 145 (step S430). As shown in FIG. 6, this summary information may be obtained by reading out summary information recorded in advance in the access management information of each anonymous information, or using part of the item name or anonymous information as summary information. You may read.

管理サーバ２０は、この取得した概要情報をユーザ端末３０に送信し（ステップＳ４４０）、提供する匿名情報の選択を促す（ステップＳ４５０）。例えば管理サーバ２０が、概要情報を一覧表示させるウエブページとしてユーザ端末３０へ提供すると共に、キーワード検索や絞り込みのための入力欄を表示させて匿名情報の選択を促す。 The management server 20 transmits the acquired summary information to the user terminal 30 (step S440), and prompts selection of anonymous information to be provided (step S450). For example, the management server 20 provides the user terminal 30 as a web page on which summary information is displayed as a list, and displays an input field for keyword search or narrowing down to prompt selection of anonymous information.

そして、ユーザが前記概要情報の一覧の中から匿名情報を選択して、ユーザ端末３０からリクエストし、管理サーバ２０がこのリクエストを受信すると（ステップＳ４６０）、管理サーバ２０は、この匿名情報のアクセス権限とユーザのアクセス権限とを比較して（ステップＳ４７０）、当該ユーザが当該匿名情報にアクセスする権限を有しているか否かを再確認する（ステップＳ４８０）。 Then, when the user selects anonymous information from the summary information list and makes a request from the user terminal 30, and the management server 20 receives this request (step S460), the management server 20 accesses the anonymous information. The authority is compared with the access authority of the user (step S470), and it is reconfirmed whether or not the user has the authority to access the anonymous information (step S480).

この結果、管理サーバ２０は、当該ユーザが当該匿名情報にアクセスする権限を有していないと判定した場合には（ステップＳ４８０，Ｎｏ）、図２８の処理を終了し、当該ユーザが当該匿名情報にアクセスする権限を有していると判定した場合には（ステップＳ４８０、Ｙｅｓ）、利用日時や当該ユーザの情報（ユーザＩＤ等）を履歴情報として記憶装置２４に記憶させる（ステップＳ４９０）。また、管理サーバ２０は、リクエストを受けた匿名情報を匿名情報ＤＢ１４５から取得し（ステップＳ５００）、リクエスト元のユーザ端末３０に送信して表示させる（ステップＳ５１０）。
As a result, if the management server 20 determines that the user does not have the authority to access the anonymous information (No in step S480), the management server 20 ends the process of FIG. If it is determined that the user has the right to access (Yes in step S480), the use date and time and the user information (user ID, etc.) are stored in the storage device 24 as history information (step S490). Moreover, the management server 20 acquires the anonymous information which received the request from anonymous information DB145 (step S500), transmits to the user terminal 30 of a request origin, and displays it (step S510).

このようにアクセス権限に基づいて権限を有するユーザにのみ匿名情報を送信するので、匿名情報に対するアクセス制御を適切に行うことができる。特に、本実施形態によれば、アクセス管理に用いる匿名情報のアクセス権限を最少出現率等の情報から求め、アクセス権限を自動で設定できる。このため、個人情報を抽象化して匿名化する際に、複数の抽象化候補を生成し、抽象化後の価値に基づいて選択した抽象化候補を匿名情報とするシステムにおいても、前述のように最少出現率等の情報から匿名情報にアクセス権限を付加す
ることで、人手を要することなくアクセス管理を行うことができる。 Thus, since anonymous information is transmitted only to the authorized user based on the access authority, it is possible to appropriately control access to the anonymous information. In particular, according to this embodiment, the access authority of anonymous information used for access management can be obtained from information such as the minimum appearance rate, and the access authority can be automatically set. For this reason, when anonymizing and personalizing personal information, a plurality of abstraction candidates are generated and the abstraction candidate selected based on the value after the abstraction is used as anonymous information as described above. By adding access authority to the anonymous information from the information such as the minimum appearance rate, access management can be performed without requiring manpower.

〈実施形態２〉
図２９は本実施形態２に係る匿名化システムの機能ブロック図である。本実施形態２の匿名化システム２００は、複数の事業者が出展する展示会において、各事業者が来場者から収集した個人情報の匿名化を行うシステムであり、各事業者の匿名化装置１０や、各事業者で匿名化した匿名情報を管理する管理サーバ２０を有する。 <Embodiment 2>
FIG. 29 is a functional block diagram of the anonymization system according to the second embodiment. An anonymization system 200 according to the second embodiment is a system that anonymizes personal information collected from visitors by an operator at an exhibition where a plurality of operators exhibit. Or it has the management server 20 which manages the anonymous information anonymized by each provider.

本実施形態２の匿名化システム２００では、管理サーバ２０が、各事業者の匿名化装置１０から夫々匿名化辞書を取得し、各事業者の匿名化辞書を統合して統合匿名化辞書を生成し、各統合匿名化辞書にＩＤを付して各事業者の匿名化装置１０へ配信する。そして、各事業者の匿名化装置１０が共通の統合匿名化辞書を用いて個人情報を匿名化して匿名情報とし、匿名情報ＤＢ（Data Base）１４５に登録し、前記統合匿名化辞書のＩＤや最少
出現率に基づいて、当該匿名情報に対するアクセスの管理を行う。 In the anonymization system 200 of the second embodiment, the management server 20 acquires anonymization dictionaries from the anonymization devices 10 of the respective operators and integrates the anonymization dictionaries of the respective operators to generate an integrated anonymization dictionary. Then, each integrated anonymization dictionary is given an ID and distributed to the anonymization device 10 of each operator. And each company's anonymization device 10 anonymizes personal information using a common integrated anonymization dictionary to make anonymous information, registers it in an anonymous information DB (Data Base) 145, and the ID of the integrated anonymization dictionary Based on the minimum appearance rate, the access to the anonymous information is managed.

図２９に示すように、管理サーバ２０は、要求受付部２０１や、アクセス制御部２０２、出力制御部２０３、ユーザ管理ＤＢ２５１、辞書取得部２１１、統合部２１２、優先度決定部２１３、辞書管理部２１４、匿名情報登録部２１５、匿名情報制御部２１６、選択部２１７、辞書ＤＢ２３１、優先度ＤＢ２３２を備えている。即ち、本実施形態１の管理サーバ２０は、辞書取得部２１１、統合部２１２、優先度決定部２１３及び選択部２１７を備えた辞書作成装置でもある。 As shown in FIG. 29, the management server 20 includes a request reception unit 201, an access control unit 202, an output control unit 203, a user management DB 251, a dictionary acquisition unit 211, an integration unit 212, a priority determination unit 213, and a dictionary management unit. 214, an anonymous information registration unit 215, an anonymous information control unit 216, a selection unit 217, a dictionary DB 231, and a priority DB 232. That is, the management server 20 according to the first embodiment is also a dictionary creation device including a dictionary acquisition unit 211, an integration unit 212, a priority determination unit 213, and a selection unit 217.

辞書取得部２１１は、対象データに含まれる語を抽象化した語に替えて匿名化するため、前記語と前記抽象化した語とを対応付けて記憶した複数の匿名化辞書を各事業者の匿名化装置１０から取得する。本実施形態では、各事業者の匿名化装置１０から送信された匿名化辞書を辞書取得部２１１が受信し、辞書ＤＢ２３１に登録する。 In order to anonymize the word included in the target data by replacing the word included in the target data with an abstracted word, the dictionary acquiring unit 211 stores a plurality of anonymized dictionaries storing the word and the abstracted word in association with each operator. Obtained from the anonymization device 10. In this embodiment, the dictionary acquisition unit 211 receives the anonymization dictionary transmitted from the anonymization device 10 of each business operator and registers it in the dictionary DB 231.

統合部２１２は、各事業者の匿名化装置１０から取得した複数の匿名化辞書を統合して統合匿名化辞書を作成する。例えば統合部２１２は、複数の匿名化辞書に含まれる各語の対応関係に基づいて、抽象化した語を上位、抽象化前の語を下位とし、前記複数の匿名化辞書に含まれる各語と、前記複数の匿名化辞書に存在する上位及び下位の語とを対応付け、対応する上位の語が存在しない最上位の語をルートとして対応する下位の語が存在しない最下位の語までのツリー状の対応関係にある語の次元を前記最上位の語毎に生成し、統合匿名化辞書として辞書ＤＢ２３１に記憶させる。この各最上位の語をルートとするツリー状の語の次元が統合匿名化辞書を構成する。 The integration unit 212 integrates a plurality of anonymization dictionaries acquired from each company's anonymization device 10 to create an integrated anonymization dictionary. For example, the integration unit 212 sets each word included in the plurality of anonymization dictionaries based on the correspondence relationship between the words included in the plurality of anonymization dictionaries. And the upper and lower words existing in the plurality of anonymization dictionaries, and the highest word that does not have a corresponding higher word as a root to the lowest word that does not have a corresponding lower word. A dimension of a word having a tree-like correspondence is generated for each top word and stored in the dictionary DB 231 as an integrated anonymization dictionary. The dimension of the tree-like word rooted at each uppermost word constitutes an integrated anonymization dictionary.

優先度決定部２１３は、前記統合匿名化辞書を構成する次元の夫々について、当該次元に含まれる語に基づいて優先度を決定する。例えば、優先度決定部２１３は、各次元に含まれる語の数、各次元に含まれる語について上位と下位の関係にある段階の数、各次元に
含まれる語の価値のうち少なくとも一つに基づいて前記優先度を決定する。なお、前記語について予め定めた値を、例えば優先度ＤＢ２３２が記憶しておき、優先度決定部２１３は、優先度ＤＢ２３２を参照して優先度を決定する。 The priority determination unit 213 determines the priority for each dimension constituting the integrated anonymization dictionary based on the words included in the dimension. For example, the priority determination unit 213 sets at least one of the number of words included in each dimension, the number of stages having a higher and lower relationship for the words included in each dimension, and the value of the word included in each dimension. Based on the priority, the priority is determined. For example, the priority DB 232 stores a predetermined value for the word, and the priority determination unit 213 refers to the priority DB 232 to determine the priority.

選択部２１７は、前記統合部２１２で生成した複数の次元のうち、統合匿名化辞書として採用する次元と採用しない次元とを前記優先度に基づいて選択する。 The selection unit 217 selects a dimension to be adopted as the integrated anonymization dictionary and a dimension not to be adopted among the plurality of dimensions generated by the integration unit 212 based on the priority.

辞書管理部２１４は、統合部２１２で作成された統合匿名化辞書を管理する。例えば辞書管理部２１４は、統合匿名化辞書を辞書ＤＢ２３１から読み出して各事業者の匿名化装置１０へ配信する。 The dictionary management unit 214 manages the integrated anonymization dictionary created by the integration unit 212. For example, the dictionary management unit 214 reads the integrated anonymization dictionary from the dictionary DB 231 and distributes it to the anonymization device 10 of each operator.

匿名情報登録部２１５は、各事業者の匿名化装置１０から匿名情報を取得し、共通ＤＢ２３３に登録する。 The anonymous information registration unit 215 acquires anonymous information from each company's anonymization device 10 and registers it in the common DB 233.

匿名情報制御部２１６は、共通ＤＢ２３３に登録された匿名情報の出力処理等を制御する。例えば、匿名化装置１０等の情報処理装置から匿名情報の取得要求を受けた場合に、該当する匿名情報を要求元の情報処理装置へ配信する。本実施形態１において、匿名情報制御部２１６は、出力部の一形態である。 The anonymous information control unit 216 controls an output process of anonymous information registered in the common DB 233 and the like. For example, when an anonymous information acquisition request is received from an information processing apparatus such as the anonymization apparatus 10, the corresponding anonymous information is distributed to the requesting information processing apparatus. In the first embodiment, the anonymous information control unit 216 is a form of an output unit.

図３０は辞書ＤＢ２３１の例を示す図である。辞書ＤＢ２３１は、抽象化前のワード（以下、下位のワードとも称す）と、当該ワードを抽象化した後のワード(以下、上位のワ
ードとも称す)とを対応付けて記憶している。 FIG. 30 is a diagram illustrating an example of the dictionary DB 231. The dictionary DB 231 stores a word before abstraction (hereinafter also referred to as a lower word) and a word after abstraction of the word (hereinafter also referred to as an upper word) in association with each other.

図３１は、優先度ＤＢ２３２の例を示す図である。優先度ＤＢ２３２は、各ワードについて、優先度を決定するための値（価値）を記憶している。図３１の例では、各ワードに対して、１日当たりのクリック数、１日当たりの表示回数、参入企業数、１日当たりのコスト、クリック率、ＳＥＭ価格（獲得価格）など、ＳＥＭに用いられる値が記憶されている。 FIG. 31 is a diagram illustrating an example of the priority DB 232. The priority DB 232 stores a value (value) for determining the priority for each word. In the example of FIG. 31, for each word, the values used in the SEM, such as the number of clicks per day, the number of display times per day, the number of participating companies, the cost per day, the click rate, the SEM price (acquired price), etc. It is remembered.

図３２は、共通ＤＢ２３３の例を示す図である。共通ＤＢ２３３は、各事業者の匿名化装置１０で統合匿名化辞書を用いて匿名化した匿名情報を記憶している。図３２の例では、来訪ブース、年齢、性別、所属企業、役職、興味を示した商品、ステータスなどの項目のデータを記憶している。この項目や各項目の抽象化の程度は、後述のように統合匿名化辞書や検定の結果等によって決まる。 FIG. 32 is a diagram illustrating an example of the common DB 233. The common DB 233 stores anonymized information that has been anonymized using the integrated anonymization dictionary by the anonymization device 10 of each business operator. In the example of FIG. 32, data of items such as a visit booth, age, sex, affiliated company, job title, product showing interest, and status are stored. The degree of abstraction of this item and each item is determined by the integrated anonymization dictionary, the result of the test, etc. as will be described later.

また、各事業者の匿名化装置１０は、図２９に示すように、データ取得部１０１や、抽象化部１０２、検定部１０３、選択部１０４、価値判定部１０６、価値データ取得部１０７、ワードカテゴリ分析部１０８、ワード価値計算部１０９、出現数取得部１１１、権限決定部１１２、出力制御部１２１、個人情報ＤＢ１３１、公開条件ＤＢ１３２、検索情報蓄積ＤＢ１３３、一時処理ＤＢ１３４、権限設定ＤＢ（権限記憶部）１３５を備えている。 Further, as shown in FIG. 29, the anonymization device 10 of each business operator includes a data acquisition unit 101, an abstraction unit 102, a test unit 103, a selection unit 104, a value determination unit 106, a value data acquisition unit 107, a word Category analysis unit 108, word value calculation unit 109, appearance number acquisition unit 111, authority determination unit 112, output control unit 121, personal information DB 131, disclosure condition DB 132, search information storage DB 133, temporary processing DB 134, authority setting DB (authorization memory) Part) 135.

データ取得部１０１は、個人と対応付けられた複数の項目を含むデータ、即ち個人情報を対象データとして取得する。例えば来場者が記載したアンケートや来場者から聞き取った個人情報をキーボード等から入力して個人情報ＤＢ１３１に記憶しておき、この個人情報を個人情報ＤＢ１３１からデータ取得部１０１が対象データとして読み出す。また、来場者の名刺やアンケートに記載された事項を読み取り、ＯＣＲ（Optical Character Recognition）により電子データとしても良いし、来場者のＲＦ−ＩＤタグやＩＣチップ等か
ら当該来場者の情報を取得しても良い。 The data acquisition unit 101 acquires data including a plurality of items associated with an individual, that is, personal information as target data. For example, a questionnaire written by a visitor and personal information heard from the visitor are input from a keyboard or the like and stored in the personal information DB 131, and the data acquisition unit 101 reads the personal information from the personal information DB 131 as target data. It is also possible to read the information on the visitor's business card or questionnaire and use it as electronic data by OCR (Optical Character Recognition), or to obtain information about the visitor from the visitor's RF-ID tag or IC chip. May be.

権限決定部１１２は、前記匿名情報の出現数又はこの出現数に基づいて算出した最少出現率等の値に基づいて当該匿名情報のアクセス権限を決定し、匿名情報に付加して匿名情報ＤＢ１４５に記憶させる。権限決定部１１２は、例えば、匿名情報の出現数又は前記出
現数に基づいて算出した出現率等の値と前記アクセス権限とを対応付けて記憶した権限記憶部を参照して、前記権限決定部が前記匿名情報の出現数に基づく前記アクセス権限を決定する。 The authority determining unit 112 determines the access authority of the anonymous information based on the number of appearances of the anonymous information or a value such as the minimum appearance rate calculated based on the number of appearances, and adds it to the anonymous information DB 145 in addition to the anonymous information. Remember me. The authority determining unit 112 refers to, for example, the authority determining unit that refers to the authority storage unit that stores the access authority in association with a value such as the number of appearances of anonymous information or the appearance rate calculated based on the number of appearances. Determines the access authority based on the number of appearances of the anonymous information.

出力制御部１２１は、前記検定の条件を満たした抽象化候補データを匿名情報として出力する。例えば、出力制御部１２１は、匿名情報を管理サーバ２０へ送信する。 The output control unit 121 outputs the abstraction candidate data that satisfies the test condition as anonymous information. For example, the output control unit 121 transmits anonymous information to the management server 20.

図３３は、個人情報ＤＢ１３１の例を示す図である。個人情報ＤＢ１３１は、データ取得部１０１で取得した個人情報を記憶している。図３３の例では氏名、メール、所属企業名、役職、興味、ステータス等を記憶している。 FIG. 33 is a diagram illustrating an example of the personal information DB 131. The personal information DB 131 stores personal information acquired by the data acquisition unit 101. In the example of FIG. 33, name, mail, company name, job title, interest, status, etc. are stored.

図３４は、匿名情報ＤＢ１４５に記憶される匿名情報の一例を示す図である。匿名情報は、個人情報の各語を抽象化したものであり、図３４の例では、年代、住所（都道府県名）、車種、最少出現数を対応付けて記憶している。 FIG. 34 is a diagram illustrating an example of anonymous information stored in the anonymous information DB 145. Anonymous information is an abstraction of each word of personal information, and in the example of FIG. 34, age, address (prefecture name), vehicle type, and minimum number of appearances are stored in association with each other.

図３５は、匿名情報へのアクセスを管理するための情報（以下アクセス管理情報とも称す）の一例を示す図である。このアクセス管理情報は、図３５に示すように、例えば、レベル、匿名情報ＩＤ、使用辞書、最少出現数率、情報種別、概要等を含んでいる。ここで、レベルは、当該匿名情報にアクセス可能な権限を示す情報であり、後述のように当該匿名情報の最少出現数や最少出現数に基づいて算出した最少出現率等の値に基づいて求めている。 FIG. 35 is a diagram illustrating an example of information for managing access to anonymous information (hereinafter also referred to as access management information). As shown in FIG. 35, this access management information includes, for example, a level, an anonymous information ID, a usage dictionary, a minimum appearance rate, an information type, and an outline. Here, the level is information indicating an authority to access the anonymous information, and is obtained based on a value such as the minimum appearance rate calculated based on the minimum appearance number or the minimum appearance number of the anonymous information as described later. ing.

匿名情報ＩＤは、匿名情報を一意に識別する情報である。使用辞書は、当該匿名情報の匿名化に用いた辞書を示す情報であり、例えば各辞書の識別情報である。最少出現率は、当該匿名情報を構成する語の全数に対する最少出現数の割合である。ここで最少出現数は、当該匿名情報において同じ属性値を持つ個人の人数、即ち匿名情報を構成する語の出現数を求めた場合に、この同じ属性値毎の人数（出現数）のうち最少のものである。 The anonymous information ID is information that uniquely identifies anonymous information. The use dictionary is information indicating a dictionary used for anonymization of the anonymous information, for example, identification information of each dictionary. The minimum appearance rate is a ratio of the minimum number of appearances to the total number of words constituting the anonymous information. Here, the minimum number of appearances is the smallest of the number of individuals (the number of appearances) for each same attribute value when the number of individuals having the same attribute value in the anonymous information, that is, the number of appearances of words constituting the anonymous information is obtained. belongs to.

情報種別は、当該匿名情報が複数の個人情報に基づく統計情報であるか、特定の事業者が持つ個人情報を匿名化したものか等の種別を示す。図３５の例では、匿名情報が、複数の匿名情報の平均や合計を求めた統計情報の場合に、この種別を平均又は合計と示し、特定の事業者の個人情報を匿名化したものの場合、この事業者の名称を示している。概要は、当該匿名情報の説明であり、例えば匿名情報に含まれる項目や匿名化の条件を示す。 The information type indicates a type such as whether the anonymous information is statistical information based on a plurality of personal information or anonymized personal information held by a specific business operator. In the example of FIG. 35, in the case where the anonymous information is statistical information obtained by calculating the average or total of a plurality of anonymous information, this type is shown as average or total, and the personal information of a specific business operator is anonymized, The name of this company is shown. The overview is an explanation of the anonymous information, and shows items included in the anonymous information and conditions for anonymization, for example.

なお、匿名情報ＤＢ１４５は、匿名化装置１０又は管理サーバ２０が備えた記憶装置に格納されたものでも良いし、匿名化装置１０及び管理サーバ２０からアクセス可能であれ
ば独立したファイルサーバ等の装置に格納されたものでも良い。 The anonymous information DB 145 may be stored in a storage device included in the anonymization device 10 or the management server 20, or an apparatus such as an independent file server if accessible from the anonymization device 10 and the management server 20. It may be stored in.

図３６は管理サーバ２０のハードウェア構成を示す図である。管理サーバ２０は、ＣＰＵ２１、メモリ２２、通信制御部２３、記憶装置２４、入出力インタフェース２５を有する所謂コンピュータである。 FIG. 36 is a diagram illustrating a hardware configuration of the management server 20. The management server 20 is a so-called computer having a CPU 21, a memory 22, a communication control unit 23, a storage device 24, and an input / output interface 25.

ＣＰＵ２１は、メモリ２２に実行可能に展開されたプログラムを実行し、前述の辞書取得部２１１や、統合部２１２、優先度決定部２１３、辞書管理部２１４、匿名情報登録部２１５、匿名情報制御部２１６、選択部２１７、要求受付部２０１、アクセス制御部２０２、出力制御部２０３の機能を提供する。 The CPU 21 executes the program expanded in the memory 22 so as to be executable, and the above-described dictionary acquisition unit 211, integration unit 212, priority determination unit 213, dictionary management unit 214, anonymous information registration unit 215, anonymous information control unit 216, a selection unit 217, a request reception unit 201, an access control unit 202, and an output control unit 203 are provided.

記憶装置２４は、外部記憶装置ということもできる。記憶装置２４としては、ＳＳＤ（Solid State Drive）やＨＤＤ等であってもよい。記憶装置２４は、ドライブ装置との間
で、データを授受する。例えば、記憶装置２４は、ドライブ装置からインストールされる情報処理プログラム等を記憶する。また、記憶装置２４は、プログラムを読み出し、メモリ２２に引き渡す。本実施形態では、記憶装置２４が前述の辞書ＤＢ２３１、優先度ＤＢ２３２、共通ＤＢ２３３を格納している。 The storage device 24 can also be referred to as an external storage device. The storage device 24 may be an SSD (Solid State Drive), an HDD, or the like. The storage device 24 exchanges data with the drive device. For example, the storage device 24 stores an information processing program installed from the drive device. The storage device 24 reads out the program and delivers it to the memory 22. In the present embodiment, the storage device 24 stores the dictionary DB 231, the priority DB 232, and the common DB 233 described above.

図３７は匿名化装置１０のハードウェア構成を示す図である。匿名化装置１０は、ＣＰＵ１１、メモリ１２、通信制御部１３、記憶装置１４、入出力インタフェース１５を有する所謂コンピュータである。 FIG. 37 is a diagram illustrating a hardware configuration of the anonymization device 10. The anonymization device 10 is a so-called computer having a CPU 11, a memory 12, a communication control unit 13, a storage device 14, and an input / output interface 15.

ＣＰＵ１１は、メモリ１２に実行可能に展開されたプログラムを実行し、前述のデータ取得部１０１や、抽象化部１０２、検定部１０３、選択部１０４、価値判定部１０６、価値データ取得部１０７、ワードカテゴリ分析部１０８、ワード価値計算部１０９、出現数取得部１１１、権限決定部１１２、出力制御部１２１の機能を提供する。 The CPU 11 executes the program expanded in an executable manner in the memory 12 and executes the data acquisition unit 101, the abstraction unit 102, the test unit 103, the selection unit 104, the value determination unit 106, the value data acquisition unit 107, the word The functions of the category analysis unit 108, the word value calculation unit 109, the appearance number acquisition unit 111, the authority determination unit 112, and the output control unit 121 are provided.

通信制御部１３は、ネットワークを介して他の装置と接続し、当該装置との通信を制御する。入出力インタフェース１５は、表示装置やプリンタ等の出力手段や、キーボードやポインティングデバイス等の入力手段、ドライブ装置等の入出力手段が適宜接続される。ドライブ装置は、着脱可能な記憶媒体の読み書き装置であり、例えば、フラッシュメモリカードの入出力装置、ＵＳＢメモリを接続するＵＳＢのアダプタ等である。また、着脱可能な記憶媒体は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、
ブルーレイディスク（Blu-ray Disc）等のディスク媒体であってもよい。ドライブ装置は、着脱可能な記憶媒体からプログラムを読み出し、記憶装置１４に格納する。 The communication control unit 13 is connected to another device via a network and controls communication with the device. The input / output interface 15 is appropriately connected to output means such as a display device and a printer, input means such as a keyboard and pointing device, and input / output means such as a drive device. The drive device is a removable storage medium read / write device, such as an input / output device for a flash memory card, a USB adapter for connecting a USB memory, or the like. The removable storage medium is, for example, a CD (Compact Disc), a DVD (Digital Versatile Disk),
It may be a disc medium such as a Blu-ray Disc. The drive device reads the program from the removable storage medium and stores it in the storage device 14.

記憶装置１４は、外部記憶装置ということもできる。記憶装置１４としては、ＳＳＤ（Solid State Drive）やＨＤＤ等であってもよい。記憶装置１４は、ドライブ装置との間
で、データを授受する。例えば、記憶装置１４は、ドライブ装置からインストールされるプログラム等を記憶する。また、記憶装置１４は、プログラムを読み出し、メモリ１２に引き渡す。本実施形態では、記憶装置１４が前述の個人情報ＤＢ１３１、公開条件ＤＢ１３２、検索情報蓄積ＤＢ１３３、一時処理ＤＢ１３４、権限設定ＤＢ（権限記憶部）１３５を格納している。 The storage device 14 can also be called an external storage device. The storage device 14 may be an SSD (Solid State Drive), an HDD, or the like. The storage device 14 exchanges data with the drive device. For example, the storage device 14 stores a program installed from the drive device. Further, the storage device 14 reads out the program and delivers it to the memory 12. In the present embodiment, the storage device 14 stores the personal information DB 131, the disclosure condition DB 132, the search information accumulation DB 133, the temporary processing DB 134, and the authority setting DB (authority storage unit) 135 described above.

§３．匿名化方法
次に図３８〜図４５を用いて匿名化方法について説明する。図３８は、管理サーバ２０がプログラムに従って実行する統合匿名化辞書を作成する処理の説明図である。 §3. Anonymization method Next, the anonymization method will be described with reference to FIGS. 38 to 45. FIG. 38 is an explanatory diagram of processing for creating an integrated anonymization dictionary executed by the management server 20 according to a program.

（３−１）統合匿名化辞書の作成
まず、管理サーバ２０は、各事業者の匿名化装置１０から各事業者の匿名化辞書を受信する（ステップＳ５１０）。 (3-1) Creation of Integrated Anonymization Dictionary First, the management server 20 receives each business operator's anonymization dictionary from each business operator's anonymization device 10 (step S510).

次に管理サーバ２０は、各事業者の匿名化辞書を統合する（ステップＳ５２０）。なお、匿名化辞書を統合する際の具体的な処理については後述する。 Next, the management server 20 integrates the anonymization dictionary of each business operator (step S520). In addition, the specific process at the time of integrating an anonymization dictionary is mentioned later.

また、管理サーバ２０は、統合匿名化辞書を構成するワードの次元について、優先度を決定し（ステップＳ５３０）、この優先度に基づいて統合匿名化辞書に採用する次元と採用しない次元とを選択する（ステップＳ５４０）。
次に管理サーバ２０は、作成した統合匿名化辞書を一意に識別できるように、辞書ＩＤ当該統合匿名化辞書に付す（ステップＳ５５０）。辞書ＩＤは、例えば辞書を示す情報“Ｄ”と、作成順にカウントするシリアルナンバー“１，２，３・・・”とを組み合わせて生成される。
また、管理サーバ２０は、作成した統合匿名化辞書に係る権限情報を権限設定ＤＢ１３５に登録する。即ち、作成した統合匿名化辞書“Ｄ４”を用いて匿名化した匿名情報の最少出現率が０．０５％以下であればランクＡ、最少出現率が０．１％以下であればランクＣのように、作成した統合匿名化辞書を用いて匿名化した匿名情報の権限情報を判断するための権限情報を
例えば、提供元の事業者名や、当該事業者の業種、匿名化辞書のジャンル、匿名化辞書の重要度、統合した匿名化辞書の数といった匿名化辞書や提供元の事業者に係る情報と権限情報とを対応付けて記憶した権限テーブルを記憶装置２４に予め記憶しておき、管理サーバ２０は、取得した匿名化辞書や提供元の事業者に係る情報と対応する権限情報を権限テーブルから取得して権限設定ＤＢ１３５に登録する。また、ステップＳ５１０で、各事業者の匿名化装置１０から匿名化辞書と共に、ランクや提供先等の権限情報を受信し、これを当該事業者の権限情報として権限設定ＤＢ１３５に登録する。更に、管理サーバ２０は、担当者に権限情報の入力を促し、入力された権限情報を権限設定ＤＢ１３５に登録しても良い。 Further, the management server 20 determines priorities for the dimensions of the words constituting the integrated anonymization dictionary (step S530), and selects a dimension to be adopted for the integrated anonymization dictionary and a dimension not to be adopted based on this priority. (Step S540).
Next, the management server 20 attaches a dictionary ID to the integrated anonymization dictionary so that the created integrated anonymization dictionary can be uniquely identified (step S550). The dictionary ID is generated by combining, for example, information “D” indicating a dictionary and serial numbers “1, 2, 3,...” That are counted in the order of creation.
Further, the management server 20 registers authority information related to the created integrated anonymization dictionary in the authority setting DB 135. That is, rank A if the minimum appearance rate of anonymous information anonymized using the created integrated anonymization dictionary “D4” is 0.05% or less, rank C if the minimum appearance rate is 0.1% or less. For example, authority information for judging authority information of anonymous information anonymized using the created integrated anonymization dictionary is, for example, the provider name of the provider, the business type of the provider, the genre of the anonymization dictionary, An authority table storing the anonymization dictionary such as the importance of the anonymization dictionary and the number of integrated anonymization dictionaries and information related to the provider of the provider and the authority information is stored in the storage device 24 in advance, The management server 20 acquires authority information corresponding to the acquired anonymization dictionary and information related to the provider of the provider from the authority table and registers it in the authority setting DB 135. In step S510, the authority information such as rank and provision destination is received together with the anonymization dictionary from the anonymization device 10 of each company, and this is registered in the authority setting DB 135 as the authority information of the company. Further, the management server 20 may prompt the person in charge to input authority information and register the input authority information in the authority setting DB 135.

そして、管理サーバ２０は、ステップＳ５４０で選択した次元から構成される統合匿名化辞書を各匿名化装置１０へ配信する（ステップＳ５５０）。 And the management server 20 delivers the integrated anonymization dictionary comprised from the dimension selected by step S540 to each anonymization apparatus 10 (step S550).

図３９は、ステップＳ５２０における匿名化辞書を統合する処理の説明図である。管理サーバ２０は、先ず、各事業者の匿名化辞書を記憶した辞書ＤＢ２３１から最下位のワードを抽出する（ステップＳ６１０）。例えば各事業者の匿名化辞書には、図３０に示すよ
うに「ソフトＡ」を抽象化した語が「伝票ソフト」と記憶されており、「ソフトＡ」に対して一段階上位のワードが「伝票ソフト」であることがわかる。同様に、「ソフトＺ」を抽象化した語が「伝票ソフト」であり、「ソフトＢ」を抽象化した語が「会計ソフト」である。 FIG. 39 is an explanatory diagram of the process of integrating the anonymization dictionary in step S520. The management server 20 first extracts the lowest word from the dictionary DB 231 storing the anonymization dictionary of each business operator (step S610). For example, in each company's anonymization dictionary, as shown in FIG. 30, an abstract word of “soft A” is stored as “slip software”, and a word one level higher than “soft A” is stored. It turns out that it is "slip software". Similarly, a word that abstracts “soft Z” is “slip software”, and a word that abstracts “soft B” is “accounting software”.

更に、「ソフトＡ」や「ソフトＺ」に対して一段階上位のワードである「伝票ソフト」についても一段階上位のワードが「業務ソフト」と記憶されている。 In addition, for “slip software” which is a word one level higher than “soft A” and “soft Z”, the word one level higher is stored as “business software”.

このように辞書ＤＢ２３１に上位・下位の関係と共に記憶されているワードのうち、下位のワードと対応付けられていないワード、即ち最も下位のワードを一つ抽出する。 In this way, out of the words stored in the dictionary DB 231 together with the upper / lower relationship, one word that is not associated with the lower word, that is, the lowest word is extracted.

次に管理サーバ２０は、ステップＳ６１０で抽出したワードより一つ上位のワードを求め、一つ上位の段階（抽象化レベル）を設定する（ステップＳ６２０）。例えば、ステップＳ６１０で抽出したワードが「ソフトＡ」であれば、「伝票ソフト」を一段階上位のワードとして抽出する。 Next, the management server 20 obtains a word one higher than the word extracted in step S610, and sets one higher level (abstraction level) (step S620). For example, if the word extracted in step S610 is “software A”, “slip software” is extracted as a word one level higher.

管理サーバ２０は、ステップＳ６２０で抽出したワードと対応する一つ下位のワードと同じ段階（抽象化レベル）のワードを抽出する（ステップＳ６３０）。例えば、ステップＳ６２０で抽出したワードが「伝票ソフト」であれば、「ソフトＡ」と同じ段階の「ソフトＺ」が抽出される。 The management server 20 extracts a word at the same stage (abstraction level) as the one-lower word corresponding to the word extracted in step S620 (step S630). For example, if the word extracted in step S620 is “slip software”, “soft Z” at the same stage as “soft A” is extracted.

更に、管理サーバ２０は、ステップＳ６３０で抽出したワードと対応する下位のワードがあれば抽出し、対応する下位のワードが無くなるまで下位のワードの抽出を繰り返す（ステップＳ６４０）。 Further, the management server 20 extracts the lower word corresponding to the word extracted in step S630, and repeats the extraction of the lower word until there is no corresponding lower word (step S640).

ステップＳ６４０で、下位のワードが出尽くした場合に、管理サーバ２０は、直前のステップＳ６２０又はステップＳ６６０で設定した段階が最上位か否か、即ち更に上位のワードが存在するか否かを判定し、最上位でなければ（ステップＳ６５０，Ｎｏ）、一つ上位のワードを求め、一つ上位の段階（抽象化レベル）を設定してステップと１３０に戻る（ステップＳ６６０）。例えば、ステップＳ６２０で設定したワードが「伝票ソフト」であった場合、一つ上位のワード「業務ソフト」を求め、一つ上位の段階として設定する。 When the lower word is exhausted in step S640, the management server 20 determines whether or not the stage set in the immediately preceding step S620 or step S660 is the highest, that is, whether or not there is a higher word. If it is not the most significant (No in step S650), the word one higher is obtained, the one higher level (abstraction level) is set, and the process returns to step 130 (step S660). For example, if the word set in step S620 is “slip software”, the word “business software” that is one higher level is obtained and set as one level higher.

そして、ステップＳ６３０へ戻り、ステップＳ６３０，Ｓ６４０の処理を行った後、ステップＳ６５０で、直前のステップＳ６２０又はステップＳ６６０で設定した段階が最上位と判定した場合（ステップＳ６５０，Ｙｅｓ）、前記複数の匿名化辞書に含まれる全てのワードの処理が終了したか否かを判定し（ステップＳ６７０）、残りのワードがあれば（ステップＳ６７０，Ｎｏ）、ステップＳ６１０に戻って処理を繰り返し、全てのワードの処理が終了したならば（ステップＳ６７０，Ｙｅｓ）図３９の処理を終了する。 Then, after returning to step S630 and performing the processing of steps S630 and S640, if it is determined in step S650 that the stage set in the immediately preceding step S620 or step S660 is the highest (step S650, Yes), the plurality of It is determined whether or not all the words included in the anonymization dictionary have been processed (step S670), and if there are remaining words (step S670, No), the process returns to step S610 to repeat the process, and all the words If the above process is completed (step S670, Yes), the process in FIG. 39 is terminated.

（３−２）次元の説明
図４０は、図３９の処理によって作成される各次元の説明図である。図４０の例では、「ＩＴ製品」をルートとする次元について示している。即ち、図４０の次元において、「ＩＴ製品」が最上位の段階のワードである。 (3-2) Description of Dimensions FIG. 40 is an explanatory diagram of each dimension created by the process of FIG. In the example of FIG. 40, a dimension having “IT product” as a root is shown. That is, in the dimension of FIG. 40, “IT product” is the word at the highest level.

「ＩＴ製品」は、その一つ下位の段階（図４０の例では段階４）のワードとして「ソフト」「ハード」が対応付けられている。そして、「ソフト」は、その一つ下位の段階（図４０の例では段階３）のワードとして「業務ソフト」「個人ソフト」が対応付けられている。 “IT product” is associated with “software” and “hardware” as words at the next lower level (step 4 in the example of FIG. 40). “Software” is associated with “business software” and “individual software” as a word at the next lower level (step 3 in the example of FIG. 40).

また、「業務ソフト」は、その一つ下位の段階（図４０の例では段階２）のワードとし
て「伝票ソフト」「会計ソフト」「顧客管理ソフト」が対応付けられ、「伝票ソフト」は、その一つ下位の段階（図４０の例では段階１、最下位の段階）のワードとして「ソフトＡ」「ソフトＺ」が対応付けられている。なお、「個人ソフト」は、その一つ下位の段階のワードとして「ソフトＶ」「ソフトＵ」と対応付けられ、「ハード」は、その一つ下位の段階のワードとして「サーバＤ」「サーバＥ」と対応付けられている。 In addition, “business software” is associated with “slip software”, “accounting software”, and “customer management software” as the words of the next lower stage (stage 2 in the example of FIG. 40). “Soft A” and “Soft Z” are associated with the words at one lower level (step 1 in the example of FIG. 40, the lowest level). The “individual software” is associated with “soft V” and “soft U” as the words in the lower level, and “hard” is “server D” and “server” as the words in the lower level. E ”.

このように本実施形態の統合部は、各事業者の匿名化辞書に基づいて図４０に示すような次元を複数作成する。ここで次元は、最上位のワードをルートとし、最下位のワードにかけて樹状に対応付けられた対応関係であり、最上位のワード毎に生成される。即ち統合部は、各事業者の匿名化辞書に含まれる全てのワードをまとめて樹状に対応つけて複数の次元とすることにより匿名化辞書を統合化している。そして、この複数の次元が、統合匿名化辞書である。 As described above, the integration unit of the present embodiment creates a plurality of dimensions as shown in FIG. 40 based on the anonymization dictionary of each business operator. Here, the dimension is a correspondence relationship in which the highest word is rooted and associated with the lowest word in a tree form, and is generated for each highest word. That is, the integration unit integrates the anonymization dictionary by combining all words included in the anonymization dictionary of each business operator into a plurality of dimensions by associating them with a tree. The plurality of dimensions is an integrated anonymization dictionary.

図４１は複数の次元の説明図である。図４１に示すように、あるワードを抽象化する次元は複数存在し得る。例えば、図４１の次元ａでは、「ソフトウェアＡ」を「会計ソフト」、「業務ソフト」に抽象化し、次元ｃでは、「ソフトウェアＡ」を「ａ社製品」、「パッケージ」に抽象化する。また、次元ｂや次元ｄでもそれぞれ異なるワードに抽象化する。 FIG. 41 is an explanatory diagram of a plurality of dimensions. As shown in FIG. 41, there can be multiple dimensions for abstracting a word. For example, in dimension a in FIG. 41, “software A” is abstracted into “accounting software” and “business software”, and in dimension c, “software A” is abstracted into “a company product” and “package”. Also, the dimension b and dimension d are abstracted into different words.

特に本実施形態の統合匿名化辞書は、多数の事業者の匿名化辞書を統合しているので、例えば数十〜数百の次元を含むことになり、全ての次元を用いて抽象化を行うと、データ量が膨大になってしまう。このため、本実施形態では、統合匿名化辞書の各次元について、抽象化に採用する次元の優先度を決定している。 In particular, since the integrated anonymization dictionary of this embodiment integrates anonymization dictionaries of a large number of operators, for example, it includes tens to hundreds of dimensions, and abstraction is performed using all dimensions. And the amount of data becomes enormous. For this reason, in this embodiment, the priority of the dimension employ | adopted for abstraction is determined about each dimension of an integrated anonymization dictionary.

（３−３）優先度の説明
次に、図４１〜図４３を用いてステップＳ３０における優先度の決定処理の詳細について説明する。図４２は、図４１に示した次元に含まれる各ワードに重み付けをした例を示す図である。図４２の例では、各次元に含まれるワードの夫々が、当該ワードの段階と対応付けて記憶されると共に、三種類の重み付けが行われる。重み付け１では、重要フラグの有無を付し、重み付け２では、検索回数を付し、重み付け３では、ＳＥＭ（Search Engine Marketing）価格を付している。ここで重要フラグは、ユーザが重要か否かを入力し
た値であり、重要なワード、即ち抽象化に利用したいワードには重要と記録する（重要フラグを立てる）。 (3-3) Description of Priority Next, details of the priority determination process in step S30 will be described with reference to FIGS. FIG. 42 is a diagram illustrating an example in which each word included in the dimension illustrated in FIG. 41 is weighted. In the example of FIG. 42, each word included in each dimension is stored in association with the stage of the word, and three types of weighting are performed. Weight 1 indicates the presence / absence of an important flag, weight 2 indicates the number of searches, and weight 3 indicates a SEM (Search Engine Marketing) price. Here, the important flag is a value input as to whether or not the user is important, and is recorded as important for an important word, that is, a word to be used for abstraction (an important flag is set).

また、優先度決定部２１３は、図３１に示す優先度ＤＢ２３２からワードの価値を読み出し、図４１に示すように対応するワードに重み付けとして付加する。 Moreover, the priority determination part 213 reads the value of a word from the priority DB232 shown in FIG. 31, and adds it to a corresponding word as a weight as shown in FIG.

そして図４１に示した次元のワードの数や、段階の和、各ワードの重み付けを次元毎に集計して、優先度を決定する。 Then, the number of words in the dimension shown in FIG. 41, the sum of steps, and the weight of each word are totaled for each dimension to determine the priority.

図４３は、各ワードの重みを集計して各次元の優先度を求める処理の説明図である。図４３において、次元ａの各ワードについて、ワード数、段階数の和、重み付け１、重み付け２、重み付け３を集計したものが表５１Ａである。同様に次元ｂを集計した表が５１Ｂ、次元ｃを集計した表が５１Ｃである。 FIG. 43 is an explanatory diagram of a process for calculating the priority of each dimension by adding up the weights of the respective words. In FIG. 43, for each word of dimension a, Table 51A is a summation of the number of words, the sum of the number of steps, weight 1, weight 2, and weight 3. Similarly, a table summarizing dimension b is 51B, and a table summing dimension c is 51C.

ワード数は、各次元に含まれるワードの総数であり、図４３の例では、次元ａが２５、次元ｂが５０、次元ｃが９である。このワード数が多いと、抽象化のバリエーションが多く、ｌ−多様性を満たし難くなる、即ち安全性が低くなることが考えられるが、データとしての詳細性は高いため、ワード数が多いものを優先する。 The number of words is the total number of words included in each dimension. In the example of FIG. 43, dimension a is 25, dimension b is 50, and dimension c is 9. If this number of words is large, there will be many variations of abstraction, and it will be difficult to satisfy 1-diversity, that is, safety will be low. Prioritize.

段階数の和とは、段階の数に、当該段階に属するワードの数を乗じ、総計を求めたものであり、例えば（段階数５×ワード数１）＋（段階数４×ワード数２）＋（段階数３×ワード数２）＋（段階数２×ワード数３）＋（段階数１×ワード数９）＝３４と求める。この段階数の和が多いと、上位の段階が多く存在し、抽象度の高い選択肢が多く存在することになり、適切な抽象化レベルで抽象化可能で、安全性が高いため、段階数の和が多いものを優先する。 The sum of the number of stages is obtained by multiplying the number of stages by the number of words belonging to the stage and obtaining a total, for example, (number of stages 5 × number of words 1) + (number of stages 4 × number of words 2). + (Number of stages 3 × number of words 2) + (number of stages 2 × number of words 3) + (number of stages 1 × number of words 9) = 34 If the sum of the number of stages is large, there are many higher-level stages, and there are many options with a high level of abstraction, which can be abstracted at an appropriate level of abstraction and is highly secure. Priority is given to those with a high sum.

同様に、重み付け１〜３についても、重要フラグの数や、検索回数、ＳＥＭ価格の総計を求め、この値の高い、即ち価値の高いものを優先する。 Similarly, for the weights 1 to 3, the total number of important flags, the number of searches, and the SEM price are obtained, and a higher value, that is, a higher value is given priority.

そして、これらワード数、段階数の和、重み付け１〜３について、次式に基づいて全体出現率（全体数に対する割合）を求める。 And about these word number, the sum of the number of steps, and the weights 1-3, the whole appearance rate (ratio with respect to the whole number) is calculated | required based on following Formula.

全体出現率＝ｔｆ／ｉｄｆ
＝次元ａの値／（次元ａの値＋次元ｂの値＋次元ｃの値＋・・・）
この全体出現率を各次元について比較したものが表５２である。表５２の各次元について、ワード数、段階数の和、重み付け１〜３の全体出現率を合計して全体優先度を定めている。 Overall appearance rate = tf / idf
= Value of dimension a / (value of dimension a + value of dimension b + value of dimension c +...)
Table 52 shows a comparison of the overall appearance rate for each dimension. For each dimension in Table 52, the total priority is determined by summing the number of words, the sum of the number of stages, and the overall appearance rates of weights 1 to 3.

このように各次元について全体優先度を求め、この全体優先度に基づいて選択部２１７が統合匿名化辞書に採用する次元と採用しない次元とを選択する。例えば、選択部２１７が表５２の全体優先度を参照し、全体優先度が高い順に所定数の次元を採用し、これ以外の全体優先度が低い次元は採用しない。 In this way, the overall priority is obtained for each dimension, and the dimension that the selection unit 217 employs in the integrated anonymization dictionary and the dimension that is not employed are selected based on the overall priority. For example, the selection unit 217 refers to the overall priorities in Table 52, adopts a predetermined number of dimensions in descending order of overall priorities, and does not adopt other dimensions with lower overall priorities.

なお、選択の基準は、全体優先度の順だけでなく、重要フラグを含む次元は採用し、重要フラグを含まない次元については全体優先度が高い順に所定数の次元を採用するといったように選択条件を設定しても良い。 The selection criteria are not only the order of the overall priority, but the dimension including the important flag is adopted, and the dimension not including the important flag is selected such that a predetermined number of dimensions are adopted in descending order of the overall priority. Conditions may be set.

また、選択の対象は、例えば統合匿名化辞書に含まれる全ての次元を選択の対象とし、全体優先度に基づいて所定数の次元を採用しても良いし、同じワードを含む次元毎に選択の対象とし、全体優先度に基づいて所定数の次元を採用しても良い。 The selection target may be, for example, all dimensions included in the integrated anonymization dictionary, and a predetermined number of dimensions may be adopted based on the overall priority, or may be selected for each dimension including the same word. And a predetermined number of dimensions may be adopted based on the overall priority.

（３−４）匿名化方法
各匿名化装置１０は、管理サーバ（辞書作成装置）２０から受信した統合匿名化辞書を用いて匿名化を行い、匿名化した匿名情報を管理サーバ２０へ送信する。この統合匿名化辞書を用いたことと、作成した匿名情報を管理サーバ２０へ送信すること以外の匿名化の処理については、前述した実施形態１の図１１の説明と同じである。なお、匿名化装置１０は、ステップＳ１８０で採用する抽象化候補を選択して匿名情報を作成した後、匿名情報を管理サーバ２０へ送信して匿名情報を匿名情報ＤＢ１４５に登録させる（ステップＳ１９０）。
管理サーバ２０は、図１０に示すように、匿名化装置１０から匿名化情報を取得し(ス
テップＳ１)、この匿名情報が公開条件を満たしているか否かを判定し(ステップＳ２)、
公開条件を満たした匿名情報にアクセス権限を設定する(ステップＳ３)。即ち、本実施形態２では、管理サーバ２０が、前述した実施形態１の図２４の説明と同様に公開条件を確認する処理 (ステップＳ２)及び、図２７の説明と同様にアクセス権限を設定する処理 (
ステップＳ３)を行う。 (3-4) Anonymization method Each anonymization apparatus 10 anonymizes using the integrated anonymization dictionary received from the management server (dictionary creation apparatus) 20, and transmits the anonymized anonymous information to the management server 20. . The anonymization process other than using this integrated anonymization dictionary and transmitting the created anonymous information to the management server 20 is the same as the description of FIG. 11 of the first embodiment. The anonymization device 10 selects the abstraction candidate adopted in step S180 and creates anonymous information, and then transmits the anonymous information to the management server 20 to register the anonymous information in the anonymous information DB 145 (step S190). .
As shown in FIG. 10, the management server 20 acquires anonymization information from the anonymization device 10 (step S1), determines whether or not the anonymous information satisfies the disclosure condition (step S2),
Access authority is set for anonymous information that satisfies the disclosure conditions (step S3). In other words, in the second embodiment, the management server 20 sets the access authority in the same manner as in the description of FIG. Processing (
Step S3) is performed.

§４．匿名情報の具体例
次に図４４，図４５を用いて匿名情報の具体例について説明する。図４４は、Ａ社における匿名化の例を示す図であり、図４４（ａ）は、Ａ社が収集した個人情報、図４４（ｂ）は、図４４（ａ）の個人情報をＡ社独自の匿名化辞書で匿名化した場合の匿名情報の例を示す図、図４４（ｃ）は、図４４（ａ）の個人情報を統合匿名化辞書で匿名化した場合の匿名情報の例を示す図である。 §4. Specific Example of Anonymous Information Next, a specific example of anonymous information will be described with reference to FIGS. 44 and 45. 44 is a diagram showing an example of anonymization in Company A, FIG. 44 (a) shows personal information collected by Company A, and FIG. 44 (b) shows personal information in FIG. 44 (a). The figure which shows the example of the anonymous information at the time of anonymizing with an original anonymization dictionary, FIG.44 (c) is an example of the anonymous information at the time of anonymizing the personal information of Fig.44 (a) with an integrated anonymization dictionary. FIG.

Ａ社の匿名化装置１０は、図４４（ａ）の個人情報を独自の匿名化辞書で匿名化した場合、図４４（ｂ）に示すように、氏名とメールアドレスの項目を削除し、年齢を年代に、所属企業を上場企業又は非上場企業に、役職を管理職や社員、アルバイトに抽象化する。 When the anonymization device 10 of company A anonymizes the personal information of FIG. 44 (a) with its own anonymization dictionary, as shown in FIG. 44 (b), the items of name and email address are deleted, and the age In the ages, the company is abstracted as a listed company or unlisted company, and the title is abstracted into a managerial position, employee, or part-time job.

これに対して、Ａ社の匿名化装置１０は、図４４（ａ）の個人情報を統合匿名化辞書で匿名化した場合、図４４（ｃ）に示すように、氏名とメールアドレスの項目を削除し、年齢を年代に、所属企業を上場企業又は非上場企業、及び所属企業を業種に抽象化する。また、Ａ社の匿名化装置１０は、統合匿名化辞書を用いた場合、役職をマネージャやスタッフに、興味を示した商品を伝票ソフトやサーバに抽象化すると共に、来訪ブースの項目を追加して、Ａ社に来訪した人のデータであることを示す値「Ａ社」を入力する。 On the other hand, when the anonymization device 10 of Company A anonymizes the personal information of FIG. 44 (a) with the integrated anonymization dictionary, as shown in FIG. 44 (c), the name and e-mail address items are displayed. Delete, abstract age to age, affiliated company to listed or unlisted company, and affiliated company to industry. In addition, when an integrated anonymization dictionary is used, company A's anonymization device 10 abstracts the title of the product to the manager or staff, the product that shows interest to the slip software or the server, and adds a visit booth item. Then, a value “Company A” indicating that it is data of a person who visited Company A is input.

一方、図４５は、Ｂ社における匿名化の例を示す図であり、図４５（ａ）は、Ｂ社が収集した個人情報、図４５（ｂ）は、図４５（ａ）の個人情報をＢ社独自の匿名化辞書で匿名化した場合の匿名情報の例を示す図、図４５（ｃ）は、図４５（ａ）の個人情報を統合匿名化辞書で匿名化した場合の匿名情報の例を示す図である。 On the other hand, FIG. 45 is a diagram showing an example of anonymization in company B, FIG. 45 (a) shows personal information collected by company B, and FIG. 45 (b) shows personal information in FIG. 45 (a). The figure which shows the example of the anonymized information at the time of anonymizing with B company original anonymization dictionary, FIG.45 (c) is anonymity information at the time of anonymizing the personal information of FIG.45 (a) with an integrated anonymization dictionary. It is a figure which shows an example.

Ｂ社の匿名化装置１０は、図４５（ａ）の個人情報を独自の匿名化辞書で匿名化した場合、図４５（ｂ）に示すように、氏名とメールアドレスの項目を削除し、年齢を年代に、所属企業を業種に、職種を開発や総務に抽象化する。 When the anonymization device 10 of company B anonymizes the personal information in FIG. 45 (a) with its own anonymization dictionary, as shown in FIG. 45 (b), the items of name and e-mail address are deleted. In the ages, the company is abstracted into the type of business, and the job type is abstracted into development and general affairs.

これに対して、Ｂ社の匿名化装置１０は、図４５（ａ）の個人情報を統合匿名化辞書で匿名化した場合、図４５（ｃ）に示すように、氏名とメールアドレスの項目を削除し、年齢を年代に、所属企業を上場企業又は非上場企業、及び所属企業を業種に抽象化する。また、Ｂ社の匿名化装置１０は、統合匿名化辞書を用いた場合、職種を技術職や事務に、興味を示した商品を会計ソフトやサーバに抽象化すると共に、来訪ブースの項目を追加して、Ｂ社に来訪した人のデータであることを示す値「Ｂ社」を入力する。 On the other hand, when the anonymization device 10 of company B anonymizes the personal information of FIG. 45 (a) with the integrated anonymization dictionary, as shown in FIG. 45 (c), the name and e-mail address items are displayed. Delete, abstract age to age, affiliated company to listed or unlisted company, and affiliated company to industry. In addition, when using the integrated anonymization dictionary, company B's anonymization device 10 abstracts job types into technical positions and office work, abstracts products that show interest into accounting software and servers, and adds a visit booth item. Then, a value “Company B” indicating that it is data of a person who visited Company B is input.

このように各事業者の匿名化装置１０は、統合匿名化辞書に基づいて所属企業の項目を複数の次元で抽象化する。前述のように統合匿名化辞書には優先度の高い次元が採用されているので、この統合匿名化辞書に存在する次元で抽象化することにより、各事業者にとって有用な抽象化を行うことができる。 Thus, the anonymization device 10 of each business operator abstracts the items of the affiliated company in a plurality of dimensions based on the integrated anonymization dictionary. As mentioned above, the integrated anonymization dictionary uses a high-priority dimension, so it is possible to perform abstraction that is useful for each operator by abstracting the dimension that exists in this integrated anonymization dictionary. it can.

また、前述のように匿名化辞書を統合したことにより、抽象化する際のワードの対応関係が再編され、Ａ社の役職やＢ社の職種のように独自の項目についても共通の次元で抽象化されるので、類似の項目を有する他社のデータと比較することができる。 In addition, by integrating the anonymization dictionary as described above, the correspondence relationship of words at the time of abstraction is reorganized, and unique items such as the positions of company A and company B are also abstracted in a common dimension. Therefore, it can be compared with data from other companies that have similar items.

§５．匿名情報の配信
上記のようにアクセス権限を付加した匿名情報に対するアクセス管理について、次に説明する。アクセス管理の手順は、前述した実施形態１の図２８の説明と同様であるため、図２８を用いて説明する。 §5. Distribution of Anonymous Information Access management for anonymous information to which access authority is added as described above will be described next. The access management procedure is the same as that described in FIG. 28 of the first embodiment, and will be described with reference to FIG.

管理サーバ２０は、ユーザ端末３０や各事業者の匿名化装置１０（以下単にユーザ端末
３０と称す）から匿名情報へのアクセス要求を受けた場合に、図２８の処理を開始し、まずユーザの認証を行う（ステップＳ４１０）。ユーザの認証処理は、管理サーバ２０が、ユーザＩＤやパスワード等の認証情報をユーザ端末３０から受信し、この認証情報を登録済の情報と比較して一致していれば認証成功として次のステップＳ４３０へ移行し、一致しなければ認証失敗として図２８の処理を終了する。なお、管理サーバ２０が、ウエブサーバの機能を有し、匿名情報等の情報をウエブページとして提供し、ユーザ端末３０が所謂ウエブブラウザの機能によって管理サーバ２０にアクセスする構成の場合、認証情報はHTTP Cookie等によってユーザ端末３０から管理サーバ２０へ送信されても良い。また、
認証情報は、ユーザの操作によってキーボード等の入力手段から入力され、ユーザ端末３０から管理サーバ２０へ送信されても良い。 When the management server 20 receives an access request to anonymous information from the user terminal 30 or the anonymization device 10 of each business operator (hereinafter simply referred to as the user terminal 30), the management server 20 starts the process of FIG. Authentication is performed (step S410). In the user authentication process, the management server 20 receives authentication information such as a user ID and a password from the user terminal 30, and compares this authentication information with the registered information. The process proceeds to S430, and if they do not match, the processing of FIG. When the management server 20 has a web server function, provides information such as anonymous information as a web page, and the user terminal 30 accesses the management server 20 by a so-called web browser function, the authentication information is It may be transmitted from the user terminal 30 to the management server 20 by HTTP Cookie or the like. Also,
The authentication information may be input from an input unit such as a keyboard by a user operation and transmitted from the user terminal 30 to the management server 20.

そして管理サーバ２０は、当該ユーザの権限内の匿名情報、即ち当該ユーザのアクセス権限でアクセス可能な匿名情報の概要情報を匿名情報ＤＢ１４５から取得する（ステップＳ４３０）。この概要情報の取得は、図３５のように、予め各匿名情報のアクセス管理情報に記録されている概要情報を読み出すものでも良いし、項目名や匿名情報の一部のデータを概要情報として読み出しても良い。 Then, the management server 20 acquires anonymous information within the authority of the user, that is, summary information of anonymous information accessible with the access authority of the user from the anonymous information DB 145 (step S430). As shown in FIG. 35, this summary information may be obtained by reading out summary information recorded in advance in the access management information of each anonymous information, or by reading out some data of item names and anonymous information as summary information. May be.

そして、ユーザが前記概要情報の一覧の中から匿名情報を選択して、ユーザ端末３０からリクエストし、管理サーバ２０がこのリクエストを受信すると（ステップＳ４６０）、管理サーバ２０は、この匿名情報のアクセス権限とユーザのアクセス権限とを比較して（ステップＳ４７０）、当該ユーザが当該匿名情報にアクセスする権限を有しているか否かを再確認する（ステップＳ４８０）。このときリクエストする匿名情報は、匿名情報の全項目であっても良いし、項目によって指定された範囲であっても良い。例えば、年代、性別、来訪ブース、興味を示した商品のように必要な項目が指定されたリクエストであっても良いし、年代が２０代、性別が男性、興味を示した商品がハード、ステータスが資料請求又は商談のように項目の値が指定されたリクエストであっても良い。 Then, when the user selects anonymous information from the summary information list and makes a request from the user terminal 30, and the management server 20 receives this request (step S460), the management server 20 accesses the anonymous information. The authority is compared with the access authority of the user (step S470), and it is reconfirmed whether or not the user has the authority to access the anonymous information (step S480). The anonymous information requested at this time may be all items of anonymous information or a range specified by the item. For example, it may be a request in which necessary items are specified such as age, gender, visiting booth, and products showing interest, age is 20s, gender is male, products showing interest are hard, status May be a request in which the value of an item is specified, such as a document request or a business negotiation.

このように本実施形態２によれば、複数の事業者がそれぞれに収集した個人情報を共通の統合匿名化辞書を用いて匿名化を行い、匿名情報を匿名情報ＤＢに登録することで、この匿名情報を一元的に利用することができる。この場合でも本実施形態２の匿名化システムは、アクセス権限に基づいて権限を有するユーザにのみ匿名情報を送信するので、匿名情報に対するアクセス制御を適切に行うことができる。特に、本実施形態２によれば、各事業者の匿名化辞書を統合した統合匿名化辞書を作成した際、当該統合匿名化辞書に係る権限情報を自動で設定でき、複数の事業者がそれぞれに収集した個人情報に基づく匿名情報を共有する場合でも、人手を要することなくアクセス管理を行うことができる。 As described above, according to the second embodiment, the personal information collected by each of the plurality of companies is anonymized using a common integrated anonymization dictionary, and the anonymous information is registered in the anonymous information DB. Anonymous information can be used centrally. Even in this case, the anonymization system of the second exemplary embodiment transmits anonymous information only to authorized users based on the access authority, and thus can appropriately control access to the anonymous information. In particular, according to the second embodiment, when creating an integrated anonymization dictionary that integrates the anonymization dictionaries of each operator, authority information related to the integrated anonymization dictionary can be automatically set, Even when anonymous information based on personal information collected is shared, access management can be performed without human intervention.

〈その他〉
本発明は、上述の図示例にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 <Others>
The present invention is not limited to the illustrated examples described above, and various modifications can be made without departing from the scope of the present invention.

１０匿名化装置
１２メモリ
１３通信制御部
１４記憶装置
１５入出力インタフェース
２０管理サーバ
２２メモリ
２３通信制御部
２４記憶装置
２５入出力インタフェース
３０ユーザ端末
４１検定用ＤＢ
６１ナビゲーションシステム
１００匿名化システム
１０１データ取得部
１０２抽象化部
１０３検定部
１０４選択部
１０５レベル登録部
１０６価値判定部
１０７価値データ取得部
１０８ワードカテゴリ分析部
１０９ワード価値計算部
１１１出現数取得部
１１２権限決定部
１２０検索エンジン
１３１個人情報ＤＢ
１３２公開条件ＤＢ
１３３検索情報蓄積ＤＢ
１３４一時処理ＤＢ
１３５権限設定ＤＢ
１４５匿名情報ＤＢ
２０１要求受付部
２０２アクセス制御部
２０３出力制御部
２５１ユーザ管理ＤＢ DESCRIPTION OF SYMBOLS 10 Anonymization device 12 Memory 13 Communication control part 14 Storage device 15 Input / output interface 20 Management server 22 Memory 23 Communication control part 24 Storage device 25 Input / output interface 30 User terminal 41 Test DB
61 navigation system 100 anonymization system 101 data acquisition unit 102 abstraction unit 103 test unit 104 selection unit 105 level registration unit 106 value determination unit 107 value data acquisition unit 108 word category analysis unit 109 word value calculation unit 111 appearance number acquisition unit 112 Authority determination unit 120 Search engine 131 Personal information DB
132 release condition DB
133 Search information storage DB
134 Temporary processing DB
135 Authority setting DB
145 Anonymous Information DB
201 Request Reception Unit 202 Access Control Unit 203 Output Control Unit 251 User Management DB

Claims

An anonymous information acquisition unit for acquiring anonymous information;
The number of appearances in which the anonymous information has a plurality of records, each record has a plurality of items, and the number of the same combinations among the word combinations that are the values of the items of the respective records is obtained as the number of appearances An acquisition unit;
And authority determination unit which determines the access rights of the anonymous information based on the previous Kide current number,
An authority setting device comprising:

Before Kide the minimum number of occurrences of the number of occurrences of minimum of the current number, the permission decision portion, authority setting device according to claim 1 for determining the access rights of the anonymous information based on the minimum number of occurrences.

A pre Kide minimum number of occurrences of the number of occurrences of minimum of the current number, the ratio of the minimum number of occurrences for all of the records to a minimum rate of appearance, the authority determining unit, the anonymous information based on the minimum incidence The authority setting device according to claim 1, wherein access authority is determined.

Referring to authority storage unit that stores in association with minimal incidence and said access rights of said anonymous information, according to claim 3, wherein the permission determining unit determines the access rights based on prior Kide current number Authority setting device.

The authority setting device according to claim 3 or 4, wherein the authority determining unit determines a rank of the access authority according to a minimum appearance rate of the anonymous information.

A data acquisition unit for acquiring anonymization target data;
An abstraction unit that abstracts at least one of the words that are values of a plurality of items constituting the target data to be abstraction candidate data;
A test unit that tests on condition that a combination of values of items of the abstraction candidate data is not limited to one individual of the target data;
A selection unit that selects the abstraction candidate data satisfying the test condition as anonymous information;
The number of appearances in which the anonymous information has a plurality of records, each record has a plurality of items, and the number of the same combinations among the word combinations that are the values of the items of the respective records is obtained as the number of appearances An acquisition unit;
And authority determination unit which determines the access rights of the anonymous information based on the previous Kide current number,
Anonymization device comprising:

A data acquisition unit for acquiring anonymization target data;
An abstraction unit that abstracts at least one of the words that are values of a plurality of items constituting the target data to be abstraction candidate data;
A test unit that tests on condition that a combination of values of items of the abstraction candidate data is not limited to one individual of the target data;
A selection unit that selects the abstraction candidate data satisfying the test condition as anonymous information;
The number of appearances in which the anonymous information has a plurality of records, each record has a plurality of items, and the number of the same combinations among the word combinations that are the values of the items of the respective records is obtained as the number of appearances An acquisition unit;
And authority determination unit which determines the access rights of the anonymous information based on the previous Kide current number,
When an access request to the anonymous information is received from the user terminal, the access authority of the user and the access authority of the anonymous information are compared, and the access authority of the anonymous information is within the range of the access authority of the user. If there is an access control unit that allows access to the anonymous information,
Anonymization system with

Obtaining anonymous information;
The anonymous information has a plurality of records, each record has a plurality of items, and among the combinations of words that are the values of the items each record has, obtaining the number of the same combination as the number of appearances; ,
Authority setting method determining the access rights of the anonymous information, the computer executes based on the previous Kide current number.

Obtaining anonymous information;
The anonymous information has a plurality of records, each record has a plurality of items, and among the combinations of words that are the values of the items each record has, obtaining the number of the same combination as the number of appearances; ,
Authority setting program for executing the steps of determining the access rights of the anonymous information based on the previous Kide current number to the computer.