JP2014086037A - Anonymized data modification system - Google Patents

Anonymized data modification system Download PDF

Info

Publication number
JP2014086037A
JP2014086037A JP2012237041A JP2012237041A JP2014086037A JP 2014086037 A JP2014086037 A JP 2014086037A JP 2012237041 A JP2012237041 A JP 2012237041A JP 2012237041 A JP2012237041 A JP 2012237041A JP 2014086037 A JP2014086037 A JP 2014086037A
Authority
JP
Japan
Prior art keywords
anonymization
data
analysis
anonymized
means
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2012237041A
Other languages
Japanese (ja)
Other versions
JP5747012B2 (en
Inventor
Yuki Kaseda
佑樹 綛田
Masanobu Koike
正修 小池
Yoshihiro Fujii
吉弘 藤井
Fumihiko Sano
文彦 佐野
Michiyo Ikegami
美千代 池上
Original Assignee
Toshiba Corp
株式会社東芝
Toshiba Solutions Corp
東芝ソリューション株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, 株式会社東芝, Toshiba Solutions Corp, 東芝ソリューション株式会社 filed Critical Toshiba Corp
Priority to JP2012237041A priority Critical patent/JP5747012B2/en
Publication of JP2014086037A publication Critical patent/JP2014086037A/en
Application granted granted Critical
Publication of JP5747012B2 publication Critical patent/JP5747012B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Abstract

The accuracy of an analysis result can be maintained while minimizing anonymization effort and the amount of information to be provided.
An anonymized data change system according to an embodiment can communicate with an anonymized data analysis system that analyzes anonymized data. The anonymized data change system includes database means, anonymization processing means, and analysis accuracy determination means. The anonymization processing means anonymizes a part of the data in the database means to generate the anonymization data. The analysis accuracy determination means determines the accuracy of the analysis by the anonymized data analysis system, and ends the process if the analysis result satisfies the analysis accuracy policy, and if not, the retry request is anonymized Output to the means. Upon receiving the retry request, the anonymization processing means generates new anonymization data based on an anonymization method having an application order that is lower by one than the highest application order.
[Selection] Figure 1

Description

  Embodiments described herein relate generally to an anonymized data change system.

  Data owned by data owners such as companies and individuals is increasing and becoming more complex. In addition, the data owner owns such a large amount of data, but often does not have analysis skills or an analysis system for the large amount of data. The analysis skill here means specialized knowledge of statistics and analysis tools, and the analysis system means an analysis tool and a distributed system capable of analyzing a large amount of data at high speed.

  Accordingly, when analyzing a large amount of data for effective utilization, a form of entrusting data analysis to an external specialist such as an external expert having analysis skills and an analysis system is becoming widespread.

  On the other hand, personal data may be included in the data to be analyzed. Therefore, providing data to a data analyst easily is not desirable because there is a fear of leaking personal information. One technique that can eliminate this concern is an anonymization technique (see Non-Patent Document 1, for example). Anonymization technology is a general term for technology that changes a part of data so that an individual cannot be identified.

JP 2010-86179 A JP 2011-123712 A

"Personal Information Anonymization Platform", [online], Ministry of Economy, Trade and Industry, Information Grand Voyage Project, [October 3, 2012 search], Internet <URL: http://www.meti.go.jp/policy/ it_policy / daikoukai / igvp / cp2_jp / common / 024/010 / post-9.html>

  The anonymization techniques as described above usually have no particular problem, but according to the study by the present inventors, there is room for improvement in terms of having two problems as described below.

  First, according to the inventors' investigation, there are three requirements for the data owner.

  The first request is a request to minimize the anonymization effort (hereinafter also referred to as anonymization minimum request).

  The second request is a request to minimize the amount of information of data provided to the data analyst (hereinafter also referred to as a minimum information amount request).

  The third requirement is a requirement to improve the accuracy of the analysis result, or to maintain at least the accuracy within an allowable range (hereinafter also referred to as accuracy maintenance requirement).

  In actual operation, it is difficult to satisfy these three requirements at the same time, and there are the following two problems.

  For example, the information amount minimum request and the accuracy improvement request have a trade-off relationship because the accuracy of the analysis result deteriorates if the amount of information provided to the data analyst is small. Since data owners often do not have analysis skills, they do not know an appropriate amount of information that satisfies the minimum information amount requirement and the accuracy maintenance requirement at the same time. For this reason, it is difficult to satisfy the information amount minimum requirement and the accuracy maintenance requirement at the same time (first problem).

  In addition, since this moderate amount of information is not known, the data owner excessively anonymizes the original data before anonymization, and the obtained anonymized data is sent to an external data analyst. The data analyst cannot analyze sufficiently, and the accuracy of the analysis result falls outside the allowable range. As a result, the data owner again anonymizes all the original data, and the anonymization process takes an enormous amount of time. Therefore, the anonymization minimum request cannot be satisfied (second problem).

  In addition, as shown in Patent Documents 1 and 2, there are many techniques focused on how to anonymize the original data before anonymization.

  However, there is no technology that simultaneously solves two problems related to a case where excessive anonymized data is sent to the outside and all original data is made anonymous again. That is, the conventional anonymization technique has room for improvement in that it has the two problems described above. Specifically, the conventional anonymization technology can maintain the accuracy of the analysis result while minimizing the anonymization effort and the amount of information to be provided from the viewpoint of simultaneously solving two problems. There is room for improvement.

  The problem to be solved by the present invention is to provide an anonymized data change system capable of maintaining the accuracy of analysis results while minimizing the anonymization effort and the amount of information to be provided.

  The anonymized data change system of the embodiment can communicate with an anonymized data analysis system that analyzes anonymized data.

  The anonymized data change system includes database means, anonymization means, and analysis accuracy determination means.

  The database means stores data including a value for each item for each individual.

  The anonymization means anonymizes a part of the data to generate the anonymized data.

  The analysis accuracy determination unit determines the accuracy of the analysis when the analysis of the anonymized data by the anonymized data analysis system is completed.

  The anonymization means includes item input means, extraction means, anonymization policy storage means, anonymization method determination means, recording means, anonymization means, and control means.

  The item input means accepts input of items used for the analysis.

  The extraction unit extracts data including an item and a value matching the item from the database unit based on the item for which the input is accepted.

  The anonymization policy storage unit stores an anonymization policy in which an application order, an item to be anonymized, and an anonymization method are associated with each other.

  The anonymization method determining means refers to the anonymization policy, refers to the anonymization method associated with the item to be anonymized that matches the item in the extracted data, and is an unapplied anonymization method. Of these, the anonymization method associated with the highest application order is determined.

  The recording means records application of the determined anonymization method.

  The anonymization means anonymizes a value of an item that matches the item to be anonymized associated with the determined anonymization method among the extracted data, thereby extracting the value from the extracted data. Generate anonymized data.

  When receiving a retry request from the analysis accuracy determining unit, the control unit controls the anonymization method determining unit to retry.

  The analysis accuracy determination unit includes an analysis result input unit, an analysis accuracy policy storage unit, an analysis accuracy determination unit, and a retry request unit.

  The analysis result input means receives an input of an analysis result obtained by analyzing the anonymized data from the anonymized data analysis system.

  The analysis accuracy policy storage means stores an analysis accuracy policy indicating a condition satisfied by the accuracy of the analysis.

  The analysis accuracy determination unit determines whether the received analysis result satisfies the analysis accuracy policy.

  The retry request means terminates the processing if satisfied as a result of the determination, and if not, the anonymization method determination means, the recording means, the anonymization means, the analysis result input means, and The retry request is output to the control means so as to retry the analysis accuracy determination means.

It is a schematic diagram which shows an example of the anonymization data change system which concerns on 1st Embodiment, and its periphery structure. It is a schematic diagram which shows an example of the data before anonymization in the embodiment. It is a schematic diagram which shows an example of the anonymization policy in the same embodiment. It is a schematic diagram which shows an example of the anonymization data in the embodiment. It is a schematic diagram which shows an example of the other anonymization data in the same embodiment. It is a schematic diagram which shows an example of the analysis precision policy in the same embodiment. It is a flowchart for demonstrating an example of the operation | movement in the embodiment. It is a schematic diagram which shows the outline of the operation | movement in the embodiment. It is a schematic diagram which shows the outline of operation | movement of each system which concerns on 2nd Embodiment. It is a schematic diagram which shows an example of the anonymization policy in 2nd Embodiment. It is a schematic diagram which shows an example of the other anonymization policy in the same embodiment. It is a schematic diagram which shows an example of the analysis precision policy in the same embodiment. It is a schematic diagram which shows an example of the data before anonymization in the embodiment. It is a schematic diagram which shows an example of the anonymization data in the embodiment. It is a schematic diagram which shows an example of the other anonymization data in the same embodiment. It is a schematic diagram which shows the outline of 1st and 2nd embodiment. It is a schematic diagram which shows the outline of 3rd Embodiment. It is a schematic diagram which shows an example of the anonymization data change system which concerns on the embodiment, and its periphery structure. It is a flowchart for demonstrating an example of the operation | movement in the embodiment.

  Each embodiment will be described below with reference to the drawings. Each of the following devices can be implemented with either a hardware configuration or a combination configuration of hardware resources and software. As the software of the combined configuration, a program that is installed in advance on a computer of a corresponding device from a network or a storage medium and that realizes the function of the corresponding device is used.

<First Embodiment>
FIG. 1 is a schematic diagram illustrating a configuration example of an anonymized data change system and an anonymized data analysis system according to the first embodiment. The first embodiment includes two systems that can communicate with each other. The first system is an anonymized data change system 100 on the data owner side, and the second system is an anonymized data analysis system 200 on the data analyst side.

  Here, the anonymized data change system 100 includes an original data storage database device 110, an anonymization device 120, an analysis accuracy determination device 130, and a communication unit 140. Note that the anonymized data changing system 100 is not limited to being implemented as a collection of devices, but the term “... system 100” is read as “... device 100”, and “... device 110”, “... device 120”, and so on. It can also be implemented as a single device by replacing the words “... 130” with “... 110”, “... 120” and “. The same applies to the following embodiments.

  As shown in FIG. 2, the original data storage database device (database means) 110 anonymizes the data D stored before the anonymization owned by the data owner (original data) D and the stored data D And a function to pass to the device 120.

  The data D includes a value for each item for each individual. Here, for example, height, sex, and age are used as each item. As this type of data, for example, receipt information including age, sex, address and disease name for each individual may be used. In this case, it is preferable to anonymize the age, sex, address, etc. as appropriate without anonymizing the disease name from the viewpoint of obtaining an analysis result on a specific disease name and preventing identification of an individual. Further, as data, tabular data including information on each column (attribute) and information on each row (record) may be used. Here, each column corresponds to each attribute, and each row corresponds to each individual.

  The anonymization device 120 is a device that anonymizes part of the data D and generates anonymized data. The anonymization device (anonymization processing means) 120 includes, for example, an input unit (item input means) 121, a data item extraction unit (extraction means) 122, an anonymization policy storage unit (anonymization policy storage means) 123, and an anonymization method. A determination unit (anonymization method determination unit, recording unit) 124 and an anonymization unit (anonymization unit, control unit) 125 are provided.

  Each unit may be integrated as appropriate when the received information is sent as it is. For example, the input unit 121 and the data item extraction unit 122 may be integrated when the received information is transmitted as it is. In addition, when the received information is transmitted as it is, the respective units may be integrated as appropriate in the other units and the following embodiments.

  Here, for example, the input unit 121, the data item extraction unit 122, the anonymization method determination unit 124, and the anonymization unit 125 execute a program including each step in the anonymization device 120 described below by a CPU (not shown). It is a functional block realized by.

  As shown in FIG. 3, the anonymization policy storage unit 123 has a function of holding the anonymization policy PA and a function of passing the anonymization policy PA to the anonymization method determination unit 124. Here, the anonymization policy PA is a table having three items: application order, items to be anonymized, and anonymization methods. The order of application corresponds to the order of difficulty in identifying individuals in the anonymization method. For example, the anonymization method that is most difficult to identify an individual is associated with the first application order. The items to be anonymized in the anonymization policy PA indicate items to be anonymized among the items of the data D. For example, the value of the item of data D is anonymized based on the item “grouping (10-year increment)” to be anonymized in the application order “first place” in the anonymization policy PA, as shown in FIG. Anonymized data DA1 is generated. In addition, the value of the item of data D is anonymized based on the item to be anonymized in the item “grouping (5-year-old)” to be anonymized in the application order “second place” in the anonymization policy PA, As shown in FIG. 5, anonymized data DA2 is generated.

  When the analysis of the anonymized data DA1 and DA2 by the anonymized data analysis system 200 is completed, the analysis accuracy determination device 130 is a device that determines the accuracy of the analysis. The analysis accuracy determination device (analysis accuracy determination unit) 130 includes, for example, an input unit (analysis result input unit) 131, an analysis accuracy policy storage unit (analysis accuracy policy storage unit) 132, and an analysis accuracy determination unit (analysis accuracy determination unit, (Retry request means) 133.

  Here, the input unit 131 and the analysis accuracy determination unit 133 are, for example, functional blocks realized by a CPU (not shown) executing a program including each step in the analysis accuracy determination device 130 described later.

  As shown in FIG. 6, the analysis accuracy policy storage unit 132 has a function of holding an analysis accuracy policy PB indicating a condition that the analysis accuracy satisfies, and the analysis accuracy policy PB according to the request of the analysis accuracy determination unit 133. And a function to pass to the determination unit 133. Here, the analysis accuracy policy PB is a table having two items: No indicating a unique number and conditions for analysis accuracy.

  The communication unit 140 has a function of communicating information between the anonymized data change system 100 and the anonymized data analysis system.

  On the other hand, the anonymized data analysis system 200 is an apparatus that analyzes anonymized data. The anonymized data analysis system 200 includes, for example, an anonymized data storage database device 210, an analysis unit 220, a data request unit 230, an analysis result transmission unit 240, and a communication unit 250.

  The anonymized data storage database device 210 has a function of receiving the anonymized data DA1 or DA2 from the anonymized data change system 100 via the communication unit 250, and a function of holding the received anonymized data DA1 or DA2. A function of passing the anonymized data DA1 or DA2 to the analysis unit 220.

  The analysis unit 220, the data request unit 230, and the analysis result transmission unit 240 are functional blocks that are realized by, for example, a CPU (not shown) executing a program including each step in the anonymized data analysis system 200 described later. ing.

  The communication unit 250 has a function of communicating information between the anonymized data change system 100 and the anonymized data analysis system 200.

  Next, operations of the anonymized data change system and the anonymized data analysis system configured as described above will be described with reference to the flowchart of FIG.

  In the anonymized data analysis system, the data request unit 230 inputs items used for analysis to the input unit 121 of the anonymization device 120 via the communication unit 250 and the communication unit 140 (ST1).

  When input unit 121 accepts input of this item, it sends the item to data item extraction unit 122 (ST2).

  Based on the sent item, the data item extraction unit 122 extracts data including an item and a value that match the item from the original data storage database device 110, and sends the data to the anonymization unit 125 (ST3). ).

  When the anonymization unit 125 receives the data extracted in step ST3, the anonymization unit 125 sends the data to the anonymization method determination unit 124, and inquires of the anonymization method determination unit 124 (ST4).

  The anonymization method determination unit 124 refers to the anonymization policy PA in the anonymization policy storage unit 123 and is associated with the item to be anonymized in the anonymization policy PA that matches the item in the data transmitted in step ST4. Among the anonymization methods that have not yet been applied, the anonymization method associated with the highest application order is determined, and the determined anonymization method is notified to the anonymization unit 125 (ST5). Further, the anonymization method determination unit 124 records the application of the determined anonymization method in the anonymization policy storage unit 123.

  The anonymization unit 125 anonymizes the value of the item that matches the item to be anonymized in the anonymization policy PA associated with the anonymization method determined in step ST5 among the data extracted in step ST3. Thus, anonymized data is generated from the extracted data. Thereafter, the anonymization unit 125 transmits the generated anonymization data to the data request unit 230 via the communication unit 140 and the communication unit 250 (ST6). The anonymization unit 125 adds an ID that can uniquely specify a row to the data transmitted in step ST6 so that the anonymized data storage database device 210 can reflect the additional anonymized data in the existing anonymous data as necessary. Processing may be executed. Further, the anonymization unit 125 holds the generated anonymization data.

  The data request unit 230 stores the anonymized data received in step ST6 in the anonymized data storage database device 210 (ST7).

  The analysis unit 220 analyzes the anonymized data stored in the anonymized data storage database device 210 using an analysis method that does not specify, and sends the obtained analysis result to the analysis result transmission unit 240 (ST8).

  Upon receiving the analysis result sent in step ST8, the analysis result transmission unit 240 inputs the analysis result to the input unit 131 in the analysis accuracy determination apparatus 130 via the communication unit 250 and the communication unit 140 (ST9). .

  When receiving the input of the analysis result obtained by analyzing the anonymized data from the anonymized data analysis system 200, the input unit 131 sends the received analysis result to the analysis accuracy determination unit 133 (ST10).

  The analysis accuracy determination unit 133 determines whether or not the analysis result received in step ST10 satisfies the analysis accuracy policy PB in the analysis accuracy policy storage unit 132 (ST11). As a result of the determination, if it is satisfied (passed), the process is terminated. If not (failed), the process proceeds to step ST12.

  When the determination result of step ST11 is NO, the analysis accuracy determination unit 133 is anonymous so as to retry (retry) the anonymization method determination unit 124, the anonymization unit 125, the input unit 131, and the analysis accuracy determination unit 133. The control method determination unit 124 is controlled. Specifically, the analysis accuracy determination unit 133 outputs a retry request for inquiring about the anonymization method again to the anonymization unit 125 (ST12). Thereafter, the processing of steps ST4 to ST11 is retried.

Next, a specific example of each step described above will be described using the schematic diagram of each data shown in FIGS. 2 to 6 and the schematic diagram of the operation shown in FIG. The analysis in this embodiment is processing for obtaining a correlation coefficient between the age x and the height y of n minors who can be identified by the number i. In (x, y) = {(x i , y i )} (i = 1, 2,..., N), the correlation coefficient is obtained by the following equation.

  In step ST1, the data request unit 230 requests the input unit 121 for data necessary for analysis via the communication unit 250 and the communication unit 140. The request here is “item (height, gender, age)”.

  In step ST2, the input unit 121 passes the request received in step ST1 to the data item extraction unit 122. This requirement is “item (height, gender, age)”.

  In step ST3, the data item extraction unit 122 extracts the request data received in step ST2 from the original data storage database device 110, and passes the extracted data to the anonymization unit 125. The passed data is all rows of “item (height, gender, age)”.

  In step ST4, the anonymization unit 125 inquires of the anonymization method determination unit 124 about the anonymization method. The content to be inquired here is “anonymization method regarding item (height, gender, age)”.

  In step ST <b> 5, when the anonymization method determination unit 124 receives the anonymization policy PA from the anonymization policy storage unit 123, the anonymization method determination unit 124 determines the anonymization method and notifies the anonymization unit 125. Here, the anonymization method determination unit 124 notifies the anonymization unit 125 of “grouping (10-year increments)” having the highest application order among the received anonymization policies PA. At the same time, the anonymization policy storage unit 123 records that the first application order is applied.

  In step ST6, the anonymization unit 125 performs the anonymization process according to the anonymization method notified in step ST5. Thereafter, the anonymization unit 125 passes the anonymization data DA1 (the location where the age column is anonymized) to the data request unit 230 via the communication unit 140 and the communication unit 250. Further, the anonymization unit 125 holds the passed anonymization data DA1. The data item passed to the data requesting unit 230 is “height, gender, age”.

  In step ST7, the data request unit 230 stores the anonymized data DA1 received in step ST6 in the anonymized data storage database device 210.

  In step ST8, the analysis part 220 calculates | requires the correlation coefficient of height and age as an analysis with respect to the anonymization data DA1 preserve | saved at the anonymization data storage database apparatus 210. FIG. In this case, the age is only a teenager, the denominator is 0 in the equation shown in [Equation 1], and the correlation coefficient cannot be obtained.

  In step ST9, when the analysis result transmission unit 240 receives the result of analysis in step ST8 from the analysis unit 220 (it cannot be calculated), the analysis result transmission unit 240 transmits the analysis result to the input unit 131 via the communication unit 250 and the communication unit 140. To do.

  In step ST10, the input unit 131 passes the analysis result received in step ST9 to the analysis accuracy determination unit 133.

  In step ST11, when the analysis accuracy determination unit 133 receives the analysis accuracy policy PB from the analysis accuracy policy storage unit 132, the analysis accuracy determination unit 133 determines whether the accuracy of the analysis result is sufficient based on the analysis accuracy policy PB. This analysis result (that calculation is not possible) is No. in the analysis accuracy policy PB. Since 1 (height of a teenager has a correlation coefficient of 0 or more), the determination result is rejected. Therefore, the process is repeated from step ST4.

  In step ST <b> 4, when the anonymization unit 125 receives a retry request from the analysis accuracy determination unit 133, the anonymization unit 125 inquires of the anonymization method determination unit 124 about the anonymization method. The content of the inquiry is “anonymization method with lower application order of items (age)”.

  In step ST <b> 5, when the anonymization method determination unit 124 receives the anonymization policy PA from the anonymization policy storage unit 123, the anonymization method determination unit 124 determines the anonymization method and notifies the anonymization unit 125. Here, the anonymization method determination unit 124 anonymizes “grouping (5-year increments)” having the next highest application rank after the recorded anonymization method (first application rank) in the received anonymization policy PA. Notification to the unit 125. At the same time, the anonymization policy storage unit 123 records that the application rank 2 is applied.

  In step ST6, the anonymization unit 125 performs the anonymization process according to the anonymization method notified in step ST5. Thereafter, the anonymization unit 125 passes the anonymization data DA2 (the portion where the bold portion is anonymized) to the data request unit 230 via the communication unit 140 and the communication unit 250. Further, the anonymization unit 125 holds the passed anonymization data DA2. The data item passed to the data request unit 230 is only “age” for which anonymization has been performed.

  In step ST7, the data request unit 230 stores the anonymized data DA2 received in step ST6 in the anonymized data storage database device 210.

  In step ST8, the analysis unit 220 analyzes the anonymized data DA2 stored in the anonymized data storage database device 210. As a result of this analysis, the correlation coefficient is obtained as 0.

  In step ST9, upon receiving the analysis result (correlation coefficient = 0) in step ST8 from the analysis unit 220, the analysis result transmission unit 240 sends the analysis result to the input unit 131 via the communication unit 250 and the communication unit 140. Send.

  In step ST10, the input unit 131 passes the analysis result (correlation coefficient = 0) received in step ST9 to the analysis accuracy determination unit 133.

  In step ST11, when the analysis accuracy determination unit 133 receives the analysis accuracy policy PB from the analysis accuracy policy storage unit 132, the analysis accuracy determination unit 133 determines whether the accuracy of the analysis result is sufficient based on the analysis accuracy policy PB. This analysis result (correlation coefficient = 0) is No. in the analysis accuracy policy PB. 1 (height of a teenager has a correlation coefficient of 0 or more). Since 2 (height in the 20s has a correlation coefficient of −0.1 or more) is irrelevant, the determination result is acceptable. For this reason, the analysis accuracy determination unit 133 ends the process.

  As described above, according to the present embodiment, among the unapplied anonymization methods, the anonymization method of the highest application order is determined, the application of the determined anonymization method is recorded, and the extracted data Among these, the value of the anonymized item related to the determined anonymization method is anonymized to generate anonymized data, and when the analysis accuracy does not satisfy the analysis accuracy policy, a retry request is received, and the anonymization method is determined. With the configuration for retrying, it is possible to maintain the accuracy of the analysis result while minimizing anonymization effort and the amount of information to be provided.

  If it supplements, the effort of anonymization can be minimized by the structure which anonymizes the value of the item to anonymize instead of anonymizing the whole data.

  If the analysis result of anonymized data does not satisfy the analysis accuracy policy, the accuracy of the analysis result can be maintained while minimizing the amount of information to be provided by reducing the application order of the anonymization method. it can.

<Second Embodiment>
Next, an anonymized data change system according to the second embodiment will be described. As shown in FIG. 8, the first embodiment is an example in which anonymization is performed again on the information in the column of the anonymized data. On the other hand, 2nd Embodiment is an example which anonymizes again to the information of the line of anonymization data, as shown in FIG.

  Accordingly, in the second embodiment, the anonymization policy storage unit 123 stores the anonymization policies PA1 and PA2 illustrated in FIGS. 10 and 11, and the analysis accuracy policy PB ′ illustrated in FIG. 12 is stored. 132, and the data D ′ before anonymization shown in FIG. 13 is stored in the original data storage database device 110. The anonymization policy PA1 is the same as the anonymization policy PA described above. Similar to the anonymization policy PA1, the anonymization policy PA2 is a table having three items: application order, items to be anonymized, and anonymization method. However, the anonymization policy PA2 is different from the anonymization policy PA1 in that “row” is set in the item to be anonymized and “resampling (*%)” is set in the anonymization method ( * = 40, 60, 80). The two anonymization policies PA1 and PA2 are present because one anonymization policy PAi (where i = 1, 2) is prepared for each item to be anonymized.

  The configuration and processing flow other than the anonymization policies PA1 and PA2, the analysis accuracy policy PB ′, and the data D ′ are the same as those in the first embodiment.

  Next, operations of the anonymized data change system and the anonymized data analysis system configured as described above will be described with reference to the flowchart of FIG. 7 described above. In the following description, the anonymized data that has already been anonymized by the grouping of the third rank of provision of the anonymized policy PA1 (in units of 3 years) and the resampling of the first rank of the application order of the anonymized policy PA2 (40%). It is assumed that DA1 ′ (FIG. 14) is stored in the anonymized data storage database device 210.

  In step ST1, the data request unit 230 requests the input unit 121 for data necessary for analysis via the communication unit 250 and the communication unit 140. Here, “row” data is requested.

  In step ST2, the input unit 121 passes the request received in step ST1 to the data item extraction unit 122. This request is “row” data.

  In step ST3, the data item extraction unit 122 extracts the request data received in step ST2 from the original data storage database device 110, and passes the extracted data to the anonymization unit 125. The data passed is “all”.

  In step ST4, the anonymization unit 125 inquires of the anonymization method determination unit 124 about the anonymization method. The content of the inquiry is “row”.

  In step ST5, when the anonymization method determination unit 124 receives the anonymization policies PA1 and PA2 from the anonymization policy storage unit 123, the anonymization method is determined and notified to the anonymization unit 125. If the content of the inquiry is “row” in step ST4, the anonymization method determination unit 124 determines “resampling (40%)” and “grouping ( 3 years old) ”. Of the received anonymization policies PA1 and PA2, the anonymization method determination unit 124 anonymously executes “resampling (60%)” in the second application order and “grouping (in 3 years)” in the third application order. Notification to the conversion unit 125.

  In step ST6, the anonymization unit 125 performs the anonymization process according to the anonymization method notified in step ST5. In this case, the number of rows of the data increases as resampling increases from 40% to 60%. Further, a difference from the previously sent anonymization data DA1 'held by the anonymization unit 125 is obtained. An anonymization process of grouping (in units of 3 years) is performed on the difference data. By this anonymization process, anonymization data DA2 'is generated as shown in FIG. Thereafter, the anonymization unit 125 passes the anonymization data DA <b> 2 ′ (where the age column is anonymized) to the data request unit 230 via the communication unit 140 and the communication unit 250. The anonymization unit 125 holds the passed anonymization data DA2 '.

  In step ST7, the data request unit 230 stores the anonymized data DA2 'received in step ST6 in the anonymized data storage database apparatus 210.

  In step ST8, the analysis part 220 calculates | requires the correlation coefficient of height and age as an analysis with respect to the anonymization data DA1 'and DA2' preserve | saved at the anonymization data storage database apparatus 210. FIG. Here, the correlation coefficient is obtained as about 0.8.

  In step ST9, upon receiving the result (correlation coefficient = about 0.8) analyzed in step ST8 from the analysis unit 220, the analysis result transmission unit 240 sends the analysis result via the communication unit 250 and the communication unit 140. To the input unit 131.

  In step ST10, the input unit 131 passes the analysis result (correlation coefficient = approximately 0.8) received in step ST9 to the analysis accuracy determination unit 133.

  In step ST11, upon receiving the analysis accuracy policy PB 'from the analysis accuracy policy storage unit 132, the analysis accuracy determination unit 133 determines whether the accuracy of the analysis result is sufficient based on the analysis accuracy policy PB'. This analysis result (correlation coefficient = about 0.8) is obtained from the analysis accuracy policy PB ′ No. 1 (height of a teenager has a correlation coefficient of 0 or more) and No. 1 3 (the number of data is 3 or more). Since 2 (height in the 20s has a correlation coefficient of −0.1 or more) is irrelevant, the determination result is acceptable. For this reason, the analysis accuracy determination unit 133 ends the process.

  As described above, according to the present embodiment, the processing time of the anonymization process is increased in addition to the effect of the first embodiment by the configuration in which the target of the anonymization process to be performed again is the difference data of the first anonymization process. Can be reduced.

<Third Embodiment>
Next, an anonymized data change system according to the third embodiment will be described. The first and second embodiments are examples in which the analysis accuracy is determined within the anonymized data change system 100 as illustrated in FIG. 16. On the other hand, 3rd Embodiment is an example which determines an analysis precision within the anonymization data analysis system 200, as shown in FIG.

  FIG. 18 is a schematic diagram showing a configuration example of the anonymized data change system and the anonymized data analysis system according to the third embodiment. The same reference numerals are given to the same parts as those in FIG. However, here, the different parts are mainly described.

  That is, the configuration of the third embodiment omits the analysis accuracy determination device 130 in the anonymized data change system 100 and anonymized data as compared to FIG. 1 showing the configurations of the first and second embodiments. The difference is that the analysis accuracy determination device 260 is arranged in the analysis system 200 and the analysis result transmission unit 240 in the anonymized data analysis system 200 is omitted.

  Moreover, although the function of each part is substantially the same, the anonymization part 125 in the anonymization data change system 100 receives a retry request from the anonymization data analysis system 200, and the analysis accuracy determination apparatus 260 receives anonymization data. The difference is that it is received from the anonymized data storage database device 210 and the analysis accuracy determination device 260 transmits a retry request to the anonymized data change system 100.

  The analysis accuracy determination device 260 is the same device as the analysis accuracy determination device 130 described above, and includes the same input unit 261, analysis accuracy policy storage unit 262, and analysis accuracy determination unit 263 as described above.

  Next, the operations of the anonymized data change system and the anonymized data analysis system configured as described above will be described with reference to the flowchart of FIG.

  In the anonymized data analysis system, the data request unit 230 inputs items used for analysis to the input unit 121 of the anonymization device 120 via the communication unit 250 and the communication unit 140 (ST1).

  When input unit 121 accepts input of this item, it sends the item to data item extraction unit 122 (ST2).

  Based on the sent item, the data item extraction unit 122 extracts data including an item and a value that match the item from the original data storage database device 110, and sends the data to the anonymization unit 125 (ST3). ).

  When the anonymization unit 125 receives the data extracted in step ST3, the anonymization unit 125 sends the data to the anonymization method determination unit 124, and inquires of the anonymization method determination unit 124 (ST4).

  The anonymization method determination unit 124 refers to the anonymization policy PA in the anonymization policy storage unit 123 and is associated with the item to be anonymized in the anonymization policy PA that matches the item in the data transmitted in step ST4. Among the anonymization methods that have not yet been applied, the anonymization method associated with the highest application order is determined, and the determined anonymization method is notified to the anonymization unit 125 (ST5). Further, the anonymization method determination unit 124 records the application of the determined anonymization method in the anonymization policy storage unit 123.

  The anonymization unit 125 anonymizes the value of the item that matches the item to be anonymized in the anonymization policy PA associated with the anonymization method determined in step ST5 among the data extracted in step ST3. Thus, anonymized data is generated from the extracted data. Thereafter, the anonymization unit 125 transmits the generated anonymization data to the data request unit 230 via the communication unit 140 and the communication unit 250 (ST6). The anonymization unit 125 adds an ID that can uniquely specify a row to the data transmitted in step ST6 so that the anonymized data storage database device 210 can reflect the additional anonymized data in the existing anonymous data as necessary. Processing may be executed. Further, the anonymization unit 125 holds the generated anonymization data.

  The data request unit 230 stores the anonymized data received in step ST6 in the anonymized data storage database device 210 (ST7). Note that the processes in steps ST1 to ST7 are the same as those in the first embodiment (FIG. 7).

  The analysis unit 220 analyzes the anonymized data stored in the anonymized data storage database device 210 using an analysis method that does not explicitly indicate (ST8c).

  The analysis unit 220 inputs the analysis result obtained in step ST8c to the input unit 261 in the analysis accuracy determination device 260 (ST9c).

  When input of the analysis result is received, input unit 261 sends the received analysis result to analysis accuracy determination unit 263 (ST10c).

  The analysis accuracy determination unit 263 determines whether or not the analysis result received in step ST10c satisfies the analysis accuracy policy PB in the analysis accuracy policy storage unit 262 (ST11c). As a result of the determination, if it is satisfied (passed), the process is terminated. If not (failed), the process proceeds to step ST12c.

  The analysis accuracy determination unit 263 determines the anonymization method so that the anonymization method determination unit 124, the anonymization unit 125, the input unit 261, and the analysis accuracy determination unit 263 are retried when the determination result of step ST11c is NO. The unit 124 is controlled. Specifically, the analysis accuracy determination unit 263 outputs a retry request for inquiring about the anonymization method again to the anonymization unit 125 via the communication unit 250 and the communication unit 140 (ST12c). Thereafter, the processing of steps ST4 to ST11c is retried.

  During the retry, for example, in step ST5, the anonymization method determination unit 124 is associated with the application rank lower by the first highest application rank based on the anonymization policy PA and the applied record. Decide the anonymization method. Further, the anonymization method determination unit 124 records the application of the anonymization method associated with the lower rank in the anonymization policy storage unit 123.

  Further, during the retry, for example, in step ST6, the anonymization unit 125 generates new anonymization data based on the anonymization method associated with the lower rank. Hereinafter, the analysis is retried based on the new anonymized data, and the analysis result is determined.

  Next, a specific example of each step described above will be described with reference to schematic diagrams of each data shown in FIGS.

  In step ST1, the data request unit 230 requests the input unit 121 for data necessary for analysis via the communication unit 250 and the communication unit 140. The request here is “item (height, gender, age)”.

  In step ST2, the input unit 121 passes the request received in step ST1 to the data item extraction unit 122. This requirement is “item (height, gender, age)”.

  In step ST3, the data item extraction unit 122 extracts the request data received in step ST2 from the original data storage database device 110, and passes the extracted data to the anonymization unit 125. The passed data is all rows of “item (height, gender, age)”.

  In step ST4, the anonymization unit 125 inquires of the anonymization method determination unit 124 about the anonymization method. The content to be inquired here is “anonymization method regarding item (height, gender, age)”.

  In step ST <b> 5, when receiving the anonymization policy from the anonymization policy storage unit 123, the anonymization method determination unit 124 determines the anonymization method and notifies the anonymization unit 125. Here, the anonymization method determination unit 124 notifies the anonymization unit 125 of “grouping (10-year increments)” having the highest application order among the received anonymization policies PA. At the same time, the anonymization policy storage unit 123 records that the first application order is applied.

  In step ST6, the anonymization unit 125 performs the anonymization process according to the anonymization method notified in step ST5. Thereafter, the anonymized data DA1 (where the age column is anonymized) is passed to the data requesting unit 230 via the communication unit 140 and the communication unit 250. Further, the anonymization unit 125 holds the passed anonymization data DA1. The data item passed to the data requesting unit 230 is “height, gender, age”.

  In step ST7, the data requesting unit 230 stores the anonymized data DA1 received in step ST6 in the anonymized data storage database device 210.

  In step ST8c, the analysis part 220 calculates | requires the correlation coefficient of height and age as an analysis with respect to the anonymization data DA1 preserve | saved at the anonymization data storage database apparatus 210. FIG. In this case, the age is only a teenager, and the correlation coefficient cannot be obtained as described above.

  In step ST9c, the analysis unit 220 inputs the result analyzed in step ST8c (that cannot be calculated) to the input unit 261 in the analysis accuracy determination device 260.

  In step ST10c, the input unit 261 sends the analysis result to the analysis accuracy determination unit 263.

  In step ST11c, when receiving the analysis accuracy policy PB from the analysis accuracy policy storage unit 262, the analysis accuracy determination unit 263 determines whether the accuracy of the analysis result is sufficient based on the analysis accuracy policy PB. This analysis result (that calculation is not possible) is No. in the analysis accuracy policy PB. Since 1 (height of a teenager has a correlation coefficient of 0 or more), the determination result is rejected. Therefore, the process is repeated from step ST4.

  In step ST4, when the anonymization unit 125 receives a retry request from the analysis accuracy determination unit 263, the anonymization method determination unit 124 is inquired of the anonymization method. The content of the inquiry is “anonymization method with lower application order of items (age)”.

  In step ST <b> 5, when the anonymization method determination unit 124 receives the anonymization policy PA from the anonymization policy storage unit 123, the anonymization method determination unit 124 determines the anonymization method and notifies the anonymization unit 125. Here, the anonymization method determination unit 124 anonymizes “grouping (5-year increments)” having the next highest application rank after the recorded anonymization method (first application rank) in the received anonymization policy PA. Notification to the unit 125. At the same time, the anonymization method determination unit 124 records the application of the second application rank in the anonymization policy storage unit 123.

  In step ST6, the anonymization unit 125 performs the anonymization process according to the anonymization method notified in step ST5. Thereafter, the anonymization unit 125 passes the anonymization data DA2 to the data request unit 230 via the communication unit 140 and the communication unit 250. Further, the anonymization unit 125 holds the passed anonymization data DA2. The data item passed to the data request unit 230 is only “age” for which anonymization has been performed.

  In step ST7, the data request unit 230 stores the anonymized data DA2 received in step ST6 in the anonymized data storage database device 210.

  In step ST8c, the analysis unit 220 analyzes the anonymized data DA2 stored in the anonymized data storage database device 210. As a result of this analysis, the correlation coefficient is obtained as 0.

  In step ST9c, the analysis unit 220 inputs the analysis result of step ST8 (correlation coefficient = 0) to the input unit 261 in the analysis accuracy determination device 260.

  In step ST10c, the input unit 261 sends the analysis result (correlation coefficient = 0) to the analysis accuracy determination unit 263.

  In step ST11c, when receiving the analysis accuracy policy PB from the analysis accuracy policy storage unit 262, the analysis accuracy determination unit 263 determines whether the accuracy of the analysis result is sufficient based on the analysis accuracy policy PB. This analysis result (correlation coefficient = 0) is No. in the analysis accuracy policy PB. 1 (height of a teenager has a correlation coefficient of 0 or more). Since 2 (height in the 20s has a correlation coefficient of −0.1 or more) is irrelevant, the determination result is acceptable. For this reason, the analysis accuracy determination unit 263 ends the process.

  As described above, according to the present embodiment, instead of the analysis accuracy determination device 130 in the anonymized data change system 100, the analysis accuracy determination device 260 is arranged in the anonymization data analysis system 200. The same effect as that of the embodiment can be obtained.

<Fourth Embodiment>
Next, an anonymized data change system according to the fourth embodiment will be described. The third embodiment is an example in which anonymization is performed again on the information in the column of the anonymized data, as in the first embodiment. On the other hand, 4th Embodiment is an example which anonymizes again to the information of the line of anonymization data similarly to 2nd Embodiment.

  Accordingly, in the fourth embodiment, the anonymization policy storage unit 123 stores the anonymization policies PA1 and PA2 shown in FIGS. 10 and 11, and the analysis accuracy policy PB ′ shown in FIG. 12 is stored. 132, and the data D ′ before anonymization shown in FIG. 13 is stored in the original data storage database device 110. There are two anonymization policies PA1 and PA2 because one anonymization policy PAi (where i = 1, 2) is prepared for each item to be anonymized.

  The configuration and processing flow other than the anonymization policies PA1 and PA2, the analysis accuracy policy PB ', and the data D' are the same as those in the third embodiment.

  Next, operations of the anonymized data change system and the anonymized data analysis system configured as described above will be described with reference to the flowchart of FIG. 19 described above. In the following description, the anonymized data that has already been anonymized by the grouping of the third rank of provision of the anonymized policy PA1 (in units of 3 years) and the resampling of the first rank of the application order of the anonymized policy PA2 (40%). It is assumed that DA1 ′ (FIG. 14) is stored in the anonymized data storage database device 210.

  In step ST1, the data request unit 230 requests the input unit 121 for data necessary for analysis via the communication unit 250 and the communication unit 140. Here, “row” data is requested.

  In step ST2, the input unit 121 passes the request received in step ST1 to the data item extraction unit 122. This request is “row” data.

  In step ST3, the data item extraction unit 122 extracts the request data received in step ST2 from the original data storage database device 110, and passes the extracted data to the anonymization unit 125. The data passed is “all”.

  In step ST4, the anonymization unit 125 inquires of the anonymization method determination unit 124 about the anonymization method. The content of the inquiry is “row”.

  In step ST5, when the anonymization method determination unit 124 receives the anonymization policies PA1 and PA2 from the anonymization policy storage unit 123, the anonymization method is determined and notified to the anonymization unit 125. If the content of the inquiry is “row” in step ST4, the anonymization method determination unit 124 determines “resampling (40%)” and “grouping ( 3 years old) ”. Of the received anonymization policies PA1 and PA2, the anonymization method determination unit 124 anonymously executes “resampling (60%)” in the second application order and “grouping (in 3 years)” in the third application order. Notification to the conversion unit 125.

  In step ST6, the anonymization unit 125 performs the anonymization process according to the anonymization method notified in step ST5. In this case, the number of rows of the data increases as the resampling is changed from the previous 40% to 60%. Further, a difference from the previously sent anonymization data DA1 'held by the anonymization unit 125 is obtained. An anonymization process of grouping (in units of 3 years) is performed on the difference data. By this anonymization process, anonymization data DA2 'is generated as shown in FIG. Thereafter, the anonymization unit 125 passes the anonymization data DA <b> 2 ′ to the data request unit 230 via the communication unit 140 and the communication unit 250. The anonymization unit 125 holds the passed anonymization data DA2 '.

  In step ST7, the data request unit 230 stores the anonymized data DA2 'received in step ST6 in the anonymized data storage database apparatus 210.

  In step ST8c, the analysis part 220 calculates | requires the correlation coefficient of height and age as an analysis with respect to the anonymization data DA1 'and DA2' preserve | saved at the anonymization data storage database apparatus 210. FIG. Here, the correlation coefficient is obtained as about 0.8.

  In step ST9c, the analysis unit 220 inputs the result (correlation coefficient = about 0.8) analyzed in step ST8c to the input unit 261 in the analysis accuracy determination device 260.

  In step ST10c, the input unit 261 sends the analysis result (correlation coefficient = about 0.8) to the analysis accuracy determination unit 263.

  In step ST11c, upon receiving the analysis accuracy policy PB 'from the analysis accuracy policy storage unit 262, the analysis accuracy determination unit 263 determines whether the accuracy of the analysis result is sufficient based on the analysis accuracy policy PB'. This analysis result (correlation coefficient = about 0.8) is obtained from the analysis accuracy policy PB ′ No. 1 (height of a teenager has a correlation coefficient of 0 or more) and No. 1 3 (the number of data is 3 or more). Since 2 (height in the 20s has a correlation coefficient of −0.1 or more) is irrelevant, the determination result is acceptable. For this reason, the analysis accuracy determination unit 263 ends the process.

  As described above, according to the present embodiment, the analysis accuracy determination device 260 is arranged in the anonymized data analysis system 200, and the target of the anonymization process to be performed again is the difference data of the first anonymization process. The effects of the second and third embodiments can be obtained simultaneously.

  According to at least one embodiment described above, among the unapplied anonymization methods, the highest application order anonymization method is determined, the application of the determined anonymization method is recorded, and extracted data Among these, the value of the item to be anonymized regarding the determined anonymization method is anonymized and anonymized data is generated. With the configuration in which the determination is retried, the accuracy of the analysis result can be maintained while minimizing the anonymization effort and the amount of information to be provided.

  Note that the methods described in the above embodiments are, as programs that can be executed by a computer, magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.), magneto-optical disks. (MO), stored in a storage medium such as a semiconductor memory, and distributed.

  In addition, as long as the storage medium can store a program and can be read by a computer, the storage format may be any form.

  In addition, an OS (operating system) running on a computer based on an instruction of a program installed in the computer from a storage medium, MW (middleware) such as database management software, network software, and the like realize the above-described embodiment. A part of each process may be executed.

  Furthermore, the storage medium in each embodiment is not limited to a medium independent of a computer, but also includes a storage medium in which a program transmitted via a LAN, the Internet, or the like is downloaded and stored or temporarily stored.

  Further, the number of storage media is not limited to one, and the case where the processing in each of the above embodiments is executed from a plurality of media is also included in the storage media in the present invention, and the media configuration may be any configuration.

  Note that the computer in each embodiment executes each process in each of the above embodiments based on a program stored in a storage medium. Any configuration of the system or the like may be used.

  In addition, the computer in each embodiment is not limited to a personal computer, and includes an arithmetic processing device, a microcomputer, and the like included in an information processing device, and is a generic term for devices and devices that can realize the functions of the present invention by a program. Yes.

  In addition, although some embodiment of this invention was described, these embodiment is shown as an example and is not intending limiting the range of invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

  DESCRIPTION OF SYMBOLS 100 ... Anonymization data change system, 110 ... Original data storage database apparatus, 120 ... Anonymization apparatus, 121, 131, 261 ... Input part, 122 ... Data item extraction part, 123 ... Anonymization policy storage part, 124 ... Anonymization Method determining unit, 125 ... anonymizing unit, 130, 260 ... analysis accuracy determining device, 132, 262 ... analysis accuracy policy storage unit, 133, 263 ... analysis accuracy determining unit, 140, 250 ... communication unit, 200 ... anonymized data Analysis system 210 ... Anonymized data storage database device 220 ... Analysis unit 230 ... Data request unit 240 ... Analysis result transmission unit D, D '... Data, PA, PA1, PA2 ... Anonymization policy, DA1, DA2 , DA1 ', DA2' ... anonymized data, PB, PB '... analysis accuracy policy.

Claims (2)

  1. An anonymized data change system capable of communicating with an anonymized data analysis system for analyzing anonymized data,
    For each individual, database means for storing data including values for each item;
    Anonymization means for anonymizing a part of the data and generating the anonymized data;
    When the analysis of the anonymized data by the anonymized data analysis system is completed, the analysis accuracy determination means for determining the accuracy of the analysis,
    With
    The anonymization means is:
    Item input means for receiving input of items used in the analysis;
    Based on the item that has received the input, an extraction unit that extracts data including an item and a value that match the item from the database unit;
    Anonymization policy storage means for storing an anonymization policy that associates an application order, an item to be anonymized, and an anonymization method with each other;
    Referring to the anonymization policy, the anonymization method associated with the item to be anonymized that matches the item in the extracted data and associated with the highest application order among the unapplied anonymization methods Anonymization method determination means for determining a given anonymization method;
    A recording means for recording application of the determined anonymization method;
    Among the extracted data, the anonymized data is generated from the extracted data by anonymizing the value of an item that matches the item to be anonymized associated with the determined anonymization method Anonymization means,
    Upon receiving a retry request from the analysis accuracy determination means, control means for controlling the anonymization method determination means to retry,
    With
    The analysis accuracy determination means includes
    An analysis result input means for receiving an input of an analysis result obtained by analyzing the anonymized data from the anonymized data analysis system;
    Analysis accuracy policy storage means for storing an analysis accuracy policy indicating a condition satisfied by the accuracy of the analysis;
    Analysis accuracy determination means for determining whether or not the received analysis result satisfies the analysis accuracy policy;
    As a result of the determination, if the condition is satisfied, the process is terminated. If not, the anonymization method determination means, the recording means, the anonymization means, the analysis result input means, and the analysis accuracy determination means are restarted. Retry request means for outputting the retry request to the control means so as to try;
    An anonymized data change system characterized by comprising:
  2. An anonymized data change system capable of communicating with an anonymized data analysis system for analyzing anonymized data,
    For each individual, database means for storing data including values for each item;
    Anonymization processing means for generating anonymized data by anonymizing a part of the data;
    With
    The anonymization processing means is:
    Item input means for receiving input of items used in the analysis;
    Based on the item that has received the input, an extraction unit that extracts data including an item and a value that match the item from the database unit;
    Anonymization policy storage means for storing an anonymization policy that associates an application order, an item to be anonymized, and an anonymization method with each other;
    Referring to the anonymization policy, the anonymization method associated with the item to be anonymized that matches the item in the extracted data and associated with the highest application order among the unapplied anonymization methods Anonymization method determination means for determining a given anonymization method;
    A recording means for recording application of the determined anonymization method;
    Among the extracted data, the anonymized data is generated from the extracted data by anonymizing the value of the item that matches the item to be anonymized associated with the determined anonymization method. Anonymization means,
    Upon receiving a retry request from the anonymized data analysis system, control means for controlling the anonymization determining means to retry,
    With
    When the control means receives a retry request from the anonymized data analysis system when the analysis result of the anonymized data by the anonymized data analysis system does not satisfy a predetermined accuracy, the control means determines the anonymization method determination means. Control to retry,
    When the anonymization method determining means is controlled to retry from the control means, based on the anonymization policy and the applied record, the application order is lower by one than the highest application order. Determine the associated anonymization method,
    The recording means records application of the anonymization method associated with the low rank,
    The anonymization means generates new anonymization data based on the anonymization method associated with the lower rank, and the anonymization data change system.
JP2012237041A 2012-10-26 2012-10-26 Anonymized data change system Active JP5747012B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012237041A JP5747012B2 (en) 2012-10-26 2012-10-26 Anonymized data change system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2012237041A JP5747012B2 (en) 2012-10-26 2012-10-26 Anonymized data change system

Publications (2)

Publication Number Publication Date
JP2014086037A true JP2014086037A (en) 2014-05-12
JP5747012B2 JP5747012B2 (en) 2015-07-08

Family

ID=50788994

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2012237041A Active JP5747012B2 (en) 2012-10-26 2012-10-26 Anonymized data change system

Country Status (1)

Country Link
JP (1) JP5747012B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3136284A1 (en) 2015-08-31 2017-03-01 Fujitsu Limited Personal information anonymization method, personal information anonymization program, and information processing apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100114840A1 (en) * 2008-10-31 2010-05-06 At&T Intellectual Property I, L.P. Systems and associated computer program products that disguise partitioned data structures using transformations having targeted distributions
JP2011100116A (en) * 2009-10-07 2011-05-19 Nippon Telegr & Teleph Corp <Ntt> Disturbance device, disturbance method, and program therefor
WO2011142327A1 (en) * 2010-05-10 2011-11-17 日本電気株式会社 Information processing device, control method and program
US20130239226A1 (en) * 2010-11-16 2013-09-12 Nec Corporation Information processing system, anonymization method, information processing device, and its control method and control program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100114840A1 (en) * 2008-10-31 2010-05-06 At&T Intellectual Property I, L.P. Systems and associated computer program products that disguise partitioned data structures using transformations having targeted distributions
JP2011100116A (en) * 2009-10-07 2011-05-19 Nippon Telegr & Teleph Corp <Ntt> Disturbance device, disturbance method, and program therefor
WO2011142327A1 (en) * 2010-05-10 2011-11-17 日本電気株式会社 Information processing device, control method and program
US20130239226A1 (en) * 2010-11-16 2013-09-12 Nec Corporation Information processing system, anonymization method, information processing device, and its control method and control program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3136284A1 (en) 2015-08-31 2017-03-01 Fujitsu Limited Personal information anonymization method, personal information anonymization program, and information processing apparatus
US10289869B2 (en) 2015-08-31 2019-05-14 Fujitsu Limited Personal information anonymization method, recording medium, and information processing apparatus

Also Published As

Publication number Publication date
JP5747012B2 (en) 2015-07-08

Similar Documents

Publication Publication Date Title
Chamberlayne et al. Creating a population-based linked health database: a new resource for health services research
Williams et al. Recent advances in the utility and use of the General Practice Research Database as an example of a UK Primary Care Data resource
Tonidandel et al. Relative importance analysis: A useful supplement to regression analysis
US8849730B2 (en) Prediction of user response actions to received data
Panos et al. Meta-analysis and systematic review assessing the efficacy of dialectical behavior therapy (DBT)
Vivian et al. Toil enables reproducible, open source, big biomedical data analyses
US20050246350A1 (en) System and method for classifying and normalizing structured data
Demchenko et al. Defining architecture components of the Big Data Ecosystem
JP6050272B2 (en) Low latency query engine for Apache hadoop
Kurtessis et al. Perceived organizational support: A meta-analytic evaluation of organizational support theory
Nagappan et al. Diversity in software engineering research
US8935804B1 (en) Rules-based data access systems and methods
US20120259877A1 (en) Methods and systems for runtime data anonymization
JP2014194769A6 (en) Low latency query engine for APACHE HADOOP
US20160321748A1 (en) Method for market risk assessment for healthcare applications
KR20140038432A (en) Predicting user navigation events
Moløkken-Østvold et al. Using planning poker for combining expert estimates in software projects
Homburg et al. Social influence on salespeople’s adoption of sales technology: a multilevel analysis
Munaiah et al. Curating GitHub for engineered software projects
US20160055150A1 (en) Converting data into natural language form
Fitzpatrick et al. A systematic review of the cost and cost effectiveness of treatment for multidrug-resistant tuberculosis
Tonidandel et al. RWA web: A free, comprehensive, web-based, and user-friendly tool for relative weight analyses
Pines et al. Variation in emergency department admission rates across the United States
US20190122136A1 (en) Feature processing tradeoff management
US20150379072A1 (en) Input processing for machine learning

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20140822

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20150409

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20150414

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20150511

R150 Certificate of patent or registration of utility model

Ref document number: 5747012

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150