KR20230105569A

KR20230105569A - A method and a device for determining predicted values for anonymous values

Info

Publication number: KR20230105569A
Application number: KR1020220001129A
Authority: KR
Inventors: 서동민
Original assignee: 비씨카드(주)
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2023-07-11
Also published as: KR102627734B1

Abstract

일 실시 예에 따르면, 디바이스가 익명 값에 대한 예측 값을 결정하는 방법에 있어서, 원본 데이터로부터 기설정 비식별 수준에 따라 익명화된 익명 값 및 식별 가능한 제 1 식별 값을 포함하는 제 1 레코드 및 제 2 식별 값으로 구성된 복수개의 제 2 레코드를 획득하는 단계; 상기 제 1 식별 값과 상기 제 2 식별 값 간의 비교 결과에 기초하여, 상기 복수개의 제 2 레코드 중 상기 제 1 레코드에 대응되는 하나 이상의 기준 레코드를 결정하는 단계; 및 상기 하나 이상의 기준 레코드를 이용하여 상기 익명 값에 대한 예측 값을 상기 비식별 수준의 범위 내에서 결정하는 단계;를 포함하는, 방법이 제공된다.According to an embodiment, in a method for determining a predicted value for an anonymous value by a device, a first record including an anonymized value and an identifiable first identification value according to a preset de-identification level from original data and a first record obtaining a plurality of second records composed of 2 identification values; determining one or more reference records corresponding to the first record among the plurality of second records, based on a comparison result between the first identification value and the second identification value; and determining a predicted value for the anonymous value within the range of the de-identification level using the one or more reference records.

Description

Method and device for determining predicted values for anonymity {A METHOD AND A DEVICE FOR DETERMINING PREDICTED VALUES FOR ANONYMOUS VALUES}

본 개시는 익명 값에 대한 예측 값을 결정하는 방법 및 디바이스에 관한 것으로, 더욱 상세하게는, 개인 정보에 관하여 식별이 불가한 수준으로 처리된 익명 정보를 유의미한 데이터로서 이용할 수 있도록 예측 정보를 제공하는 기술에 관한 것이다.The present disclosure relates to a method and device for determining a predictive value for an anonymous value, and more particularly, to provide predictive information so that anonymous information processed to a level in which personal information cannot be identified can be used as meaningful data It's about technology.

빅데이터, 클라우드 컴퓨팅 등 기술 발전에 따라 처리하는 데이터의 종류와 양이 증가하면서, 개인 정보를 포함하는 데이터들이 다양한 산업에서 활발히 수집 및 공유되고 있다. 그러나 이러한 개인 정보는 특정한 개인의 식별 가능성이 있어 정보 유출에 따른 사고 위험성이 높으므로, 개인 정보가 포함된 데이터는 개인 정보 보호법 등에 기반하여 특정 개인을 알아볼 수 없는 형태로 데이터를 익명화하는 작업이 요구된다. 데이터를 익명화하는 방법의 예시로는, 데이터베이스의 연관성을 줄여 데이터 집합에서 개인이 식별되지 않게 하는 K-익명성(K-anonymity) 기술 등이 있다.As the type and amount of data processed increases with the development of technologies such as big data and cloud computing, data including personal information is actively collected and shared in various industries. However, since such personal information has the possibility of identifying a specific individual, there is a high risk of accidents due to information leakage. Therefore, data containing personal information must be anonymized in a form in which a specific individual cannot be identified based on the Personal Information Protection Act, etc. do. Examples of methods of anonymizing data include K-anonymity technology, which reduces database relevance so that individuals are not identified in a data set.

종래의 개인 정보가 포함된 데이터의 활용 기술은 상술한 것처럼, 특정 개인을 알아볼 수 없는 형태로 개인 정보를 익명화 처리하여 통계작성 및 학술연구 등의 목적을 위하여 활용하고 있다.As described above, the conventional technology for utilizing data containing personal information is used for purposes such as statistical writing and academic research by anonymizing personal information in a form in which a specific individual cannot be identified.

그러나 이러한 종래 기술은 익명화된 정보를 활용함에 따라 데이터의 익명성은 보장할 수 있으나, 데이터의 통계적 특징으로서의 가치가 떨어지는 단점이 있다. 예컨대, 익명 정보가 세분화될수록 통계치도 세분화되어 실제 활용할 수 있는 데이터 양이 감소되기 때문에 데이터로서 활용이 어렵고, 데이터를 대분화할 경우 세부 속성이 부재하기 때문에 데이터로서 활용이 어려운 한계가 있었다.However, this prior art can guarantee the anonymity of data by utilizing anonymized information, but has a disadvantage in that the value as a statistical characteristic of the data is reduced. For example, as anonymous information is subdivided, statistical values are also subdivided, reducing the amount of data that can actually be utilized, making it difficult to utilize as data.

이에, 상술한 단점 및 한계를 극복하고 익명화된 정보에 대해서 데이터로서 활용 가치를 높이기 위한 기술에 대한 요구가 점차 증대되고 있다.Accordingly, a demand for a technology for overcoming the above-mentioned disadvantages and limitations and increasing the utilization value of anonymized information as data is gradually increasing.

한국공개특허 제 10-2017-0126804 호(2017.11.20.), 익명화된 통계적 데이터베이스 질문 시스템 및 방법Korean Patent Publication No. 10-2017-0126804 (2017.11.20.), Anonymized Statistical Database Question System and Method

본 개시의 일 실시 예는 전술한 종래 기술의 문제점을 해결하기 위한 것으로, 익명화된 정보를 보다 유의미한 데이터로서 이용할 수 있도록 익명화된 정보에 대한 예측 정보를 제공하고자 한다.An embodiment of the present disclosure is intended to solve the above-mentioned problems of the prior art, and to provide prediction information for anonymized information so that the anonymized information can be used as more meaningful data.

본 개시의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The purpose of the present disclosure is not limited to the purpose mentioned above, and other objects not mentioned will be clearly understood from the description below.

본 개시의 제 1 측면에 따른 디바이스가 익명 값에 대한 예측 값을 결정하는 방법은 원본 데이터로부터 기설정 비식별 수준에 따라 익명화된 익명 값 및 식별 가능한 제 1 식별 값을 포함하는 제 1 레코드 및 제 2 식별 값으로 구성된 복수개의 제 2 레코드를 획득하는 단계; 상기 제 1 식별 값과 상기 제 2 식별 값 간의 비교 결과에 기초하여, 상기 복수개의 제 2 레코드 중 상기 제 1 레코드에 대응되는 하나 이상의 기준 레코드를 결정하는 단계; 및 상기 하나 이상의 기준 레코드를 이용하여 상기 익명 값에 대한 예측 값을 상기 비식별 수준의 범위 내에서 결정하는 단계;를 포함할 수 있다.A method for determining a predicted value for an anonymous value by a device according to a first aspect of the present disclosure includes a first record including an anonymized value and an identifiable first identification value according to a predetermined de-identification level from original data, and a first record including an identifiable first identification value. obtaining a plurality of second records composed of 2 identification values; determining one or more reference records corresponding to the first record among the plurality of second records, based on a comparison result between the first identification value and the second identification value; and determining a predicted value for the anonymous value within the range of the non-identification level by using the one or more reference records.

또한, 상기 예측 값을 결정하는 단계는 상기 제 1 레코드에 대한 상위 데이터의 유무, 상기 하나 이상의 기준 레코드에 대한 통계 분석 결과 및 상기 제 1 레코드의 개수 중 적어도 하나에 기초하여, 상기 예측 값을 결정할 수 있다.The determining of the prediction value may include determining the prediction value based on at least one of the presence or absence of higher data for the first record, a statistical analysis result of the one or more reference records, and the number of the first records. can

또한, 상기 예측 값을 결정하는 단계는 상기 제 1 레코드에 대한 상위 데이터가 없는 경우, 상기 하나 이상의 기준 레코드에 기초하여 결정되는 건당 단가를 이용하여 상기 예측 값을 결정할 수 있다.In the determining of the prediction value, when there is no higher data for the first record, the prediction value may be determined using a unit price per case determined based on the one or more reference records.

또한, 상기 하나 이상의 기준 레코드 각각은 각 레코드를 정의하는 복수개의 속성 중 하나에 대해서만 상기 제 2 식별 값이 상기 제 1 식별 값과 상이할 수 있다.Also, in each of the one or more reference records, the second identification value may be different from the first identification value only for one of a plurality of attributes defining each record.

또한, 상기 예측 값을 결정하는 단계는 상기 하나 이상의 기준 레코드의 개수가 복수개인 경우, 상기 복수개의 기준 레코드 각각에 기초하여 복수개의 건당 단가를 결정하고, 상기 복수개의 건당 단가에 대해 평균 또는 가중 평균한 값을 이용하여 상기 예측 값을 결정할 수 있다.In addition, the determining of the predicted value may include determining a plurality of unit prices per case based on each of the plurality of reference records when the number of the one or more reference records is plural, and an average or weighted average of the plurality of unit prices per case The prediction value may be determined using one value.

또한, 상기 예측 값을 결정하는 단계는 상기 제 1 레코드에 대한 상위 데이터가 있고, 상기 익명 값이 복수개인 경우, 상기 복수개의 익명 값의 합을 나타내는 합계 익명 값을 상기 기준 레코드에 기초하여 결정하는 단계; 및 상기 합계 익명 값과 상기 복수개의 익명 값의 개수가 동일한 경우, 상기 복수개의 익명 값 각각을 1로 결정하는 단계;를 포함할 수 있다.In addition, the determining of the prediction value may include determining a total anonymous value indicating a sum of the plurality of anonymous values based on the reference record when there is upper data for the first record and there are a plurality of anonymous values. step; and determining each of the plurality of anonymous values to be 1 when the total anonymous value is equal to the number of the plurality of anonymous values.

또한, 상기 예측 값을 결정하는 단계는 상기 제 1 레코드에 대한 상위 데이터가 있고, 상기 익명 값이 1개인 경우, 상기 복수개의 익명 값의 합을 나타내는 합계 익명 값을 상기 기준 레코드에 기초하여 결정하는 단계; 및 상기 익명 값을 상기 합계 익명 값으로 결정하는 단계;를 포함할 수 있다.In addition, the determining of the prediction value may include determining a total anonymous value indicating a sum of the plurality of anonymous values based on the reference record when there is upper data for the first record and the anonymous value is one. step; and determining the anonymous value as the total anonymous value.

또한, 상기 예측 값을 결정하는 단계는 상기 제 1 레코드에 대한 상위 데이터가 있고, 상기 익명 값이 복수개인 경우, 상기 복수개의 익명 값의 합을 나타내는 합계 익명 값을 상기 기준 레코드에 기초하여 결정하는 단계; 및 상기 합계 익명 값이 상기 복수개의 익명 값의 개수보다 큰 경우, 상기 합계 익명 값의 범위 내에서 상기 복수개의 익명 값에 대한 조합을 결정하는 단계; 및 상기 하나 이상의 기준 레코드에 기초하여 결정되는 건당 단가를 이용하여 상기 복수개의 익명 값에 대한 조합 중 어느 하나에 따라 상기 복수개의 익명 값을 결정하는 단계;를 포함할 수 있다.In addition, the determining of the prediction value may include determining a total anonymous value indicating a sum of the plurality of anonymous values based on the reference record when there is upper data for the first record and there are a plurality of anonymous values. step; and if the sum anonymous value is greater than the number of the plurality of anonymous values, determining a combination of the plurality of anonymous values within a range of the sum anonymous value; and determining the plurality of anonymous values according to one of combinations of the plurality of anonymous values using the unit price per case determined based on the one or more reference records.

또한, 상기 복수개의 익명 값을 결정하는 단계는 상기 하나 이상의 기준 레코드에 기초하여 결정되는 건당 단가를 이용하여 상기 복수개의 익명 값 각각에 대한 제 1 예측 값을 결정하는 단계; 상기 합계 익명 값과 상기 복수개의 익명 값의 개수 간의 차이를 이용하여 상기 복수개의 익명 값 각각에 대한 제 2 예측 값을 결정하는 단계; 및 상기 제 1 예측 값 및 상기 제 2 예측 값을 이용하여 상기 복수개의 익명 값 각각에 대한 예측 값을 결정하는 단계;를 포함할 수 있다.The determining of the plurality of anonymous values may include determining a first prediction value for each of the plurality of anonymous values using a unit price per case determined based on the one or more reference records; determining a second prediction value for each of the plurality of anonymous values using a difference between the total anonymous value and the number of the plurality of anonymous values; and determining a prediction value for each of the plurality of anonymous values by using the first prediction value and the second prediction value.

본 개시의 제 2 측면에 따른 익명 값에 대한 예측 값을 결정하는 디바이스는 원본 데이터로부터 기설정 비식별 수준에 따라 익명화된 익명 값 및 식별 가능한 제 1 식별 값을 포함하는 제 1 레코드 및 제 2 식별 값으로 구성된 복수개의 제 2 레코드를 획득하는 수신부; 및 상기 제 1 식별 값과 상기 제 2 식별 값 간의 비교 결과에 기초하여, 상기 복수개의 제 2 레코드 중 상기 제 1 레코드에 대응되는 하나 이상의 기준 레코드를 결정하고, 상기 하나 이상의 기준 레코드를 이용하여 상기 익명 값에 대한 예측 값을 상기 비식별 수준의 범위 내에서 결정하는 프로세서;를 포함할 수 있다.A device for determining a predicted value for an anonymous value according to a second aspect of the present disclosure includes a first record including an anonymized anonymized value and an identifiable first identification value according to a preset de-identification level from original data and a second identification value. a receiving unit acquiring a plurality of second records composed of values; and determining one or more reference records corresponding to the first record among the plurality of second records based on a comparison result between the first identification value and the second identification value, and using the one or more reference records to determine the reference records. A processor for determining a prediction value for an anonymous value within the range of the non-identification level; may include.

또한, 상기 프로세서는 상기 제 1 레코드에 대한 상위 데이터의 유무, 상기 하나 이상의 기준 레코드에 대한 통계 분석 결과 및 상기 제 1 레코드의 개수 중 적어도 하나에 기초하여, 상기 예측 값을 결정할 수 있다.In addition, the processor may determine the prediction value based on at least one of the presence or absence of higher data for the first record, a statistical analysis result for the one or more reference records, and the number of the first records.

또한, 상기 프로세서는 상기 제 1 레코드에 대한 상위 데이터가 없는 경우, 상기 하나 이상의 기준 레코드에 기초하여 결정되는 건당 단가를 이용하여 상기 예측 값을 결정할 수 있다.In addition, when there is no upper data for the first record, the processor may determine the predicted value using a unit price per case determined based on the one or more reference records.

또한, 상기 프로세서는 상기 제 1 레코드에 대한 상위 데이터가 있고, 상기 익명 값이 복수개인 경우, 상기 복수개의 익명 값의 합을 나타내는 합계 익명 값을 상기 기준 레코드에 기초하여 결정하고, 상기 합계 익명 값과 상기 복수개의 익명 값의 개수가 동일한 경우, 상기 복수개의 익명 값 각각을 1로 결정할 수 있다.In addition, when there is upper data for the first record and there are a plurality of anonymous values, the processor determines a total anonymous value indicating a sum of the plurality of anonymous values based on the reference record, and determines the sum anonymous value and the number of the plurality of anonymous values are the same, each of the plurality of anonymous values may be determined to be 1.

또한, 상기 프로세서는 상기 제 1 레코드에 대한 상위 데이터가 있고, 상기 익명 값이 복수개인 경우, 상기 복수개의 익명 값의 합을 나타내는 합계 익명 값을 상기 기준 레코드에 기초하여 결정하고, 상기 합계 익명 값이 상기 복수개의 익명 값의 개수보다 큰 경우, 상기 합계 익명 값의 범위 내에서 상기 복수개의 익명 값에 대한 조합을 결정하고, 상기 하나 이상의 기준 레코드에 기초하여 결정되는 건당 단가를 이용하여 상기 복수개의 익명 값에 대한 조합 중 어느 하나에 따라 상기 복수개의 익명 값을 결정할 수 있다.In addition, when there is upper data for the first record and there are a plurality of anonymous values, the processor determines a total anonymous value indicating a sum of the plurality of anonymous values based on the reference record, and determines the sum anonymous value When is greater than the number of the plurality of anonymous values, a combination of the plurality of anonymous values is determined within the range of the total anonymous value, and a unit price per case determined based on the one or more reference records is used to determine the plurality of anonymous values. The plurality of anonymous values may be determined according to one of combinations of anonymous values.

본 개시의 제 3 측면은 제 1 측면에 따른 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다. 또는, 본 개시의 제 4 측면은 제 1 측면에 따른 방법을 구현하기 위하여 기록매체에 저장된 컴퓨터 프로그램을 제공할 수 있다.A third aspect of the present disclosure may provide a computer readable recording medium recording a program for executing the method according to the first aspect in a computer. Alternatively, the fourth aspect of the present disclosure may provide a computer program stored in a recording medium to implement the method according to the first aspect.

본 개시의 일 실시 예에 따르면, 익명화에 따라 값을 식별할 수 없게 된 익명 값에 대해서 실제 값에 근접하도록 추정한 예측 값을 제공할 수 있으며, 이와 같이 익명성이 보장된 데이터에 대해 신뢰성 있는 예측 값을 제공함에 따라 익명 데이터의 데이터로서 이용 가치를 크게 향상시킬 수 있다.According to an embodiment of the present disclosure, it is possible to provide a prediction value that is estimated to be close to the actual value for an anonymous value whose value cannot be identified due to anonymization, and thus provide reliable data for anonymity guaranteed. By providing predicted values, the value of use as data of anonymous data can be greatly improved.

또한, 상위 데이터의 유무, 기준 레코드와 대상 레코드 간의 비교 결과, 익명 값의 개수, 합계 익명 값 등의 다양한 요소를 고려하여 상황에 따라 보다 적합한 방식으로 실제 값에 근접하도록 예측 값을 추정하여 신뢰성을 제고시킬 수 있다.In addition, reliability is improved by estimating the predicted value to be closer to the actual value in a more appropriate way depending on the situation, considering various factors such as the presence or absence of upper data, the comparison result between the reference record and the target record, the number of anonymous values, and the total anonymous value. can enhance

본 개시의 효과는 상기한 효과로 한정되는 것은 아니며, 본 개시의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.The effects of the present disclosure are not limited to the above effects, and should be understood to include all effects that can be inferred from the configuration of the invention described in the detailed description or claims of the present disclosure.

도 1은 일 실시 예에 따른 디바이스의 구성을 도시한 블록도이다.
도 2는 일 실시 예에 따른 디바이스가 익명 값에 대한 예측 값을 결정하는 방법을 나타내는 흐름도이다.
도 3은 일 실시 예에 따른 제 1 레코드, 제 2 레코드 및 기준 레코드를 나타내는 도면이다.
도 4는 일 실시 예에 따른 디바이스가 제 1 실시 예에 따라 익명 값에 대한 예측 값을 결정하는 동작을 설명하기 위한 도면이다.
도 5는 일 실시 예에 따른 디바이스가 제 2 실시 예에 따라 익명 값에 대한 예측 값을 결정하는 동작을 설명하기 위한 도면이다.
도 6은 일 실시 예에 따른 디바이스가 제 3 실시 예에 따라 익명 값에 대한 예측 값을 결정하는 동작을 설명하기 위한 도면이다.1 is a block diagram illustrating a configuration of a device according to an exemplary embodiment.
2 is a flowchart illustrating a method for a device to determine a prediction value for an anonymous value according to an embodiment.
3 is a diagram illustrating a first record, a second record, and a reference record according to an exemplary embodiment.
4 is a diagram for explaining an operation in which a device according to an embodiment determines a prediction value for an anonymous value according to the first embodiment.
5 is a diagram for explaining an operation in which a device according to an embodiment determines a prediction value for an anonymous value according to a second embodiment.
6 is a diagram for explaining an operation in which a device according to an embodiment determines a prediction value for an anonymous value according to a third embodiment.

실시 예들에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the embodiments have been selected as general terms that are currently widely used as much as possible while considering the functions in the present disclosure, but they may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, terms used in the present disclosure should be defined based on the meaning of the term and the general content of the present disclosure, not simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 “…부”, “…모듈” 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the entire specification, when a part is said to "include" a certain component, it means that it may further include other components, not excluding other components unless otherwise stated. In addition, as described in the specification, "... wealth", "… A term such as “module” refers to a unit that processes at least one function or operation, and may be implemented as hardware or software or a combination of hardware and software.

명세서 전체에서 '제공'은 대상이 특정 정보를 획득하거나 직간접적으로 특정 대상에게 송수신하는 과정을 포함하며 이러한 과정에서 요구되는 관련 동작의 수행을 포괄적으로 포함하는 것으로 해석될 수 있다.Throughout the specification, 'providing' includes a process in which an object acquires specific information or directly or indirectly transmits/receives it to a specific object, and can be interpreted as comprehensively including the performance of related operations required in this process.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily carry out the present disclosure. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein.

이하에서는 도면을 참조하여 본 개시의 실시 예들을 상세히 설명한다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings.

도 1은 일 실시 예에 따른 디바이스(100)의 구성을 도시한 블록도이다.1 is a block diagram showing the configuration of a device 100 according to an embodiment.

도 1을 참조하면, 디바이스(100)는 수신부(110), 프로세서(120) 및 저장부(130)를 포함할 수 있다.Referring to FIG. 1 , a device 100 may include a receiving unit 110, a processor 120, and a storage unit 130.

수신부(110)는 원본 데이터로부터 익명화된 익명 데이터를 획득할 수 있다. 여기에서, 익명화는 삭제를 포함하는 개념으로 이해될 수 있으며, 예컨대, '익명화'는 개인식별정보를 영구적으로 삭제하거나, 개인식별정보의 전부 또는 일부를 해당 기관의 고유식별기호로 대체하는 것을 포함할 수 있다. 여기에서, 개인식별정보는 단일 속성만으로 개인을 식별 가능한 식별자, 복수 속성의 조합을 통해 개인을 식별 가능한 준식별자 등을 포괄하는 개념으로 이해될 수 있으며, 일 실시 예에서, 식별정보(예: 주민번호, 휴대폰번호), 결제정보(예: 구매이력), 신용정보(예: 신용등급) 등과 같은 개인정보를 의미할 수 있다.The receiving unit 110 may obtain anonymized anonymized data from original data. Here, anonymization can be understood as a concept including deletion. For example, 'anonymization' includes permanently deleting personal identification information or replacing all or part of the personal identification information with a unique identifier of the institution concerned. can do. Here, personal identification information may be understood as a concept encompassing an identifier capable of identifying an individual with only a single attribute, a quasi-identifier capable of identifying an individual through a combination of multiple attributes, and the like, and in one embodiment, identification information (e.g., resident number, mobile phone number), payment information (eg, purchase history), credit information (eg, credit rating), and the like.

일 실시 예에서, 원본 데이터는 개인식별정보를 포함하고, 복수개의 속성을 통해 정의되고 식별 가능한 식별 값으로 구성된 복수개의 레코드를 포함할 수 있다. 일 실시 예에서, 익명 데이터는 원본 데이터의 개인식별정보를 익명화한 익명 값을 포함하고, 원본 데이터를 구성하는 복수개의 레코드 중 하나 이상에 대해서 일부 속성의 식별 값이 익명화된 제 1 레코드와, 익명화가 생략된 나머지 레코드인 제 2 레코드를 포함할 수 있다. 예컨대, 익명 데이터는 원본 데이터에서 단일 속성만으로 개인을 식별 가능하여 식별자에 해당하는 속성의 값들은 제거되고, 단일 속성으로는 개인을 식별할 수 없으나 속성들의 조합을 통해 개인을 식별 가능하게 하는 준식별자에 해당하는 일부 속성의 값들은 익명화 처리된 상태일 수 있다.In one embodiment, the original data may include personal identification information and may include a plurality of records defined through a plurality of attributes and composed of identifiable identification values. In one embodiment, the anonymous data includes an anonymized value obtained by anonymizing personal identification information of the original data, and a first record in which identification values of some attributes of at least one of a plurality of records constituting the original data are anonymized; may include a second record that is the remaining record from which is omitted. For example, in anonymous data, an individual can be identified with only a single attribute in the original data, and the values of the attribute corresponding to the identifier are removed. The values of some properties corresponding to may be in an anonymized state.

일 실시 예에서, 수신부(110)는 저장부(130)로부터 기저장된 익명 데이터를 수신할 수 있다. 다른 일 실시 예에서, 수신부(110)는 저장부(130)로부터 기저장된 원본 데이터를 수신하고, 프로세서(120)는 원본 데이터에 대해서 기설정된 익명화 프로세스를 수행하여 익명 데이터를 획득할 수 있다. 일 실시 예에서, 익명화 프로세스는 기저장된 K-익명성 알고리즘에 기초하여 수행될 수 있고, 예컨대, 비식별 수준을 의미하는 계수인 기설정된 K값(예: 5)에 기반하여 K-익명성 알고리즘에 따라 원본 데이터로부터 익명화된 제 1 레코드가 획득될 수 있다.In one embodiment, the receiving unit 110 may receive pre-stored anonymous data from the storage unit 130 . In another embodiment, the receiving unit 110 may receive pre-stored original data from the storage unit 130, and the processor 120 may acquire anonymized data by performing a preset anonymization process on the original data. In one embodiment, the anonymization process may be performed based on a pre-stored K-anonymity algorithm. An anonymized first record may be obtained from the original data according to.

프로세서(120)는 익명 데이터에 포함된 제 1 레코드와 제 2 레코드 간의 비교 결과에 기초하여, 제 1 레코드에 포함된 익명 값의 예측 값을 결정할 수 있다. 일 실시 예에서, 프로세서(120)는 제 1 레코드와 제 2 레코드 간의 속성별 식별 값을 기설정 기준에 따라 비교 분석함으로써, K-익명성 알고리즘에 따라 값을 식별할 수 없게 된 익명 값(예: K값 미만의 이용건수, 고객수)에 대해서 실제 값에 근접하도록 추정한 예측 값을 결정할 수 있다. 또한, 프로세서(120)는 익명 값에서 예측 값으로 대체되도록 제 1 레코드를 갱신할 수 있다. 이와 같이, 익명성이 보장된 데이터에 대해 신뢰성 있는 예측 값이 제공됨에 따라, 익명 데이터의 데이터로서 이용 가치가 크게 향상될 수 있다.The processor 120 may determine a predicted value of an anonymous value included in the first record based on a comparison result between the first record and the second record included in the anonymous data. In one embodiment, the processor 120 compares and analyzes the identification value for each attribute between the first record and the second record according to a predetermined criterion, and the anonymous value whose value cannot be identified according to the K-anonymity algorithm (eg : The number of uses and the number of customers below the K value) can determine the predicted value that is estimated to be close to the actual value. Also, the processor 120 may update the first record so that the anonymous value is replaced with the predicted value. In this way, as a reliable predictive value is provided for anonymity-guaranteed data, the use value of anonymized data can be greatly improved.

이하, 도 2 내지 도 6을 더 참조하여 디바이스(100)가 익명 값에 대한 예측 값을 결정하는 실시 예에 대해 보다 구체적으로 서술하도록 한다.Hereinafter, with further reference to FIGS. 2 to 6 , an embodiment in which the device 100 determines a prediction value for an anonymous value will be described in more detail.

도 2는 일 실시 예에 따른 디바이스(100)가 익명 값(12)에 대한 예측 값을 결정하는 방법을 나타내는 흐름도이고, 도 3은 일 실시 예에 따른 제 1 레코드(10), 제 2 레코드(20) 및 기준 레코드(30)를 나타내는 도면이다.2 is a flowchart illustrating a method of determining a predicted value for an anonymous value 12 by a device 100 according to an embodiment, and FIG. 3 is a flowchart illustrating a first record 10 and a second record ( 20) and a reference record 30.

도 2 내지 도 3(a)를 참조하면, 단계 S210에서 수신부(110)는 하나 이상의 제 1 레코드(10) 및 복수개의 제 2 레코드(20)를 획득할 수 있다. 예를 들면, 수신부(110)는 사용자 입력에 기초하여 저장부(130)로부터 하나 이상의 제 1 레코드(10) 및 복수개의 제 2 레코드(20)를 로딩할 수 있고, 다른 예를 들면, 프로세서(120)에 의한 익명화 프로세스를 통해 원본 데이터로부터 획득되는 하나 이상의 제 1 레코드(10) 및 복수개의 제 2 레코드(20)를 수신할 수 있다.Referring to FIGS. 2 to 3(a) , in step S210, the receiving unit 110 may obtain one or more first records 10 and a plurality of second records 20. For example, the receiving unit 110 may load one or more first records 10 and a plurality of second records 20 from the storage unit 130 based on a user input, and for another example, a processor ( One or more first records 10 and a plurality of second records 20 obtained from original data through the anonymization process by 120 may be received.

여기에서, 제 1 레코드(10)는 제 1 식별 값(11) 및 익명 값(12)을 포함하고, 예컨대, 상술한 것처럼 원본 데이터를 구성하는 복수개의 레코드 중 하나 이상에 대해서 일부 속성의 식별 값을 익명화한 레코드에 해당할 수 있다. 제 1 식별 값(11)은 제 1 레코드(10)에서 식별 가능한 값을 의미하며, 예컨대, 원본 데이터에서 제 1 레코드(10)에 대응되는 원본 레코드의 해당 속성에 대한 식별 값과 같다. 또한, 익명 값(12)은 원본 데이터로부터 기설정 비식별 수준(예: K값)에 따라 익명화된 값을 의미하며, 예컨대, 원본 데이터에서 제 1 레코드(10)에 대응되는 원본 레코드의 해당 속성에 대한 식별 값으로부터 삭제 처리된 널(null) 값 또는 대체 처리된 고유식별기호일 수 있다.Here, the first record 10 includes a first identification value 11 and an anonymous value 12, and, for example, as described above, an identification value of some attribute for at least one of a plurality of records constituting the original data may correspond to anonymized records. The first identification value 11 means a value that can be identified in the first record 10, and is equal to, for example, an identification value for a corresponding attribute of an original record corresponding to the first record 10 in original data. In addition, the anonymized value 12 means a value anonymized from the original data according to a predetermined non-identification level (eg, K value), and, for example, the corresponding attribute of the original record corresponding to the first record 10 in the original data It may be a null value deleted from the identification value for , or a unique identifier replaced.

또한, 제 2 레코드(20)는 제 2 식별 값(21)으로 구성되고, 예컨대, 제 1 레코드(10)에 의해 익명화 조건이 충족됨에 따라 원본 데이터를 구성하는 복수개의 레코드 중 익명화가 생략된 나머지 레코드에 해당할 수 있다. 제 2 식별 값(21)은 제 2 레코드(20)에서 식별 가능한 값을 의미하며, 예컨대, 원본 데이터에서 제 2 레코드(20)에 대응되는 원본 레코드의 해당 속성에 대한 식별 값과 같다. 또한, 명세서 전반에서, '값'은 숫자 이외에도, 글자, 기호, 이들의 조합 등을 포괄하는 개념으로 이해될 수 있다.In addition, the second record 20 is composed of the second identification value 21, and, for example, as the anonymization condition is satisfied by the first record 10, the remainder of the plurality of records constituting the original data where anonymization is omitted. may correspond to the record. The second identification value 21 means a value that can be identified in the second record 20, and is equal to, for example, an identification value for a corresponding attribute of an original record corresponding to the second record 20 in original data. Also, throughout the specification, 'value' may be understood as a concept encompassing letters, symbols, combinations thereof, and the like, in addition to numbers.

한편, 도 3에서는 제 1 레코드(10)의 개수가 복수개로 도시되어 있으나, 이에 제한되는 것은 아니며, 실시 예에 따라서 단수일 수도 있다. 또한, 제 1 레코드(10)의 개수가 복수개인 경우, 복수개의 제 1 레코드(10) 중 현재 단계에서 예측 값을 결정하고자 하는 어느 하나를 기설정 방식(예: 랜덤, 순차, 사용자 입력 등)에 따라 선택한 후에 이후 단계들이 진행될 수 있다. 이하에서는 설명의 편의상, 예측 값을 결정하고자 하는 익명 값(12) 및 해당 익명 값(12)이 포함된 제 1 레코드(10)가 특정된 경우를 가정하여 설명하도록 한다.Meanwhile, although the number of first records 10 is shown as plural in FIG. 3, it is not limited thereto and may be singular according to embodiments. In addition, when the number of first records 10 is plural, one of the plurality of first records 10 for which a prediction value is to be determined in the current step is determined using a preset method (eg, random, sequential, user input, etc.) After selecting according to, subsequent steps may proceed. Hereinafter, for convenience of description, it is assumed that an anonymous value 12 to be determined as a prediction value and a first record 10 including the anonymous value 12 are specified.

일 실시 예에서, 각 레코드를 정의하는 복수개의 속성은 기준년월, 광역시도, 시군구, 읍면동, 업종, 성별, 이용금액 및 이용건수 중 적어도 하나를 포함할 수 있으나, 이에 제한되지 않으며, 개인정보에 속하는 다양한 항목들이 다양한 조합으로 이용될 수 있다. 또한, 일 실시 예에서, 복수개의 속성 각각은 단일 속성만으로는 개인을 식별할 수 없으나 일부 속성의 조합을 통해 개인을 식별 가능한 준식별자에 대응할 수 있다.In one embodiment, the plurality of attributes defining each record may include at least one of the base year and month, metropolitan city and province, city, county, township, dong, type of business, gender, amount of use, and number of uses, but are not limited thereto. The various items belonging thereto may be used in various combinations. Also, in one embodiment, each of a plurality of attributes may correspond to a quasi-identifier capable of identifying an individual through a combination of some attributes, although a single attribute alone cannot identify an individual.

일 실시 예에서, 복수개의 속성은 하나 이상의 기준 레코드(30)를 결정하는데 이용되는 제 1 속성 및 익명 값(12)에 대한 예측 값을 결정하는데 이용되는 제 2 속성으로 구분될 수 있다. 일 실시 예에서, 제 1 속성은 범주형 변수에 대응할 수 있고, 기설정된 복수개의 값 중 어느 하나만 가질 수 있는 이산적 데이터로 표현될 수 있으며, 예를 들면, 기준년월, 광역시도, 시군구, 읍면동, 업종 및 성별 중 적어도 하나를 포함할 수 있다. 또한, 일 실시 예에서, 제 2 속성은 수치형 변수에 대응할 수 있고, 기설정된 수치 범위 내의 값을 가질 수 있는 연속형 데이터로 표현될 수 있으며, 예를 들면, 이용금액 및 이용건수 중 적어도 하나를 포함할 수 있다.In one embodiment, the plurality of attributes may be divided into a first attribute used to determine one or more reference records 30 and a second attribute used to determine a predicted value for the anonymous value 12 . In one embodiment, the first attribute may correspond to a categorical variable and may be expressed as discrete data that may have only one of a plurality of preset values, for example, base year and month, metropolitan city, city, county, township, and township. , may include at least one of industry and gender. In addition, in one embodiment, the second attribute may correspond to a numerical variable and may be expressed as continuous data having a value within a preset numerical range, for example, at least one of the amount of use and the number of uses. can include

도 3(b)를 참조하면, 단계 S220에서 프로세서(120)는 제 1 식별 값(11)과 제 2 식별 값(22) 간의 비교 결과에 기초하여, 복수개의 제 2 레코드(20) 중 제 1 레코드(10)에 대응되는 하나 이상의 기준 레코드(30)를 결정할 수 있다. 여기에서, 하나 이상의 기준 레코드(30)는 복수개의 제 2 레코드(20) 중에서 제 1 레코드(10)의 익명 값(12)에 대한 예측 값을 결정하는데 이용되는 레코드를 나타낸다.Referring to FIG. 3(b) , in step S220, the processor 120 selects a first of a plurality of second records 20 based on a comparison result between the first identification value 11 and the second identification value 22. One or more reference records 30 corresponding to the record 10 may be determined. Here, one or more reference records 30 represent records used to determine the prediction value of the anonymous value 12 of the first record 10 among a plurality of second records 20 .

일 실시 예에서, 프로세서(120)는 복수개의 속성 중 기설정 개수 이상의 속성에 대해서 제 1 식별 값(11)과 제 2 식별 값(22) 간에 상호 동일하거나 기설정 수준 이상 유사한지 여부에 기초하여, 복수개의 제 2 레코드(20) 중 제 1 레코드(10)에 대응되는 하나 이상의 기준 레코드(30)를 결정할 수 있다. 예를 들면, 도 3(a)에 도시된 복수개의 제 1 레코드(10) 중 1번째 레코드(이하, '제 1-1 레코드'라 함)의 익명 값(12)(이하, 제 1 익명 값'이라 함)에 대해 예측 값을 결정하는 경우, 도 3(b)에 도시된 것처럼, 복수개의 속성 중 기준년월, 광역시도, 시군구 및 업종에 대한 식별 값이 서로 동일한 레코드를 복수개의 제 2 레코드(20) 중에서 필터링하여 하나 이상의 기준 레코드(30)로서 결정할 수 있다.In one embodiment, the processor 120 determines whether the first identification value 11 and the second identification value 22 are the same or similar to each other at a predetermined level or more for attributes equal to or greater than a preset number among a plurality of attributes. , one or more reference records 30 corresponding to the first record 10 among the plurality of second records 20 may be determined. For example, the anonymous value 12 (hereinafter, the first anonymous value '), as shown in FIG. 3 (b), a plurality of records having the same identification values for the base year and month, metropolitan city, city, county, and industry among a plurality of attributes are selected as a plurality of second records (20) can be filtered to determine one or more reference records (30).

일 실시 예에서, 하나 이상의 기준 레코드(30) 각각은, 복수개의 속성 중 하나에 대해서만 제 2 식별 값(21)이 제 1 식별 값(11)과 상이할 수 있다. 즉, 하나의 속성만 다르고 나머지 속성에 대한 식별 값은 모두 동일한 레코드만 필터링하여 하나 이상의 기준 레코드(30)로서 결정할 수 있다.In one embodiment, the second identification value 21 of each of the one or more reference records 30 may be different from the first identification value 11 only for one of a plurality of attributes. That is, records that differ in only one attribute and have identical identification values for the remaining attributes may be filtered and determined as one or more reference records 30 .

일 실시 예에서, 하나 이상의 기준 레코드(30) 각각은, 범주형 변수에 대응하는 복수개의 제 1 속성 중 하나에 대해서만 제 2 식별 값(21)이 제 1 식별 값(11)과 상이할 수 있다. 예컨대, 제 1-1 레코드와 비교하여, 기준년월, 광역시도, 시군구, 읍면동, 업종 및 성별의 값이 모두 동일하거나 적어도 이들 중 어느 하나에 대해서만 값이 다른 경우에는 기준 레코드(30)로서 결정될 수 있다.In one embodiment, each of the one or more reference records 30 may have a second identification value 21 different from the first identification value 11 only for one of a plurality of first attributes corresponding to a categorical variable. . For example, compared to the 1-1 record, if the values of the base year and month, metropolitan city, city, county, district, type of business, and gender are all the same, or if the values are different for at least one of them, it can be determined as the reference record 30 there is.

일 실시 예에서, 하나 이상의 기준 레코드(30) 각각은 제 3 식별 값(31)으로 구성된다. 여기에서, 제 3 식별 값(31)은 기준 레코드(30)에서 식별 가능한 값을 의미하며, 예컨대, 제 2 레코드(20) 중 제 1 레코드(10)에 대응되는 레코드의 속성별 식별 값과 같다.In one embodiment, each of the one or more reference records 30 consists of a third identification value 31 . Here, the third identification value 31 means a value that can be identified in the reference record 30, and is equal to, for example, an identification value for each attribute of a record corresponding to the first record 10 among the second records 20. .

단계 S230에서 프로세서(120)는 하나 이상의 기준 레코드(30)를 이용하여 익명 값(12)에 대한 예측 값을 비식별 수준의 범위 내에서 결정할 수 있다. 일 실시 예에서, 프로세서(120)는 하나 이상의 기준 레코드(30)에 대한 통계 분석 결과에 기초하여 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예컨대, 프로세서(120)는 익명화에 적용된 K값이 5인 경우, 익명 값(12)이 속하는 항목(예: 이용 건수)에 대한 제 3 식별 값(31)들의 합(예: 12 = 7+5)에 기초하여, K값을 초과하지 않는 0 이상 4 이항의 자연수 범위 내에서 익명 값(12)에 대한 예측 값을 결정할 수 있다.In step S230, the processor 120 may determine a predicted value for the anonymous value 12 within the range of the non-identification level using one or more reference records 30. In one embodiment, the processor 120 may determine a predicted value for the anonymous value 12 based on a statistical analysis result for one or more reference records 30 . For example, when the K value applied to anonymization is 5, the processor 120 calculates the sum (eg, 12 = 7+5) of the third identification values 31 for the item (eg, the number of uses) to which the anonymization value 12 belongs. ), it is possible to determine a prediction value for the anonymous value 12 within the range of 0 or more and 4 binomial natural numbers that do not exceed the K value.

일 실시 예에서, 기준 레코드(30)에 대한 통계 분석 결과는 기준 레코드(30)의 개수, 기준 레코드(30)가 복수인 경우 복수의 기준 레코드(30)로부터 도출되는 이용 건수의 합, 기준 레코드(30)로부터 도출되는 건당 단가 등을 포함할 수 있다. 일 예로, 프로세서(120)는 익명 값(12)이 속하는 항목이 이용 건수인 경우, 기준 레코드(30) 각각의 이용 건수에 대한 제 3 식별 값(31)을 합하는 연산을 통해 이용 건수의 합을 산출하고, 기준 레코드(30) 각각의 이용 금액에 대한 제 3 식별 값(31)에서 이용 건수에 대한 제 3 식별 값(31)을 나누는 연산을 통해 건당 단가를 산출할 수 있다.In one embodiment, the statistical analysis result for the reference record 30 is the number of reference records 30, the sum of the number of uses derived from the plurality of reference records 30 when the reference records 30 are plural, the reference record It may include the unit price per case derived from (30). For example, when the item to which the anonymous value 12 belongs is the number of uses, the processor 120 calculates the sum of the number of uses through an operation of summing the third identification value 31 for the number of uses of each reference record 30. The unit price per case may be calculated through an operation of dividing the third identification value 31 for the number of uses from the third identification value 31 for the amount of use of each reference record 30 .

일 실시 예에서, 기준 레코드(30)에 대한 통계 분석 결과는 수치형 변수에 대응하는 복수개의 제 2 속성에 대한 제 3 식별 값(31)의 합산, 평균, 가중 평균, 중앙값 등 다양한 연산을 통해 획득될 수 있다.In one embodiment, the statistical analysis result for the reference record 30 is obtained through various operations such as summation, average, weighted average, and median of third identification values 31 for a plurality of second attributes corresponding to numerical variables. can be obtained

일 실시 예에서, 프로세서(120)는 제 1 레코드(10)에 대한 상위 데이터의 유무, 하나 이상의 기준 레코드(30)에 대한 통계 분석 결과 및 제 1 레코드(10)의 개수 중 적어도 하나에 기초하여, 익명 값(12)에 대한 예측 값을 결정할 수 있다.In one embodiment, the processor 120 determines whether or not there is higher data for the first record 10, a statistical analysis result for one or more reference records 30, and at least one of the number of first records 10. , it is possible to determine the predicted value for the anonymous value (12).

일 실시 예에서, 프로세서(120)는 제 1 레코드(10)에 대한 상위 데이터의 유무, 하나 이상의 기준 레코드(30)에 대한 통계 분석 결과 및 제 1 레코드(10)의 개수 중 적어도 하나에 기초하여, 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예컨대, 제 1 레코드(10)에 대한 상위 데이터가 없는 경우, 하나 이상의 기준 레코드(30)에 대한 통계 분석 결과에 기초하여 익명 값(12)에 대한 예측 값을 결정하고, 제 1 레코드(10)에 대한 상위 데이터가 있는 경우, 제 1 레코드(10)에 대한 상위 데이터, 하나 이상의 기준 레코드(30)에 대한 통계 분석 결과 및 제 1 레코드(10)의 개수를 모두 이용하여 익명 값(12)에 대한 예측 값을 결정할 수 있다.In one embodiment, the processor 120 determines whether or not there is higher data for the first record 10, a statistical analysis result for one or more reference records 30, and at least one of the number of first records 10. , it is possible to determine the predicted value for the anonymous value (12). For example, when there is no top data for the first record 10, a predicted value for the anonymous value 12 is determined based on a statistical analysis result for one or more reference records 30, and the first record 10 If there is top data for , the top data for the first record 10, the statistical analysis result for one or more reference records 30, and the number of first records 10 are all used to determine the anonymous value 12. predictive values can be determined.

이하, 도 4 내지 도 6을 더 참조하여 디바이스(100)가 익명 값에 대한 예측 값을 결정하는 제 1 실시 예 내지 제 3 실시 예에 대해서 구체적으로 서술하도록 한다.Hereinafter, with further reference to FIGS. 4 to 6 , first to third embodiments in which the device 100 determines a prediction value for an anonymous value will be described in detail.

도 4는 일 실시 예에 따른 디바이스(100)가 제 1 실시 예에 따라 익명 값(12)에 대한 예측 값을 결정하는 동작을 설명하기 위한 도면이다. 제 1 실시 예는 제 1 레코드(10)에 대한 상위 데이터가 없는 경우, 디바이스(100)가 익명 값(12)에 대한 예측 값을 결정하는 방법을 나타낸다.FIG. 4 is a diagram for explaining an operation in which the device 100 according to an embodiment determines a prediction value for an anonymous value 12 according to the first embodiment. The first embodiment shows a method for the device 100 to determine a predicted value for an anonymous value 12 when there is no upper data for the first record 10 .

도 4를 참조하면, 프로세서(120)는 제 1 레코드(10)에 대한 상위 데이터가 없는 경우, 하나 이상의 기준 레코드(30)에 기초하여 결정되는 건당 단가를 이용하여 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예를 들면, 프로세서(120)는 저장부(130)에 제 1 레코드(10)에 대한 상위 데이터가 저장되어 있지 않은 경우, 기준 레코드(30)에 대해 이용 금액의 합과 이용 건수의 합을 구한 후, 이용 금액의 합에서 이용 건수의 합을 나누는 방식으로 기준 레코드(30)에서의 건당 단가를 산출하고, 제 1 레코드(10)의 이용 금액과 건당 단가를 비교한 결과에 따라 익명 값(12)에 대한 예측 값을 결정할 수 있다.Referring to FIG. 4 , the processor 120 predicts an anonymous value 12 using a unit price per case determined based on one or more reference records 30 when there is no upper data for the first record 10. value can be determined. For example, if the upper data for the first record 10 is not stored in the storage 130, the processor 120 calculates the sum of the amount of use and the sum of the number of uses for the reference record 30. Then, the unit price per case in the reference record 30 is calculated by dividing the sum of the number of uses from the sum of the amount of use, and the anonymous value (12 ) can determine the predicted value for.

일 실시 예에서, 제 1 레코드(10)에 대한 상위 데이터는 복수개의 속성 중 기설정 개수 이상의 속성에 대해서 제 1 레코드(10)의 제 1 식별 값(11)과 동일한 값을 갖는 원본 데이터 기반의 통계치를 나타낸다. 예컨대, 제 1 레코드(10)에 대한 상위 데이터는, 원본 데이터에 포함된 복수개의 레코드 중 기준년월, 광역시도, 시군구, 업종, 성별에 대한 값이 제 1 레코드(10)와 동일한 레코드의 이용 건수의 합, 이용 금액의 합 등을 포함할 수 있다(도 5(a) 참조).In one embodiment, the upper data for the first record 10 is based on original data having the same value as the first identification value 11 of the first record 10 for a predetermined number or more of a plurality of attributes. represent statistics. For example, the top data for the first record 10 is the number of uses of a record having the same values for the base year and month, metropolitan city, city, county, district, industry, and gender as those of the first record 10 among a plurality of records included in the original data. It may include the sum of, the sum of the amount of use, etc. (see FIG. 5 (a)).

제 1-1 레코드의 예를 들면, 프로세서(120)는 기준년월, 지역, 업종 등의 속성의 값이 동일한 기준 레코드(30) 중에서 (2021.09, 서울, 중구, 의류, 남성) 그룹의 건당 단가를 33,428원 (= 234,000원 / 7건)으로 산출하고(도 3(b)의 1번째 레코드 참조), 제 1-1 레코드의 이용 금액 40,000원에서 33,428원을 나누는 방식으로 1.19을 산출할 수 있다. 제 1 익명 값의 예측 값은 1.19를 자연수 범위에서 추정함에 따라 1건 또는 2건으로 추정될 수 있고, 1.19를 반올림한 1건으로 결정될 수 있다.For example, in the 1-1 record, the processor 120 determines the unit price per case of the group (2021.09, Seoul, Jung-gu, clothing, male) among the reference records 30 having the same attribute values such as the base year and month, region, and industry. It is calculated as 33,428 won (= 234,000 won / 7 cases) (refer to the 1st record in FIG. 3(b)), and 1.19 is calculated by dividing 33,428 won from 40,000 won in the 1-1 record. The predicted value of the first anonymous value may be estimated as one or two cases by estimating 1.19 within the natural number range, and may be determined as one case obtained by rounding off 1.19.

제 1-2 레코드(도 3(a)에 도시된 복수개의 제 1 레코드(10) 중 2번째 레코드)의 익명 값(12)(이하, 제 2 익명 값'이라 함)에 대한 예를 들면, 프로세서(120)는 위에서 산출한 (2021.09, 서울, 중구, 의류, 남성) 그룹의 건당 단가를 33,428원을 이용하여, 제 1-2 레코드의 이용 금액 70,000원에서 33,428원을 나누는 방식으로 2.09를 산출할 수 있다. 제 2 익명 값의 예측 값은 2.09를 자연수 범위에서 추정함에 따라 2건 또는 3건으로 추정될 수 있고, 2.09를 반올림한 2건으로 결정될 수 있다.For example, for the anonymous value 12 (hereinafter referred to as 'second anonymous value') of the 1-2 records (the second record among the plurality of first records 10 shown in FIG. 3(a)), The processor 120 calculates 2.09 by dividing 33,428 won from 70,000 won of the usage amount of records 1-2 using 33,428 won as the unit price per case of the group (2021.09, Seoul, Jung-gu, clothing, male) calculated above. can do. The predicted value of the second anonymous value may be estimated as 2 or 3 cases by estimating 2.09 within the natural number range, and may be determined as 2 cases obtained by rounding 2.09.

제 1-3 레코드(도 3(a)에 도시된 복수개의 제 1 레코드(10) 중 3번째 레코드)의 익명 값(12)(이하, 제 3 익명 값'이라 함)에 대한 예를 들면, 기준년월, 지역, 업종 등의 속성의 값이 동일한 기준 레코드(30) 중에서 (2021.09, 서울, 중구, 의류, 여성) 그룹의 건당 단가를 39,091원 (= 430,000원 / 11건)으로 산출하고(도 3(b)의 2~3번째 레코드 참조), 제 1-3 레코드의 이용 금액 80,000원에서 39,091원을 나누는 방식으로 2.04을 산출할 수 있다. 제 3 익명 값의 예측 값은 2.04를 자연수 범위에서 추정함에 따라 2건 또는 3건으로 추정될 수 있고, 2.04를 반올림한 2건으로 결정될 수 있다.For example, for the anonymous value 12 (hereinafter referred to as 'third anonymous value') of records 1-3 (the third record among the plurality of first records 10 shown in FIG. 3(a)), Among the base records (30) with the same attribute values such as base year and month, region, and industry, the unit price per case of the group (2021.09, Seoul, Jung-gu, clothing, female) is calculated as KRW 39,091 (= KRW 430,000 / 11 cases) (also 3(b), 2nd and 3rd records), 2.04 can be calculated by dividing 39,091 won from 80,000 won in records 1-3. The predicted value of the third anonymous value may be estimated as 2 or 3 cases by estimating 2.04 in the range of natural numbers, and may be determined as 2 cases obtained by rounding 2.04.

제 1 실시 예에서, 프로세서(120)는 상술한 예시를 바탕으로, 하기 수학식 1에 기초하여 익명 값(12)에 대한 예측 값을 결정할 수 있다.In the first embodiment, the processor 120 may determine a prediction value for the anonymous value 12 based on Equation 1 below based on the above-described example.

[수학식 1][Equation 1]

y = f₁( x₁ / x₂)y = f ₁ ( x ₁ / x ₂ )

(여기에서, x₁및 x₁는 기준 레코드(30) 중에서 기준년월, 광역시도, 시군구, 업종 및 성별에 대한 식별 값이 제 1 레코드(10)와 동일한 하나 이상의 레코드의 이용 금액의 합 및 이용 건수의 합을 각각 나타내고, f₁( )는 내부 값에 대해 반올림 연산하는 함수를 나타내며, y는 익명 값(12)에 대한 예측 값을 나타냄)(Here, x ₁ and x ₁ are the sum of the usage amount of one or more records having the same identification value for the base year and month, metropolitan city, city, county, district, industry, and gender among the reference records 30 as the first record 10 and usage Each represents the sum of the cases, f ₁ ( ) represents a function that rounds to an internal value, and y represents a predicted value for an anonymous value (12)

제 1 실시 예에서, 프로세서(120)는 하나 이상의 기준 레코드(30)의 개수가 복수개인 경우, 복수개의 기준 레코드(30) 각각에 기초하여 복수개의 건당 단가를 결정하고, 복수개의 건당 단가에 대해 평균 또는 가중 평균한 값을 이용하여 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예를 들면, 프로세서(120)는 기준 레코드(30) 각각에 대해 이용 금액에서 이용 건수를 나누어 건당 단가 33,428원 (= 234,000원 / 7건), 46,667원 (= 280,000원 / 6건), 30,000원 (= 150,000원 / 5건)을 각각 산출하고, 이들을 평균한 36,698원으로 건당 단가를 산출한 후, 제 1-1 레코드의 경우 이용 금액 40,000원과 비교하고, 제 1-2 레코드의 경우 이용 금액 70,000원과 비교하고, 제 1-3 레코드의 경우 이용 금액 80,000원과 비교하여, 제 1 익명 값 내지 제 3 익명 값의 예측 값을 각각 1건, 2건, 2건으로 결정할 수 있다.In the first embodiment, when the number of one or more reference records 30 is plural, the processor 120 determines a plurality of unit prices per case based on each of the plurality of reference records 30, and for the plurality of unit prices per case An average or weighted average can be used to determine the predicted value for the anonymized value (12). For example, the processor 120 divides the number of uses from the amount of use for each of the reference records 30, and the unit price per case is 33,428 won (= 234,000 won / 7 cases), 46,667 won (= 280,000 won / 6 cases), 30,000 won (= KRW 150,000 / 5 cases), calculate the unit price per case as the average of these, KRW 36,698, and compare with the usage amount of KRW 40,000 in the case of the 1-1 record, and the usage amount in the case of the 1-2 record Compared with 70,000 won and compared with the usage amount of 80,000 won in the case of records 1-3, prediction values of the first to third anonymity values may be determined as 1 case, 2 cases, and 2 cases, respectively.

제 1 실시 예에서, 프로세서(120)는 복수개의 건당 단가에 대해 이용 건수에 높게 부여되는 가중치에 따라 가중 평균한 값을 이용하여 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예를 들면, 프로세서(120)는 상술한 것처럼 기준 레코드(30) 각각의 건당 단가 33,428원, 46,667원, 30,000원을 산출한 후, 이용 건수가 7건으로 가장 많은 33,428원에 가장 큰 제 1 가중치를 부여하고, 이용 건수가 6건인 46,667원에는 제 1 가중치보다 작은 제 2 가중치를 부여하고, 이용 건수가 5건인 30,000원에는 제 2 가중치보다 작은 제 3 가중치를 부여한 후, 가중치에 따라 가중 평균한 건당 단가에 기초하여 익명 값(12)에 대한 예측 값을 결정할 수 있다.In the first embodiment, the processor 120 may determine a predicted value for the anonymity value 12 using a weighted average of a plurality of unit prices per case according to a weight given to a high number of uses. For example, as described above, the processor 120 calculates unit prices of 33,428 won, 46,667 won, and 30,000 won per reference record 30, respectively, and then assigns the largest first weight to 33,428 won, which has the largest number of uses (7 cases). After assigning a second weight smaller than the first weight to KRW 46,667 with 6 cases of use and assigning a third weight smaller than the second weight to KRW 30,000 with five cases of use, weighted average per case Based on the unit price, a predicted value for the anonymous value (12) can be determined.

일 실시 예에서, 복수의 건당 단가 각각에 부여되는 가중치는 이용 건수의 크기 또는 이용 건수 간의 차이에 비례하도록 결정될 수 있다. 다른 일 실시 예에서, 복수의 건당 단가 각각에 부여되는 가중치는 특정 속성의 값이 동일한지 여부에 기초하여 결정될 수 있고, 예컨대, 성별이 동일한 경우, 더 큰 가중치가 부여될 수 있다. 또 다른 일 실시 예에서, 복수의 건당 단가 각각에 부여되는 가중치는 동일한 값을 갖는 속성의 개수에 비례하도록 결정될 수 있다.In an embodiment, a weight given to each of the plurality of unit prices per case may be determined to be proportional to the size of the number of uses or the difference between the number of uses. In another embodiment, a weight assigned to each of a plurality of unit prices may be determined based on whether the value of a specific attribute is the same, and, for example, a higher weight may be assigned when the gender is the same. In another embodiment, weights assigned to each of a plurality of unit prices per case may be determined in proportion to the number of attributes having the same value.

도 5는 일 실시 예에 따른 디바이스(100)가 제 2 실시 예에 따라 익명 값(12)에 대한 예측 값을 결정하는 동작을 설명하기 위한 도면이다. 제 2 실시 예는 제 1 레코드(10)에 대한 상위 데이터가 있는 경우, 디바이스(100)가 익명 값(12)에 대한 예측 값을 결정하는 방법 중 하나를 나타낸다.5 is a diagram for explaining an operation in which the device 100 according to an embodiment determines a prediction value for an anonymous value 12 according to the second embodiment. The second embodiment shows one of the methods for the device 100 to determine the predicted value for the anonymous value 12 when there is higher data for the first record 10 .

도 5를 참조하면, 프로세서(120)는 제 1 레코드(10)에 대한 상위 데이터가 있고, 익명 값(12)이 복수개인 경우, 복수개의 익명 값(12)의 합을 나타내는 합계 익명 값을 기준 레코드(30)에 기초하여 결정할 수 있다. 일 실시 예에서, 제 1 레코드(10)에 대한 상위 데이터는, 도 5(a)에 도시된 것처럼, 성별에 따라 분류될 수 있다. 예를 들면, 프로세서(120)는 저장부(130)에 제 1 레코드(10)에 대한 상위 데이터가 저장되어 있고, 남성 속성에 대한 익명 값(12)의 개수가 2개로서 복수인 경우(예: 제 1-1 내지 제 1-2 레코드), 제 1 레코드(10)에 대한 상위 데이터에 있는 (2021.09, 서울, 중구, 의류, 남성) 그룹의 이용 건수의 합인 9건에서, 기준 레코드(30)에 있는 (2021.09, 서울, 중구, 의류, 남성) 그룹의 이용 건수의 합인 7건을 감산하여, 합계 익명 값 2를 산출할 수 있다. 이에 따라 복수개의 익명 값(12) 각각은 해당 합계 익명 값의 범위 내에서 정해질 수 있다.Referring to FIG. 5 , when there is upper data for the first record 10 and there are a plurality of anonymous values 12, the processor 120 uses the total anonymous value representing the sum of the plurality of anonymous values 12 as a standard. A decision can be made based on the record (30). In one embodiment, the upper data for the first record 10 may be classified according to gender, as shown in FIG. 5(a). For example, the processor 120, when upper data for the first record 10 is stored in the storage unit 130 and the number of anonymous values 12 for the male attribute is two (e.g. : 1-1 to 1-2 records), the reference record (30 A total anonymity value of 2 can be calculated by subtracting 7 cases, which is the sum of the number of uses of the (2021.09, Seoul, Jung-gu, clothing, male) group in (2021.09, Seoul, Jung-gu, clothing, male). Accordingly, each of the plurality of anonymous values 12 may be determined within the range of the corresponding total anonymous value.

제 2 실시 예에서, 프로세서(120)는 합계 익명 값과 복수개의 익명 값(12)의 개수가 동일한 경우, 복수개의 익명 값(12) 각각을 1로 결정할 수 있다. 위의 예에 따를 때, 합계 익명 값 2와 남성 속성에 대한 익명 값(12)의 개수가 2개로 동일하므로, 프로세서(120)는 제 1-1 레코드의 제 1 익명값 a과 제 1-2 레코드의 제 2 익명값 b를 각각 1로 결정할 수 있다.In the second embodiment, the processor 120 may determine each of the plurality of anonymous values 12 as 1 when the total anonymous value and the number of the plurality of anonymous values 12 are the same. According to the above example, since the total anonymity value 2 and the number of anonymity values 12 for the male attribute are equal to 2, the processor 120 calculates the first anonymous value a of record 1-1 and the number of anonymous values 12 of record 1-2. The second anonymous value b of each record may be determined to be 1.

제 2 실시 예에서, 프로세서(120)는 제 1 레코드(10)에 대한 상위 데이터가 있고, 익명 값(12)이 1개인 경우, 기준 레코드(30)에 기초하여 합계 익명 값을 결정하고, 익명 값(12)울 합계 익명 값으로 결정할 수 있다. 예를 들면, 프로세서(120)는 저장부(130)에 제 1 레코드(10)에 대한 상위 데이터가 저장되어 있고, 여성 속성에 대한 익명 값(12)의 개수가 1개로서 단수인 경우(예: 제 1-3 레코드), 제 1 레코드(10)에 대한 상위 데이터에 있는 (2021.09, 서울, 중구, 의류, 여성) 그룹의 이용 건수의 합인 13건에서, 기준 레코드(30)에 있는 (2021.09, 서울, 중구, 의류, 여성) 그룹의 이용 건수의 합인 11건을 감산하여, 합계 익명 값 2를 산출할 수 있다. 또한, 여성 속성에 대한 익명 값(12)의 개수가 1개이므로, 프로세서(120)는 제 1-3 레코드의 제 3 익명값 c를 합계 익명 값과 같은 2로 결정할 수 있다.In the second embodiment, if there is upper data for the first record 10 and the anonymous value 12 is one, the processor 120 determines the total anonymous value based on the reference record 30, and determines the anonymous value 12. The value (12) can be determined as a sum anonymous value. For example, the processor 120 may store upper data for the first record 10 in the storage 130 and the number of anonymous values 12 for the female attribute is 1, which is singular (e.g. : 1-3 records), in the 13 cases, which is the sum of the number of uses of the (2021.09, Seoul, Jung-gu, clothing, female) group in the upper data for the first record (10), in the reference record (30) (2021.09 , Seoul, Jung-gu, clothing, female) group, the total anonymity value of 2 can be calculated by subtracting 11 cases, which is the sum of the number of uses. Also, since the number of anonymity values 12 for the female attribute is 1, the processor 120 may determine the third anonymity value c of records 1-3 to be 2, which is equal to the total anonymity value.

도 6은 일 실시 예에 따른 디바이스(100)가 제 3 실시 예에 따라 익명 값(12)에 대한 예측 값을 결정하는 동작을 설명하기 위한 도면이다. 제 3 실시 예는 제 1 레코드(10)에 대한 상위 데이터가 있는 경우, 디바이스(100)가 익명 값(12)에 대한 예측 값을 결정하는 방법 중 다른 하나를 나타낸다.FIG. 6 is a diagram for explaining an operation in which the device 100 according to an embodiment determines a prediction value for an anonymous value 12 according to a third embodiment. The third embodiment shows another one of methods for the device 100 to determine a predicted value for the anonymous value 12 when there is higher data for the first record 10 .

도 6을 참조하면, 프로세서(120)는 제 1 레코드(10)에 대한 상위 데이터가 있고, 익명 값(12)이 복수개이며, 합계 익명 값이 복수개의 익명 값(12)의 개수보다 큰 경우, 합계 익명 값의 범위 내에서 복수개의 익명 값에 대한 조합을 결정할 수 있다. 예를 들면, 남성 속성에 대한 익명 값(12)의 개수가 2개로서 복수인 경우(예: 제 1-1 내지 제 1-2 레코드), 제 1 레코드(10)에 대한 상위 데이터에 있는 (2021.09, 서울, 중구, 의류, 남성) 그룹의 이용 건수의 합인 11건에서, 기준 레코드(30)에 있는 (2021.09, 서울, 중구, 의류, 남성) 그룹의 이용 건수의 합인 7건을 감산하여, 합계 익명 값 4를 산출할 수 있다. 이와 같이 합계 익명 값 4가 익명 값(12)의 개수 2보다 큰 경우, 합계 익명 값 4의 범위 내에서 (제 1 익명 값, 제 2 익명 값)의 조합을 {(1, 3), (2, 2), (3, 1)}로 도출할 수 있다.Referring to FIG. 6 , the processor 120, when there is upper data for the first record 10, there are a plurality of anonymous values 12, and the total anonymous value is greater than the number of the plurality of anonymous values 12, A combination of a plurality of anonymous values within the range of the total anonymous value may be determined. For example, if the number of anonymous values 12 for the male attribute is 2, which is plural (eg, records 1-1 and 1-2), in the upper data for the first record 10 ( 2021.09, Seoul, Jung-gu, Clothing, Male) group's total number of uses of 11 cases, in the reference record (30) (2021.09, Seoul, Jung-gu, Clothing, Male) group's total number of uses of 7 cases, A total anonymity value of 4 can be calculated. In this way, if the total anonymous value 4 is greater than the number 2 of the anonymous value 12, the combination of (first anonymous value, second anonymous value) within the range of the total anonymous value 4 is {(1, 3), (2 , 2), (3, 1)}.

제 3 실시 예에서, 프로세서(120)는 하나 이상의 기준 레코드(30)에 기초하여 결정되는 건당 단가를 이용하여 복수개의 익명 값에 대한 조합 중 어느 하나에 따라 복수개의 익명 값(12)을 결정할 수 있다. 예를 들면, 프로세서(120)는 제 1 실시 예와 유사한 방식으로 기준 레코드(30)에 기초하여 2021.09, 서울, 중구, 의류, 남성) 그룹의 건당 단가를 33,428원로 산출하고, 제 1-1 내지 제 1-2 레코드에 대한 복수개의 익명 값의 조합 {(1, 3), (2, 2), (3, 1)} 중에서, 각 조합을 적용하였을 때 건당 단가와의 차이가 가장 작은 조합인 (1, 3)를 검출하고, 검출된 (1, 3) 조합에 따라 제 1 및 제 2 익명 값 (a, b)에 대한 예측 값을 각각 1, 3으로 결정할 수 있다.In the third embodiment, the processor 120 may determine the plurality of anonymous values 12 according to any one of the combinations of the plurality of anonymous values using the unit price per case determined based on one or more reference records 30. there is. For example, the processor 120 calculates the unit price per case of the 2021.09, Seoul, Jung-gu, clothing, male) group as 33,428 won based on the reference record 30 in a manner similar to the first embodiment, and Among the combinations of a plurality of anonymous values for records 1-2 {(1, 3), (2, 2), (3, 1)}, the combination with the smallest difference from the unit price per case when each combination is applied (1, 3) may be detected, and prediction values for the first and second anonymous values (a, b) may be determined as 1 and 3, respectively, according to the detected combination of (1, 3).

제 3 실시 예에서, 프로세서(120)는 제 1 실시 예에 따른 제 1 예측 값과 제 2 실시 예에 따른 제 2 예측 값을 이용하여 복수개의 익명 값(12) 각각에 대한 예측 값을 결정할 수 있다.In the third embodiment, the processor 120 may determine a prediction value for each of the plurality of anonymous values 12 using the first prediction value according to the first embodiment and the second prediction value according to the second embodiment. there is.

보다 구체적으로, 프로세서(120)는 하나 이상의 기준 레코드(30)에 기초하여 결정되는 건당 단가를 이용하여 복수개의 익명 값(12) 각각에 대한 제 1 예측 값을 결정할 수 있다. 예를 들면, 제 1 실시 예와 유사한 방식으로, 기준년월, 지역, 업종 등의 속성의 값이 동일한 기준 레코드(30) 중에서 (2021.09, 서울, 중구, 의류, 남성) 그룹의 건당 단가를 31,272원 (= 344,000원 / 11건)으로 산출하고, 제 1-1 레코드의 이용 금액 40,000원에서 31,272원을 나누어 제 1 익명 값에 대한 제 1 예측 값을 1건 또는 2건으로 추정하고, 제 1-2 레코드의 이용 금액 70,000원에서 31,272원을 나누어 제 2 익명 값에 대한 제 1 예측 값을 2건 또는 3건으로 결정할 수 있다.More specifically, the processor 120 may determine a first predicted value for each of the plurality of anonymous values 12 using the unit price per case determined based on one or more reference records 30 . For example, in a manner similar to the first embodiment, the unit price per case of the group (2021.09, Seoul, Jung-gu, clothing, male) among the reference records 30 having the same attribute values such as base year, month, region, and industry is 31,272 won (= 344,000 won / 11 cases), divide the usage amount of 40,000 won of the 1-1 record to 31,272 won, and estimate the first predicted value for the 1st anonymous value as 1 or 2 cases, The first prediction value for the second anonymity value may be determined as two or three cases by dividing the amount of use of the record from 70,000 won to 31,272 won.

또한, 프로세서(120)는 합계 익명 값과 복수개의 익명 값(12)의 개수 간의 차이를 이용하여 복수개의 익명 값(12) 각각에 대한 제 2 예측 값을 결정할 수 있다. 예를 들면, 제 2 실시 예와 유사한 방식으로, 제 1 레코드(10)에 대한 상위 데이터에 있는 (2021.09, 서울, 중구, 의류, 남성) 그룹의 이용 건수의 합인 11건에서, 기준 레코드(30)에 있는 (2021.09, 서울, 중구, 의류, 남성) 그룹의 이용 건수의 합인 7건을 감산하여, 합계 익명 값 4를 산출하고, 제 1 내지 제 2 익명 값에 대한 제 2 예측 값을 합계 익명 값 4의 범위 내에서 제 1 익명 값 내지 제 2 익명 값의 조합인 {(1, 3), (2, 2), (3, 1)}으로 결정할 수 있다.Also, the processor 120 may determine a second prediction value for each of the plurality of anonymous values 12 by using a difference between the total anonymous value and the number of the plurality of anonymous values 12 . For example, in a manner similar to the second embodiment, in 11 cases, which is the sum of the number of uses of the (2021.09, Seoul, Jung-gu, clothing, male) group in the upper data for the first record 10, the reference record (30 ) in (2021.09, Seoul, Jung-gu, clothing, male) by subtracting 7 cases, which is the sum of the number of uses of the group, to calculate the total anonymity value 4, and sum the second predicted value for the first and second anonymity values. Within the range of value 4, it can be determined as {(1, 3), (2, 2), (3, 1)}, which is a combination of the first to second anonymous values.

또한, 프로세서(120)는 제 1 예측 값 및 제 2 예측 값을 이용하여 복수개의 익명 값(12)에 대한 예측 값을 결정할 수 있다. 일 실시 예에서, 프로세서(120)는 제 1 예측 값과 제 2 예측 값에서 공통되는 수치에 따라 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예를 들면, 제 1 익명 값 a에 있어서, 제 1 예측 값은 1 또는 2이고, 제 2 예측 값은 (a, b) = {(1, 3), (2, 2), (3, 1)}에 따라 1 내지 3이므로, {(1, 3), (2, 2), (3, 1)} 조합 중에서 (1, 3) 또는 (2, 2) 조합인 것으로 추정할 수 있다. 또한, 제 2 익명 값 b에 있어서, 제 1 예측 값은 2 또는 3이고, 제 2 예측 값은 (a, b) = {(1, 3), (2, 2), (3, 1)}에 따라 3 내지 1이므로, {(1, 3), (2, 2), (3, 1)} 조합 중에서 (1, 3) 또는 (2, 2) 조합인 것으로 추정할 수 있다.Also, the processor 120 may determine prediction values for the plurality of anonymous values 12 using the first prediction value and the second prediction value. In an embodiment, the processor 120 may determine a prediction value for the anonymous value 12 according to a numerical value common to the first prediction value and the second prediction value. For example, for a first anonymity value a, the first prediction value is 1 or 2, and the second prediction value is (a, b) = {(1, 3), (2, 2), (3, 1 )}, it can be assumed that it is a (1, 3) or (2, 2) combination among {(1, 3), (2, 2), (3, 1)} combinations. Also, for the second anonymity value b, the first prediction value is 2 or 3, and the second prediction value is (a, b) = {(1, 3), (2, 2), (3, 1)} Since it is 3 to 1 depending on , it can be estimated that it is a (1, 3) or (2, 2) combination among {(1, 3), (2, 2), (3, 1)} combinations.

제 3 실시 예에서, 프로세서(120)는 제 1 예측 값과 제 2 예측 값 간에 공통되는 예측 값의 조합이 복수개인 경우, 제 1 레코드(10)의 이용 금액에 대한 구성 비율 및 이용 건수의 조합별 구성 비율에 기초하여 복수개의 공통되는 예측 값의 조합 중 어느 하나에 따라 복수개의 익명 값(12)을 결정할 수 있다. 여기에서, 이용 금액에 대한 구성 비율은 복수개의 제 1 레코드(10)의 이용 금액의 합에 대한 각 제 1 레코드(10)의 이용 금액의 비를 나타내고, 이용 건수의 조합별 구성 비율은 합계 익명 값에 대한 각 익명 값(12)의 비를 나타낸다. 예를 들면, 제 1 내지 제 2 익명 값 (a, b)의 조합이 (1, 3) 또는 (2, 2) 조합으로 추정되는 경우, 제 1-1 레코드의 이용 금액에 대한 구성 비율은 0.36 (= 40,000 / (40,000 + 70,000))으로, 제 1-2 레코드의 이용 금액에 대한 구성 비율은 0.64 (=70,000 / (40,000 + 70,000))으로 산출될 수 있다. 또한, 제 1-1 레코드의 이용 건수의 조합별 구성 비율은, 제 1 예측 값과 제 2 예측 값 간에 공통되는 조합들 중 (1, 3) 조합에 대해서는 a, b 각각에 대하여 0.25 (=1 / (1 + 3)), 0.75 (=3 / (1 + 3))로 산출되고, (2, 2) 조합에 대해서는 a, b 각각에 대하여 0.5 (=2 / (2 + 2)), 0.5 (=2 / (2 + 2))로 산출될 수 있다.In the third embodiment, the processor 120 determines, when there are a plurality of combinations of common prediction values between the first prediction value and the second prediction value, the composition ratio of the first record 10 to the amount of use and the combination of the number of uses. A plurality of anonymous values 12 may be determined according to any one of a plurality of common predictive value combinations based on the star composition ratio. Here, the composition ratio to the amount of use represents the ratio of the amount of use of each first record 10 to the sum of the amount of use of the plurality of first records 10, and the composition ratio for each combination of the number of uses is the sum of the anonymous represents the ratio of each anonymous value (12) to the value. For example, when the combination of the first and second anonymous values (a, b) is estimated as a combination of (1, 3) or (2, 2), the composition ratio to the amount of use of the 1-1 record is 0.36 (= 40,000 / (40,000 + 70,000)), the composition ratio to the amount of use of the 1-2 records can be calculated as 0.64 (= 70,000 / (40,000 + 70,000)). In addition, the composition ratio for each combination of the number of uses of the 1-1 record is 0.25 (= 1 for each of a and b for the combination of (1, 3) among the common combinations between the first prediction value and the second prediction value) / (1 + 3)), 0.75 (=3 / (1 + 3)), and for the (2, 2) combination, 0.5 for each of a and b (=2 / (2 + 2)), 0.5 It can be calculated as (=2 / (2 + 2)).

제 3 실시 예에서, 프로세서(120)는 이용 금액에 대한 구성 비율 및 이용 건수의 조합별 구성 비율 간의 차이가 최소화되도록 복수개의 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예를 들면, (1, 3) 조합의 경우, 이용 금액에 대한 구성 비율 및 이용 건수의 조합별 구성 비율 간의 차이는 a, b 각각에 대하여 0.11 (= |0.36 - 0.25|), 0.11(= |0.64 - 0.75|)이고, (2, 2) 조합의 경우, 0.14 (= |0.36 - 0.5|), 0.14 (= |0.64 - 0.5|)이므로, 차이가 더 작은 (1, 3) 조합에 따라 제 1 및 제 2 익명 값 a, b에 대한 예측 값을 각각 1건, 3건으로 결정할 수 있다.In the third embodiment, the processor 120 may determine predicted values for the plurality of anonymous values 12 such that a difference between the composition ratio for the amount of use and the composition ratio for each combination of the number of uses is minimized. For example, in the case of the (1, 3) combination, the difference between the composition ratio for the amount of use and the composition ratio for each combination of the number of uses is 0.11 (= |0.36 - 0.25|), 0.11 (= | 0.64 - 0.75|), and for the (2, 2) combination, 0.14 (= |0.36 - 0.5|) and 0.14 (= |0.64 - 0.5|), so the difference is smaller depending on the (1, 3) combination Prediction values for the first and second anonymity values a and b may be determined to be one and three, respectively.

제 3 실시 예에서, 프로세서(120)는 상술한 예시를 바탕으로, 하기 수학식 2에 기초하여 익명 값(12)에 대한 예측 값을 결정할 수 있다. 예컨대, 프로세서(120)는 제 1 예측 값과 제 2 예측 값 간에 공통되는 복수개의 예측 값의 조합 각각에 대해 z를 산출하고, 복수개의 조합 중 a, b 각각에 대하여 산출된 z 값의 합계가 가장 작은 하나의 조합에 따라 익명 값(12)에 대한 예측 값을 결정할 수 있다.In the third embodiment, the processor 120 may determine a prediction value for the anonymous value 12 based on Equation 2 below based on the above-described example. For example, the processor 120 calculates z for each combination of a plurality of prediction values common between the first prediction value and the second prediction value, and the sum of z values calculated for each of a and b among the plurality of combinations is A predicted value for an anonymous value (12) can be determined according to one of the smallest combinations.

[수학식 2][Equation 2]

z₁ = x₄ / x₃ z ₁ = x ₄ / x ₃

z₂ = x₆ / x₅ z ₂ = x ₆ / x ₅

z = f₂( z₁ - z₂ )z = f ₂ ( z ₁ - z ₂ )

(여기에서, z₁은 제 1 레코드(10)의 이용 금액에 대한 구성 비율을 나타내고, x₃은 복수개의 제 1 레코드(10)의 이용 금액의 합을 나타내고, x₄은 각 제 1 레코드(10)의 이용 금액을 나타냄. 또한, z₂는 이용 건수의 조합별 구성 비율을 나타내고, x₅은 합계 익명 값을 나타내고, x₆은 각각의 익명 값(12)을 나타냄. 또한, z는 이용 금액에 대한 구성 비율 및 이용 건수의 조합별 구성 비율 간의 차이를 나타내고, f₂( )은 내부의 값을 절대값으로 변환하는 함수를 나타냄)(Here, z ₁ represents the composition ratio of the first record 10 to the amount of money used, x ₃ represents the sum of the amount of money used by a plurality of first records 10, and x ₄ represents each first record ( 10) In addition, z ₂ represents the composition ratio of each combination of the number of uses, x ₅ represents the total anonymity value, and x ₆ represents each anonymity value 12. In addition, z represents the use Represents the difference between the composition ratio for the amount and the composition ratio for each combination of the number of uses, and f ₂ ( ) indicates a function that converts the internal value into an absolute value)

본 발명의 일 실시 예에 따르면, 익명성 조건의 충족을 위해 값을 식별할 수 없게 된 익명 값(12)에 대해서 실제 값에 근접하도록 추정한 예측 값을 제공할 수 있으며, 이와 같이 익명성이 보장된 데이터에 대해 신뢰성 있는 예측 값을 제공함에 따라 익명 데이터의 데이터로서 이용 가치를 크게 향상시킬 수 있다.According to an embodiment of the present invention, it is possible to provide a prediction value estimated to be close to the actual value for the anonymous value 12 whose value cannot be identified in order to satisfy the anonymity condition. By providing reliable prediction values for guaranteed data, the value of use as data of anonymous data can be greatly improved.

일 실시 예에 따른 수신부(110)는 네트워크를 통하거나 전기적으로 디바이스(100)의 구성요소들 또는 다른 디바이스(예: 단말, 서버)와 연결되어 명세서 전반에서 기술되는 다양한 정보들을 송수신할 수 있는 유무선 통신 장치를 포함할 수 있다. 여기에서, 네트워크는 유선 및 무선 등과 같은 다양한 통신망을 통해 구성될 수 있고, 예를 들면, 근거리 통신망(LAN: Local Area Network), 도시권 통신망(MAN: Metropolitan Area Network), 광역 통신망(WAN: Wide Area Network) 등 다양한 통신망으로 구성될 수 있다.The receiving unit 110 according to an embodiment is connected to components of the device 100 or other devices (eg, a terminal, a server) through a network or electrically to transmit and receive various information described throughout the specification through a wired/wireless connection. A communication device may be included. Here, the network may be configured through various communication networks such as wired and wireless, for example, a local area network (LAN), a metropolitan area network (MAN), and a wide area network (WAN). Network) can be composed of various communication networks.

일 실시 예에 따른 프로세서(120)는 익명 값에 대한 예측 값을 결정하는 일련의 동작들을 수행할 수 있고, 수신부(110), 저장부(130) 및 그 밖의 구성요소들과 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있으며, 이를 위해 디바이스(100)의 동작 전반을 제어하는 CPU(central processor unit)로 구현될 수 있다.The processor 120 according to an embodiment may perform a series of operations for determining a prediction value for an anonymous value, and is electrically connected to the receiver 110, the storage unit 130, and other components to obtain these It is possible to control the flow of data between the devices, and for this purpose, it may be implemented as a central processor unit (CPU) that controls the overall operation of the device 100 .

일 실시 예에 따른 저장부(130)는 디바이스(100)의 동작 전반에 이용되는 데이터를 저장할 수 있고, 예컨대, 제 1 레코드(10) 및 제 2 레코드(20)를 저장할 수 있다. 일 실시 예에서, 저장부(130)는 고객의 개인 정보에 관한 원본 데이터 또는 원본 데이터로부터 익명화된 익명 데이터를 저장할 수 있고, 상술한 동작을 통해 결정되는 예측 값을 저장할 수 있다. 일 실시 예에서, 저장부(130)는 메모리, 데이터베이스 등과 같은 다양한 형태로 구현될 수 있으며, 클라우드나 별도의 저장 서버로 구현되어 유무선 통신망을 통해 디바이스(100)에 필요한 데이터 및 저장 공간을 제공할 수도 있다.The storage unit 130 according to an embodiment may store data used throughout the operation of the device 100, and may store, for example, the first record 10 and the second record 20. In one embodiment, the storage unit 130 may store original data about the customer's personal information or anonymized data from the original data, and may store a prediction value determined through the above-described operation. In one embodiment, the storage unit 130 may be implemented in various forms such as a memory or a database, and may be implemented as a cloud or a separate storage server to provide data and storage space necessary for the device 100 through a wired or wireless communication network. may be

또한, 도 2에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 디바이스(100)에 더 포함될 수 있음을 관련 기술 분야에서 통상의 지식을 가진 자라면 이해할 수 있다. 예를 들면, 디바이스(100)는 사용자 입력을 수신하거나 정보를 출력하기 위한 입출력 인터페이스 등을 더 포함할 수 있다. 또한, 디바이스(100)는 익명 값에 대한 예측 값을 결정할 수 있는 컴퓨팅 장치에 해당하고, 일 실시 예에서, 본 명세서에서 설명되는 기능을 실현시키기 위한 컴퓨터 프로그램을 통해 동작하는 컴퓨터 등의 서버로 구현될 수 있다.In addition, those skilled in the art can understand that other general-purpose components other than the components shown in FIG. 2 may be further included in the device 100 . For example, the device 100 may further include an input/output interface for receiving user input or outputting information. In addition, the device 100 corresponds to a computing device capable of determining a predicted value for an anonymous value, and in an embodiment, is implemented as a server such as a computer that operates through a computer program for realizing the functions described in this specification. It can be.

이상에서 도시된 단계들의 순서 및 조합은 일 실시 예이고, 명세서에 기재된 각 구성요소들의 본질적인 특성에서 벗어나지 않는 범위에서 순서, 조합, 분기, 기능 및 그 수행 주체가 추가, 생략 또는 변형된 형태로 다양하게 실시될 수 있음을 알 수 있다.The order and combination of the steps shown above is an embodiment, and the order, combination, branch, function, and subject of execution thereof may be added, omitted, or modified within the scope that does not deviate from the essential characteristics of each component described in the specification. It can be seen that this can be done.

본 개시의 다양한 실시 예들은 기기(machine)(예를 들어, 디스플레이 장치 또는 컴퓨터)에 의해 읽을 수 있는 저장 매체(storage medium)(예를 들어, 메모리)에 저장된 하나 이상의 인스트럭션들을 포함하는 소프트웨어로서 구현될 수 있다. 예를 들면, 기기의 프로세서(예를 들어, 프로세서(120))는, 저장 매체로부터 저장된 하나 이상의 인스트럭션들 중 적어도 하나의 인스트럭션을 호출하고, 그것을 실행할 수 있다. 이것은 기기가 상기 호출된 적어도 하나의 인스트럭션에 따라 적어도 하나의 기능을 수행하도록 운영되는 것을 가능하게 한다. 상기 하나 이상의 인스트럭션들은 컴파일러에 의해 생성된 코드 또는 인터프리터에 의해 실행될 수 있는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장매체가 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다.Various embodiments of the present disclosure are implemented as software including one or more instructions stored in a storage medium (eg, memory) readable by a machine (eg, a display device or a computer). It can be. For example, a processor (eg, processor 120 ) of the device may call at least one of one or more instructions stored from a storage medium and execute it. This enables the device to be operated to perform at least one function according to the invoked at least one instruction. The one or more instructions may include code generated by a compiler or code executable by an interpreter. The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' only means that the storage medium is a tangible device and does not contain signals (e.g., electromagnetic waves), and this term refers to the case where data is stored semi-permanently in the storage medium. It does not discriminate when it is temporarily stored.

일 실시 예에 따르면, 본 개시에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 또는 두 개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, the method according to various embodiments disclosed in the present disclosure may be included and provided in a computer program product. Computer program products may be traded between sellers and buyers as commodities. A computer program product is distributed in the form of a device-readable storage medium (e.g. compact disc read only memory (CD-ROM)), or through an application store (e.g. Play Store™) or on two user devices (e.g. It can be distributed (eg downloaded or uploaded) online, directly between smartphones. In the case of online distribution, at least part of the computer program product may be temporarily stored or temporarily created in a device-readable storage medium such as a manufacturer's server, an application store server, or a relay server's memory.

전술한 본 개시의 설명은 예시를 위한 것이며, 본 개시가 속하는 기술분야의 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the present disclosure described above is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present disclosure. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 개시의 범위는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 개시의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present disclosure is indicated by the following claims, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts thereof should be construed as being included in the scope of the present disclosure.

100: 디바이스
110: 수신부 120: 프로세서
10: 제 1 레코드
11: 제 1 식별 값 12: 익명 값
20: 제 2 레코드 21: 제 2 식별 값
30: 기준 레코드 31: 제 3 식별 값100: device
110: receiver 120: processor
10: first record
11: first identification value 12: anonymous value
20: second record 21: second identification value
30: reference record 31: third identification value

Claims

A method for a device to determine a predicted value for an anonymous value,
obtaining a first record including an anonymized anonymized value and an identifiable first identification value and a plurality of second records composed of a second identification value from original data according to a preset de-identification level;
determining one or more reference records corresponding to the first record among the plurality of second records, based on a comparison result between the first identification value and the second identification value; and
Determining a predicted value for the anonymous value within the range of the non-identification level by using the one or more reference records.

According to claim 1,
The step of determining the predicted value is
The method of determining the prediction value based on at least one of the presence or absence of higher data for the first record, a statistical analysis result for the one or more reference records, and the number of the first records.

According to claim 1,
The step of determining the predicted value is
If there is no upper data for the first record, determining the prediction value using a unit price per case determined based on the one or more reference records.

According to claim 3,
Each of the one or more reference records is
wherein the second identification value differs from the first identification value only for one of a plurality of attributes defining each record.

According to claim 3,
The step of determining the predicted value is
When the number of the one or more reference records is plural, determining a unit price per case based on each of the plurality of reference records;
The method of determining the predicted value using an average or a weighted average value for the plurality of unit costs per case.

According to claim 1,
The step of determining the predicted value is
When there is upper data for the first record and there are a plurality of anonymous values,
determining a total anonymous value representing a sum of the plurality of anonymous values based on the reference record; and
and determining each of the plurality of anonymous values as 1 when the total anonymous value and the number of the plurality of anonymous values are the same.

According to claim 1,
The step of determining the predicted value is
When there is upper data for the first record and the anonymous value is 1,
determining a total anonymous value representing a sum of the plurality of anonymous values based on the reference record; and
determining the anonymous value as the total anonymous value.

According to claim 1,
The step of determining the predicted value is
When there is upper data for the first record and there are a plurality of anonymous values,
determining a total anonymous value representing a sum of the plurality of anonymous values based on the reference record; and
determining a combination of the plurality of anonymous values within a range of the sum anonymous value when the total anonymous value is greater than the number of the plurality of anonymous values; and
and determining the plurality of anonymous values according to one of the combinations of the plurality of anonymous values using the unit price per case determined based on the one or more reference records.

According to claim 8,
The step of determining the plurality of anonymous values
determining a first predicted value for each of the plurality of anonymous values using a unit price per case determined based on the one or more reference records;
determining a second prediction value for each of the plurality of anonymous values using a difference between the total anonymous value and the number of the plurality of anonymous values; and
And determining a prediction value for each of the plurality of anonymous values using the first prediction value and the second prediction value.

In a device for determining a predicted value for an anonymous value,
a receiving unit that obtains a first record including an anonymized anonymized value and an identifiable first identification value and a plurality of second records composed of second identification values from original data according to a predetermined de-identification level; and
Based on a comparison result between the first identification value and the second identification value, one or more reference records corresponding to the first record among the plurality of second records are determined;
A processor for determining a predicted value for the anonymous value within a range of the non-identification level by using the one or more reference records.

According to claim 10,
The processor
The device of determining the prediction value based on at least one of the presence or absence of higher data for the first record, a statistical analysis result for the one or more reference records, and the number of the first records.

According to claim 10,
The processor
If there is no upper data for the first record, determining the predicted value using a unit price per case determined based on the one or more reference records.

According to claim 10,
The processor
When there is upper data for the first record and there are a plurality of anonymous values,
Determine a total anonymous value representing the sum of the plurality of anonymous values based on the reference record;
When the total anonymous value and the number of the plurality of anonymous values are the same, determining each of the plurality of anonymous values as 1.

According to claim 10,
The processor
When there is upper data for the first record and there are a plurality of anonymous values,
Determine a total anonymous value representing the sum of the plurality of anonymous values based on the reference record;
When the sum anonymous value is greater than the number of the plurality of anonymous values, determining a combination of the plurality of anonymous values within a range of the sum anonymous value;
The device of determining the plurality of anonymous values according to any one of the combinations of the plurality of anonymous values using the unit price per case determined based on the one or more reference records.

A computer-readable recording medium recording a program for executing the method of any one of claims 1 to 9 in a computer.