KR102126795B1

KR102126795B1 - Deep learning-based image on personal information image processing system, apparatus and method therefor

Info

Publication number: KR102126795B1
Application number: KR1020180152503A
Authority: KR
Inventors: 강명주; 곽지훈; 서현; 김현욱; 정승원; 노형민
Original assignee: 서울대학교 산학협력단
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-26
Also published as: KR20200072586A

Abstract

본 발명은 딥러닝을 위하여 입력 데이터로 수집된 데이터 중 사용자의 얼굴과 같은 이미지 개인정보는 사생활 침해가 발생할 수 있는 문제와 직결되므로, 이러한 이미지 개인정보를 딥러닝 프로세스의 분산을 통해 적은 연산 자원 상황에서도 효율적으로 가공하여 식별력을 저하시키는 기술에 관한 것으로, 데이터 생성부, 레이어 연산부, 역산 데이터 획득부, 비식별화 이미지 생성부, 손실율 계산부, 식별 가능 판단부를 포함할 수 있으며, 슈퍼 컴퓨터 수준의 연산능력을 보유하지 못한 개인 휴대용 단말 및 퍼스널 컴퓨터에서도 상대적으로 적은 연산 능력으로도 이미지에 대한 비식별화가 가능하게 되어 개인정보 유출 가능성 없이 다양한 기기에서 다양한 목적으로 딥러닝 학습 뿐만 아니라 빅데이터 전송 및 가공이 가능할 수 있다는 효과를 제공할 수 있다.In the present invention, since image personal information such as a user's face among data collected as input data for deep learning is directly related to a problem that privacy infringement may occur, a small amount of computational resources can be obtained by distributing such image personal information through the dispersion of a deep learning process. It is also related to a technique for efficiently processing to reduce the discrimination power, and may include a data generation unit, a layer calculation unit, an inverse data acquisition unit, an unidentified image generation unit, a loss rate calculation unit, and an identifiable determination unit, Even in personal portable terminals and personal computers that do not possess computational power, it is possible to de-identify images with relatively little computational power, so deep learning learning as well as big data transmission and processing for various purposes can be performed on various devices without the possibility of personal information leakage. This can provide the effect that this may be possible.

Description

Deep learning-based image on personal information image processing system, apparatus and method therefor}

본 발명은 딥러닝(deep learning)을 위하여 입력 데이터로 수집된 데이터 중에서 사용자의 얼굴과 같은 이미지 개인정보는 사생활 침해가 발생할 수 있는 문제와 직결되므로, 딥러닝 프로세스의 분산을 통해 이미지 개인정보를 적은 연산 자원 상황에서도 효율적으로 가공하여 식별력을 저하시키는 기술에 관한 것이다.In the present invention, image personal information such as a user's face among data collected as input data for deep learning is directly related to a problem that privacy infringement may occur, so that the image personal information is reduced through dispersion of the deep learning process. The present invention relates to a technique for efficiently processing and reducing identification power even in a computational resource situation.

딥러닝은 사람의 사고방식을 컴퓨터에게 가르치는 기계학습의 한 분야이며, 여러 비선형 변환기법의 조합을 통해 높은 수준의 추상화(abstractions, 다량의 데이터나 복잡한 자료들 속에서 핵심적인 내용 또는 기능을 요약하는 작업)를 시도하는 기계학습(machine learning) 알고리즘의 집합으로 정의될 수 있다.Deep learning is a field of machine learning that teaches a person's way of thinking to computers, and a combination of several nonlinear transformation methods sums up the core content or functions in high-level abstractions, large amounts of data or complex materials. It can be defined as a set of machine learning algorithms that try to work.

딥 러닝 구조는 인공신경망(ANN, artificial neural networks)에 기반하여 설계된 개념이다. 인공신경망은 가상의 뉴런을 수학적으로 모델링한 후 시뮬레이션 하여 인간의 뇌와 같은 학습 능력을 갖게 하고자 하는 알고리즘으로서, 주로 패턴인식에 많이 사용된다. 딥 러닝에서 이용하는 인공신경망 모델은 선형 맞춤 (linear fitting)과 비선형 변환 (nonlinear transformation or activation)을 반복해 쌓아 올린 구조를 가진다. Deep learning architecture is a concept designed based on artificial neural networks (ANN). The artificial neural network is an algorithm that simulates virtual neurons after modeling them to have the same learning ability as the human brain, and is mainly used for pattern recognition. The artificial neural network model used in deep learning has a structure built up by repeating linear fitting and nonlinear transformation or activation.

딥러닝에서 사용하는 신경망 모델은 심층 신경망(Deep Neural Network, DNN), 합성곱 신경망(Convolutional Neural Network, CNN), 순환 신경망(Recurrent Neural Network, RNN), 제한 볼츠만 머신 (Restricted Boltzmann Machine, RBM), 심층 신뢰 신경망 (Deep Belief Network, DBN), 심층 Q-네트워크(Deep Q-Networks) 등을 들 수 있다.The neural network models used in deep learning are Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Trust Network (Deep Belief Network, DBN), Deep Q-Networks, and the like.

딥 러닝의 트레이닝 과정에서는 수많은 입력 데이터를 가지고 인공신경망 모델의 파라미터를 최적화하게 되며, 이러한 딥러닝의 트레이닝 과정에서는 (Error-backpropagation 알고리즘), 경사하강법(Gradient Decent 기법) 등이 이용될 수 있다.In the deep learning training process, the parameters of the artificial neural network model are optimized with a large number of input data. In the deep learning training process (error-backpropagation algorithm), gradient decent method, etc. may be used.

딥 러닝에서는 학습 데이터의 수가 많을수록 보다 정확한 예측이 가능한 모델이 형성되므로 딥 러닝의 트레이닝 과정에서는 많은 수의 학습 데이터가 필요하므로 방대한 데이터를 수집하고 수집된 데이터를 전 처리하여 입력데이터로 변환하는 빅데이터 가공 방식이 필요로 된다.In deep learning, a more predictable model is formed as the number of training data increases, so deep learning requires a lot of training data, so big data that collects massive data and preprocesses the collected data to convert it into input data Processing method is required.

방대한 입력 데이터를 수집하기 위해서는 빅데이터의 성격상 데이터의 양이 방대할 뿐만 아니라, 대부분의 빅데이터에는 개인 신상에 대한 정보가 필연적으로 포함될 수 밖에 없는 바, 개인 신상 정보의 유출에 따른 법적 분쟁이 발생할 소지가 다분하기 때문에, 빅데이터의 조직 간의 교류나 유통에는 한계가 있었다.In order to collect a large amount of input data, not only is the amount of data large due to the nature of big data, but most big data must inevitably contain information about personal identities. Because there is a lot of potential for this to occur, there has been a limit to exchange and distribution between big data organizations.

이에 따라, 빅데이터의 수집이 가능한 조직의 입장 에서는, 개인 신상정보 유출에 따른 법적 분쟁의 발생을 피하기 위해, 빅데이터 자체를 사업적 목적으로 가공하여 유통하기보다는, 특정 목적에 필요한 정보만을 대상으로 이를 군집화(clustering) 작업이나 통계분석 작업을 통해 통계적 정보 수준으로 가공하여 제공하고 있는 실정이므로, 빅데이터의 활용을 필요로 하는 조직의 입장에 서는 조직의 독특한 사업 환경에 꼭 필요로 하는 분석 자료를 획득하기가 어렵다는 문제점이 있었다.Accordingly, from the standpoint of an organization capable of collecting big data, in order to avoid the occurrence of legal disputes caused by leakage of personally identifiable information, rather than processing and distributing big data itself for business purposes, it targets only the information necessary for a specific purpose. Since it is processed and provided at the level of statistical information through clustering or statistical analysis, it is necessary for the organization's unique business environment to analyze the data required for the organization's unique business environment. There was a problem that it was difficult to obtain.

이러한 상황에서 통계 결과 데이터가 아닌 통계 분석용으로서의 빅데이터 자체를 사업적 목적으로 가공하여 유통하기 위하여, 마스킹, 치환, 반식별화, 유형화 등을 통해 개인 속성을 비식별화하는 방법이 일각에서 적용되고 있다.In this situation, in order to process and distribute the big data itself for statistical analysis, not statistical result data, for business purposes, the method of de-identifying individual attributes through masking, substitution, anti-identification, typification, etc. is applied at some point. Is becoming.

마스킹은, 대상정보를 마스킹 또는 삭제하는 것이고( 예; 670101-10491910 → ************** ), 치환은 대상 정보에 대응하여 생성된 정보로 치환하는 것이며(예; 670101-10491910 → ID2311331), 반식별화는 대상 정보의 일부만 나타내도록 반식별화 하는 것이고(예; 670101-10491910 → 67-1), 유형화는 대상정보를 유형화시켜 구분하는 방식(예; 670101-10491910 → 남자)이다.Masking is to mask or delete the target information (eg; 670101-10491910 → ************** ), and replacement is to replace the generated information corresponding to the target information (eg ; 670101-10491910 → ID2311331), semi-identification is semi-identification to represent only a part of the target information (eg; 670101-10491910 → 67-1), and classification is a method of classifying and classifying target information (eg; 670101- 10491910 → male).

그러나, 개인 정보를 마스킹, 치환, 반식별화, 유형화 등에 의해 비식별화 한다 하더라도, 조합(Mash-Up)이나 개인의 특정 정보 및 그 조합을 통한 역추적 등을 통해 개인정보의 유출 위험이 존재한다는 단점이 있었으며, 또한 이러한 비식별화 작업에는 별도의 연산자원이 필요하므로, 방대한 빅데이터를 대상으로 비식별화 작업을 수행하기 위해서는 막대한 연산 자원이 요구될 수 밖에 없는 문제점이 존재한다.However, even if personal information is de-identified by masking, substitution, anti-identification, typification, etc., there is a risk of leakage of personal information through mash-up or specific information of the individual and backtracking through the combination. There was also a drawback, and since such a de-identification operation requires a separate operator, there is a problem in that enormous computational resources are required to perform the de-identification operation on a large amount of big data.

본 발명은 종래의 기술에서 발생하던 개인정보의 유출 위험 및 이러한 유출을 방지하기 위한 비식별화를 위해 막대한 연산 자원이 요구되는 문제를 해결하기 위하여 다수의 레이어를 포함하는 전체 신경망에서 수행단계를 크게 2단계로 나누어, 적은 연산능력으로도 비식별화가 가능하도록 최소한의 비식별화에 필요한 레이어 연산만을 이미지 가공장치에서 일부 수행하고, 나머지는 상대적으로 높은 연산능력을 가지는 서버에서 수행하게 함으로써 이미지 개인정보를 딥러닝 프로세스의 분산을 통해 적은 연산 자원 상황에서도 효율적으로 가공하여 식별력을 저하시키는 것에 그 목적이 있다. The present invention greatly increases the execution steps in the entire neural network including a plurality of layers in order to solve the problem of the leakage of personal information that has occurred in the prior art and the problem of requiring huge computational resources for de-identification to prevent such leakage. By dividing into two stages, the image processing device performs only some of the layer operations necessary for minimal de-identification so that it can be de-identified with less computational power. The goal is to reduce the discrimination power by efficiently processing even in a small computational resource situation through the dispersion of the deep learning process.

본 발명의 실시 예에 따르면 딥러닝 기반의 개인정보 가공장치는 프로세서에 의하여 동작하는 순차적으로 위치한 다수의 레이어(Li, i = 1, 2, 3...n) 를 가지는 신경망을 이용하여, 인물 식별이 가능한 식별 가능 이미지를 포함한 개인정보를 이용하여 적어도 하나의 입력 데이터를 생성하는 데이터 생성부; 상기 입력데이터를 상기 신경망의 Li 레이어에 통과시켜 연산을 수행하여 연산 결과 데이터를 획득하는 레이어 연산부; 상기 획득한 연산 결과 데이터를 대상으로 상기 Li 레이어의 연산을 역으로 수행하여 역산 데이터를 획득하는 역산 데이터 획득부; 상기 획득한 역산 데이터를 이용하여 상기 식별가능 이미지 중 일부분이 누락된 비식별화 이미지를 생성하는 비식별화 이미지 생성부; 상기 식별 가능 이미지와 상기 비식별화 이미지를 대비하여 손실율을 계산하는 손실율 계산부; 및 상기 계산된 손실율과 미리 설정된 기준 값을 비교하여, 상기 손실율이 상기 기준 값 이하인 경우 식별 가능 수준으로 판단하여 상기 연산 결과 데이터를 상기 레이어 연산부의 Li+1 레이어에 전송 후 입력 데이터로 사용하여 다음 단계의 연산결과 데이터를 산출하고, 상기 손실율이 상기 기준 값 이상인 경우 식별 불가 수준으로 판단하여 해당 연산 결과 데이터 및 수행 레이어 정보를 전송하는 식별 가능 판단부를 포함하고, 상기 식별 가능 판단부는, 상기 다음 단계의 연산 결과 데이터의 산출을 상기 손실율이 상기 기준 값에 도달할 때까지 반복할 수 있다.According to an embodiment of the present invention, a deep learning-based personal information processing apparatus uses a neural network having a plurality of layers (Li, i = 1, 2, 3...n) sequentially located by a processor to perform a character A data generation unit generating at least one input data by using personal information including an identifiable identifiable image; A layer operation unit for performing calculation by passing the input data through the Li layer of the neural network to obtain calculation result data; An inverse data acquisition unit for performing inverse operation of the Li layer on the obtained operation result data to obtain inversion data; A de-identification image generator for generating a de-identification image in which a part of the identifiable image is missing using the obtained inverse data; A loss rate calculator for calculating a loss rate by comparing the identifiable image and the de-identified image; And comparing the calculated loss rate with a preset reference value, determining the discrimination level when the loss rate is less than or equal to the reference value, transmitting the result data to the Li+1 layer of the layer operation unit and using it as input data. And an identifiable determination unit for calculating the operation result data of the step, and determining the non-identifiable level when the loss rate is greater than or equal to the reference value, and transmitting the operation result data and execution layer information. Calculation of the result data can be repeated until the loss rate reaches the reference value.

본 발명의 일 실시 예에 따르면 상기 송신된 연산 결과 데이터 및 수행 레이어 정보는 서버가 수신하며, 상기 서버는 수행 레이어 정보에 기초해 상기 연산 결과 데이터를 상기 신경망에 포함된 다수의 레이어 중 아직 수행되지 않은 적어도 하나의 레이어에 순차적으로 거치게 함으로써 나머지 딥러닝 연산을 수행할 수 있다. According to an embodiment of the present invention, a server receives the transmitted operation result data and execution layer information, and the server has not yet performed the operation result data among a plurality of layers included in the neural network based on the execution layer information. The rest of the deep learning operation can be performed by sequentially passing through at least one layer.

본 발명의 일 실시 예에 따르면 상기 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함하고, 상기 다수의 레이어를 통해 합성곱 연산을 수행하는 딥러닝 모델일 수 있다.According to an embodiment of the present invention, the neural network may be a deep learning model including a plurality of layers (Li, i = 1, 2, 3...), and performing a convolution operation through the plurality of layers. .

본 발명의 일 실시 예에 따르면 상기 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함한 상기 다수의 레이어를 통해 MLP 연산을 수행하는 딥러닝 모델일 수 있다.According to an embodiment of the present invention, the neural network may be a deep learning model that performs MLP operations through the multiple layers including multiple layers (Li, i = 1, 2, 3...).

본 발명의 일 실시 예에 따르면 상기 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함한 상기 다수의 레이어를 통해 Pooling 연산을 수행할 수 있다.According to an embodiment of the present invention, the neural network may perform a pooling operation through the plurality of layers including a plurality of layers (Li, i = 1, 2, 3...).

본 발명의 일 실시 예에 따르면 상기 손실율 계산부는, 상기 손실율을 산출하기 위하여 MSE, PSNR, Image correlation의 방법 중 적어도 하나를 사용할 수 있다.According to an embodiment of the present invention, the loss rate calculator may use at least one of MSE, PSNR, and image correlation methods to calculate the loss rate.

본 발명의 일 실시 예에 따르면 상기 식별 가능 판단부는, 상기 손실율을 산출하기 위하여 Image correlation를 사용하며, 상기 Image correlation의 값과 미리 설정된 기준 값인 0.002를 대비하여 식별 가능 여부를 판단할 수 있다.According to an embodiment of the present invention, the identifiable determination unit may use image correlation to calculate the loss rate, and determine whether it is possible to identify the image correlation value by comparing the value of the image correlation with a preset reference value of 0.002.

본 발명의 일 실시 예에 따르면 상기 식별 불가 수준으로 판단되는 Li 레이어의 i값을 산출하고, 상기 신경망에 포함된 다수의 레이어 중 L1부터 Li 레이어까지는 상기 개인정보 가공장치에서 연산을 수행하며, Li+1 부터 Ln 레이어까지는 상기 서버에서 연산을 수행함으로써 상대적으로 적은 물리적 연산능력으로 비식별화 및 딥러닝 연산이 가능할 수 있다.According to an embodiment of the present invention, an i value of a Li layer determined to be the indistinguishable level is calculated, and operations from the L1 to Li layer among a plurality of layers included in the neural network are performed by the personal information processing device, and Li By performing operations on the server from +1 to Ln layer, de-identification and deep learning operations may be possible with relatively little physical computational power.

본 발명의 실시 예에 따르면 딥러닝 기반의 개인정보 가공방법은 프로세서에 의하여 동작하는 순차적으로 위치한 다수의 레이어(Li, i = 1, 2, 3...n) 를 가지는 신경망을 이용하여, 인물 식별이 가능한 식별 가능 이미지를 포함한 개인정보를 이용하여 적어도 하나의 입력 데이터를 생성하는 단계; 상기 입력데이터를 상기 신경망의 Li 레이어에 통과시켜 연산을 수행하여 연산 결과 데이터를 획득하는 단계; 상기 획득한 연산 결과 데이터를 대상으로 상기 Li 레이어의 연산을 역으로 수행하여 역산 데이터를 획득하는 단계; 상기 획득한 역산 데이터를 이용하여 상기 식별가능 이미지 중 일부분이 누락된 비식별화 이미지를 생성하는 단계; 상기 식별 가능 이미지와 상기 비식별화 이미지를 대비하여 손실율을 계산하는 단계; 및 상기 계산된 손실율과 미리 설정된 기준 값을 비교하여, 상기 손실율이 상기 기준 값 이하인 경우 식별 가능 수준으로 판단하여 상기 연산 결과 데이터를 상기 레이어 연산부의 Li+1 레이어에 전송 후 입력 데이터로 사용하여 다음 단계의 연산결과 데이터를 산출하고, 상기 손실율이 상기 기준 값 이상인 경우 식별 불가 수준으로 판단하여 해당 연산 결과 데이터 및 수행 레이어 정보를 전송하는 단계를 포함하고, 상기 식별 가능 수준으로 판단하는 단계는, 상기 다음 단계의 연산 결과 데이터의 산출을 상기 손실율이 상기 기준 값에 도달할 때까지 반복할 수 있다.According to an embodiment of the present invention, a method for processing personal information based on deep learning uses a neural network having a plurality of layers (Li, i = 1, 2, 3...n) sequentially located by a processor, and Generating at least one input data using personal information including an identifiable identifiable image; Performing operation by passing the input data through the Li layer of the neural network to obtain operation result data; Performing inverse operation of the Li layer on the obtained operation result data to obtain inversion data; Generating an unidentified image in which a part of the identifiable image is missing using the obtained inverse data; Calculating a loss rate by comparing the identifiable image and the de-identified image; And comparing the calculated loss rate with a preset reference value, determining the discrimination level when the loss rate is less than or equal to the reference value, transmitting the result data to the Li+1 layer of the layer operation unit and using it as input data. Comprising the calculation result data of the step, if the loss rate is greater than or equal to the reference value, determining the level of non-identification, and transmitting the operation result data and performance layer information, and determining the level of identification is, The calculation of the result data of the next step may be repeated until the loss rate reaches the reference value.

본 발명의 일 실시 예에 따르면 상기 송신된 연산 결과 데이터 및 수행 레이어 정보는 서버가 수신하며, 상기 서버는 수행 레이어 정보에 기초해 상기 연산 결과 데이터를 상기 신경망에 포함된 다수의 레이어 중 아직 수행되지 않은 적어도 하나의 레이어에 순차적으로 거치게 함으로써 나머지 딥러닝 연산을 수행할 수 있다.According to an embodiment of the present invention, a server receives the transmitted operation result data and execution layer information, and the server has not yet performed the operation result data among a plurality of layers included in the neural network based on the execution layer information. The rest of the deep learning operation can be performed by sequentially passing through at least one layer.

본 발명의 일 실시 예에 따르면 상기 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함한 상기 다수의 레이어를 통해 MLP 연산을 수행할 수 있다.According to an embodiment of the present invention, the neural network may perform MLP operation through the plurality of layers including a plurality of layers (Li, i = 1, 2, 3...).

본 발명의 일 실시 예에 따르면 상기 손실율을 계산하는 단계는, 상기 손실율을 산출하기 위하여 MSE, PSNR, Image correlation의 방법 중 적어도 하나를 사용할 수 있다.According to an embodiment of the present invention, in the calculating of the loss rate, at least one of MSE, PSNR, and image correlation methods may be used to calculate the loss rate.

본 발명의 일 실시 예에 따르면 상기 계산된 손실율과 미리 설정된 기준 값을 비교하는 단계는, 상기 손실율을 산출하기 위하여 Image correlation를 사용하며, 상기 Image correlation의 값과 미리 설정된 기준 값인 0.002를 대비하여 식별 가능 여부를 판단할 수 있다.According to an embodiment of the present invention, comparing the calculated loss rate with a preset reference value uses image correlation to calculate the loss rate, and identifies it by comparing the value of the image correlation with the preset reference value of 0.002 You can judge whether it is possible.

본 발명에 따르면 비식별화를 위하여 막대한 연산자원이 필요한 문제점을 해결하고, 적은 연산능력으로도 비식별화가 가능하도록 최소한의 비식별화에 필요한 레이어 연산만을 이미지 가공장치에서 일부 수행하고, 나머지는 높은 연산능력을 가지는 서버에서 수행하게 함으로써 슈퍼 컴퓨터 수준의 연산능력을 보유하지 못한 개인 휴대용 단말 및 퍼스널 컴퓨터에서도 상대적으로 적은 연산 능력으로도 이미지에 대한 비식별화가 가능하게 되어 개인정보 유출 가능성 없이 다양한 기기에서 다양한 목적으로 딥러닝 학습 뿐만 아니라 빅데이터 전송 및 가공이 가능할 수 있다는 효과를 제공할 수 있다.According to the present invention, a problem that requires a large operator source for de-identification is solved, and only a few layer operations necessary for de-identification are performed in the image processing apparatus so that de-identification is possible with less computational power, and the rest are high. By performing it on a server with computational power, it is possible to de-identify images with relatively small computational power, even on personal portable terminals and personal computers that do not have supercomputer-level computing power. In addition to deep learning, it can provide the effect that big data transmission and processing may be possible for various purposes.

도 1은 본 발명의 실시 예에 따른 딥러닝 기반의 개인정보 가공장치의 구성도이다.
도 2는 본 발명의 일 실시 예에 따른 딥러닝 기반의 개인정보 가공 시스템의 구성도이다.
도 3은 본 발명의 일 실시 예에 따른 MLP 연산을 나타낸 도면이다.
도 4는 본 발명의 일 실시 예에 따른 합성곱 연산을 나타낸 도면이다.
도 5는 본 발명의 제1 실시 예에 따른 Pooling 연산을 나타낸 도면이다.
도 6은 본 발명의 제2 실시 예에 따른 Pooling 연산을 나타낸 도면이다.
도 7는 본 발명의 일 실시 예에 따라 각 레이어 별로 생성된 비식별화 이미지 및 이에 따라 계산된 손실율과 식별 불가한 이미지를 나타낸 도면이다.
도 8은 본 발명의 일 실시 예에 따른 딥러닝 기반의 개인정보 가공방법의 흐름도이다.1 is a configuration diagram of a deep learning-based personal information processing apparatus according to an embodiment of the present invention.
2 is a configuration diagram of a deep learning-based personal information processing system according to an embodiment of the present invention.
3 is a view showing an MLP operation according to an embodiment of the present invention.
4 is a diagram illustrating a convolution operation according to an embodiment of the present invention.
5 is a view showing a pooling operation according to the first embodiment of the present invention.
6 is a view showing a pooling operation according to a second embodiment of the present invention.
7 is a view showing an unidentified image generated for each layer and a calculated loss rate and an unidentifiable image according to an embodiment of the present invention.
8 is a flowchart of a method for processing personal information based on deep learning according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains may easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein.

그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In addition, in order to clearly describe the present invention in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part “includes” a certain component, it means that the component may further include other components, not to exclude other components, unless otherwise stated.

이하, 도면을 참조하여 본 발명의 실시 예에 따른 딥러닝 기반의 개인정보 가공장치 및 그 방법에 대하여 설명한다.Hereinafter, a deep learning-based personal information processing apparatus and method according to an embodiment of the present invention will be described with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 딥러닝 기반의 개인정보 가공장치(1000)의 구성도이다.1 is a configuration diagram of a deep learning-based personal information processing apparatus 1000 according to an embodiment of the present invention.

도 1을 참조하면 딥러닝 기반의 개인정보 가공장치(1000)는 데이터 생성부(100), 레이어 연산부(200), 역산 데이터 획득부(300), 비식별화 이미지 생성부(400), 손실율 계산부(500), 식별 가능 판단부(600)를 포함할 수 있다.1, the deep learning-based personal information processing apparatus 1000 includes a data generation unit 100, a layer operation unit 200, an inverse data acquisition unit 300, an unidentified image generation unit 400, and a loss rate calculation The unit 500 may include an identifiable determination unit 600.

본 발명의 실시 예에 따르면 딥러닝 기반의 개인정보 가공장치(1000)는 프로세서에 의하여 동작하는 순차적으로 위치한 다수의 레이어(Li, i = 1, 2, 3...n) 를 가지는 신경망을 이용하여 각 레이어 연산을 수행할 수 있다. According to an embodiment of the present invention, the deep learning-based personal information processing apparatus 1000 uses a neural network having a plurality of layers (Li, i = 1, 2, 3...n) sequentially located operated by a processor. To perform each layer operation.

데이터 생성부(100)는 인물 식별이 가능한 식별 가능 이미지를 포함한 개인정보를 이용하여 적어도 하나의 입력 데이터를 생성할 수 있다.The data generation unit 100 may generate at least one input data using personal information including an identifiable image capable of identifying a person.

여기서 식별 가능 이미지란 개인에 대한 식별이 가능할 정도로 얼굴이나 신체 외형이 선명하게 드러나 있는 이미지를 의미할 수 있으며, 아무런 비식별화 가공 없이 서버로 전송될 경우 개인정보 침해의 위험성이 존재하는 이미지라면 형식과 해상도, 크기에 제한없이 사용될 수 있다.Here, an identifiable image may mean an image in which the face or body appearance is clearly displayed so that an individual can be identified. If the image is transmitted to a server without any de-identification processing, if the image exists, there is a risk of invasion of personal information. It can be used without any limitation on resolution, size.

여기서 식별이 가능한 정도는 사람 마다 다를 수 있으므로 일정한 기준으로 특정되지 아니하며, 특정인으로 구분이 되지 않는 정도라면 제한없이 사용될 수 있다.Here, the degree of identification can vary from person to person, so it is not specified on a regular basis, and can be used without limitation if it is not classified as a specific person.

본 발명의 일 실시 예에 따르면 개인정보를 입력 데이터로 사용하기 위하여 전처리(preprocess) 단계를 거칠 수 있으며, 전처리 단계를 통해 특정 파일 형식 또는 특정 해상도 , 특정 크기 등으로 이미지를 규격화 할 수 있으며, 메타 데이터를 생성하여 해당 정보의 특성 데이터를 생성할 수도 있다.According to an embodiment of the present invention, a preprocessing step may be performed in order to use personal information as input data, and an image may be standardized in a specific file format, a specific resolution, a specific size, etc. through the preprocessing step. It is also possible to generate characteristic data of the information by generating data.

레이어 연산부(200)는 입력데이터를 신경망의 Li 레이어에 통과시켜 연산을 수행하여 연산 결과 데이터를 획득할 수 있다.The layer calculator 200 may perform calculation by passing the input data through the Li layer of the neural network to obtain calculation result data.

본 발명의 일 실시 예에 따르면 딥러닝 기반의 개인정보 가공장치(1000)는 다수의 레이어를 포함한 신경망을 이용하여, 입력 데이터를 L1, L2, L3, L4…… Ln등의 레이어에 순차적으로 통과시켜 딥러닝 연산을 수행할 수 있다.According to an embodiment of the present invention, the deep learning-based personal information processing apparatus 1000 uses an neural network including a plurality of layers to input data L1, L2, L3, L4... … Deep learning operations can be performed by sequentially passing through layers such as Ln.

본 발명의 일 실시 예에 따르면 입력 데이터를 Li 레이어에 통과시킨 다음 바로 연산 결과 데이터를 바로 Li+1 레이어에 바로 통과시키는 것이 아닌 역산을 위해 역산 데이터 획득부(300)에 송신할 수 있다.According to an embodiment of the present invention, after passing the input data through the Li layer, the calculation result data may be directly transmitted to the inverse data acquisition unit 300 for inversion rather than directly passing through the Li+1 layer.

본 발명의 일 실시 예에 따르면 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함할 수 있으며, 상기 다수의 레이어를 통해 합성곱 연산을 수행하는 딥러닝 모델일 수 있다.According to an embodiment of the present invention, the neural network may include a plurality of layers (Li, i = 1, 2, 3...), and may be a deep learning model performing a convolution operation through the plurality of layers. have.

본 발명의 또다른 일 실시 예에 따르면 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함할 수 있으며, 다수의 레이어를 통해 MLP 연산을 수행하는 딥러닝 모델일 수 있다.According to another embodiment of the present invention, a neural network may include a plurality of layers (Li, i = 1, 2, 3...), and may be a deep learning model performing MLP operation through a plurality of layers. have.

본 발명의 또 다른 일 실시 예에 따르면 신경망은 다수의 레이어(Li, i = 1, 2, 3...)를 포함할 수 있으며, 다수의 레이어를 통해 Pooling 연산을 수행하는 딥러닝 모델일 수 있다.According to another embodiment of the present invention, a neural network may include a plurality of layers (Li, i = 1, 2, 3...), and may be a deep learning model that performs a pooling operation through a plurality of layers. have.

합성곱 연산을 사용하는 본 발명의 일 실시 예에 따르면 입력 데이터를 Forward CNN하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention using a convolution operation, input result data may be obtained by forward CNN.

여기서 Forward CNN은 일반적인 컨볼루션 연산을 의미할 수 있다.Here, Forward CNN may mean a general convolution operation.

MLP 연산을 사용하는 본 발명의 일 실시 예에 따르면 입력 데이터를 Forward MLP하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention using an MLP operation, operation result data may be obtained by forward MLP input data.

여기서 Forward MLP는 액티베이션(activation) 함수를 이용한 연산일 수 있으며, 액티베이션 함수로는 Sigmoid, Tanh, Relu 등의 함수가 사용될 수 있다.Here, Forward MLP may be an operation using an activation function, and functions such as Sigmoid, Tanh, and Relu may be used as the activation function.

Pooling 연산을 사용하는 본 발명의 일 실시 예에 따르면 Max-Pooling 또는 Average-pooling 등 resolution을 줄이기 위한 연산을 이용하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention using a pooling operation, calculation result data may be obtained using an operation to reduce resolution such as Max-Pooling or Average-pooling.

상술한 합성곱(Convolution) 연산, MLP(multi-layer perceptron) 연산, Pooling 연산에 대해서는 도 3 내지 6을 참조하며 더 자세하게 설명한다.The above-described convolution operation, multi-layer perceptron (MLP) operation, and pooling operation will be described in more detail with reference to FIGS. 3 to 6.

역산 데이터 획득부(300)는 획득한 연산 결과 데이터를 대상으로 Li 레이어의 연산을 역으로 수행하여 역산 데이터를 획득할 수 있다.The inversion data acquisition unit 300 may obtain inversion data by performing an operation of the Li layer inversely on the obtained operation result data.

여기서 연산을 역으로 수행하는 것은 레이어 연산부(200)에서 수행된 연산을 반대로 수행하는 것을 의미한다.Here, performing the operation inversely means performing the operation performed in the layer operation unit 200 in reverse.

합성곱 연산을 사용하는 본 발명의 일 실시 예에 따르면 입력 데이터를 Inverse CNN하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention using a convolution operation, the result data may be obtained by inverse CNN input data.

여기서 Inverse CNN는 컨볼루션 연산을 반대로 하는 연산으로, 연산적으로 Kernel의 Transpose를 곱하는 것을 의미할 수 있다.Here, Inverse CNN is an operation that reverses the convolution operation, and may mean multiplying Kernel Transpose computationally.

MLP 연산을 사용하는 본 발명의 일 실시 예에 따르면 입력 데이터를 Inverse MLP하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention using an MLP operation, the result data may be obtained by inverse MLP input data.

여기서 Inverse MLP는 액티베이션 함수의 역함수와 W 의 inverse를 곱해주는 것을 의미할 수 있다. Here, the Inverse MLP may mean multiplying the inverse of the activation function and the inverse of W.

Pooling 연산을 사용하는 본 발명의 일 실시 예에 따르면 입력 데이터를 Un-Pooling하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention using a pooling operation, the result data may be obtained by Un-Pooling the input data.

여기서 Un-Pooling은 resolution을 높이기 위하여 사용되며, Pooling 연산을 역으로 수행하여 모두 같은 값으로 채워 넣는 방법도 사용될 수 있으며, Max-pooling 할 때 위치를 저장해 두었다가 그 위치만 채워 넣는 방법도 사용될 수 있다. Here, Un-Pooling is used to increase the resolution, and a method of filling in all with the same value by performing the inverse pooling operation can also be used, and a method of storing the location after filling in Max-pooling can also be used. .

비식별화 이미지 생성부(400)는 획득한 역산 데이터를 이용하여 식별가능 이미지 중 일부분이 누락된 비식별화 이미지를 생성할 수 있다.The de-identified image generator 400 may generate a de-identified image in which a portion of the identifiable image is missing using the obtained inverse data.

본 발명의 일 실시 예에 따르면 입력 데이터를 레이어 연산 하는 과정에서 일정 정보가 누락되기 때문에 연산 결과 데이터를 역산하여 획득된 역산 데이터 또한 원래 학습 데이터에 비하여 누락된 정보가 존재할 수 있다.According to an embodiment of the present invention, since certain information is missing in the process of layer calculation of the input data, the inverse data obtained by inverting the calculation result data may also have missing information compared to the original learning data.

따라서 역산 데이터를 이용하여 생성한 이미지는 식별 가능 이미지 대비 누락된 부분이 존재할 수 밖에 없으며, 이러한 누락된 부분이 존재하는 이미지를 비식별화 이미지로 정의할 수 있다.Therefore, the image generated using the inversion data has to have a missing portion compared to the identifiable image, and an image in which the missing portion exists can be defined as an unidentified image.

손실율 계산부(500)는 식별 가능 이미지와 비식별화 이미지를 대비하여 손실율을 계산할 수 있다.The loss rate calculator 500 may calculate a loss rate by comparing the identifiable image and the non-identified image.

본 발명의 일 실시 예에 따르면 손실율 계산부(500)는 식별 가능 이미지와 비식별화 이미지를 대비하여 누락 부분을 파악하여 이를 통해 손실율을 계산할 수 있다.According to an embodiment of the present invention, the loss rate calculator 500 may calculate a loss rate by identifying a missing part in contrast to an identifiable image and an unidentified image.

본 발명의 일 실시 예에 따르면 손실율은 MSE(mean square error), PSNR(Peak signal-to-noise ratio), Image correlation의 방법 중 적어도 하나를 사용하여 산출할 수 있다.According to an embodiment of the present invention, the loss rate may be calculated using at least one of a method of mean square error (MSE), peak signal-to-noise ratio (PSNR), and image correlation.

본 발명의 MSE(mean square error)를 이용하여 손실율을 계산하는 일 실시 예에 따르면 수학식 1과 같은 수식을 이용하여 손실율을 산출할 수 있다.According to an embodiment of calculating a loss rate using a mean square error (MSE) of the present invention, a loss rate may be calculated using an equation such as Equation (1).

[수학식 1][Equation 1]

본 발명의 PSNR(Peak signal-to-noise ratio)를 이용하여 손실율을 계산하는 일 실시 예에 따르면 수학식 2과 같은 수식을 이용하여 손실율을 산출할 수 있다.According to an embodiment of calculating a loss rate using a peak signal-to-noise ratio (PSNR) of the present invention, a loss rate may be calculated using an equation such as Equation (2).

[수학식 2][Equation 2]

본 발명의 Image correlation를 이용하여 손실율을 계산하는 일 실시 예에 따르면 원본 이미지와 복원된 이미지의 correlation을 계산하여 얼마나 상관관계가 있는지 측정할 수 있다.According to an embodiment of calculating a loss rate using the image correlation of the present invention, it is possible to measure the correlation between the original image and the reconstructed image by calculating the correlation.

식별 가능 판단부(600)는 계산된 손실율과 미리 설정된 기준 값을 비교하여, 손실율이 기준 값 이하인 경우 식별 가능 수준으로 판단하여 연산 결과 데이터를 레이어 연산부의 Li+1 레이어에 전송 후 입력 데이터로 사용하여 다음 단계의 연산결과 데이터를 산출할 수 있다.The identifiable determination unit 600 compares the calculated loss rate with a preset reference value, determines the identifiable level when the loss rate is less than the reference value, transmits the calculation result data to the Li+1 layer of the layer operation unit, and uses it as input data By doing so, it is possible to calculate the result data of the next step.

여기서 미리 설정된 기준 값은 손실율 계산 방법에 따라 다르게 설정될 수 있으며, 인물 식별이 불가능할 정도의 손실이 존재한다고 판단된다면 특정 값에 제한 없이 사용될 수 있다.Here, the preset reference value may be set differently according to the method of calculating the loss rate, and if it is determined that there is a loss that cannot identify a person, it may be used without limitation to a specific value.

또한 상기 실시 예에 따르면 손실율이 기준 값 이상인 경우 식별 불가 수준으로 판단하여 해당 연산 결과 데이터 및 수행 레이어 정보를 전송할 수 있다.In addition, according to the above embodiment, when the loss rate is greater than or equal to a reference value, it is determined that the level is unidentifiable, and thus the result data and the performance layer information can be transmitted.

본 발명의 일 실시 예에 따르면 식별 가능 판단부(600)는 다음 단계의 연산 결과 데이터의 산출을 손실율이 기준 값에 도달할 때까지 반복할 수 있다.According to an embodiment of the present invention, the identifiable determination unit 600 may repeat the calculation of the result data of the next step until the loss rate reaches a reference value.

본 발명의 일 실시 예에 따르면 손실율 계산부(500)는 손실율을 산출하기 위하여 MSE, PSNR, Image correlation의 방법 중 적어도 하나를 사용할 수 있다.According to an embodiment of the present invention, the loss rate calculator 500 may use at least one of MSE, PSNR, and image correlation methods to calculate the loss rate.

본 발명의 일 실시 예에 따르면 식별 가능 판단부(600)는 손실율을 산출하기 위해 Image correlation를 사용한 경우, 기준 값을 0.002로 설정하고 이와 손실율을 대비하여 식별 가능 여부를 판단할 수 있다.According to an embodiment of the present invention, when the image correlation is used to calculate the loss rate, the identifiable determination unit 600 may set the reference value to 0.002 and determine whether it is identifiable in preparation for the loss rate.

상기 실시 예는 도 7을 참조하여 더 자세히 설명하도록 한다. The above embodiment will be described in more detail with reference to FIG. 7.

도 2는 본 발명의 일 실시 예에 따른 딥러닝 기반의 개인정보 가공 시스템의 구성도이다.2 is a configuration diagram of a deep learning-based personal information processing system according to an embodiment of the present invention.

도 2를 참조하면 딥러닝 기반의 개인정보 가공 시스템은 딥러닝 기반의 개인정보 가공장치(1000), 서버(2000)를 포함할 수 있다.Referring to FIG. 2, a deep learning-based personal information processing system may include a deep learning-based personal information processing apparatus 1000 and a server 2000.

본 발명의 일 실시 예에 따르면 식별 불가 수준으로 판단되는 Li 레이어의 i값을 산출할 수 있으며, 신경망에 포함된 다수의 레이어 중 L1부터 Li 레이어까지는 개인정보 가공장치(1000)에서 연산을 수행할 수 있으며, Li+1 부터 Ln 레이어까지는 서버(2000)에서 연산을 수행함으로써 상대적으로 적은 물리적 연산능력으로 비식별화 및 딥러닝 연산을 할 수 있다.According to an embodiment of the present invention, it is possible to calculate the i value of the Li layer determined to be indistinguishable, and from the L1 to Li layer among the multiple layers included in the neural network, the personal information processing apparatus 1000 may perform the operation. In addition, by performing operations on the server 2000 from the Li+1 to the Ln layer, de-identification and deep learning operations can be performed with relatively little physical computational power.

상기 실시 예에 따른 시나리오는 아래와 같다.The scenario according to the above embodiment is as follows.

만약 개인정보 가공장치(1000)의 레이어 연산부(200)에서 L3 레이어를 통과시켜 연산 결과 데이터를 획득한 경우, L3 레이어의 연산 결과 데이터를 역산한 역산 데이터를 이용하여 생성된 비식별화 이미지와 식별 가능 이미지를 대비하여 손실율을 산출할 수 있다.If the calculation result data is obtained by passing the L3 layer through the layer calculation unit 200 of the personal information processing apparatus 1000, it is identified with the de-identified image generated using the inverse data that inverses the calculation result data of the L3 layer. The loss rate can be calculated against the possible image.

이때 산출된 손실율이 미리 설정된 기준 값 이하인 경우에는 L3 레이어의 연산 결과 데이터를 레이어 연산부(200)로 전송하여 L4 레이어의 입력 데이터로 사용함으로써 식별 가능 여부를 다시 판단할 수 있다.In this case, when the calculated loss rate is equal to or less than a preset reference value, it is possible to determine whether or not identification is possible by transmitting the calculation result data of the L3 layer to the layer operation unit 200 and using it as input data of the L4 layer.

반면에 산출된 손실율이 미리 설정된 기준 값 이상인 경우에는 L3 레이어의 연산 결과 데이터 및 수행 레이어 정보를 서버(2000)에 전송하고 L4 레이어부터 Ln 레이어까지의 연산을 서버(2000)에서 수행할 수 있다.On the other hand, when the calculated loss rate is greater than or equal to a preset reference value, the calculation result data of the L3 layer and the execution layer information may be transmitted to the server 2000 and the calculation from the L4 layer to the Ln layer may be performed by the server 2000.

도 3은 본 발명의 일 실시 예에 따른 MLP 연산을 나타낸 도면이다.3 is a view showing an MLP operation according to an embodiment of the present invention.

도 3을 참조하면 MLP 연산에 대한 데이터 흐름이 나타나 있으며, MLP 연산은 정방향 레이어 연산인 Forward MLP과 역방향 역산인 Inverse MLP으로 구별될 수 있다.Referring to FIG. 3, a data flow for an MLP operation is shown, and the MLP operation may be divided into a forward layer operation, Forward MLP, and an inverse MLP, which is an inverse operation.

본 발명에 의한 Forward MLP 연산에서는 Li+1 레이어의 j값은 아래 수학식 3과 같이 계산될 수 있다.In the Forward MLP operation according to the present invention, the j value of the Li+1 layer can be calculated as in Equation 3 below.

[수학식 3][Equation 3]

,

본 발명에 의한 Inverse MLP 연산은 아래 수학식 4와 같이 연산 될 수 있다.The Inverse MLP operation according to the present invention can be calculated as in Equation 4 below.

[수학식 4][Equation 4]

도 4는 본 발명의 일 실시 예에 따른 합성곱 연산을 나타낸 도면이다.4 is a diagram illustrating a convolution operation according to an embodiment of the present invention.

도 4를 참조하면 합성곱 연산에 대한 데이터 흐름이 나타나 있으며, 합성곱 연산은 정방향 레이어 연산인 Forward CNN과 역방향 역산인 Inverse CNN으로 구별될 수 있다.Referring to FIG. 4, a data flow for a convolution operation is shown, and the convolution operation can be divided into a forward layer operation, Forward CNN, and an inverse operation, Inverse CNN.

본 발명에 의한 Forward CNN 연산에서는 Li+1 레이어의 j값은 아래 수학식 5과 같이 계산될 수 있다.In the Forward CNN operation according to the present invention, the j value of the Li+1 layer may be calculated as in Equation 5 below.

[수학식 5][Equation 5]

본 발명에 의한 Inverse CNN 연산은 Forward CNN 연산의 역산으로 연산적으로 Kernel의 Transpose를 곱하는 방법으로 산출될 수 있다.The Inverse CNN operation according to the present invention can be calculated by multiplying the Kernel Transpose as an inverse of the Forward CNN operation.

도 5는 본 발명의 제1 실시 예에 따른 Pooling 연산을 나타낸 도면이다.5 is a view showing a pooling operation according to the first embodiment of the present invention.

도 5를 참조하면 본 발명의 제1 실시 예에 따른 Pooling 연산은 Max-pooling 연산 시에 다수의 데이터 중 임의의 데이터 값을 저장하고 나머지는 삭제하여 연산을 수행할 수 있고, Un-Pooling 연산시에는 삭제된 부분에 저장된 데이터 값을 할당하는 연산을 수행할 수 있다.Referring to FIG. 5, the pooling operation according to the first embodiment of the present invention may perform an operation by storing an arbitrary data value among a plurality of data during the Max-pooling operation and deleting the rest, when performing the Un-Pooling operation. In the operation, an operation of allocating data values stored in the deleted part can be performed.

도 6은 본 발명의 제2 실시 예에 따른 Pooling 연산을 나타낸 도면이다. 6 is a view showing a pooling operation according to a second embodiment of the present invention.

도 6을 참조하면 본 발명의 제2 실시 예에 따른 Pooling 연산은 Max-pooling 연산 시에 다수의 데이터 중 임의의 데이터 값 및 위치를 저장하고 나머지는 삭제하여 연산을 수행할 수 있고, Un-Pooling 연산시에는 저장된 임의의 데이터 값 및 위치를 저장을 제외한 삭제된 부분에 0을 할당하는 연산을 수행할 수 있다.Referring to FIG. 6, the pooling operation according to the second embodiment of the present invention may perform an operation by storing an arbitrary data value and a position among a plurality of data and deleting the rest of the data during a max-pooling operation. At the time of operation, an operation of allocating 0 to the deleted part other than storing the stored arbitrary data values and locations can be performed.

도 7는 본 발명의 일 실시 예에 따라 각 레이어 별로 생성된 비식별화 이미지 및 이에 따라 계산된 손실율과 식별 불가한 이미지를 나타낸 도면이다.7 is a view showing an unidentified image generated for each layer and a calculated loss rate and an unidentifiable image according to an embodiment of the present invention.

도 7을 참조하면 각 레이어를 순차적으로 거쳤을 때의 각 레이어 별 역산에 따라 생성된 비식별화 이미지가 나타나 있으며, 이러한 비식별화 이미지 중 인물의 식별이 불가능한 레이어에서의 손실율을 미리 설정된 기준 값으로 정할 수 있다.Referring to FIG. 7, de-identified images generated according to inversion of each layer when sequentially passing through each layer are shown, and among these de-identified images, a loss factor in a layer in which a person cannot be identified is a preset reference value. Can be determined by.

손실율을 Image correlation로 구하는 본 발명의 일 실시 예에 따르면 미리 설정된 기준 값을 0.002로 잡아 손실율을 대비하여 식별 가능 여부를 판단할 수 있다. According to an embodiment of the present invention for obtaining the loss rate as image correlation, it is possible to determine whether or not it is possible to identify the loss rate by setting the preset reference value to 0.002.

도 8은 본 발명의 일 실시 예에 따른 딥러닝 기반의 개인정보 가공방법의 흐름도이다.8 is a flowchart of a method for processing personal information based on deep learning according to an embodiment of the present invention.

입력 데이터를 생성할 수 있다(810).Input data may be generated (810).

본 발명의 일 실시 예에 따르면 인물 식별이 가능한 식별 가능 이미지를 포함한 개인정보를 이용하여 적어도 하나의 입력 데이터를 생성할 수 있다.According to an embodiment of the present invention, at least one input data may be generated using personal information including an identifiable image capable of identifying a person.

연산 결과 데이터를 획득할 수 있다(820).Operation result data may be obtained (820 ).

본 발명의 일 실시 예에 따르면 입력 데이터를 신경망의 Li 레이어에 통과시켜 연산을 수행하여 연산 결과 데이터를 획득할 수 있다.According to an embodiment of the present invention, input data may be passed through a Li layer of a neural network to perform calculation to obtain calculation result data.

본 발명의 일 실시 예에 따르면 다수의 레이어를 포함한 신경망을 이용하여, 입력 데이터를 L1, L2, L3, L4…… Ln등의 레이어에 순차적으로 통과시켜 딥러닝 연산을 수행할 수 있다.According to an embodiment of the present invention, by using a neural network including a plurality of layers, input data L1, L2, L3, L4... … Deep learning operations can be performed by sequentially passing through layers such as Ln.

본 발명의 일 실시 예에 따르면 입력 데이터를 Li 레이어에 통과시킨 다음 바로 연산 결과 데이터를 바로 Li+1 레이어에 바로 통과시키는 것이 아닌 역산을 위해 송신할 수 있다.According to an embodiment of the present invention, after passing the input data through the Li layer, the calculation result data may not be passed directly through the Li+1 layer, but may be transmitted for inversion.

역산 데이터를 획득할 수 있다(830).Inverse data may be obtained (830).

본 발명의 일 실시 에에 따르면 획득한 연산 결과 데이터를 대상으로 Li 레이어의 연산을 역으로 수행하여 역산 데이터를 획득할 수 있다.According to an embodiment of the present invention, the inversion data may be obtained by performing the operation of the Li layer in reverse on the obtained result data.

여기서 연산을 역으로 수행하는 것은 각 레이어에서 수행된 연산을 반대로 수행하는 것을 의미한다.Here, performing the operation in reverse means performing the operation performed in each layer in reverse.

여기서 Un-Pooling은 resolution을 높이기 위하여 사용되며, Pooling 연산을 역으로 수행하여 모두 같은 값으로 채워 넣는 방법도 사용될 수 있으며, Max-pooling 할 때 위치를 저장해 두었다가 그 위치만 채워 넣는 방법도 사용될 수 있다.Here, Un-Pooling is used to increase the resolution, and a method of filling in all with the same value by performing the inverse pooling operation can also be used, and a method of storing the location after filling in Max-pooling can also be used. .

비식별화 이미지를 생성할 수 있다(840).A de-identified image may be generated (840).

본 발명의 일 실시 예에 따르면 획득한 역산 데이터를 이용하여 식별가능 이미지 중 일부분이 누락된 비식별화 이미지를 생성할 수 있다.According to an embodiment of the present invention, a de-identified image in which a part of the identifiable image is missing may be generated using the obtained inverse data.

비식별화 이미지와 대비하여 손실율을 계산할 수 있다(850).The loss rate may be calculated in comparison to the de-identified image (850).

본 발명의 일 실시 예에 따르면 식별 가능 이미지와 비식별화 이미지를 대비하여 손실율을 계산할 수 있다.According to an embodiment of the present invention, a loss rate may be calculated by comparing an identifiable image and a non-identified image.

본 발명의 일 실시 예에 따르면 식별 가능 이미지와 비식별화 이미지를 대비하여 누락 부분을 파악하여 이를 통해 손실율을 계산할 수 있다.According to an embodiment of the present invention, a missing portion may be identified by comparing an identifiable image and a non-identified image to calculate a loss rate.

식별 가능 수준을 판단할 수 있다(860).The discernible level may be determined (860).

본 발명의 일 실시 예에 따르면 계산된 손실율과 미리 설정된 기준 값을 비교하여, 손실율이 기준 값 이하인 경우 식별 가능 수준으로 판단하여 연산 결과 데이터를 레이어 연산부의 Li+1 레이어에 전송 후 입력 데이터로 사용하여 다음 단계의 연산결과 데이터를 산출할 수 있다.According to an embodiment of the present invention, the calculated loss rate is compared with a preset reference value, and when the loss rate is less than or equal to the reference value, it is determined as an identifiable level, and the calculation result data is transferred to the Li+1 layer of the layer operation unit and used as input data. By doing so, it is possible to calculate the result data of the next step.

본 발명의 일 실시 예에 따르면 손실율 계산부는 손실율을 산출하기 위하여 MSE, PSNR, Image correlation의 방법 중 적어도 하나를 사용할 수 있다.According to an embodiment of the present invention, the loss rate calculator may use at least one of MSE, PSNR, and image correlation methods to calculate the loss rate.

본 발명의 일 실시 예에 따르면 식별 가능 판단부는 손실율을 산출하기 위해 Image correlation를 사용한 경우, 기준 값을 0.002로 설정하고 이와 손실율을 대비하여 식별 가능 여부를 판단할 수 있다.According to an embodiment of the present invention, when the image correlation is used to calculate the loss rate, the discriminable determination unit may set the reference value to 0.002 and determine whether it is discernable in preparation for the loss rate.

연산 결과 데이터 및 수행 레이어 정보를 송신할 수 있다(870).The calculation result data and execution layer information may be transmitted (870).

본 발명의 일 실시 예에 따르면 손실율이 기준 값 이상인 경우 식별 불가 수준으로 판단하여 해당 연산 결과 데이터 및 수행 레이어 정보를 전송할 수 있다.According to an embodiment of the present invention, when the loss rate is greater than or equal to a reference value, it is determined that the level is not identifiable, and thus the result data and execution layer information can be transmitted.

나머지 딥 러닝 연산을 수행할 수 있다(880).The remaining deep learning operations may be performed (880).

본 발명의 일 실시 예에 따르면 송신된 연산 결과 데이터 및 수행 레이어 정보는 서버(2000)가 수신하며, 서버(2000)는 수행 레이어 정보에 기초해 연산 결과 데이터를 상기 신경망에 포함된 다수의 레이어 중 아직 수행되지 않은 적어도 하나의 레이어에 순차적으로 거치게 함으로써 나머지 딥러닝 연산을 수행할 수 있다.According to an embodiment of the present invention, the server 2000 receives the calculation result data and the performance layer information, and the server 2000 receives the calculation result data from the plurality of layers included in the neural network based on the performance layer information. The remaining deep learning operations can be performed by sequentially passing through at least one layer that has not been performed yet.

딥러닝 연산의 결과 값을 산출할 수 있다(890).A result value of the deep learning operation may be calculated (890).

본 발명의 일 실시 예에 따르면 다수의 레이어 중 L1부터 Li 레이어까지는 개인정보 가공장치(1000)에서 연산을 수행할 수 있으며, Li+1 부터 Ln 레이어까지는 서버(2000)에서 연산을 수행함으로써 상대적으로 적은 물리적 연산능력으로 비식별화 및 딥러닝 연산을 할 수 있다.According to an embodiment of the present invention, operations from the personal information processing apparatus 1000 may be performed from the L1 to the Li layer among the plurality of layers, and the server 2000 may perform the operations from the Li+1 to the Ln layers relatively. De-identification and deep learning operations can be performed with less physical computational power.

본 발명의 실시 예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.The embodiments of the present invention are not implemented only through the devices and/or methods described above, and the embodiments of the present invention have been described in detail above, but the scope of the present invention is not limited thereto, and the following claims Various modifications and improvements of those skilled in the art using the basic concept of the present invention defined in the above also belong to the scope of the present invention.

100 : 데이터 생성부 200 : 레이어 연산부
300 : 역산 데이터 획득부 400 : 비식별화 이미지 생성부
500 : 손실율 계산부 600 : 식별 가능 판단부
1000 : 개인정보 가공장치 2000 : 서버100: data generation unit 200: layer operation unit
300: inversion data acquisition unit 400: de-identification image generation unit
500: loss rate calculation unit 600: discernible determination unit
1000: Personal information processing device 2000: Server

Claims

By using a neural network having a number of layers (Li, i = 1, 2, 3...n) sequentially located by the processor,
A data generation unit generating at least one input data by using personal information including an identifiable image capable of identifying a person;
A layer operation unit for performing calculation by passing the input data through the Li layer of the neural network to obtain calculation result data;
An inverse data acquisition unit for performing inverse operation of the Li layer on the obtained operation result data to obtain inversion data;
A de-identification image generator for generating a de-identification image in which a part of the identifiable image is missing using the obtained inverse data;
A loss rate calculation unit calculating a loss rate by comparing the identifiable image and the de-identified image; And
The calculated loss rate is compared with a preset reference value, and when the loss rate is less than the reference value, it is determined as an identifiable level, and the result data is transferred to the Li+1 layer of the layer operation unit and used as input data. Next step A personal information processing device including an identifiable determination unit for calculating calculation result data of, and determining that the loss rate is higher than the reference value as an unidentifiable level and transmitting the calculation result data and execution layer information;
The remaining deep learning operation is performed by receiving the operation result data and execution layer information and sequentially passing the operation result data to at least one layer that has not been performed among the multiple layers included in the neural network based on the execution layer information. Deep learning-based personal information processing system that includes a server that performs the task.

According to claim 1,
The identifiable determination unit,
Deep learning-based personal information processing system characterized by repeating the calculation of the result data of the next step until the loss rate reaches the reference value.

According to claim 1,
The neural network includes a plurality of layers (Li, i = 1, 2, 3...), and deep learning based personal information characterized by being a deep learning model that performs a convergence operation through the plurality of layers. Processing system.

According to claim 1,
The neural network is a deep learning-based personal information processing system characterized in that it is a deep learning model that performs MLP operations through the plurality of layers including a plurality of layers (Li, i = 1, 2, 3...).

According to claim 1,
The neural network is a deep learning-based personal information processing system characterized in that it is a deep learning model that performs a pooling operation through the plurality of layers including a plurality of layers (Li, i = 1, 2, 3...).

According to claim 1, wherein the loss rate calculation unit,
Deep learning-based personal information processing system, characterized in that at least one of the methods of MSE, PSNR, and Image correlation is used to calculate the loss rate.

The method of claim 6, wherein the identifiable determination unit,
A deep learning-based personal information processing system characterized by determining whether or not identification is possible by using image correlation to calculate the loss rate, and comparing the value of the image correlation with a preset reference value of 0.002.

According to claim 2,
Calculate the i value of the Li layer determined to be the non-identifiable level, and perform operations in the personal information processing device from L1 to Li layer among the multiple layers included in the neural network, and the server from Li+1 to Ln layer Deep learning based personal information processing system, characterized in that it is possible to perform non-identification and deep learning calculations with relatively little physical calculation ability by performing calculations in

Using a neural network with multiple layers (Li, i = 1, 2, 3...n) located sequentially by the processor in the personal information processing device,
Generating at least one input data by using the personal information including an identifiable image capable of identifying a person in the data generation unit;
A step of performing a calculation by passing the input data through the Li layer of the neural network in a layer calculation unit to obtain calculation result data;
Obtaining an inversion data by performing an operation of the Li layer inversely on the obtained operation result data in the inversion data acquisition unit;
Generating a de-identified image in which a portion of the identifiable image is missing using the obtained inverse data in the de-identified image generating unit;
Calculating a loss rate by comparing the identifiable image and the de-identified image by a loss rate calculator; And
The identifiable determination unit compares the calculated loss rate with a preset reference value, and if the loss rate is less than the reference value, determines the identifiable level and transmits the calculation result data to the Li+1 layer of the layer calculation unit, and then inputs the data. Comprising the step of calculating the operation result data of the next step, and if the loss rate is greater than or equal to the reference value, determining the level of identification, and transmitting the operation result data and performance layer information,
The server receives the calculation result data and the performance layer information, and based on the performance layer information, sequentially passes the calculation result data to at least one layer that has not yet been performed among a plurality of layers included in the neural network. Deep learning based personal information processing method comprising the step of performing a running operation.

The method of claim 9,
The step of judging by the identifiable level,
A method of processing personal information based on deep learning, characterized in that the calculation of the result data of the next step is repeated until the loss rate reaches the reference value.

The method of claim 9,
The neural network includes a plurality of layers (Li, i = 1, 2, 3...), and deep learning based personal information characterized by being a deep learning model that performs a convergence operation through the plurality of layers. Processing method.

The method of claim 9,
The neural network is a deep learning-based personal information processing method characterized in that it is a deep learning model that performs MLP operations through the plurality of layers including a plurality of layers (Li, i = 1, 2, 3...).

The method of claim 9,
The neural network is a deep learning-based personal information processing method characterized in that it is a deep learning model that performs a pooling operation through the plurality of layers including a plurality of layers (Li, i = 1, 2, 3...).

The method of claim 9, wherein calculating the loss rate,
A method of processing personal information based on deep learning, characterized in that at least one of MSE, PSNR, and image correlation is used to calculate the loss rate.

15. The method of claim 14, Comparing the calculated loss rate and a preset reference value,
A method of processing personal information based on deep learning, characterized in that image correlation is used to calculate the loss rate, and it is determined whether or not identification is possible by comparing the value of the image correlation with a preset reference value of 0.002.

The method of claim 9,
Calculate the i value of the Li layer determined to be the non-identifiable level, and perform operations in the personal information processing device from L1 to Li layer among the multiple layers included in the neural network, and the server from Li+1 to Ln layer Deep learning-based personal information processing method characterized in that de-identification and deep learning calculations are possible with relatively small physical calculation ability by performing calculations in.