KR20190061446A

KR20190061446A - Apparatus for generating adversarial example in deep learning environment and method thereof, computer program

Info

Publication number: KR20190061446A
Application number: KR1020170159849A
Authority: KR
Inventors: 권현; 최대선
Original assignee: 공주대학교 산학협력단
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2019-06-05
Also published as: KR102149132B1

Abstract

The present invention relates to an apparatus for generating an adversarial example in a deep running environment and a method thereof and a computer program thereof. According to the present invention, the apparatus comprises: first and second classifiers learned in accordance with deep learning to identify a class of input data through a neural network; and a converter for separately inputting adversarial data (adversarial example) converted from original data into the first and second classifiers, obtaining, by the first and second classifiers, each identification result obtained by separately identifying a class of the original data on the basis of the input adversarial data, normally identifying, by the first classifier, the class of the original data on the basis of the adversarial data, and optimizing, by the second classifier, the adversarial data to misidentify the class of the original data on the input adversarial data.

Description

[0001] APPARATUS FOR GENERATING EXAMPLES IN DEEP LEARNING ENVIRONMENT AND METHOD THEREOF, COMPUTER PROGRAM [0002]

본 발명은 딥 러닝 환경에서의 적대적 예제 생성 장치 및 방법, 컴퓨터 프로그램에 관한 것으로서, 더욱 상세하게는 신경망(Neural Network)에 따라 원본 데이터의 클래스의 식별 결과가 상이해지는 적대적 예제(Adversarial Example)를 생성하는 딥 러닝 환경에서의 적대적 예제 생성 장치 및 방법, 컴퓨터 프로그램에 관한 것이다.The present invention relates to an apparatus and method for generating a hostile example in a deep learning environment, and more particularly to an adversarial example in which the result of discrimination of a class of original data differs according to a neural network To a hostile example generating device and method in a deep learning environment, and to a computer program.

딥 러닝(Deep Learning)은 기계가 사람처럼 생각하고 배울 수 있도록 하는 인공지능(AI) 기술을 의미하며, 인공신경망 이론을 기반으로 복잡한 비선형 문제를 기계가 스스로 학습하여 해결할 수 있도록 한다. 이러한 딥 러닝 기술을 적용하면 사람이 모든 판단 기준을 정해주지 않아도 컴퓨터가 스스로 인지, 추론 및 판단을 수행할 수 있어 패턴 분석 분야에서 광범위하게 적용되고 있다.Deep Learning is an artificial intelligence (AI) technology that allows a machine to think and learn like a human being, and it enables a machine to learn and solve complex nonlinear problems based on artificial neural network theory. The application of this deep learning technology enables a computer to perceive, reason, and judge itself without having to set all judgment criteria, and is widely applied in the field of pattern analysis.

심층 신경망(DNN: Deep Neural Network)은 입력 계층(input layer)과 출력 계층(output layer) 사이에 복수 개의 은닉 계층(hidden layer)들로 이뤄진 인공 신경망(ANN: Artificial Neural Network)을 의미하며, 선형 맞춤(linear fitting)과 비선형 변환(nonlinear transformation or activation) 등을 반복적으로 수행한다.Deep Neural Network (DNN) means an artificial neural network (ANN) composed of a plurality of hidden layers between an input layer and an output layer, Linear fitting and nonlinear transformation or activation are repeatedly performed.

심층 신경망은 이미지 인식, 음성 인식, 침입 감내 시스템(Intrusion Tolerance System) 및 자연어 처리(Natural Language Processing) 등 광범위한 분야에 적용되고 있어 그 보안 문제가 제기되어 왔다. 구체적으로, 입력 데이터에 야기된 미소 변조를 인간이 육안으로 인지할 수 없는 경우라도, 미소 변조가 발생한 입력 데이터는 심층 신경망이 입력 데이터의 클래스를 잘못 식별하도록 하는 문제점을 야기할 수 있다. 예를 들어, 심층 심경망을 통해 도로 표지판을 인식하여 주행하는 자율 주행 차량에 있어, 심층 심경망으로 입력되는 도로 표지판 이미지를 미소 변조시킴으로써 자율 주행 차량의 의도치 않는 동작이 유발되는 문제점이 존재한다(예: 좌회전 표시 이미지의 미소 변조가 자율 주행 차량의 우회전을 유발하는 경우). 상기한, 미소 변조된 입력 데이터를 적대적 예제(Adversarial Example)라 하며, 최소한의 이미지 변조를 통해 원래 이미지의 클래스와는 다른 클래스로 인식되도록 하는 것을 Evasion Attack이라 한다.The depth of neural network has been applied to a wide range of fields such as image recognition, speech recognition, Intrusion Tolerance System, and Natural Language Processing. Specifically, even when a human being can not perceive the minute modulation caused by the input data, the input data in which the minute modulation occurs may cause a problem that the in-depth neural network erroneously classifies the input data. For example, in an autonomous vehicle that recognizes a road sign through an in-depth network, there is a problem in that an unintended operation of the autonomous vehicle is caused by micro-modifying the road sign image input to the deep core network (For example, when a slight change in the left turn display image causes a right turn of the autonomous vehicle). The above-mentioned input data that is slightly modulated is referred to as an adversarial example, and it is referred to as Evasion Attack so as to be recognized as a class different from the class of the original image through minimum image modulation.

본 발명의 배경기술은 대한민국 공개특허공보 제10-2017-0095582호(2017.08.23. 공개)에 개시되어 있다.The background art of the present invention is disclosed in Korean Patent Laid-Open Publication No. 10-2017-0095582 (published on Aug. 23, 2017).

전술한 적대적 예제는 활용 분야에 따라 유용하게 활용될 수도 있다. 예를 들어, 전장(battle field)에서 도로 표시 이미지를 미소 변조시켜 적대적 예제를 생성함으로써 심층 심경망이 적용된 적군의 자율 주행 차량의 오동작을 유도하는 경우를 생각할 수 있다. 이때, 아군과 적군이 같은 전장에 존재하는 경우라면 적군의 자율 주행 차량에 적용된 심층 신경망에 대하여는 원본 데이터가 오식별되는 동시에 아군의 자율 주행 차량에 적용된 심층 신경망에 대하여는 원본 데이터가 정상적으로 식별되는 적대적 예제를 생성할 필요성이 존재한다.The above-mentioned hostile examples may be useful depending on the application field. For example, a case may be considered in which a malfunction of an enemy autonomous vehicle to which an in-depth core network is applied by generating a hostile example by slightly modulating a road display image in a battle field can be considered. In this case, if the allied and enemy forces are in the same battlefield, the original data is misidentified for the defense-in-depth neural network applied to the enemy autonomous vehicles, and the hostile example applied to the defense- Lt; / RTI >

이에, 본 발명의 일 측면에 따른 목적은 어느 하나의 심층 신경망에 대하여는 원본 데이터가 정상적으로 식별되고, 다른 하나의 심층 신경망에 대하여는 원본 데이터가 오식별되는 적대적 예제를 생성할 수 있는 딥 러닝 환경에서의 적대적 예제 생성 장치 및 방법, 컴퓨터 프로그램을 제공하는 것이다.Accordingly, it is an object of the present invention to provide a deep learning environment in which a hostile example in which original data is normally identified for one of the deep neural networks and source data is misidentified for another deep neural network An apparatus and method for generating hostile examples, and a computer program.

본 발명의 일 측면에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치는 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 딥 러닝(Deep Learning)에 따라 학습된 제1 및 제2 분류기, 및 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 상기 제1 및 제2 분류기로 각각 입력하고, 상기 제1 및 제2 분류기가 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득한 후, 상기 획득된 각 식별 결과를 토대로, 상기 제1 분류기는 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 정상 식별하고, 상기 제2 분류기는 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 오식별하도록 적대 데이터를 최적화하는 변환기를 포함하는 것을 특징으로 한다.A hostile example generating apparatus in a deep learning environment according to an aspect of the present invention includes first and second classifiers learned according to Deep Learning to identify a class of input data through a Neural Network, The first classifier and the second classifier input antialiased data (adversarial example) transformed from the data into the first classifier and the second classifier, respectively, and classify the class of the original data based on the hostile data inputted by the first classifier and the second classifier The first classifier normally identifies the class of the original data on the basis of the inputted hostile data based on each of the obtained identification results after acquiring the respective identification results, And a converter for optimizing the hostile data to misidentify the class of the original data.

본 발명에 있어 상기 변환기는, 상기 원본 데이터로부터의 왜곡 정도를 더 고려하여 적대 데이터를 최적화하는 것을 특징으로 한다.In the present invention, the converter optimizes the hostile data while further considering the degree of distortion from the original data.

본 발명에 있어 상기 원본 데이터 및 적대 데이터는 각각 이미지 데이터이고, 상기 왜곡 정도는 상기 원본 데이터 및 적대 데이터 간의 픽셀 거리를 나타내는 것을 특징으로 한다.In the present invention, the original data and the hostile data are each image data, and the degree of distortion represents a pixel distance between the original data and the hostile data.

본 발명에 있어 상기 변환기는, 상기 제1 분류기가 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 정상 식별하는 확률이 증가하고, 상기 제2 분류기가 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 오식별하는 확률이 증가하며, 상기 원본 데이터로부터의 왜곡 정도가 감소하도록 적대 데이터를 최적화하는 것을 특징으로 한다.In the present invention, the transformer increases the probability that the first classifier normally identifies the class of the original data based on the inputted hostile data, and when the second classifier detects the class of the original data The probability of misidentifying the class increases, and the hostile data is optimized so as to reduce the degree of distortion from the original data.

본 발명에 있어 상기 변환기는, 제1 내지 제3 손실 함수를 합산한 총 손실 함수의 출력값을 최소화시키는 방법을 이용하여 적대 데이터를 최적화하되, 상기 제1 손실 함수는 상기 제1 분류기가 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 정상 식별하는 확률이 높을수록 더 낮은 값을 출력하고, 상기 제2 손실 함수는 상기 제2 분류기가 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 오식별하는 확률이 높을수록 더 낮은 값을 출력하며, 상기 제3 손실 함수는 상기 원본 데이터 대비 적대 데이터의 왜곡 정도가 낮을수록 더 낮은 값을 출력하는 것을 특징으로 한다.In the present invention, the converter optimizes the hostile data using a method of minimizing the output value of the total loss function by summing the first to third loss functions, wherein the first loss function is a function of And outputs the lower value as the probability of normally identifying the class of the original data based on the data is higher, and the second loss function outputs the lower value of the class of the original data based on the hostile data inputted by the second classifier The third loss function outputs a lower value as the degree of distortion of the hostile data is lower than the original data.

본 발명에 있어 상기 변환기는, 상기 획득된 각 식별 결과를 토대로 적대 데이터를 갱신하여 상기 제1 및 제2 분류기로 각각 입력하는 과정과, 상기 제1 및 제2 분류기가 갱신된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득하여 적대 데이터를 갱신하는 과정을 반복적으로 수행하여 적대 데이터를 최적화하는 것을 특징으로 한다.In the present invention, the converter updates enemy data based on the obtained identification results and inputs them to the first and second classifiers. The first and second classifiers are updated based on the updated hostile data And acquiring each identification result that identifies each of the classes of the original data and updating the hostile data is repeatedly performed to optimize the hostile data.

본 발명에 있어 상기 변환기는, 상기 제2 분류기가 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스 이외의 어느 하나의 특정 클래스를 식별하도록 적대 데이터를 최적화하는 것을 특징으로 한다.In the present invention, the converter optimizes the hostile data so that the second classifier identifies any one class other than the class of the original data based on the hostile data inputted.

본 발명에 있어 상기 변환기는, 상기 제2 분류기가 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스 이외의 하나 이상의 불특정 클래스를 식별하도록 적대 데이터를 최적화하는 것을 특징으로 한다.In the present invention, the converter optimizes the hostile data so as to identify one or more unspecified classes other than the class of the original data based on the hostile data inputted by the second classifier.

본 발명의 일 측면에 따른 딥 러닝 환경에서의 적대적 예제 생성 방법은 변환기가, 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 딥 러닝(Deep Learning)에 따라 학습된 제1 및 제2 분류기로, 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 각각 입력하는 단계, 상기 변환기가, 상기 제1 및 제2 분류기가 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득하는 단계, 및 상기 변환기가, 상기 획득된 각 식별 결과를 토대로, 상기 제1 분류기는 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 정상 식별하고, 상기 제2 분류기는 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 오식별하도록 적대 데이터를 갱신하는 단계를 포함하는 것을 특징으로 한다.A hostile example generation method in a deep learning environment according to an aspect of the present invention is characterized in that the converter includes a first classifier and a second classifier that are learned according to Deep Learning to identify a class of input data through a Neural Network Inputting, respectively, antialiased data (adversarial example) transformed from original data, wherein the transformer identifies the class of the original data based on the hostile data inputted by the first and second classifiers, Wherein the first classifier normally identifies the class of the original data on the basis of the hostile data inputted, and the second classifier identifies the class of the original data on the basis of the inputted identification result, And updating the hostile data so as to erroneously identify the class of the original data based on the hostile data do.

본 발명의 일 측면에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치는 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 딥 러닝(Deep Learning)에 따라 학습된 제1 및 제2 분류 모델로, 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 각각 입력하고, 상기 제1 및 제2 분류 모델이 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득한 후, 상기 획득된 각 식별 결과를 토대로, 상기 제1 분류 모델은 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 정상 식별하고, 상기 제2 분류 모델은 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 오식별하도록 적대 데이터를 최적화하는 것을 특징으로 한다.A hostile example generating apparatus in a deep learning environment according to an aspect of the present invention is a first and a second classification model that are learned according to Deep Learning to identify a class of input data through a Neural Network, Inputting the converted anti-host data (adversarial example) from the original data, and acquiring each identification result that identifies the class of the original data based on the hostile data inputted by the first and second classification models The first classification model normally identifies the class of the original data on the basis of the input hostile data and the second classification model identifies the original data on the basis of the input hostile data, To optimize hostile data so as to misidentify the class of the host.

본 발명의 일 측면에 따른 컴퓨터 프로그램은 하드웨어와 결합되어, 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 각각 기계 학습(Machine Learning)된 제1 및 제2 분류 모델로, 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 각각 입력하는 단계, 상기 제1 및 제2 분류 모델이 입력된 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득하는 단계, 및 상기 획득된 각 식별 결과를 토대로, 상기 제1 분류 모델은 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 정상 식별하고, 상기 제2 분류 모델은 입력되는 적대 데이터를 기반으로 상기 원본 데이터의 클래스를 오식별하도록 적대 데이터를 갱신하는 단계를 실행시키기 위하여 매체에 저장된 것을 특징으로 한다.A computer program according to an aspect of the present invention is a first and a second classification model, each of which is machine-learned to identify a class of input data through a neural network, Obtaining antidotal data (hostile example, adversarial example), obtaining each identification result that identifies each class of the original data based on the hostile data inputted by the first and second classification models, and The first classification model normally identifies the class of the original data on the basis of the input hostile data and the second classification model identifies the class of the original data based on the input hostile data, And updating the hostile data to erroneously identify the hostile data.

본 발명의 일 측면에 따르면, 전장 상황에서 아군의 심층 신경망에 대하여는 원본 데이터가 정상적으로 식별되고, 적군의 심층 신경망에 대하여는 원본 데이터가 오식별되는 적대적 예제를 생성하는 방법을 군 과학화 장비에 적용함으로써 방위 산업의 발전에 기여할 수 있다.According to an aspect of the present invention, a method of generating a hostile example in which original data is normally identified for a friendly neural network in the battlefield, and the original data is misidentified for the enemy's neural network, It can contribute to the development of industry.

도 1은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치를 설명하기 위한 블록도이다.
도 2 및 도 3은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치에서 Targeted Scheme 알고리즘과 Untargeted Scheme 알고리즘을 도시한 도면이다.
도 4 내지 도 7은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치에서 Targeted Scheme을 적용한 결과를 도시한 예시도이다.
도 8 및 도 9는 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치에서 Untargeted Scheme을 적용한 결과를 도시한 예시도이다.
도 10은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 방법을 설명하기 위한 흐름도이다.
도 11은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 방법에서 적대 데이터가 최적화되는 과정을 설명하기 위한 흐름도이다.1 is a block diagram illustrating an apparatus for generating a hostile example in a deep learning environment according to an embodiment of the present invention.
FIG. 2 and FIG. 3 are diagrams illustrating a Targeted Scheme algorithm and an Untargeted Scheme algorithm in a hostile example generating apparatus in a deep learning environment according to an embodiment of the present invention.
FIGS. 4 to 7 are diagrams illustrating examples of application of Targeted Scheme in a hostile example generating apparatus in a deep learning environment according to an embodiment of the present invention. FIG.
FIG. 8 and FIG. 9 illustrate examples of applying the untargeted scheme in a hostile example generating apparatus in a deep learning environment according to an embodiment of the present invention.
10 is a flowchart illustrating a method for generating a hostile example in a deep learning environment according to an embodiment of the present invention.
11 is a flowchart illustrating a process of optimizing hostile data in a hostile example generation method in a deep learning environment according to an embodiment of the present invention.

이하, 첨부된 도면들을 참조하여 본 발명에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치 및 방법, 컴퓨터 프로그램의 실시예를 설명한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of an apparatus and method for generating a hostile example in a deep learning environment and a computer program according to the present invention will be described with reference to the accompanying drawings. In this process, the thicknesses of the lines and the sizes of the components shown in the drawings may be exaggerated for clarity and convenience of explanation. In addition, the terms described below are defined in consideration of the functions of the present invention, which may vary depending on the intention or custom of the user, the operator. Therefore, definitions of these terms should be made based on the contents throughout this specification.

도 1은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치를 설명하기 위한 블록도이고, 도 2 및 도 3은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치에서 Targeted Scheme 알고리즘과 Untargeted Scheme 알고리즘을 도시한 도면이며, 도 4 내지 도 7은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치에서 Targeted Scheme을 적용한 결과를 도시한 예시도이고, 도 8 및 도 9는 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치에서 Untargeted Scheme을 적용한 결과를 도시한 예시도이다.FIG. 1 is a block diagram illustrating an apparatus for generating a hostile example in a deep learning environment according to an exemplary embodiment of the present invention. FIGS. 2 and 3 illustrate a hostile example generation in a deep learning environment according to an exemplary embodiment of the present invention. FIGS. 4 to 7 are diagrams illustrating a result of applying a Targeted Scheme in an apparatus for generating a hostile example in a deep learning environment according to an embodiment of the present invention. FIG. And FIGS. 8 and 9 are views illustrating the result of applying an untargeted scheme in a hostile example generating apparatus in a deep learning environment according to an embodiment of the present invention.

본 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 장치(AEG: Adversarial Example Generator)는 어느 하나의 심층 신경망에 대하여는 원본 데이터가 정상적으로 식별되고, 다른 하나의 심층 신경망에 대하여는 원본 데이터가 오식별되는 적대적 예제를 생성하는 구성을 구현하기 위해, 도 1에 도시된 것과 같이 변환기(Transformer), 제1 분류기(Discriminator_friend: D_friend), 및 제2 분류기(Discriminator_enemy: D_enemy)를 포함하는 네트워크 아키텍처를 제안한다. 변환기(Transformer), 제1 분류기(D_friend) 및 제2 분류기(D_enemy)는 마이크로프로세서(microprocessor) 또는 마이크로컨트롤러(microcontroller)와 같은 소정의 컴퓨팅 디바이스(computing device)로 구현될 수 있다.The Adversarial Example Generator (AEG) in the deep learning environment according to the present embodiment is an adversarial example generator in which the original data is normally identified for one of the deep neural networks and the original data is misidentified for the other neural network In order to implement the configuration for generating the example, a network architecture including a transformer, a Discriminator _friend (D _friend ), and a second classifier (Discriminator _enemy : D _enemy ) I suggest. Converter (Transformer), the first classifier (D _friend) and the second classifier (D _enemy) may be implemented in a given computing device (computing device), such as a microprocessor (microprocessor) or a micro controller (microcontroller).

설명의 편의를 위해 본 실시예에서 제1 분류기(D_friend)는 후술할 적대 데이터(적대적 예제, Adversarial Example)가 입력된 경우 원본 데이터의 클래스를 정상 식별하는 분류기(예: 아군의 자율 주행 차량에 적용되는 분류기)로, 그리고 제2 분류기(D_enemy)는 적대 데이터가 입력된 경우 원본 데이터의 클래스를 오식별하는 분류기(예: 적군의 자율 주행 차량에 적용되는 분류기)로 설명한다.For convenience of explanation, in the present embodiment, the first classifier (D _friend ) is a classifier for normally identifying the class of the original data (for example, a friend in an autonomous driving vehicle when the host enemy data (adversarial example) And the second classifier (D _enemy ) is described as a classifier (eg, a classifier applied to enemy autonomous vehicles) that misidentifies the class of original data when hostile data is entered.

제1 및 제2 분류기(D_friend, D_enemy)는 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 딥 러닝(Deep Learning)에 따라 학습된 분류기로 기능할 수 있으며, 제1 및 제2 분류기(D_friend, D_enemy)에 적용된 신경망은 합성곱 신경망(CNN: Convolutional Neural Network)일 수 있다. 제1 및 제2 분류기(D_friend, D_enemy)의 동작 함수(Operation Function)를 각각 f_friend(x) 및 f_enemy(x)라 한다면, 제1 및 제2 분류기(D_friend, D_enemy)는 하기 수학식 1을 만족하도록 학습될 수 있다.The first and second classifiers (D _friend , D _enemy ) can function as a classifier learned according to Deep Learning to identify the class of input data through a Neural Network, and the first and second classifiers The neural network applied to the classifier (D _friend , D _enemy ) may be a Convolutional Neural Network (CNN). If the operation functions of the first and second classifiers (D _friend and D _enemy ) are f _friend (x) and f _enemy (x), respectively, the first and second classifiers (D _friend , D _enemy ) Can be learned to satisfy the following equation (1).

수학식 1에서 x는 입력 데이터, X는 입력 데이터의 전체 집합, y는 입력 데이터의 클래스, Y는 입력 데이터의 클래스의 전체 집합을 의미한다.In Equation (1), x denotes input data, X denotes a whole set of input data, y denotes a class of input data, and Y denotes a whole set of classes of input data.

변환기(Transformer)는 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 제1 및 제2 분류기(D_friend, D_enemy)로 각각 입력하고, 입력된 적대 데이터를 기반으로 원본 데이터의 클래스를 제1 및 제2 분류기(D_friend, D_enemy)가 각각 식별한 각 식별 결과를 획득한 후, 획득된 각 식별 결과를 토대로, 제1 분류기(D_friend)는 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하고, 제2 분류기(D_enemy)는 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하도록 적대 데이터를 최적화할 수 있다.The transformer inputs the transformed antivirus data (adversarial example) from the original data into the first and second classifiers (D _friend and D _enemy ), and classifies the original data based on the inputted hostile data the first and second classifiers (D _friend, D _enemy) is then obtained for each identification result of identifying, respectively, the first classifier (D _friend) on the basis of each of the obtained identification result is the original data based on the antagonistic data input , And the second classifier (D _enemy ) can optimize the hostile data to misidentify the class of original data based on the incoming hostile data.

이때, 변환기(Transformer)는 원본 데이터로부터의 왜곡 정도를 더 고려하여 적대 데이터를 최적화할 수 있으며, 구체적으로는 원본 데이터로부터의 왜곡 정도가 최소가 되도록 적대 데이터를 최적화할 수 있다.At this time, the transformer can optimize the hostile data in consideration of the degree of distortion from the original data. Specifically, the hostile data can be optimized such that the degree of distortion from the original data is minimized.

즉, 본 실시예는 변환기(Transformer)를 통해, 제1 분류기(D_friend)가 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하고, 제2 분류기(D_enemy)가 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하는 범위에서 원본 데이터로부터의 왜곡 정도가 최소가 되는 적대 데이터를 생성하여 적대 데이터를 최적화하는 것을 특징적 구성으로 한다.That is, the present embodiment allows a first classifier (D _friend ) to normally identify a class of original data based on hostile data, and a second classifier (D _enemy ) to identify, via a transformer, The enemy data is optimized so as to minimize the degree of distortion from the original data in the range of misidentifying the class of the enemy data.

여기서, 제2 분류기(D_enemy)가 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하도록 적대 데이터를 최적화시키는 구성은, ⅰ)제2 분류기(D_enemy)가 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스 이외의 어느 하나의 특정 클래스(Targeted Class)를 식별하도록 적대 데이터를 최적화시키는 구성(Targeted Scheme), 및 ⅱ)제2 분류기(D_enemy)가 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스 이외의 하나 이상의 불특정 클래스(Untargeted Class)를 식별하도록 적대 데이터를 최적화시키는 구성(Untargeted scheme)으로 구분될 수 있다.Here, the second classifier (D _enemy) is configured to optimize the antagonistic data to misidentification the class of the original data based on the antagonistic data, ⅰ) a second sorter (original data in the base hostile data input D _enemy) (Targeted Scheme) for optimizing the hostile data so as to identify any one specific class (Targeted Class) other than the class of the original data based on the hostile data to which the second classifier (D _enemy ) is input And an untargeted scheme that optimizes the hostile data to identify one or more untargeted classes of the host.

전술한 Targeted Scheme 및 Untargeted Scheme을 수행하는 변환기(Transformer)의 동작을 수학식으로 표현하면 각각 수학식 2 및 3과 같다.The operations of the transformer for performing the Targeted Scheme and Untargeted Scheme are expressed by Equations (2) and (3), respectively.

수학식 2 및 3에서, x는 원본 데이터, x는 원본 데이터의 클래스, x*은 적대 데이터, y*은 특정 클래스(Targeted Class)를 의미하고, L(·)은 원본 데이터 x 및 적대 데이터 x*의 왜곡 정도를 계산하는 함수로서, 이미지 데이터로 구현될 수 있는 원본 데이터 및 적대 데이터 간의 픽셀 거리를 평균 제곱 오차(MSE: Mean Square Error)로 계산하여 왜곡 정도로서 출력하는 함수를 의미한다.In Equations 2 and 3, x denotes original data, x denotes a class of original data, x * denotes hostile data, y * denotes a specific class (Targeted Class), L (·) denotes original data x and hostile data x * Denotes a function for calculating the pixel distance between original data and hostile data that can be implemented with image data as a mean square error (MSE) and outputting the distortion distance.

전술한 구성에 기초하여, 이하에서는 변환기(Transformer)가 적대 데이터를 최적화하는 구성을 구체적으로 설명한다.Based on the above-described configuration, a configuration in which a transformer optimizes hostile data will be described in detail below.

먼저, 변환기(Transformer)는 원본 데이터와, 원본 데이터의 클래스(이하 원본 클래스)를 최초 입력받아 변환하여 적대 데이터를 생성할 수 있다. 본 실시예에서 변환기(Transformer)는 하기 수학식 4에 의해 적대 데이터를 생성할 수 있다.First, the transformer can initially input the original data and the class of the original data (hereinafter referred to as the original class), and generate the hostile data. In this embodiment, the transformer can generate hostile data by the following equation (4).

수학식 4에서 w는 모디파이어(modifier)로서, 후술할 것과 같이 적대 데이터는 w값의 갱신에 따라 갱신되어 최적화될 수 있다.In Equation (4), w is a modifier, and the hostile data can be updated and optimized in accordance with the update of the w value as will be described later.

원본 데이터 및 원본 클래스가 최초 입력되어 변환기(Transformer)에 의해 적대 데이터가 생성되면, 변환기(Transformer)는 생성된 적대 데이터를 제1 및 제2 분류기(D_friend, D_enemy)로 각각 입력하고, 제1 및 제2 분류기(D_friend, D_enemy)의 각 식별 결과를 획득하여 전술한 모디파이어(w)를 갱신하는 과정을 반복함으로써 적대 데이터를 최적화할 수 있다.When the original data and the original class are first input and hostile data is generated by the transformer, the transformer inputs the generated hostile data to the first and second classifiers (D _friend and D _enemy ) 1 and the second classifier (D _friend , D _enemy ), and updating the above-described modifier (w) is repeated to optimize the hostile data.

이때, 변환기(Transformer)는 제1 분류기(D_friend)가 입력된 적대 데이터를 기반으로 원본 클래스를 정상 식별하는 확률이 증가하고, 제2 분류기(D_enemy)가 입력된 적대 데이터를 기반으로 원본 클래스를 오식별하는 확률이 증가하며, 원본 데이터로부터의 왜곡 정도가 감소하도록 적대 데이터를 최적화할 수 있으며, 이를 위해 본 실시예는 제1 내지 제3 손실 함수를 이용하는 구성을 채용한다.At this time, the transformer increases the probability that the first classifier (D _friend ) normally identifies the original class based on the inputted hostile data, and the second classifier (D _enemy ) And it is possible to optimize the hostile data such that the degree of distortion from the original data decreases. To this end, the present embodiment employs a configuration using the first through third loss functions.

즉, 변환기(Transformer)는 제1 내지 제3 손실 함수를 합산한 총 손실 함수의 출력값을 최소화시키는 방법을 이용하여 적대 데이터를 최적화할 수 있다. 이때, 제1 손실 함수는 제1 분류기(D_friend)가 입력된 적대 데이터를 기반으로 원본 클래스를 정상 식별하는 확률이 높을수록 더 낮은 값을 출력하고, 제2 손실 함수는 제2 분류기(D_enemy)가 입력된 적대 데이터를 기반으로 원본 클래스를 오식별하는 확률이 높을수록 더 낮은 값을 출력하며, 제3 손실 함수는 원본 데이터 대비 적대 데이터의 왜곡 정도가 낮을수록 더 낮은 값을 출력하도록 설계된 함수를 의미한다.That is, the transformer can optimize the hostile data using a method of minimizing the output value of the total loss function by summing the first to third loss functions. In this case, the first loss function first classifier (D _friend) the higher the probability that normal identify the original class based on the antagonistic data input output a lower value, and the second loss function of the second classifier (D _enemy ) Outputs a lower value as the probability of misidentifying the original class based on inputted hostile data is higher, and the third loss function is a function designed to output a lower value as the degree of distortion of hostile data is lower than that of original data .

구체적으로, 변환기(Transformer)는 하기 수학식 5에 따른 총 손실 함수의 출력값이 최소가 되도록 하는 모디파이어(w)를 결정하여 적대 데이터를 갱신할 수 있다.Specifically, the transformer can update the hostile data by determining a modifier w that minimizes the output value of the total loss function according to Equation (5).

수학식 5에서 Loss_T는 총 손실 함수, Loss_friend는 제1 손실 함수, Loss_enemy는 제2 손실 함수, Loss_distortion은 제3 손실 함수를 의미한다.In Equation 5, Loss _T denotes a total loss function, Loss _friend denotes a first loss function, Loss _enemy denotes a second loss function, and Loss _distortion denotes a third loss function.

먼저, 제1 손실 함수에 대하여 설명하면, 수학식 2 또는 3의 f_friend(x*) = y 텀을 만족하기 위해서는 제1 손실 함수의 출력값을 최소화시킬 필요가 있다. 제1 손실 함수는 하기 수학식 6으로 표현될 수 있다.First, to describe the first loss function, it is necessary to minimize the output value of the first loss function in order to satisfy f _friend (x *) = y term in Equation 2 or 3. [ The first loss function can be expressed by the following equation (6).

수학식 6에서, org는 원본 클래스, Z(k)_i는 제1 분류기(D_friend)가 입력 데이터 k의 클래스를 i로 식별하는 확률을 의미한다. 따라서, 수학식 6에 따른 제1 손실 함수, 즉 Loss_friend의 출력값이 최소화되는 경우, f_friend(x*) = y가 만족될 확률이 증가하게 된다.In Equation (6), org represents the original class, and Z (k) _i represents the probability that the first classifier (D _friend ) identifies the class of input data k as i. Therefore, when the first loss function according to Equation (6), i.e., the output value of the loss _friend is minimized, the probability that f _friend (x *) = y is satisfied is increased.

제2 손실 함수는 전술한 Targeted Scheme 및 Untargeted Scheme의 두 가지 경우로 구분될 수 있다.The second loss function can be divided into two cases: Targeted Scheme and Untargeted Scheme.

Targeted Scheme의 경우, 수학식 2의 f_enemy(x*) = y* 텀을 만족시키기 위해서는 제2 손실 함수의 출력값을 최소화시킬 필요가 있다. Targeted Scheme의 경우 제2 손실 함수는 하기 수학식 7로 표현될 수 있다.In the case of Targeted Scheme, it is necessary to minimize the output value of the second loss function in order to satisfy f _enemy (x *) = y * term in Equation (2). In the case of the targeted scheme, the second loss function can be expressed by Equation (7).

수학식 7에서, t는 특정 클래스(즉, y*), Z(k)_i는 제2 분류기(D_enemy)가 입력 데이터 k의 클래스를 i로 식별하는 확률을 의미한다. 따라서, 수학식 7에 따른 제2 손실 함수, 즉 Loss_enemy의 출력값이 최소화되는 경우, f_enemy(x*) = y*이 만족될 확률이 증가하게 된다.In Equation (7), t represents a specific class (i.e., y *), Z (k) _i represents the probability that the second classifier (D _enemy ) identifies the class of input data k as i. Therefore, when the second loss function according to Equation (7), i.e., the output value of the loss _enemy is minimized, the probability that f _enemy (x *) = y * is satisfied is increased.

Untargeted Scheme의 경우, 수학식 3의 f_enemy(x*) ≠ y 텀을 만족시키기 위해서는 제2 손실 함수의 출력값을 최소화시킬 필요가 있다. Targeted Scheme의 경우 제2 손실 함수는 하기 수학식 8로 표현될 수 있다.In the case of the untargeted scheme, it is necessary to minimize the output value of the second loss function in order to satisfy f _enemy (x *) ≠ y term in Equation (3). In the case of the targeted scheme, the second loss function can be expressed by the following equation (8).

수학식 8에서, org는 원본 클래스, Z(k)_i는 제2 분류기(D_enemy)가 입력 데이터 k의 클래스를 i로 식별하는 확률을 의미한다. 따라서, 수학식 8에 따른 제2 손실 함수, 즉 Loss_enemy의 출력값이 최소화되는 경우, f_enemy(x*) ≠ y이 만족될 확률이 증가하게 된다.In Equation (8), org denotes the original class, and Z (k) _i denotes the probability that the second classifier (D _enemy ) identifies the class of the input data k as i. Therefore, when the second loss function according to Equation (8), i.e., the output value of the loss _enemy is minimized, the probability that f _enemy (x *)? Y is satisfied is increased.

제3 손실 함수는 원본 데이터 대비 적대 데이터의 왜곡 정도가 낮을수록 더 낮은 값을 출력하도록 설계되며, 하기 수학식 9로 표현될 수 있다.The third loss function is designed to output a lower value as the distortion degree of the hostile data is lower than the original data, and can be expressed by the following equation (9).

정리하면, 수학식 6 내지 9에 따른 수학식 5의 총 손실 함수의 출력값이 최소가 되는 경우, 제1 분류기(D_friend)가 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하고, 제2 분류기(D_enemy)가 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하는 범위에서 원본 데이터로부터의 왜곡 정도가 최소가 되는 적대 데이터를 생성할 수 있으며, 따라서 변환기(Transformer)는 수학식 5의 총 손실 함수의 출력값이 최소가 되는 모디파이어(w)를 결정하여 적대 데이터를 갱신할 수 있다.In summary, when the output value of the total loss function of Equation (5) according to Equations (6) to (9) is minimized, the first classifier (D _friend ) normally identifies the class of the original data based on the hostile data, It is possible to generate hostile data in which the degree of distortion from the original data is minimized within a range in which the _{enemy enemy} misidentifies the class of the original data based on the hostile data so that the transformer calculates the total loss It is possible to determine the modifier w that minimizes the output value of the function and update the hostile data.

본 실시예의 변환기(Transformer)는 전술한 과정을 반복적으로 수행하여, 즉 적대 데이터를 갱신하는 과정을 반복적으로 수행하여 적대 데이터를 최적화할 수 있다. 즉, 변환기(Transformer)는 제1 및 제2 분류기(D_friend, D_enemy)로부터 획득된 각 식별 결과를 토대로 적대 데이터를 갱신하여 제1 및 제2 분류기(D_friend, D_enemy)로 각각 입력하는 과정과, 제1 및 제2 분류기(D_friend, D_enemy)가 갱신된 적대 데이터를 기반으로 원본 클래스를 각각 식별한 각 식별 결과를 획득하여 적대 데이터를 갱신하는 과정을 반복적으로 수행하여 적대 데이터를 최적화할 수 있다. 도 2 및 도 3은 변환기(Transformer)가 수행하는 알고리즘을 Targeted Scheme과 Untargeted Scheme으로 구분하여 구체적으로 정리하여 도시한 것이다. 도 2 및 도 3에서 x는 원본 데이터(original sample), y는 원본 클래스(original class), y*는 특정 클래스(targeted class), n은 적대 데이터 갱신 반복 회수(number of interations)를 의미한다.The transformer of the present embodiment can repeatedly perform the above-described process, that is, it can repeatedly perform the process of updating the hostile data to optimize the hostile data. That is, the converter (Transformer) are respectively input to the first and second classifier first and second classifiers (D _friend, D _enemy) The updates the hostile data on the basis of each of the identification result obtained from (D _friend, D _enemy) And the first and second classifiers (D _friend and D _enemy ) acquire each identification result that identifies the original class based on the updated hostile data, and repeatedly perform the process of updating the hostile data, It can be optimized. FIG. 2 and FIG. 3 show the algorithms performed by the transformer by dividing them into a targeted scheme and an untargeted scheme. 2 and 3, x denotes an original sample, y denotes an original class, y * denotes a targeted class, and n denotes a number of intervening updates of hostile data.

도 4 내지 도 9는, "0" 내지 "9"의 클래스를 가지며 0 내지 9의 수기 이미지 데이터(hand-written digit images)인 MNIST를 원본 데이터로 사용하여 전술한 본 실시예의 구성을 적용한 결과를 도시하고 있다. 이하에서 표기하는 제1 분류기 정확도(D_friend Accuracy)는 제1 분류기(D_friend)가 적대 데이터를 기반으로 출력하는 클래스 및 원본 클래스 간의 일치율을 의미하고, 공격 성공율(Attack Success Rate)은 Targeted Scheme의 경우 제2 분류기(D_enemy)가 적대 데이터를 기반으로 출력하는 클래스 및 특정 클래스(Targeted Class) 간의 일치율을, Untargeted Scheme의 경우 제2 분류기(D_enemy)가 적대 데이터를 기반으로 출력하는 클래스 및 원본 클래스 간의 불일치율을 의미하며, 왜곡량(distortion)은 평균 제곱 오차(MSE: Mean Square Error) 등의 방법으로 계산된, 원본 데이터 및 적대 데이터 간의 픽셀 거리를 의미하는 것으로 정의한다.Figs. 4 to 9 show the results of applying the configuration of the present embodiment described above using MNIST, which has classes of " 0 " to " 9 " and handwritten digit images of 0 to 9, Respectively. The first _friend classifier accuracy (D _friend accuracy) indicates the concordance rate between the class and the original class that the first classifier (D _friend ) outputs based on the hostile data, and the attack success rate is the target class (D _enemy ) based on the hostile data and the original class (Targeted Class) in the case of the untargeted scheme, and the second classifier (D _enemy ) And the distortion is defined as meaning the pixel distance between the original data and the hostile data calculated by a method such as a mean square error (MSE).

도 4 내지 도 7은 Targeted Scheme을 적용한 결과를 나타낸다.FIGS. 4 to 7 show results of applying the Targeted Scheme.

도 4는 원본 데이터 0 내지 9에 대하여, 특정 클래스(Targeted Class)를 각각 "0" 내지 "9"로 설정하여 Targeted Scheme을 적용하여 최적화된 적대 데이터의 예시를 나타내며, 도 5은 도 4에서 원본 데이터 7에 대한 적대 데이터를 그 왜곡 정도(Loss_distortion)의 평균치와 함께 도시한 예시를 나타낸다.FIG. 4 shows an example of hostile data optimized by applying Targeted Scheme by setting a specific class (Targeted Class) to "0" to "9" for original data 0 to 9, And the enemy data for the data 7 together with the average of the degree of _distortion thereof (loss _distortion ).

도 6는 "7"의 원본 클래스를 갖는 원본 데이터 7에 대하여, 특정 클래스(Targeted Class)를 "0"으로 설정하여 Targeted Scheme을 적용한 결과를 도시하고 있으며, 제2 분류기(D_enemy)는 적대 데이터를 기반으로 클래스의 스코어가 최대(694)인 클래스 "0"으로 원본 클래스를 오식별하게 되고(특정 클래스(Targeted Class)의 스코어(694)가 원본 클래스의 스코어(693)보다 미소하게 큰 값을 가질 때까지 적대 데이터의 갱신 과정을 반복함으로써 원본 데이터로부터의 왜곡 정도를 최소화하면서 제2 분류기(D_enemy)의 원본 클래스에 대한 오식별을 유도할 수 있다), 제1 분류기(D_friend)는 적대 데이터를 기반으로 클래스의 스코어가 최대(10.5)인 클래스 "7"로 원본 클래스를 정상 식별하게 된다.6 shows a result obtained by applying Targeted Scheme to original data 7 having an original class of " 7 " with a specific class (Targeted Class) set to " 0 ". The second classifier (D _enemy ) (The target class score 694) is slightly larger than the score 693 of the original class (i.e., the score of the target class) by repeating the update procedure of the hostile data can lead to misidentification of the source class of the second classifier (D _enemy) while minimizing the distortion degree from the original data) until it has a first sorter (D _friend) is hostile Based on the data, the original class is normally identified with a class "7" having a maximum score of 10.5.

도 7은 적대 데이터의 갱신 과정 반복 횟수에 따른 제1 분류기 정확도, 공격 성공율 및 왜곡량의 변화를 도시한 그래프로서, 반복 횟수가 증가할수록 제1 분류기 정확도 및 공격 성공율은 증가하고 왜곡량은 감소하는 것을 확인할 수 있다.FIG. 7 is a graph showing the change of the first classifier accuracy, the attack success rate and the distortion amount according to the repetition times of the hostile data. As the number of repetitions increases, the accuracy of the first classifier and the attack success rate are increased and the amount of distortion is decreased .

도 8 및 도 9는 Untargeted Scheme을 적용한 결과를 나타낸다.Figures 8 and 9 show the results of applying the Untargeted Scheme.

도 8은 원본 클래스 "0" 내지 "9"를 갖는 원본 데이터 0 내지 9 각각에 대하여 100개의 적대 데이터를 제2 분류기(D_enemy)에 입력한 경우 원본 클래스 이외의 클래스로 식별된 횟수를 나타낸다. 원본 클래스 "0"을 갖는 원본 데이터 0의 경우, 100개의 적대 데이터를 제2 분류기(D_enemy)에 입력한 결과 클래스 "9"로 오식별된 경우가 35회로 가장 많고, 클래스 "0"으로 정상 식별된 경우는 0회로 나타난 것을 확인할 수 있다.FIG. 8 shows the number of times that 100 enemy data are input to the second classifier (D _enemy ) for each of original data 0 to 9 having original classes "0" to "9" In the case of the original data 0 having the original class "0", there are 35 cases of misidentification as class "9" as a result of inputting 100 hostile data into the second classifier (D _enemy ) If it is identified, it can be confirmed that 0 is displayed.

도 9는 적대 데이터의 갱신 과정 반복 횟수에 따른 제1 분류기 정확도, 공격 성공율 및 왜곡량의 변화를 도시한 그래프로서, 반복 횟수가 증가할수록 제1 분륙디 정확도 및 공격 성공율은 증가하고 왜곡량은 감소하는 것을 확인할 수 있다.FIG. 9 is a graph showing the change of the first classifier accuracy, the attack success rate and the distortion amount according to the repetition times of the hostile data. As the number of repetition increases, the accuracy of the first classifier and the attack success rate are increased, .

하기 표 1은 Targeted Scheme과 Untargted Scheme 각 경우의 적대 데이터 갱신 과정 반복 횟수(Interation count), 최대 왜곡량(Max distortion), 최소 왜곡량(Min distortion) 및 평균 왜곡량(Mean distortion)을 나타낸다.Table 1 shows Interation count, Max distortion, Min distortion and Mean distortion of the hostile data updating procedure in each of the Targeted Scheme and Untargeted Scheme.

Description

Description
Targeted SchemeTargeted Scheme Untargeted SchemeUntargeted Scheme 공격 성공율
(Attack Success Rate)Attack success rate
(Attack Success Rate) 제1 분류기 정확도(D_friend Accuracy)1st _friend accuracy (D _friend accuracy) 공격 성공율
(Attack Success Rate)Attack success rate
(Attack Success Rate) 제1 분류기 정확도(D_friend Accuracy)1st _friend accuracy (D _friend accuracy) 반복 횟수
(Iteration count)Number of repetitions
(Iteration count)
500
500
500
500
300
300
400
400 최대 왜곡량
(Max distortion)Maximum distortion amount
(Max distortion)
6.645
6.645
6.645
6.645
4.016
4.016
3.440
3.440 최소 왜곡량
(Min distortion)Minimum distortion amount
(Min distortion)
0.232
0.232
0.232
0.232
0.249
0.249
0.234
0.234 평균 왜곡량
(Mean distortion)Average distortion amount
(Mean distortion)
2.183
2.183
2.183
2.183
1.788
1.788
1.536
1.536

전술한, 본 실시예의 적용 결과를 토대로 Targeted Scheme과 Untargeted Scheme의 구별 실익을 검토하면, 도 7 및 도 9를 통해 Untargeted Scheme의 경우는 Targeted Scheme의 경우와 달리 특정 클래스(Targeted Class)로 식별되기 위한 제한이 부재하여 제1 분류기 정확도 및 공격 성공율이 더 빠르게 100%에 도달하는 것을 확인할 수 있으며, 그 왜곡도 또한 Untargeted Scheme의 경우가 Targeted Scheme의 경우보다 낮은 것을 표 1을 통해 확인할 수 있다. 따라서, 사용자는 제2 분류기(D_enemy)가 식별해야 하는 클래스에 대한 특정이 요구되지 않고 원본 데이터로부터의 왜곡 최소화가 요구되는 경우 본 실시예의 Untargeted Scheme에 따라 적대 데이터를 최적화할 수 있다.7 and 9, the untargeted scheme is different from the target scheme in that it is identified as a target class (Targeted Scheme) It can be seen that the accuracy of the first classifier and the attack success rate reach 100% more quickly due to the absence of the restriction, and that the distortion is also lower in the untargeted scheme than in the targeted scheme. Thus, the user can optimize the hostile data according to the Untargeted Scheme of the present embodiment when the second classifier (D _enemy ) is not required to identify a class and minimization of distortion from the original data is required.

반면, 제2 분류기(D_enemy)가 사용자가 원하는 특정 클래스(Targeted Class)로 원본 클래스를 오식별하도록 하고자 하는 경우, 본 실시예의 Targeted Scheme에 따라 적대 데이터를 최적화할 수 있다.On the other hand, when the second classifier (D _enemy ) intends to misidentify the original class with a specific class desired by the user, the hostile data can be optimized according to the Targeted Scheme of the present embodiment.

이상에서는 본 실시예에 따른, 딥 러닝 환경에서의 적대적 예제 생성 장치(AEG)를 변환기(Transformer)와 제1 및 제2 분류기(D_friend, D_enemy)로 구분하여 설명하였으나, 실시예에 따라서는 입력 데이터의 클래스를 식별하는데 사용하는 제1 및 제2 분류 모델(즉, 합성곱 신경망)을 기반으로 적대 데이터를 최적화하는 통합된 구성으로 구현될 수도 있다.In the above description, the hostile example generator (AEG) in the deep learning environment according to the present embodiment has been described as being divided into the transformer and the first and second classifiers (D _friend and D _enemy ) May be implemented in an integrated configuration that optimizes hostile data based on first and second classification models (i.e., composite neural networks) that are used to identify classes of input data.

즉, 본 실시예의 딥 러닝 환경에서의 적대적 예제 생성 장치(AEG)는 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 딥 러닝(Deep Learning)에 따라 학습된 제1 및 제2 분류 모델로 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 각각 입력하고, 제1 및 제2 분류 모델이 입력된 적대 데이터를 기반으로 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득한 후, 획득된 각 식별 결과를 토대로, 제1 분류 모델은 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하고, 제2 분류 모델은 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하도록 적대 데이터를 최적화하는 구성으로 구현될 수도 있다.In other words, the hostile example generating apparatus AEG in the deep learning environment of the present embodiment is divided into the first and second classification models that are learned according to Deep Learning to identify the class of the input data through the Neural Network The first and second classification models are respectively input with converted antagonistic data (hostile example) from the original data, and after obtaining each identification result which identifies each class of the original data based on the hostile data inputted, Based on the obtained identification results, the first classification model normally identifies the class of the original data based on the inputted hostile data, and the second classification model identifies the class of the original data based on the inputted hostile data. Or may be implemented in a configuration that optimizes the data.

도 10은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 방법을 설명하기 위한 흐름도이고, 도 11은 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 방법에서 적대 데이터가 최적화되는 과정을 설명하기 위한 흐름도이다.FIG. 10 is a flowchart for explaining a hostile example generation method in a deep learning environment according to an embodiment of the present invention. FIG. 11 is a flowchart illustrating a hostile example generation method in a deep learning environment according to an embodiment of the present invention. Is optimized.

도 10을 참조하여 본 발명의 일 실시예에 따른 딥 러닝 환경에서의 적대적 예제 생성 방법을 설명하면, 먼저 변환기(Transformer)는 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 딥 러닝(Deep Learning)에 따라 학습된 제1 및 제2 분류기(D_friend, D_enemy)로, 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 각각 입력한다(S100). S100 단계에서, 변환기(Transformer)는 원본 데이터와 원본 클래스를 최초 입력받아 전술한 수학식 4를 통해 변환하여 적대 데이터를 생성한 후 제1 및 제2 분류기(D_friend, D_enemy)로 각각 입력한다.Referring to FIG. 10, a method of generating a hostile example in a deep learning environment according to an embodiment of the present invention will be described. First, a transformer performs Deep Learning (Deep Learning) to identify a class of input data through a Neural Network The _enemy data (adversarial example) transformed from the original data is input to the first and second classifiers (D _friend and D _enemy ) learned in accordance with the learning (step S100). In step S100, the transformer first receives the original data and the original class, transforms it through Equation (4), generates hostile data, and inputs the data to the first and second classifiers (D _friend and D _enemy ) .

이어서, 변환기(Transformer)는 제1 및 제2 분류기(D_friend, D_enemy)가 입력된 적대 데이터를 기반으로 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득한다(S200).Then, the transformer obtains each identification result in which the first and second classifiers (D _friend , D _enemy ) respectively identify the class of the original data based on the inputted hostile data (S200).

이어서, 변환기(Transformer)는 S200 단계에서 획득된 각 식별 결과를 토대로, 제1 분류기(D_friend)는 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하고, 제2 분류기(D_enemy)는 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하도록 적대 데이터를 갱신한다(S300).Then, the converter (Transformer) is based on the respective identification result obtained in step S200, the first classifier (D _friend) identifies the normal classes of the original data based on the antagonistic data input, and the second classifier (D _enemy) is The hostile data is updated so as to misidentify the class of the original data based on the hostile data to be input (S300).

S300 단계에서, 변환기(Transformer)는 원본 데이터로부터의 왜곡 정도를 더 고려하여 적대 데이터를 갱신할 수 있고, 또한 제1 분류기(D_friend)가 입력된 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하는 확률이 증가하고, 제2 분류기(D_enemy)가 입력된 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하는 확률이 증가하며, 원본 데이터로부터의 왜곡 정도가 감소하도록 적대 데이터를 갱신할 수 있으며, 구체적으로는, 제1 내지 제3 손실 함수를 합산한 총 손실 함수의 출력값을 최소화시키는 방법을 이용하여 적대 데이터를 갱신할 수 있다. 이에 대한 설명은 전술한 것이므로 구체적인 설명은 생략한다.In step S300, the transformer can update the hostile data with further consideration of the degree of distortion from the original data, and the first classifier (D- _friend ) can identify the class of the original data based on the entered hostile data , The probability that the second classifier (D _enemy ) misidentifies the class of the original data based on the inputted hostile data increases, and the hostile data can be updated such that the degree of distortion from the original data decreases Specifically, the hostile data can be updated using a method of minimizing the output value of the total loss function by summing the first to third loss functions. The description thereof has been given above, so a detailed description thereof will be omitted.

한편, 도 11에 도시된 것과 같이 본 실시예는 S300 단계 이후, 변환기(Transformer)가 S300 단계에서 갱신된 적대 데이터를 제1 및 제2 분류기(D_friend, D_enemy)로 각각 입력하는 S400 단계를 더 포함할 수 있으며, 이에 따라 변환기(Transformer)는 S200 단계, S300 단계 및 S400 단계를 반복적으로 수행하여 적대 데이터를 최적화할 수 있다.11, in step S300, the transformer inputs the hostile data updated in step S300 to the first and second classifiers (D _friend and D _enemy ), respectively, as shown in FIG. 11 So that the transformer can perform the steps S200, S300, and S400 repeatedly to optimize the hostile data.

S300 단계에서 변환기(Transformer)는 제2 분류기(D_enemy)가 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스 이외의 어느 하나의 특정 클래스(Targeted Class)를 식별하도록 적대 데이터를 갱신할 수도 있고(Targeted Scheme), 제2 분류기(D_enemy)가 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스 이외의 하나 이상의 불특정 클래스(Untargeted Class)를 식별하도록 적대 데이터를 갱신할 수도 있다(Untargeted Scheme).In step S300, the transformer may update the hostile data so that the second classifier (D _enemy ) identifies any one specific class (Targeted Class) other than the class of the original data based on the inputted hostile data Scheme, and the second classifier (D _enemy ) may update the hostile data to identify one or more untargeted classes other than the class of original data based on the incoming hostile data.

한편, 본 실시예는 하드웨어와 결합되어, 신경망(Neural Network)을 통해 입력 데이터의 클래스를 식별하도록 각각 기계 학습(Machine Learning)된 제1 및 제2 분류 모델로 원본 데이터로부터 변환된 적대 데이터(적대적 예제, Adversarial Example)를 각각 입력하는 단계, 제1 및 제2 분류 모델이 입력된 적대 데이터를 기반으로 원본 데이터의 클래스를 각각 식별한 각 식별 결과를 획득하는 단계, 획득된 각 식별 결과를 토대로, 제1 분류 모델은 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 정상 식별하고, 제2 분류 모델은 입력되는 적대 데이터를 기반으로 원본 데이터의 클래스를 오식별하도록 적대 데이터를 갱신하는 단계를 실행시키기 위하여 매체에 저장된 컴퓨터 프로그램으로 작성될 수도 있으며, 컴퓨터로 읽을 수 있는 기록매체에 저장되어 상기 컴퓨터 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 컴퓨터로 읽을 수 있는 기록매체에는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크 및 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(carrier wave)(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.This embodiment, on the other hand, combines with the hardware to generate hostile data (hostile data) converted from original data into first and second classification models, respectively, which are machine learned to identify classes of input data through a neural network The first classification model, the second classification model, and the adversarial example), obtaining each identification result that identifies the class of the original data based on the hostile data inputted by the first and second classification models, The first classification model normally identifies the class of the original data based on the input hostile data and the second classification model updates the hostile data to misidentify the class of the original data based on the input hostile data Or may be stored in a computer-readable recording medium, It may be implemented in a general purpose digital computer for operating the computer program. Examples of the computer readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk and an optical data storage device, and a carrier wave (for example, transmission via the Internet) . The computer readable recording medium may also be distributed over a networked computer system so that computer readable code in a distributed manner may be stored and executed.

이와 같이 본 실시예는 전장 상황에서 아군의 심층 신경망에 대하여는 원본 데이터가 정상적으로 식별되고, 적군의 심층 신경망에 대하여는 원본 데이터가 오식별되는 적대적 예제를 생성하는 방법을 군 과학화 장비에 적용함으로써 방위 산업의 발전에 기여할 수 있다.As described above, the present embodiment applies a method of generating a hostile example in which the original data is normally identified for the ally in-depth neural network and the original data is misidentified for the enemy in-depth neural network, It can contribute to development.

본 발명은 도면에 도시된 실시예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며 당해 기술이 속하는 기술분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, I will understand. Accordingly, the true scope of the present invention should be determined by the following claims.

AEG: 적대적 예제 생성 장치
Transformer: 변환기
D_friend: 제1 분류기
D_enemy: 제2 분류기
Targeted Class: 특정 클래스
Untargeted Class: 불특정 클래스AEG: Hostile Example Generator
Transformer: Converter
D _friend : 1st classifier
D _enemy : 2nd classifier
Targeted Class: Specific class
Untargeted Class: Unspecified Class

Claims

A first classifier and a second classifier learned according to Deep Learning to identify a class of input data through a Neural Network; And
The first and second classifiers are respectively inputted into the first classifier and the second classifier, and the classifier classifies the class of the original data based on the hostile data inputted by the first classifier and the second classifier, The first classifier normally identifies the class of the original data on the basis of the inputted hostile data, and the second classifier identifies the hostile data to be inputted on the basis of the acquired identification results, A translator for optimizing hostile data to misidentify the class of original data based on the host data;
Wherein the host computer generates a hostile example in the deep learning environment.

The method according to claim 1,
Wherein the converter optimizes the hostile data in consideration of the degree of distortion from the original data.

3. The method of claim 2,
Wherein the original data and the hostile data are image data, respectively, and the distortion degree indicates a pixel distance between the original data and the hostile data.

3. The method of claim 2,
The transcoder further includes a second classifier for classifying the class of the original data based on the hostile data inputted by the second classifier, wherein the probability that the first classifier normally identifies the class of the original data based on the inputted hostile data increases, And the hostile data is optimized so as to reduce the degree of distortion from the original data.

5. The method of claim 4,
Wherein the converter optimizes the hostile data using a method of minimizing the output of the total loss function summing the first through third loss functions, wherein the first loss function is based on the hostile data input by the first classifier The second loss function outputs a higher probability that the second classifier misidentifies the class of the original data based on the hostile data inputted by the second classifier, And the third loss function outputs a lower value as the degree of distortion of hostile data with respect to the original data is lower.

The method according to claim 1,
Updating the antagonistic data based on the obtained identification results and inputting the updated anti-host data to the first and second classifiers, respectively; Classifying the antagonistic data by obtaining each identification result for each class and updating the hostile data is repeatedly performed to optimize the hostile data.

The method according to claim 1,
Wherein the converter optimizes the hostile data so that the second classifier identifies any one class other than the class of the original data based on the hostile data inputted.

The method according to claim 1,
Wherein the translator optimizes the hostile data so as to identify one or more unspecified classes other than the class of the original data based on hostile data inputted by the second classifier.

The transformer transforms the adversarial data (adversarial example) from the original data into first and second classifiers that are learned according to Deep Learning to identify the class of input data through the Neural Network Respectively;
The transducer obtaining each identification result that identifies the class of the original data based on the hostile data inputted by the first and second classifiers; And
The first classifier normally identifies the class of the original data on the basis of the inputted hostile data based on each of the obtained identification results, and the second classifier classifies the original data Updating the hostile data to misidentify the class of the host;
The method comprising the steps of: generating a hostile example in a deep learning environment;

10. The method of claim 9,
In the step of updating the hostile data,
And the hostile data is updated in consideration of the degree of distortion from the original data.

11. The method of claim 10,
In the step of updating the hostile data,
The probability that the first classifier normally identifies the class of the original data increases based on the inputted hostile data and the probability that the second classifier misidentifies the class of the original data based on the inputted hostile data increases And the hostile data is updated so as to reduce the degree of distortion from the original data.

12. The method of claim 11,
In the step of updating the hostile data,
Updating the hostile data by using a method of minimizing an output value of a total loss function obtained by summing the first to third loss functions, wherein the first loss function is a function of the first classifier, The first loss function outputs a lower value as the probability of normally identifying the class is higher and the second loss function has a lower value as the probability that the second classifier misidentifies the class of the original data based on the inputted hostile data is higher, Wherein the third loss function outputs a lower value as the degree of distortion of the hostile data with respect to the original data is lower.

10. The method of claim 9,
Further comprising the step of the converter translating the updated hostile data into the first and second classifiers, respectively, after the step of updating the hostile data,
Wherein said converter optimizes hostile data by repeatedly performing the steps of acquiring the respective identification results, updating the hostile data, and inputting the updated hostile data. How to create a hostile example.

10. The method of claim 9,
In the step of updating the hostile data,
Wherein the second classifier updates the hostile data so as to identify any one class other than the class of the original data based on the hostile data to which the second classifier is input.

10. The method of claim 9,
In the step of updating the hostile data,
Wherein the second classifier updates the hostile data so as to identify one or more unspecified classes other than the class of the original data based on the hostile data to which the second classifier is input.

The adversarial data (adversarial example) transformed from the original data is input to the first and second classification models learned in accordance with Deep Learning to identify the class of the input data through the neural network Wherein the first classification model is obtained by obtaining each identification result that identifies the class of the original data on the basis of the hostile data inputted by the first and second classification models, Wherein the second classification model normally identifies the class of the original data based on the hostile data to be input and optimizes the hostile data to misidentify the class of the original data based on the hostile data inputted, A hostile example generator in the environment.

Combined with hardware,
Inputting converted hostile data (adversarial example) from the original data into first and second classification models, respectively, which are machine learning to identify classes of input data through a neural network (Neural Network) ;
Obtaining each identification result that identifies the class of the original data based on the hostile data inputted by the first and second classification models; And
The first classification model normally identifies the class of the original data on the basis of the input hostile data and the second classification model identifies the class of the original data based on the input hostile data, Updating the hostile data to misidentify the hostile data;
A computer program stored on a medium for executing the program.