KR102592120B1

KR102592120B1 - Method of unsupervised detection of adversarial example and apparatus using the method

Info

Publication number: KR102592120B1
Application number: KR1020210095390A
Authority: KR
Inventors: 조호묵; 고기혁; 임규민
Original assignee: 한국과학기술원
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2023-10-20
Also published as: KR20230014248A

Abstract

적대적 예제의 비지도 이상 탐지 방법 및 이를 이용하는 장치가 제공된다. 상기 방법은, 인공지능 모델로부터 정상 이미지 데이터셋의 정상 설명 가능한 AI(eXplanable Artificial Intelligence; XAI) 시그니처셋을 생성하는 단계, 상기 정상 XAI 시그니처셋을 오토인코더로 학습하여 압축-복원 모델을 생성하는 단계, 상기 인공지능 모델로부터 대상 이미지의 대상 XAI 시그니처를 생성하는 단계, 및 상기 정상 XAI 시그니처셋을 상기 압축-복원 모델에 통과시켜 획득한 기준 오차와, 상기 대상 이미지의 XAI 시그니처를 상기 압축-복원 모델에 통과시켜 획득한 대상 오차의 비교 결과에 기초하여 상기 대상 이미지의 적대적 예제 여부를 판단하는 단계를 포함한다.An unsupervised anomaly detection method for adversarial examples and a device using the same are provided. The method includes generating an AI (eXplanable Artificial Intelligence; , generating a target XAI signature of the target image from the artificial intelligence model, and a reference error obtained by passing the normal XAI signature set through the compression-decompression model and the It includes determining whether the target image is an adversarial example based on a comparison result of target errors obtained by passing through .

Description

Unsupervised anomaly detection method of adversarial example and device using the same {METHOD OF UNSUPERVISED DETECTION OF ADVERSARIAL EXAMPLE AND APPARATUS USING THE METHOD}

본 발명은 적대적 예제의 비지도 이상 탐지 방법 및 이를 이용하는 장치에 관한 것으로, 더욱 구체적으로는 뉴럴 네트워크에 입력 시 잘못된 출력을 생성할 수 있는 적대적 예제에 대하여 비지도 학습을 이용하여 이미지 데이터의 이상을 탐지할 수 있는 방법 및 이를 이용하는 장치에 관한 것이다.The present invention relates to an unsupervised anomaly detection method of adversarial examples and a device using the same. More specifically, the present invention relates to anomalies in image data using unsupervised learning for adversarial examples that may generate incorrect output when input to a neural network. It relates to a detection method and a device that uses it.

최근 인공지능 기술은 이미지 분류, 자연어 처리 등 다양한 분야에 이용되어 성능을 입증하고 있으나, 인공지능 지능 모델을 생성하는 기계학습 알고리즘에 존재하는 취약점을 이용한 보안 공격 방법 또한 다양하게 연구되고 있다.Recently, artificial intelligence technology has been used in various fields such as image classification and natural language processing and has proven its performance, but various security attack methods using vulnerabilities in machine learning algorithms that create artificial intelligence models are also being studied.

도 1a는 적대적 예제 생성 기법을 이용하여 입력 데이터를 조작함으로써 잘못된 결과값을 도출하는 회피 공격(evasion attack)의 일례를 도시한다. 매우 정교하게 생성되어 정상 데이터와 구분이 쉽지 않은 적대적 예제(adversarial example)는 인공지능 모델의 오작동을 유발함으로써 보안 문제점을 발생시킬 수 있다.Figure 1a shows an example of an evasion attack that produces incorrect results by manipulating input data using an adversarial example generation technique. Adversarial examples, which are so elaborately created that they are difficult to distinguish from normal data, can cause security problems by causing malfunctions in artificial intelligence models.

도 1b는 이러한 적대적 예제에 대한 학습을 통한 문제점 해결 방안을 도시한다. 즉, 사전에 적대적 예제를 미리 생성하고, 이를 딥러닝 모델을 이용하여 학습시킴으로써 인공지능 모델이 정상적 데이터와 적대적 예제를 구분할 수 있도록 하는 것이다.Figure 1b shows a solution to the problem through learning about these adversarial examples. In other words, by creating hostile examples in advance and learning them using a deep learning model, the artificial intelligence model can distinguish between normal data and hostile examples.

그러나 이러한 방법은 새로운 적대적 예제가 나올 때마다 매번 재학습하는 과정이 필요하므로 대응이 어려우며, 적대적 예제를 생성하고 학습하는데 시간 및 비용이 크게 발생할 수 있다. 또한 이렇게 적대적 예제를 학습하더라도 인공지능 모델이 정교하게 만들어진 적대적 예제를 구분하는 것에는 한계가 존재한다는 문제점이 있다.However, this method is difficult to respond to because it requires a re-training process every time a new adversarial example appears, and generating and learning adversarial examples can result in significant time and cost. In addition, even when learning adversarial examples in this way, there is a problem in that there are limits to the ability of the artificial intelligence model to distinguish between elaborately created adversarial examples.

본 발명이 해결하고자 하는 기술적 과제는 비지도 학습을 이용하여 적대적 예제를 효과적으로 구분할 수 있는 탐지 방법 및 이를 이용하는 장치를 제공하는 것이다.The technical problem to be solved by the present invention is to provide a detection method that can effectively distinguish adversarial examples using unsupervised learning and a device using the same.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

상술한 기술적 과제를 해결하기 위한 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법은, 인공지능 모델로부터 정상 이미지 데이터셋의 정상 설명 가능한 AI(eXplanable Artificial Intelligence; XAI) 시그니처셋을 생성하는 단계, 상기 정상 XAI 시그니처셋을 오토인코더로 학습하여 압축-복원 모델을 생성하는 단계, 상기 인공지능 모델로부터 대상 이미지의 대상 XAI 시그니처를 생성하는 단계, 및 상기 정상 XAI 시그니처셋을 상기 압축-복원 모델에 통과시켜 획득한 기준 오차와, 상기 대상 이미지의 XAI 시그니처를 상기 압축-복원 모델에 통과시켜 획득한 대상 오차의 비교 결과에 기초하여 상기 대상 이미지의 적대적 예제 여부를 판단하는 단계를 포함한다.The unsupervised anomaly detection method of adversarial examples according to some embodiments of the present invention to solve the above-mentioned technical problems generates an AI (eXplanable Artificial Intelligence (XAI)) signature set that can normally explain normal image datasets from an artificial intelligence model. A step of generating a compression-decompression model by learning the normal XAI signature set with an autoencoder, generating a target It includes determining whether the target image is an adversarial example based on a comparison result of a reference error obtained by passing the XAI signature of the target image through the model and a target error obtained by passing the XAI signature of the target image through the compression-decompression model.

본 발명의 몇몇 실시예에서, 상기 정상 XAI 시그니처셋을 생성하는 단계는, 상기 정상 이미지 데이터셋으로부터 돌출 맵(saliency map), LIME(Local Interpretable Model-Agnostic Explanation), SHAP(SHapley Additive exPlanations), IG(Intergrated Gradient) 중 적어도 어느 하나를 이용하여 특징값을 얻는 것을 포함할 수 있다.In some embodiments of the present invention, generating the normal It may include obtaining a feature value using at least one of (Integrated Gradient).

본 발명의 몇몇 실시예에서, 상기 정상 XAI 시그니처셋을 생성하는 단계는, 상기 정상 이미지 데이터셋으로부터 돌출 맵셋을 생성하는 것을 포함하고, 상기 돌출 맵셋을 생성하는 것은, 상기 정상 이미지 데이터셋에 포함된 정상 이미지의 상기 인공지능 모델에 대한 픽셀 기여도를 획득하는 것을 포함할 수 있다.In some embodiments of the present invention, generating the normal It may include obtaining pixel contribution of a normal image to the artificial intelligence model.

본 발명의 몇몇 실시예에서, 상기 대상 이미지의 적대적 예제 여부를 판단하는 단계는, 상기 대상 오차와 상기 기준 오차의 차이가 미리 정한 값보다 큰 경우 상기 대상 이미지를 적대적 예제로 판단하는 것을 포함할 수 있다.In some embodiments of the present invention, the step of determining whether the target image is an adversarial example may include determining the target image as an adversarial example when the difference between the target error and the reference error is greater than a predetermined value. there is.

본 발명의 몇몇 실시예에서, 상기 대상 이미지는 상기 정상 이미지 데이터셋에 포함된 이미지에 FGSM(Fast Gradient Sign Method), CW((Carlini & Wagner's), PGD(Projected Gradient Descent) 중 적어도 어느 하나를 이용하여 생성한 노이즈를 합성하여 생성된 이미지를 포함할 수 있다.In some embodiments of the present invention, the target image uses at least one of FGSM (Fast Gradient Sign Method), CW ((Carlini & Wagner's), and PGD (Projected Gradient Descent) on images included in the normal image dataset. It may include an image created by synthesizing the generated noise.

상술한 기술적 과제를 해결하기 위한 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 장치는, 프로세서 및 상기 프로세서에 의해 실행 가능한 명령어들을 저장하는 메모리를 포함하고, 상기 명령어들은 상기 프로세서에 의해 실행되어, 인공지능 모델로부터 정상 이미지 데이터셋의 정상 XAI 시그니처셋을 생성하는 단계, 상기 정상 XAI 시그니처셋을 오토인코더로 학습하여 압축-복원 모델을 생성하는 단계, 상기 인공지능 모델로부터 대상 이미지의 대상 XAI 시그니처를 생성하는 단계 및 상기 정상 XAI 시그니처셋을 상기 압축-복원 모델에 통과시켜 획득한 기준 오차와, 상기 대상 이미지의 XAI 시그니처를 상기 압축-복원 모델에 통과시켜 획득한 대상 오차의 비교 결과에 기초하여 상기 대상 이미지의 적대적 예제 여부를 판단하는 단계를 수행한다.An unsupervised anomaly detection device for adversarial examples according to some embodiments of the present invention for solving the above-described technical problem includes a processor and a memory storing instructions executable by the processor, wherein the instructions are executed by the processor. Executed, generating a normal XAI signature set of a normal image dataset from an artificial intelligence model, learning the normal In the step of generating an XAI signature and the result of comparing the reference error obtained by passing the normal Based on this, a step is performed to determine whether the target image is an adversarial example.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Specific details of other embodiments are included in the detailed description and drawings.

본 발명의 실시예에 따른 본 발명의 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법 및 탐지 장치는, 오토인코더를 이용한 비지도 학습을 통해 적대적 예제의 이상 여부를 탐지할 수 있어 적대적 예제의 생성 및 학습에 필요한 시간 및 비용을 크게 감소시킬 수 있다. The unsupervised anomaly detection method and detection device for adversarial examples according to an embodiment of the present invention can detect anomalies in adversarial examples through unsupervised learning using an autoencoder, thereby generating adversarial examples. And the time and cost required for learning can be greatly reduced.

또한, 정상 이미지 데이터셋으로부터 생성된 정상 XAI 시그니처셋에 이용되는 알고리즘(예를 들어 돌출 맵, LIME, SHAP, IG 등)을 다양하게 적용할 수 있으며, 비지도 학습을 수행함으로써 새로운 적대적 예제에 대한 적응성을 높일 수 있다.In addition, various algorithms (e.g. saliency map, LIME, SHAP, IG, etc.) used in the normal Adaptability can be improved.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 청구범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

도 1a는 적대적 예제 생성 기법을 이용하여 입력 데이터를 조작함으로써 잘못된 결과값을 도출하는 회피 공격(evasion attack)의 일례를 도시한다.
도 1b는 적대적 예제에 대한 학습을 통한 문제점 해결 방안을 도시한다.
도 2는 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 장치를 설명하기 위한 도면이다.
도 3은 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법을 설명하기 위한 순서도이다.
도 4는 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 장치에 의해 수행되는 일련의 이상 탐지 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법의 효과를 설명하기 위한 도면이다.Figure 1a shows an example of an evasion attack that produces incorrect results by manipulating input data using an adversarial example generation technique.
Figure 1b shows a problem solving method through learning on adversarial examples.
Figure 2 is a diagram illustrating an unsupervised anomaly detection device for adversarial examples according to some embodiments of the present invention.
Figure 3 is a flowchart illustrating an unsupervised anomaly detection method for adversarial examples according to some embodiments of the present invention.
FIG. 4 is a diagram illustrating a series of anomaly detection processes performed by an unsupervised anomaly detection device for adversarial examples according to some embodiments of the present invention.
Figure 5 is a diagram illustrating the effectiveness of an unsupervised anomaly detection method for adversarial examples according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and will be implemented in various different forms. The present embodiments only serve to ensure that the disclosure of the present invention is complete and that common knowledge in the technical field to which the present invention pertains is not limited. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

하나의 구성 요소가 다른 구성 요소와 "연결된(connected to)" 또는 "커플링된(coupled to)" 이라고 지칭되는 것은, 다른 구성 요소와 직접 연결 또는 커플링된 경우 또는 중간에 다른 구성 요소를 개재한 경우를 모두 포함한다. 반면, 하나의 구성 요소가 다른 구성 요소와 "직접 연결된(directly connected to)" 또는 "직접 커플링된(directly coupled to)"으로 지칭되는 것은 중간에 다른 구성 요소를 개재하지 않은 것을 나타낸다. "및/또는"은 언급된 아이템들의 각각 및 하나 이상의 모든 조합을 포함한다. One component is said to be “connected to” or “coupled to” another component when it is directly connected or coupled to another component or with an intervening other component. Includes all cases. On the other hand, when one component is referred to as “directly connected to” or “directly coupled to” another component, it indicates that there is no intervening other component. “And/or” includes each and every combination of one or more of the mentioned items.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for describing embodiments and is not intended to limit the invention. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. As used herein, “comprises” and/or “comprising” refers to the presence of one or more other components, steps, operations and/or elements. or does not rule out addition.

비록 제1, 제2 등이 다양한 구성 요소들을 서술하기 위해서 사용되나, 이들 구성 요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성 요소를 다른 구성 요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성 요소는 본 발명의 기술적 사상 내에서 제2 구성 요소 일 수도 있음은 물론이다.Although first, second, etc. are used to describe various components, it goes without saying that these components are not limited by these terms. These terms are merely used to distinguish one component from another. Therefore, of course, the first component mentioned below may also be the second component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings that can be commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined.

도 2는 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 장치를 설명하기 위한 도면이다.Figure 2 is a diagram illustrating an unsupervised anomaly detection device for adversarial examples according to some embodiments of the present invention.

도 2를 참조하면, 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 장치(100)는 프로세서(110), 메모리(120), 인터페이스(130) 및 스토리지(140)를 포함할 수 있다.Referring to FIG. 2, the unsupervised anomaly detection device 100 of an adversarial example according to some embodiments of the present invention may include a processor 110, a memory 120, an interface 130, and a storage 140. .

적대적 예제의 비지도 이상 탐지 장치(100)는 주어진 데이터셋과 대상 데이터를 이용하여 비지도 학습을 수행하고 적대적 예제를 판단하기 위한 연산을 수행할 수 있는 컴퓨팅 장치를 포함할 수 있다. 이러한 컴퓨팅 장치는 예를 들어, PC(Personal Computer), 노트북 PC(notebook PC), 서버 컴퓨터 등을 포함할 수 있으나 이에 제한되지 않으며, 휴대폰, 스마트폰(smart phone), 태블릿 PC(tablet PC) 등을 포함할 수도 있다.The unsupervised anomaly detection device 100 for adversarial examples may include a computing device capable of performing unsupervised learning using a given dataset and target data and performing an operation to determine an adversarial example. Such computing devices may include, but are not limited to, personal computers (PCs), notebook PCs, server computers, etc., mobile phones, smart phones, tablet PCs, etc. It may also include .

프로세서(110)는 적대적 예제의 비지도 이상 탐지 장치(100)의 동작을 제어할 수 있다. 구체적으로, 프로세서(110)는 인터페이스(130)를 통해 획득한 입력 데이터 또는 스토리지(140)에 저장된 입력 데이터의 전처리, 전처리된 데이터를 이용한 인공 신경망의 학습, 학습된 인공 신경망을 통한 적대적 예제의 판단을 위해 데이터 마이닝, 데이터 분석, 지능형 의사 결정, 및 기계 학습 알고리즘을 수행하는 한편, 본 발명의 실시예에 따른 동작 방법을 위해 이용될 정보를 수신, 분류, 저장 및 출력할 수 있다.The processor 110 may control the operation of the unsupervised anomaly detection device 100 of the adversarial example. Specifically, the processor 110 preprocesses input data acquired through the interface 130 or input data stored in the storage 140, learns an artificial neural network using the preprocessed data, and determines adversarial examples through the learned artificial neural network. For this purpose, data mining, data analysis, intelligent decision-making, and machine learning algorithms can be performed, while information to be used for the operation method according to an embodiment of the present invention can be received, classified, stored, and output.

메모리(120)는 프로세서(110)와 통신 가능하게 연결될 수 있다. 메모리(120)는 프로세서(110)에 의해 처리되는 데이터를 일시적으로 저장할 수 있다. The memory 120 may be communicatively connected to the processor 110 . Memory 120 may temporarily store data processed by processor 110.

또한, 메모리(120)는 이후에 설명될 적대적 예제의 비지도 이상 탐지 방법의 수행을 위해 이용되는 인공지능 모델, 압축-복원 모델을 저장하는 한편, 프로세서(110)에 의해 실행 가능한 명령어들을 저장할 수 있다.In addition, the memory 120 stores an artificial intelligence model and a compression-decompression model used to perform an unsupervised anomaly detection method of an adversarial example, which will be described later, while storing instructions executable by the processor 110. there is.

인터페이스(130)는 본 발명의 실시예에 따른 적대적 예제 비지도 이상 탐지 방법의 수행을 위한 기계 학습에 필요한 정상 이미지 데이터셋, 대상 이미지를 입력받는 한편, 프로세서(110)에 의해 수행된 적대적 이미지의 판단 결과를 외부로 출력할 수 있다.The interface 130 receives normal image datasets and target images required for machine learning to perform the adversarial example unsupervised anomaly detection method according to an embodiment of the present invention, while receiving the adversarial image data set performed by the processor 110. The judgment result can be output externally.

스토리지(140)는 적대적 예제의 비지도 이상 탐지 장치(100)의 동작에 필요한 프로그램 및 데이터를 저장할 수 있다. 스토리지(140)는 예를 들어 정상 XAI 시그니처셋을 생성하기 위한 정상 이미지 데이터셋, 정상 이미지 데이터셋으로부터 생성된 적대적 이미지 데이터셋을 저장할 수 있다.The storage 140 may store programs and data necessary for the operation of the unsupervised anomaly detection device 100 for adversarial examples. The storage 140 may store, for example, a normal image dataset for generating a normal XAI signature set and a hostile image dataset generated from the normal image dataset.

도 3은 본 발명의 몇몇 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법을 설명하기 위한 순서도이고, 도 4는 도 3의 방법에 의해 수행되는 일련의 적대적 예제 이상 탐지 과정을 설명하기 위한 도면이다.FIG. 3 is a flowchart illustrating an unsupervised anomaly detection method of adversarial examples according to some embodiments of the present invention, and FIG. 4 is a diagram illustrating a series of adversarial example anomaly detection processes performed by the method of FIG. 3. .

도 3을 참조하면, 본 발명의 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법은 인공지능 모델로부터 정상 이미지 데이터셋의 정상 XAI 시그니처셋을 생성하는 단계(S110), 정상 XAI 시그니처셋을 오토인코더로 학습하여 압축-복원 모델을 생성하는 단계(S120), 인공지능 모델로부터 대상 이미지의 대상 XAI 시그니처를 생성하는 단계(S130), 정상 XAI 시그니처셋으로부터 얻어진 기준 오차와, 대상 이미지로부터 얻어진 대상 오차의 비교 결과에 기초하여 대상 이미지의 적대적 예제 여부를 판단하는 단계(S140), 기준 오차가 대상 오차보다 큰지 여부를 판단하여(S150) 기준 오차가 대상 오차보다 작은 경우 대상 이미지가 적대적 예제임을 판단하는 단계(S160) 및 기준 오차가 대상 오차보다 큰 경우 대상 이미지가 적대적 예제임을 판단하는 단계(S165)를 포함할 수 있다.Referring to FIG. 3, the unsupervised anomaly detection method of an adversarial example according to an embodiment of the present invention includes generating a normal XAI signature set of a normal image dataset from an artificial intelligence model (S110), and generating the normal A step of generating a compression-decompression model by learning (S120), a step of generating the target XAI signature of the target image from the artificial intelligence model (S130), the reference error obtained from the normal A step of determining whether the target image is a hostile example based on the comparison result (S140), determining whether the reference error is greater than the target error (S150), and determining that the target image is a hostile example if the reference error is smaller than the target error. (S160) and, if the reference error is greater than the target error, determining that the target image is an adversarial example (S165).

상술한 방법의 각 단계는, 도 2를 참조하여 설명된 적대적 예제의 비지도 이상 탐지 장치(100), 특히 프로세서(110)에 의하여 수행될 수 있다. 이하 도 3과 도 4를 함께 참조하여 더욱 구체적으로 설명한다.Each step of the above-described method may be performed by the unsupervised anomaly detection device 100, particularly the processor 110, of the adversarial example described with reference to FIG. 2. Hereinafter, a more detailed description will be given with reference to FIGS. 3 and 4.

먼저, 인공지능 모델로부터 정상 이미지 데이터셋의 정상 XAI 시그니처셋을 생성하는 단계(S110)가 수행될 수 있다.First, a step (S110) of generating a normal XAI signature set of a normal image dataset from an artificial intelligence model may be performed.

정상 이미지 데이터셋(210)은, 이후 설명될 적대적 예제 생성 기법이 적용되지 않은 정상적인 이미지 데이터의 집합을 의미할 수 있다. 적대적 예제의 비지도 이상 탐지 장치(100)는, 인터페이스(130)를 통해 정상 이미지 데이터셋(210)을 입력받거나, 스토리지(140)에 저장된 정상 이미지 데이터셋(210)을 이용하여 정상 XAI 시그니처셋(220)을 생성할 수 있다.The normal image dataset 210 may refer to a set of normal image data to which an adversarial example generation technique, which will be described later, has not been applied. The unsupervised anomaly detection device 100 of the adversarial example receives a normal image dataset 210 through the interface 130, or uses the normal image dataset 210 stored in the storage 140 to set a normal (220) can be generated.

XAI 시그니처, 즉 설명 가능한 인공지능의 시그니처는, 인공지능이 인공지능 모델을 이용하여 입력된 데이터를 분류 또는 판단한 이유를 사람이 이해할 수 있도록 제시하는 것으로, 인공지능의 사고과정을 들여다볼 수 있게 하는 기술이다. XAI signatures, or explainable signatures of artificial intelligence, present the reasons why artificial intelligence classified or judged input data using an artificial intelligence model so that people can understand them, allowing them to look into the thought process of artificial intelligence. It's technology.

이미지 데이터에 있어 XAI 시그니처는 예를 들어 돌출 맵(saliency map), LIME(Local Interpretable Model-Agnostic Explanation), SHAP(SHapley Additive exPlanations), IG(Intergrated Gradient) 중 적어도 어느 하나를 이용하여 인공지능이 이용하는 이미지 데이터의 특징을 시각화하여 나타낸 것을 의미한다. 물론 본 발명이 이에 제한되는 것은 아니며, XAI 시그니처는 상술한 4개의 방식 외에 다른 설명할 수 있는 인공지능에 의해 생성된 것을 포함할 수 있다.In image data, the It means visualizing the characteristics of image data. Of course, the present invention is not limited to this, and the XAI signature may include those generated by artificial intelligence that can explain other methods in addition to the four methods described above.

이하에서 적대적 예제의 비지도 이상 탐지 장치(100)는 이미지 데이터의 돌출 맵(saliency map)을 이용하여 XAI 시그니처를 생성하는 것을 기준으로 설명한다.Hereinafter, the unsupervised anomaly detection device 100 of the adversarial example will be described based on generating an XAI signature using a saliency map of image data.

적대적 예제의 비지도 이상 탐지 장치(100)는, 메모리(120)에 미리 저장된 인공지능 모델과, 정상 이미지 데이터셋(210)을 이용하여 정상 XAI 시그니처셋(220)을 생성할 수 있다. 여기서 메모리(120)에 미리 저장된 인공지능 모델은, 적대적 예제에 의해 회피 공격이 수행되는 대상의 인공지능 모델을 의미할 수 있다.The unsupervised anomaly detection device 100 of the adversarial example may generate a normal XAI signature set 220 using an artificial intelligence model pre-stored in the memory 120 and a normal image dataset 210. Here, the artificial intelligence model pre-stored in the memory 120 may mean an artificial intelligence model of a target on which an evasion attack is performed by an adversarial example.

돌출 맵을 이용하여 정상 이미지 데이터의 정상 XAI 시그니처를 생성하는 경우, 정상 이미지의 상기 인공지능 모델에 대한 픽셀 기여도를 생성할 수 있다. 이는 해당 이미지 데이터 중 픽셀값의 변화가 급격한 부분을 모아 매핑함으로써 생성될 수 있다. 하나의 정상 이미지 데이터에 대응하는 정상 XAI 시그니처를 생성하고, 이를 정상 이미지 데이터셋(210) 전체에 대하여 반복 수행함으로써 정상 이미지 데이터셋(210)에 대응하는 정상 XAI 시그니처셋(220)이 생성될 수 있다. 여기서는 정상 이미지 데이터셋에 대응하는 돌출 맵셋이 생성되는 것을 기준으로 설명한다.When generating a normal XAI signature of normal image data using a saliency map, pixel contributions to the artificial intelligence model of the normal image can be generated. This can be created by gathering and mapping parts of the image data where the pixel value changes rapidly. A normal XAI signature set 220 corresponding to the normal image dataset 210 can be generated by generating a normal there is. Here, the explanation is based on the creation of a saliency mapset corresponding to a normal image dataset.

이어서, 정상 XAI 시그니처셋을 오토인코더로 학습하여 압축-복원 모델을 생성하는 단계(S120)가 수행될 수 있다.Subsequently, a step (S120) of generating a compression-decompression model by learning the normal XAI signature set with an autoencoder may be performed.

오토인코더(230)는 이미지, 음성 등과 같은 고차원 데이터의 복원 및 노이즈 감소에 특화된 심층신경망으로, 서로 대칭을 띄고 있는 인코더(Encoder)와 디코더(Decoder)로 이루어져 있다. 인코더는 고차원 데이터를 저차원으로 압축하여 불필요한 정보를 최소화하며, 디코더는 저차원 데이터를 고차원으로 다시 복원시킨다.The autoencoder 230 is a deep neural network specialized in restoring and reducing noise of high-dimensional data such as images and voices, and consists of an encoder and a decoder that are symmetrical to each other. The encoder compresses high-dimensional data into low-dimensional data to minimize unnecessary information, and the decoder restores low-dimensional data back to high-dimensional data.

정상 XAI 시그니처셋(220)을 오토인코더(230)에 입력하여 학습시킴으로써 학습된 인코딩 표현에서 입력 데이터를 복원하기 위한 압축-복원 모델이 생성될 수 있다. 적대적 예제의 비지도 이상 탐지 장치(100)는 오토인코더(230)에 의해 학습된 압축-복원 모델을 이용하여 적대적 예제 판단에 필요한 오차를 연산할 수 있다.By inputting and learning the normal The unsupervised anomaly detection device 100 for adversarial examples can calculate the error required for judging adversarial examples using the compression-decompression model learned by the autoencoder 230.

이어서, 인공지능 모델로부터 대상 이미지의 대상 XAI 시그니처를 생성하는 단계(S130)가 수행될 수 있다. 대상 이미지(240)는 인공지능 모델에 대한 회피 공격을 위해 정상 이미지로부터 노이즈를 첨가하여 생성된 적대적 예제인 경우를 가정한다. 상기 노이즈의 첨가는 예를 들어 FGSM(Fast Gradient Sign Method), CW((Carlini & Wagner's), PGD(Projected Gradient Descent) 중 적어도 어느 하나의 알고리즘을 이용할 수 있으나 본 발명이 이에 제한되는 것은 아니다. 이하에서 대상 이미지(240)는 정상 이미지에 대해 FGSM을 이용하여 노이즈가 첨가됨으로써 인공지능 모델이 다른 결과값으로 분류하도록 만들어진 경우를 가정하여 설명한다.Subsequently, a step (S130) of generating a target XAI signature of the target image from the artificial intelligence model may be performed. It is assumed that the target image 240 is an adversarial example created by adding noise from a normal image for an evasion attack against an artificial intelligence model. For example, the addition of the noise may use at least one of FGSM (Fast Gradient Sign Method), CW (Carlini & Wagner's), and PGD (Projected Gradient Descent) algorithms, but the present invention is not limited thereto. Hereinafter , the target image 240 is explained assuming that noise is added to a normal image using FGSM so that the artificial intelligence model is created to classify it into different result values.

적대적 예제의 비지도 이상 탐지 장치(100)는 대상 이미지(240)를 스토리지(140)에 미리 저장하고 있거나 인터페이스(130)를 통해 입력받을 수 있다. 또는 적대적 예제의 비지도 이상 탐지 장치(100) 미리 저장된 정상 이미지 데이터셋(210)에 노이즈를 첨가함으로써 복수의 대상 이미지(240)로 구성된 대상 이미지셋을 생성할 수도 있다.The unsupervised anomaly detection device 100 of the adversarial example may store the target image 240 in advance in the storage 140 or receive the target image 240 through the interface 130. Alternatively, the unsupervised anomaly detection device 100 of the adversarial example may generate a target image set consisting of a plurality of target images 240 by adding noise to the pre-stored normal image dataset 210.

대상 이미지(240)의 대상 XAI 시그니처(250)는 예를 들어 돌출 맵(saliency map), LIME(Local Interpretable Model-Agnostic Explanation), SHAP(SHapley Additive exPlanations), IG(Intergrated Gradient) 중 적어도 어느 하나를 이용하여 대상 이미지 데이터의 특징을 시각화함으로써 생성될 수 있다. 이 때 상술한 정상 이미지 데이터셋(210)의 정상 XAI 시그니처셋(220) 생성과, 대상 이미지(240)의 대상 XAI 시그니처(250) 생성은 동일한 알고리즘을 이용하여 수행될 필요가 있다. 따라서 정상 이미지 데이터셋(210)에 돌출 맵을 적용하여 정상 XAI 시그니처셋(220)을 생성한 경우, 대상 이미지(240)의 적대적 예제 여부를 판단하기 위해 대상 이미지(240)에 돌출 맵을 적용한 대상 XAI 시그니처(250)를 생성할 필요가 있다.For example, the target It can be created by visualizing the characteristics of target image data. At this time, the generation of the normal XAI signature set 220 of the above-described normal image dataset 210 and the target XAI signature 250 of the target image 240 need to be performed using the same algorithm. Therefore, when a saliency map is applied to the normal image dataset 210 to generate a normal There is a need to create an XAI signature (250).

이어서 정상 XAI 시그니처셋으로부터 얻어진 기준 오차와, 대상 XAI 시그니처로부터 얻어진 대상 오차의 비교 결과에 기초하여 대상 이미지의 적대적 예제 여부를 판단하는 단계(S140)가 수행될 수 있다. Subsequently, a step (S140) of determining whether the target image is an adversarial example may be performed based on a comparison result between the reference error obtained from the normal XAI signature set and the target error obtained from the target XAI signature.

본 발명의 실시예에 따른 적대적 예제의 비지도 이상 탐지 장치(100)는 오토인코더(230)에 의해 생성된 압축-복원 모델을 이용하여 적대적 예제 판단에 필요한 오차를 연산할 수 있다. The unsupervised anomaly detection device 100 for adversarial examples according to an embodiment of the present invention can calculate the error required for determining adversarial examples using the compression-decompression model generated by the autoencoder 230.

구체적으로, 적대적 예제의 비지도 이상 탐지 장치(100)는 정상 XAI 시그니처셋(220)을 압축-복원 모델에 입력하고, 압축 및 복원 작업을 수행하여 복원된 정상 XAI 시그니처셋을 생성할 수 있다. 고차원 데이터인 정상 XAI 시그니처셋이 저차원으로 압축되고, 저차원 데이터가 고차원 데이터로 다시 복원되는 과정에서 정상 XAI 시그니처셋(220)과 복원된 정상 XAI 시그니처셋 사이에는 복원 오차가 존재한다. 복수의 이미지 데이터의 집합인 정상 XAI 시그니처셋(220)과, 이에 대응하는 복원된 XAI 시그니처셋 사이의 복원 오차를 이용하여 기준 오차가 생성될 수 있다. 예를 들어 기준 오차는 정상 XAI 시그니처셋(220)과, 이에 대응하는 복원된 XAI 시그니처셋 사이의 복원 오차의 평균값을 포함할 수 있다.Specifically, the unsupervised anomaly detection device 100 of the adversarial example may input the normal In the process of compressing the normal A reference error may be generated using the reconstruction error between the normal XAI signature set 220, which is a set of a plurality of image data, and the corresponding restored XAI signature set. For example, the reference error may include the average value of the reconstruction error between the normal XAI signature set 220 and the corresponding restored XAI signature set.

또한, 적대적 예제의 비지도 이상 탐지 장치(100)는 대상 XAI 시그니처(250)를 압축-복원 모델에 입력하고, 압축 및 복원 작업을 수행하여 복원된 대상 XAI 시그니처를 생성할 수 있다. 대상 XAI 시그니처(250)와, 이에 대응하는 복원된 대상 XAI 시그니처 사이의 복원 오차를 의미하는 대상 오차가 생성될 수 있다. Additionally, the unsupervised anomaly detection apparatus 100 of the adversarial example may input the target A target error, which means a reconstruction error between the target XAI signature 250 and the corresponding restored target XAI signature, may be generated.

대상 이미지가 적대적 예제인지 여부의 판단은, 기준 오차와 대상 오차의다 큰차이가 미리 정한 기준 이상인지 여부를 판단하여(S150) 차이가 미리 정한 기준보다 큰 경우 대상 이미지가 적대적 예제임을 판단하고(S160), 차이가 미리 정한 기준보다 작은 경우 대상 이미지가 적대적 예제임을 판단(S165)함으로써 수행될 수 있다.To determine whether the target image is a hostile example, determine whether the large difference between the reference error and the target error is greater than a predetermined standard (S150), and if the difference is greater than the predetermined standard, determine that the target image is a hostile example ( S160), if the difference is smaller than a predetermined standard, it can be performed by determining that the target image is an adversarial example (S165).

기준 오차와 대상 오차 사이의 차이가 미리 정한 기준보다 큰 경우는, 대상 오차 대상 XAI 시그니처(250)를 압축-복원 모델을 통해 복원한 복원된 대상 XAI 시그니처의 복원 손실이 정상 XAI 시그니처셋의 복원 시 나타난 복원 오차(예를 들어 정상 XAI 시그니처셋에 대한 평균)보다 큰 경우로, 대상 이미지가 적대적 예제에 해당하는 경우를 의미한다.If the difference between the reference error and the target error is greater than a predetermined standard, the restoration loss of the restored target This case is larger than the indicated reconstruction error (e.g., the average for the normal XAI signature set), meaning that the target image corresponds to an adversarial example.

반면에 기준 오차와 대상 오차 사이의 차이가 미리 정한 기준보다 작은 경우는, 대상 오차 대상 XAI 시그니처(250)를 압축-복원 모델을 통해 복원한 복원된 대상 XAI 시그니처의 복원 손실이 정상 XAI 시그니처셋의 복원 시 나타난 복원 오차(예를 들어 정상 XAI 시그니처셋에 대한 평균)보다 작은 경우로, 대상 이미지가 적대적 예제가 아닌 일반적인 이미지 데이터인 경우를 의미한다.On the other hand, if the difference between the reference error and the target error is smaller than the predetermined standard, the restoration loss of the restored target This refers to the case where the restoration error shown during restoration (e.g., the average for the normal XAI signature set) is smaller, and the target image is general image data rather than an adversarial example.

도 5는 본 발명의 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법의 효과를 설명하기 위한 도면이다.Figure 5 is a diagram illustrating the effectiveness of an unsupervised anomaly detection method for adversarial examples according to an embodiment of the present invention.

도 5를 참조하면, 정상 XAI 시그니처셋의 기준 오차와, FGSM에서 노이즈의 크기(eps) 값을 변화시키며 생성된 대상 이미지의 대상 XAI 시그니처의 대상 오차의 최솟값, 최댓값 및 평균값이 도시된다. 정상 XAI 시그니처셋의 기준 오차는 그 평균이 0.0191인 반면, FGSM을 통해 생성된 적대적 예제의 경우 압축-복원을 통한 대상 오차가 기준 오차보다 크며, 노이즈의 크기(eps)가 늘어날수록 대상 오차의 크기 또한 증가하는 것을 알 수 있다.Referring to FIG. 5, the minimum, maximum, and average values of the reference error of the normal XAI signature set and the target error of the target The average of the standard error of the normal It can also be seen that it is increasing.

이와 같이, 본 발명의 실시예에 따른 적대적 예제의 비지도 이상 탐지 방법 및 탐지 장치는, 오토인코더를 이용한 비지도 학습을 통해 적대적 예제의 이상 여부를 탐지할 수 있어 적대적 예제의 생성 및 학습에 필요한 시간 및 비용을 크게 감소시킬 수 있다. As such, the unsupervised anomaly detection method and detection device for adversarial examples according to an embodiment of the present invention can detect anomalies in adversarial examples through unsupervised learning using an autoencoder, which is necessary for generating and learning adversarial examples. Time and cost can be greatly reduced.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 장치에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 하드디스크, ROM, RAM, CD-ROM, 하드 디스크, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다.The present invention can also be implemented as computer-readable code on a computer-readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer device. Examples of computer-readable recording media include hard disks, ROMs, RAM, CD-ROMs, hard disks, magnetic tapes, floppy disks, and optical data storage devices, as well as carrier waves (e.g., transmission via the Internet). It also includes implementation in the form of.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although embodiments of the present invention have been described above with reference to the attached drawings, those skilled in the art will understand that the present invention can be implemented in other specific forms without changing the technical idea or essential features. You will be able to understand it. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive.

100: 적대적 예제의 비지도 이상 탐지 장치
110: 프로세서
120: 메모리
130: 인터페이스
140: 스토리지
210: 정상 이미지 데이터셋
220: 정상 XAI 시그니처셋
230: 오토인코더
240: 대상이미지
250: 대상 XAI 시그니처100: Unsupervised anomaly detection device for adversarial examples
110: processor
120: memory
130: interface
140: storage
210: Normal image dataset
220: Normal XAI signature set
230: Autoencoder
240: Target image
250: Target XAI signature

Claims

Generating a normally explainable AI (eXplanable Artificial Intelligence; XAI) signature set of a normal image dataset from an artificial intelligence model;
generating a compression-decompression model by learning the normal XAI signature set with an autoencoder;
Generating a target XAI signature of a target image from the artificial intelligence model; and
Based on the comparison result of the reference error obtained by passing the normal XAI signature set through the compression-decompression model and the target error obtained by passing the Includes a step of determining whether it is an example,
The step of determining whether the target image is an adversarial example is,
An unsupervised anomaly detection method of an adversarial example, comprising determining the target image as an adversarial example when the difference between the target error and the reference error is greater than a predetermined value.

According to claim 1,
The step of generating the normal XAI signature set is,
Including obtaining feature values from the normal image dataset using at least one of a saliency map, LIME (Local Interpretable Model-Agnostic Explanation), SHAP (SHapley Additive exPlanations), and IG (Integrated Gradient),
An unsupervised anomaly detection method for adversarial examples.

According to clause 2,
Generating the normal XAI signature set includes generating a saliency mapset from the normal image dataset,
Generating the saliency mapset includes obtaining pixel contributions to the artificial intelligence model of a normal image included in the normal image dataset,
An unsupervised anomaly detection method for adversarial examples.

delete

According to clause 1,
The target image is generated by synthesizing noise generated using at least one of FGSM (Fast Gradient Sign Method), CW ((Carlini &Wagner's), and PGD (Projected Gradient Descent) with the image included in the normal image dataset. containing images,
An unsupervised anomaly detection method for adversarial examples.

processor; and
a memory storing instructions executable by the processor, the instructions being executed by the processor,
Generating a normal XAI signature set of a normal image dataset from an artificial intelligence model;
generating a compression-decompression model by learning the normal XAI signature set with an autoencoder;
Generating a target XAI signature of a target image from the artificial intelligence model; and
Based on the comparison result of the reference error obtained by passing the normal XAI signature set through the compression-decompression model and the target error obtained by passing the Take steps to determine whether an example is present,
The step of determining whether the target image is an adversarial example is,
Including determining the target image as an adversarial example when the difference between the target error and the reference error is greater than a predetermined value,
An unsupervised anomaly detector on adversarial examples.

According to clause 6,
The step of generating the normal XAI signature set is,
Including obtaining feature values from the normal image dataset using at least one of a saliency map, LIME, SHAP, and IG,
An unsupervised anomaly detector on adversarial examples.

According to clause 7,
Generating the normal XAI signature set includes generating a saliency mapset from the normal image dataset,
Generating the saliency mapset includes obtaining pixel contributions to the artificial intelligence model of a normal image included in the normal image dataset,
An unsupervised anomaly detector on adversarial examples.

delete

According to clause 6,
The target image is generated by synthesizing noise generated using at least one of FGSM (Fast Gradient Sign Method), CW ((Carlini &Wagner's), and PGD (Projected Gradient Descent) with the image included in the normal image dataset. containing images,
An unsupervised anomaly detector on adversarial examples.