KR20240039407A

KR20240039407A - Robustness measurement system and application of ai model for malware variant analysis

Info

Publication number: KR20240039407A
Application number: KR1020220117936A
Authority: KR
Inventors: 이태진; 이은규; 정시온; 이현우
Original assignee: 호서대학교 산학협력단
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2024-03-26

Abstract

본 발명은, 적대적 공격(Adversarial attack)에 대응하기 위한 AI 모델의 견고성 측정 시스템에 있어서, 원본 데이터와 적대적 공격의 성공으로 인해 생성된 공격 샘플 사이의 거리 파라메터를 이용하여 원본에 근접하면서도 오분류를 유발시키는 적대적 트레이닝 샘플을 생성하는 전처리 모듈; 상기 AI 모델의 견고성(Robust) 수준을 평가하기 위한 견고성 인덱스의 값으로 상기 적대적 트레이닝 샘플과 원본 데이터 간 거리의 평균값을 연산하는 견고성 측정 모듈; 및 상기 견고성 측정 모듈이 산출한 상기 견고성 인덱스의 값을 이용하여 적대적 공격의 악성코드 그룹별 견고성 수준 측정 현황을 출력하는 출력 모듈;을 포함하여, AI 모델의 적대적 공격에 대한 견고성의 평가지표를 출력하는 것을 일 특징으로 한다.The present invention, in a system for measuring the robustness of an AI model to respond to adversarial attacks, uses the distance parameter between the original data and the attack sample generated due to the success of the adversarial attack to prevent misclassification while remaining close to the original. a preprocessing module that generates challenging adversarial training samples; a robustness measurement module that calculates the average value of the distance between the adversarial training sample and the original data as a robustness index value for evaluating the robustness level of the AI model; And an output module that outputs the robustness level measurement status for each malicious code group of a hostile attack using the value of the robustness index calculated by the robustness measurement module; including, outputting an evaluation index of the robustness of the AI model against hostile attacks. It's characteristic is to do something.

Description

Robustness measurement system and application of AI model for analyzing malware variants {ROBUSTNESS MEASUREMENT SYSTEM AND APPLICATION OF AI MODEL FOR MALWARE VARIANT ANALYSIS}

본 발명은 AI 모델의 학습분류 오류와 신뢰도 하락을 유도하는 적대적 공격(Adversarial attack)에 대응하기 위하여, AI 모델의 견고성을 측정할 수 있는 AI 모델의 견고성 측정 시스템 및 어플리케이션을 제공하고자 한다.The present invention seeks to provide an AI model robustness measurement system and application that can measure the robustness of an AI model in order to respond to adversarial attacks that induce learning classification errors and a decrease in reliability of the AI model.

디지털 산업이 고도화됨에 따라 사이버공격의 수법과 유형이 다양해지고 있다. 트래픽이 많은 대용량 네트워크에서 의심스러운 데이터나 악의적인 접근을 빠르고 정확하게 감지하는 평가가 특히 중요해지고 있다.As the digital industry becomes more sophisticated, the methods and types of cyber attacks are becoming more diverse. Evaluation to quickly and accurately detect suspicious data or malicious access is becoming especially important in high-volume networks with high traffic.

기존의 AI 모델은 주어진 데이터셋으로 학습을 수행하여 악성과 정상을 판단하는 형태이다. 하지만 기존 공격들을 학습한 AI 모델이 새로운 공격에도 효과적으로 대응 가능하다는 신뢰성이 부족하다. 새로 등장하는 공격들은 기술의 발전 및 기존 탐지 체계를 회피하기 위해 변형된 공격들일 것이다.Existing AI models learn from a given data set and determine whether it is malicious or normal. However, there is a lack of reliability that AI models that have learned from existing attacks can effectively respond to new attacks. Newly emerging attacks will be attacks that have been modified to evade technological advancements and existing detection systems.

일 예시로, AI 모델의 misclassification과 신뢰도 하락을 유발하기 위한 적대적 공격(Adversarial Attack)이 있다. 적대적 공격(Adversarial Attack)이란 원본에 쉽게 구별하기 힘든 아주 작은 변조(perturbation)를 주어서 AI 모델을 속이는 공격 기법이다. AI 모델의 탐지를 회피하기 위해 적대적 공격에 대한 관심이 증가하고 있다. Zeroth Order Optimization based Black-Box Attacks(ZOO Attack)은 적대적 공격을 생성하는 기술로 Gradient Masking을 통해 Gradient 값 노출로 인한 공격을 막고 Gradient를 숨긴다.As an example, there is an adversarial attack to cause misclassification and lower reliability of the AI model. Adversarial attack is an attack technique that deceives the AI model by giving a very small perturbation to the original that is difficult to distinguish. There is increasing interest in adversarial attacks to evade detection by AI models. Zeroth Order Optimization based Black-Box Attacks (ZOO Attack) is a technology that creates hostile attacks. It prevents attacks due to exposure of gradient values and hides gradients through Gradient Masking.

2014년 이후 적대적 공격에 관한 사례 및 공격 관련 논문이 지속적으로 증가하고 있지만, NVD(National Vulnerability Database) 내에 적대적 공격 관련 항목이 존재하지 않으며, 적대적 공격에 대한 실제 대응 활동은 많지 않다. 이에 따라 이러한 공격에도 대응할 수 있는 견고한 모델을 만들기 위한 방안이 필요하다. Although cases of hostile attacks and attack-related papers have been increasing continuously since 2014, there are no hostile attack-related items in the National Vulnerability Database (NVD), and there are not many actual response activities to hostile attacks. Accordingly, a method is needed to create a robust model that can respond to such attacks.

관련 종래기술로 한국공개특허 제10-2022-0049967호는 적대적 공격(Adversarial Attack)에 대한 방어 방법 및 그 장치 발명을 개시한다. 상기 선행특허는 영상 인식의 대상이 되는 타겟 영상을 획득하는 단계; 상기 획득된 타겟 영상을 색상 별로 분류하여 복수의 컬러 맵을 생성하는 단계; CNN(Convolutional Neural Network) 모델을 기반으로 각각의 컬러 맵에 포함된 객체들의 특징을 추출하고, 상기 추출된 특징을 기반으로 상기 컬러 맵에 포함된 객체들을 분류하는 단계; 및 상기 컬러 맵에 포함된 객체들에 대해 우선순위를 할당하고, 상기 객체들에 관한 분류 정보와 우선순위 정보를 기반으로 상기 타겟 영상을 인식하는 단계를 포함한다. 상기의 선행특허는 객체들에 대해 서로 다른 우선순위를 부여하여 적대적 공격에 대해 방어한다.As a related prior art, Korean Patent Publication No. 10-2022-0049967 discloses the invention of a defense method and device against an adversarial attack. The prior patent includes the steps of acquiring a target image that is the object of image recognition; Classifying the acquired target image by color to generate a plurality of color maps; Extracting features of objects included in each color map based on a Convolutional Neural Network (CNN) model, and classifying objects included in the color map based on the extracted features; and assigning priorities to objects included in the color map and recognizing the target image based on classification information and priority information about the objects. The above prior patent protects against hostile attacks by assigning different priorities to objects.

머신러닝을 공략하는 적대적 공격에 대응하기 위해서는 악성코드 변종으로부터 대응 가능한 견고한(Robust) AI 모델인지 그 견고성(Robustness)의 수준을 확인할 수 있어야 할 것이다. AI 모델은 단순히 Accuracy, Precision, Recall, F1 score 등과 같은 종래의 평가지표로 측정되는 것이 아니라, 견고성(Robustness)의 수준의 별도의 평가지표가 도입되어 그 신뢰성이 확인될 필요성이 있다. In order to respond to adversarial attacks targeting machine learning, it will be necessary to check whether the AI model is robust enough to respond to malware variants and the level of robustness. AI models are not simply measured using conventional evaluation indicators such as Accuracy, Precision, Recall, F1 score, etc., but there is a need to introduce a separate evaluation indicator at the level of robustness to confirm its reliability.

그럼에도, 종래에는 AI 모델의 견고성을 평가하기 위한 공식적인 표준화된 방안이 합의되지 않은 실정이다. 또한, 기존의 AI 모델 견고성 평가를 위한 연구에서 사용되는 지표는 이미지 데이터와 같이 피쳐 값(value)의 범위(range)가 동일한 데이터가 아닌 경우 사용되기에 부적합하다. 종래의 경우, 데이터의 피쳐 스케일이 모두 동일한다는 전제에 따른 것으로, 피쳐별 의미나 범위가 다른 경우에도 동일하게 적용될 수 있는 통일적인 견고성 지표가 제시되지 않았기 때문이다. Nevertheless, there has been no agreed upon formal, standardized method for evaluating the robustness of AI models. In addition, the indicators used in research to evaluate the robustness of existing AI models are unsuitable for use in cases where the range of feature values is not the same, such as image data. In the conventional case, this is based on the premise that the feature scales of the data are all the same, and a unified robustness index that can be applied equally even when the meaning or range of each feature is different has not been presented.

또한, 기존의 견고성 평가를 위한 연구는 원본과의 거리를 통한 변조(perturbation) 크기를 측정하는 것에 집중되며, 변조에 따른 예측(probability)의 변화량을 고려하지 않는다는 한계점이 있다. AI 모델은 모델마다 사용하는 피쳐의 수가 다르므로, 서로 다른 AI 모델과의 비교에 견고성 측정의 비교를 통일시킬 수 없으므로 여전히 견고성 측정의 표준화가 어려운 실정이다.In addition, existing research on robustness evaluation focuses on measuring the size of perturbation through the distance from the original, and has the limitation of not considering the amount of change in probability due to modulation. Since the number of features used in AI models is different for each model, the comparison of robustness measurements cannot be unified when comparing with different AI models, so standardization of robustness measurements is still difficult.

한국공개특허 제10-2022-0049967호Korean Patent Publication No. 10-2022-0049967

본 발명은 악성코드 변종으로부터 대응 가능한 견고한(Robust) AI 모델인지 그 견고성(Robustness)의 수준을 확인할 수 있고, 특히, AI 모델마다 사용하는 피쳐의 수가 다르거나 피쳐 스케일이 다른 경우에도 동일하게 견고성 수준의 측정이 가능한 견고성 측정 시스템을 제공하고자 한다. 이에 따라, 서로 다른 AI 모델일지라도 견고성의 정도를 비교가 가능한 견고성 인덱스를 산출하는 시스템을 제공하고자 한다.The present invention can check whether the robust AI model is capable of responding to malware variants and the level of robustness. In particular, the robustness level is the same even when the number of features used for each AI model is different or the feature scale is different. We aim to provide a robustness measurement system capable of measuring . Accordingly, we would like to provide a system that calculates a robustness index that allows comparison of the degree of robustness even for different AI models.

상기 목적을 달성하기 위하여 본 발명은, 적대적 공격(Adversarial attack)에 대응하기 위한 AI 모델의 견고성 측정 시스템에 있어서, 원본 데이터와 적대적 공격의 성공으로 인해 생성된 공격 샘플 사이의 거리 파라메터를 이용하여 원본에 근접하면서도 오분류를 유발시키는 적대적 트레이닝 샘플을 생성하는 전처리 모듈; 상기 AI 모델의 견고성(Robust) 수준을 평가하기 위한 견고성 인덱스의 값으로 상기 적대적 트레이닝 샘플과 원본 데이터 간 거리의 평균값을 연산하는 견고성 측정 모듈; 및 상기 견고성 측정 모듈이 산출한 상기 견고성 인덱스의 값을 이용하여 적대적 공격의 악성코드 그룹별 견고성 수준 측정 현황을 출력하는 출력 모듈;을 포함하여, AI 모델의 적대적 공격에 대한 견고성의 평가지표를 출력하는 것을 일 특징으로 한다.In order to achieve the above object, the present invention provides a robustness measurement system for an AI model to respond to an adversarial attack, using the distance parameter between the original data and the attack sample generated due to the success of the adversarial attack to determine the original data. A preprocessing module that generates adversarial training samples that are close to but cause misclassification; a robustness measurement module that calculates the average value of the distance between the adversarial training sample and the original data as a robustness index value for evaluating the robustness level of the AI model; And an output module that outputs the robustness level measurement status for each malicious code group of a hostile attack using the value of the robustness index calculated by the robustness measurement module; including, outputting an evaluation index of the robustness of the AI model against hostile attacks. It's characteristic is to do something.

바람직하게 상기 견고성 측정 모듈은, 상기 견고성 인덱스의 지표가 상기 적대적 트레이닝 샘플을 생성하기 어려울수록 노이즈가 커지고, 이에 따라 상기 경고성 인덱스의 값이 증가하게 되며, 상기 경고성 인덱스의 값이 증가될수록 상기 적대적 트레이닝 샘플의 생성이 어려워지는 특성을 갖고, 상기 적대적 트레이닝 샘플의 생성 난이도를 기준으로 상기 AI 모델의 견고성에 대한 평가 기준인 상기 견고성 인덱스가 설정될 수 있다.Preferably, the robustness measurement module is such that, as the index of the robustness index becomes more difficult to generate the adversarial training sample, noise increases, and the value of the warning index increases accordingly, and as the value of the warning index increases, the noise increases. It has the characteristic of making it difficult to generate adversarial training samples, and the robustness index, which is an evaluation standard for the robustness of the AI model, may be set based on the difficulty of generating the adversarial training samples.

바람직하게 상기 견고성 측정 모듈은, [관계식 1]을 기반으로 상기 견고성 인덱스를 산출할 수 있다.Preferably, the robustness measurement module can calculate the robustness index based on [Relational Equation 1].

[관계식 1][Relational Expression 1]

여기서, ARS는 Adversarial Robustness Score로 상기 견고성 인덱스를 의미하고, S는 각 그룹별 상기 적대적 트레이닝 샘플의 생성에 성공한 데이터의 집합을 의미하며, i는 피쳐 인덱스를 의미하고, n은 피쳐의 수를 의미하며, r은 피쳐의 범위를 의미하고, x는 오리지널 데이터이며, x’는 상기 적대적 트레이닝 샘플이고, P는 AI 모델의 예측값을 의미한다.Here, ARS refers to the Adversarial Robustness Score and refers to the robustness index, S refers to the set of data that successfully generated the adversarial training sample for each group, i refers to the feature index, and n refers to the number of features. , r means the range of the feature, x is the original data, x' is the adversarial training sample, and P means the predicted value of the AI model.

바람직하게 상기 견고성 측정 모듈은, 상기 견고성 인덱스를 산출시 상기 적대적 트레이닝 샘플의 생성에 실패한 데이터를 제외시킨 후, 데이터의 그룹 별 상기 적대적 트레이닝 샘플의 생성에 성공한 N개에 대하여 평균값을 산출할 수 있다.Preferably, when calculating the robustness index, the robustness measurement module excludes data that failed to generate the adversarial training sample, and then calculates an average value for the N numbers that succeeded in generating the adversarial training sample for each group of data. .

또한 본 발명은, 데이터를 입력하는 입력 수단, 입력된 데이터를 처리하는 처리 수단 및 출력 수단을 갖고 일반 데이터를 학습하는 AI 모델이 매체에 저장된 스마트폰, 태블릿, 노트북, 또는 컴퓨터에, 실행시키기 위하여 매체에 저장된 AI 모델의 견고성 측정 어플리케이션에 있어서, 원본 데이터와 적대적 공격의 성공으로 인해 생성된 공격 샘플 사이의 거리 파라메터를 이용하여 원본에 근접하면서도 오분류를 유발시키는 적대적 트레이닝 샘플을 생성하는 전처리 단계; 상기 AI 모델의 견고성(Robust) 수준을 평가하기 위한 견고성 인덱스의 값으로 상기 적대적 트레이닝 샘플과 원본 데이터 간 거리의 평균값을 연산하는 견고성 측정 단계; 및 상기 견고성 측정 단계에서 산출한 상기 견고성 인덱스의 값을 이용하여 적대적 공격의 악성코드 그룹별 견고성 수준 측정 현황을 출력하는 출력 단계;를 포함하는 것을 다른 특징으로 한다.In addition, the present invention is to execute an AI model that has an input means for inputting data, a processing means for processing the input data, and an output means and learns general data on a smartphone, tablet, laptop, or computer stored in a medium. In an application for measuring the robustness of an AI model stored in a medium, a preprocessing step of generating an adversarial training sample that is close to the original but causes misclassification using a distance parameter between the original data and an attack sample generated due to a successful adversarial attack; A robustness measurement step of calculating the average value of the distance between the adversarial training sample and the original data as a value of the robustness index to evaluate the robustness level of the AI model; and an output step of outputting the robustness level measurement status for each malicious code group of the hostile attack using the value of the robustness index calculated in the robustness measurement step.

본 발명에 따르면, 전처리 모듈이 최소한의 노이즈를 담당하는 손실함수와 공격 성공률을 높이는 손실 함수의 합을 최소하하여 최적의 공격 샘플을 찾고, 원본 데이터와 공격의 성공으로 생성된 공격 샘플 사이의 거리 파라메터 기반으로 원본에서 가장 가까우면서오 오분류를 유발시키는 적대적 트레이닝 샘플을 학습 데이터 셋으로 구축한다.According to the present invention, the preprocessing module finds the optimal attack sample by minimizing the sum of the loss function responsible for minimal noise and the loss function that increases the attack success rate, and finds the distance between the original data and the attack sample generated by the success of the attack. Based on parameters, an adversarial training sample that is closest to the original and causes misclassification is constructed as a learning data set.

본 발명은 적대적 트레이닝 샘플의 학습 데이터 셋을 기반으로 견고성 인덱스 값으로 AI 모델의 견고성 수준을 평가할 수 있으며, 특히 본 발명에 따른 견고성 인덱스는 변조의 크기 산출에서 피쳐 스케일의 다양성을 고려하고, 피쳐 수에 따른 변조의 변화폭을 고려한다. The present invention can evaluate the robustness level of an AI model with a robustness index value based on a learning data set of adversarial training samples. In particular, the robustness index according to the present invention considers the diversity of feature scales in calculating the magnitude of modulation and determines the number of features. Consider the range of change in modulation according to .

이에 따라, 본 발명은 서로 다른 AI 모델에도 객관적으로 적용될 수 있는 견고성 수준을 측정할 수 있는 이점이 있다.Accordingly, the present invention has the advantage of being able to measure the level of robustness that can be objectively applied to different AI models.

도 1은 본 발명의 실시예에 따른 견고성 측정 시스템의 구성 개요도이다.
도 2는 본 발명의 실시예에 따른 견고성 측정 시스템의 처리 프로세스 개요도이다.
도 3은 본 발명의 실시예에 따른 AI 모델의 견고성 수준 평가를 위한 견고성 인덱스의 산출 과정을 도시한 것이다.
도 4는 본 발명의 실시예에 따른 적대적 학습 이전의 견고성 인덱스와 적대적 학습 이후의 견고성 인덱스를 비교한 실험례의 결과를 나타낸다.1 is a schematic diagram of the structure of a robustness measurement system according to an embodiment of the present invention.
Figure 2 is a schematic diagram of the processing process of the robustness measurement system according to an embodiment of the present invention.
Figure 3 shows the process of calculating the robustness index for evaluating the robustness level of an AI model according to an embodiment of the present invention.
Figure 4 shows the results of an experiment comparing the robustness index before adversarial learning and the robustness index after adversarial learning according to an embodiment of the present invention.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명을 상세히 설명한다. 다만, 본 발명이 예시적 실시 예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일 참조부호는 실질적으로 동일한 기능을 수행하는 부재를 나타낸다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited or limited by the exemplary embodiments. The same reference numerals in each drawing indicate members that perform substantially the same function.

본 발명의 목적 및 효과는 하기의 설명에 의해서 자연스럽게 이해되거나 보다 분명해 질 수 있으며, 하기의 기재만으로 본 발명의 목적 및 효과가 제한되는 것은 아니다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이, 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.The purpose and effect of the present invention can be naturally understood or become clearer through the following description, and the purpose and effect of the present invention are not limited to the following description. Additionally, in describing the present invention, if it is determined that a detailed description of known techniques related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted.

도 1은 본 발명의 실시예에 따른 견고성 측정 시스템(1)의 구성 개요도이다. 도 2는 본 발명의 실시예에 따른 견고성 측정 시스템(1)의 처리 프로세스 개요도이다.1 is a schematic diagram of the structure of a robustness measurement system 1 according to an embodiment of the present invention. Figure 2 is a schematic diagram of the processing process of the robustness measurement system 1 according to an embodiment of the present invention.

도 1, 2를 참조하면, 본 실시예에 따른 견고성 측정 시스템(1)은 전처리 모듈(10), 견고성 측정 모듈(30) 및 출력 모듈(50)을 포함할 수 있다. 본 실시예에 따른 견고성 측정 시스템(1)은 일반 데이터를 학습하는 AI 모델의 적대적 공격에 대응하기 위해 적대적 공격을 기반으로 한 샘플을 생성하여 견고성의 정도를 평가할 수 있다. 이러한 견고성의 평가는 적대적 샘플의 재학습을 통해 AI 모델의 견고성을 강화하는데 적용될 수 있다.Referring to FIGS. 1 and 2, the robustness measurement system 1 according to this embodiment may include a preprocessing module 10, a robustness measurement module 30, and an output module 50. The robustness measurement system 1 according to this embodiment can evaluate the degree of robustness by generating samples based on adversarial attacks in order to respond to adversarial attacks of an AI model that learns general data. This robustness assessment can be applied to enhance the robustness of AI models through retraining on adversarial samples.

적대적 공격은 원본에 쉽게 구별하기 힘든 아주 작은 변형을 주어 원본이 적대적 샘플(Adversarial example)이 생성될 수 있는 영역으로 가도록 하여, 모델의 분류오류(Misclassification)를 유발하며, 신뢰도 하락을 유발하는 공격 기법이다. 최소한의 변형(Pertubation)을 가하더라도 학습시 라벨이 변경되어 산출되는 적대적 샘플이 생성됨에 따라, 변형을 쉽게 식별할 수는 없으면서도, 높은 확률로 분류오류를 발생시켜 서비스 응용에 문제를 유발시키는 것이다.An adversarial attack is an attack technique that causes a misclassification of the model and a decrease in reliability by giving a very small transformation to the original that is difficult to distinguish, leading the original to an area where an adversarial example can be created. am. Even if minimal modification (pertubation) is applied, adversarial samples are generated whose labels are changed during learning, so although the modification cannot be easily identified, classification errors are generated with a high probability, causing problems in service applications. .

적대적 공격에 대응하기 위해서는 분류오류와 신뢰도 하락에 대한 근본적인 해결이 필요하다. 적대적 공격에 대응할 수 있는 방안으로 기울기인 Gradient를 숨기는 방법인 Gradient masking, 공격이 될 만한 노이즈를 포함하는 Input을 넣지 못하도록 데이터를 정제하는 Input pre-processing이 있다. 그리고, 공격이 될 만한 데이터를 적대적 공격 기반으로 샘플을 생성하여 모델 학습에 추가한 후 재학습하는 적대적 학습(Adversarial training)이 있다. 적대적 학습은 적대적 공격 샘플을 통해 비슷한 공격에 대한 대응이 가능하도록 모델을 재학습시켜 모델의 견고성(Robustenss) 수준의 향상을 기대할 수 있다. Gradieant masking의 경우, 기울기를 사용하여 공격하는 기법이 아닐 경우 공격에 대한 대응이 불가능하며, Input pre-processing의 경우 최적화에 대한 확인을 할 수 없기 때문에 최선의 방법인지 확신이 어렵다. 따라서, 적대적 학습을 통해 모델을 재학습시킬 경우 알려진 공격에 대한 정확도 상승 및 새로운 공격에도 대응이 가능한 방식이 가장 바람직할 것이다. In order to respond to hostile attacks, a fundamental solution to classification errors and reduced reliability is needed. Methods to respond to hostile attacks include gradient masking, which is a method of hiding gradients, and input pre-processing, which refines data to prevent input containing noise that could be an attack. Additionally, there is adversarial training, which generates samples of data that can be attacked based on an adversarial attack, adds them to model training, and then retrains them. Adversarial learning can be expected to improve the model's robustness level by retraining the model to respond to similar attacks using adversarial attack samples. In the case of gradient masking, it is impossible to respond to an attack if it is not a technique that attacks using gradients, and in the case of input pre-processing, it is difficult to be sure whether it is the best method because optimization cannot be confirmed. Therefore, when retraining a model through adversarial learning, it would be most desirable to use a method that can increase accuracy against known attacks and respond to new attacks.

적대적 학습을 통해 모델을 재학습시키기 위해서는 적대적 트레이닝 샘플의 생성이 필요하다. 학습 데이터를 대상으로 변형된 노이즈를 삽입하여 오분류 및 신뢰도 하락을 유발하는 적대적 공격을 진행하여 적대적 트레이닝 샘플을 생성해야 한다. 또한, 재학습시의 피쳐 요소로 견고성의 평가 지표가 요구된다. 이 때, 견고성의 수준을 나타내는 견고성 인덱스는 트레이닝 샘플의 피쳐의 수, 피쳐의 스케일, 예측의 변화량을 고려하여 서로 다른 AI 모델에서도 적용될 수 있도록 설정되어야 한다.In order to retrain a model through adversarial learning, it is necessary to create adversarial training samples. Hostile training samples must be created by performing an adversarial attack that inserts modified noise into the training data to cause misclassification and lower reliability. Additionally, an evaluation index of robustness is required as a feature element during re-learning. At this time, the robustness index, which indicates the level of robustness, must be set so that it can be applied to different AI models by considering the number of features in the training sample, the scale of the features, and the amount of change in prediction.

이하에서, 본 실시예에 따른 견고성 측정 시스템(1)의 새로운 견고성 측정 ARS 인덱스를 제시하여 AI 모델의 견고성을 측정하는 세부 구성을 설명한다. In the following, a new robustness measurement ARS index of the robustness measurement system 1 according to this embodiment will be presented to explain the detailed configuration of measuring the robustness of the AI model.

전처리 모듈(10)은 원본 데이터와 적대적 공격의 성공으로 인해 생성된 공격 샘플 사이의 거리 파라메터를 이용하여 원본에 근접하면서도 오분류를 유발시키는 적대적 트레이닝 샘플을 생성할 수 있다. 전처리 모듈(10)의 적대적 트레이닝 샘플 생성은 접근 방법으로 최소한의 노이즈를 담당하는 손실 함수와 공격 성공률을 높이는 손실 함수의 합을 최소화함으로써 최적의 적대적 트레이닝 샘플을 찾는 공격 기법이 적용된다.The preprocessing module 10 can use the distance parameter between the original data and the attack sample generated due to the success of the adversarial attack to generate an adversarial training sample that is close to the original but causes misclassification. The generation of adversarial training samples in the preprocessing module 10 is an approach that uses an attack technique to find the optimal adversarial training sample by minimizing the sum of the loss function responsible for the minimum noise and the loss function that increases the attack success rate.

실시예로, 전처리 모듈(10)은 [수학식 1]을 기반으로 상기 적대적 트레이닝 샘플을 생성할 수 있다. In an embodiment, the preprocessing module 10 may generate the adversarial training sample based on [Equation 1].

[수학식 1][Equation 1]

여기서, D는 원본과 상기 공격 샘플 사이의 거리를 산출하기 위한 Distance metric을 의미하며, f는 Objective 함수 로 정의되며, Z(x)_i는 x가 라벨 i로 산출될 확률을 의미한다.Here, D means Distance metric for calculating the distance between the original and the attack sample, and f is the Objective function. It is defined as, Z(x) _i means the probability that x is calculated as label i.

[수학식 1]에 따른 함수로 적대적 트레이닝 샘플이 정상적으로 의도한 라벨이 산출되었는지를 판단하도록 할 수 있다. 특히, Z(x)_i는 x가 라벨 i로 산출될 확률을 의미하는데, 적대적 트레이닝 샘플이 의도한대로 라벨 t(target)으로 산출된다면, 해당 Objective 함수의 산출 값은 음수가 된다. 본 처리과정으로 산출 라벨이 변경되면서 원본과 가장 가까운 적대적 트레이닝 샘플이 생성될 수 있다.The function according to [Equation 1] can be used to determine whether the adversarial training sample has normally produced the intended label. In particular, Z(x) _i refers to the probability that x is calculated with label i. If the adversarial training sample is calculated with label t(target) as intended, the output value of the corresponding Objective function becomes negative. Through this processing, the output label is changed and an adversarial training sample that is closest to the original can be created.

전처리 모듈(10)은 상기 거리 파라메터로, 적대적 공격에 의해 피쳐(feature)가 가장 많이 변경된 값인 Linf 변수, 변경된 피쳐(feature)의 수를 나타내는 L0 변수, 또는 상기 원본 데이터와 상기 공격 샘플을 각각 점으로 표현하였을 때 점과 점 사이의 거리를 유클라디안 알고리즘으로 계산한 L2 변수를 포함할 수 있다. 본 실시예에 따른 전처리 모듈(10)은 L0, L2 변수에 대하여 성능이 우수하다. 전처리 모듈(10)은 특히, 두 가지의 거리 관련 파라메터 중 L2 변수의 distance metric이 가장 많이 사용된다. The preprocessing module 10 uses the distance parameter as a Linf variable, which is the value of the most changed feature due to an adversarial attack, an L0 variable indicating the number of changed features, or points to each of the original data and the attack sample. When expressed as , it can include an L2 variable that calculates the distance between points using the Euclidean algorithm. The preprocessing module 10 according to this embodiment has excellent performance for L0 and L2 variables. In particular, the preprocessing module 10 uses the distance metric of the L2 variable the most among the two distance-related parameters.

전처리 모듈(10)은 [수학식 1]의 D(Distance metric) 관련, 상기 L2 변수 기반의 파라메터를 이용하여 상기 적대적 트레이닝 샘플을 생성하고, 상기 L2 변수는 [수학식 2]로부터 산출될 수 있다.The preprocessing module 10 generates the adversarial training sample using parameters based on the L2 variable related to D (Distance metric) in [Equation 1], and the L2 variable can be calculated from [Equation 2] .

[수학식 2][Equation 2]

여기서 x는 원본 데이터를 말하며, x’는 생성된 공격 샘플을 의미하고, n은 데이터가 가진 피쳐의 개수로 정의된다. Here, x refers to the original data, x’ refers to the generated attack sample, and n is defined as the number of features the data has.

견고성 측정 모듈(30)은 상기 AI 모델의 견고성(Robust) 수준을 평가하기 위한 견고성 인덱스의 값으로 상기 적대적 트레이닝 샘플과 원본 데이터 간 거리의 평균값을 연산할 수 있다.The robustness measurement module 30 may calculate the average value of the distance between the adversarial training sample and the original data as the value of the robustness index to evaluate the robustness level of the AI model.

견고성 측정 모듈(30)은 상기 견고성 인덱스의 지표가 상기 적대적 트레이닝 샘플을 생성하기 어려울수록 노이즈가 커지고, 이에 따라 상기 경고성 인덱스의 값이 증가하게 되며, 상기 경고성 인덱스의 값이 증가될수록 상기 적대적 트레이닝 샘플의 생성이 어려워지는 특성을 갖고, 상기 적대적 트레이닝 샘플의 생성 난이도를 기준으로 상기 AI 모델의 견고성에 대한 평가 기준인 상기 견고성 인덱스가 설정될 수 있다. The robustness measurement module 30 determines that as the robustness index index becomes more difficult to generate the adversarial training sample, the noise increases, and thus the value of the warning index increases. As the value of the warning index increases, the hostile training sample becomes larger. It has the characteristic of making it difficult to generate training samples, and the robustness index, which is an evaluation standard for the robustness of the AI model, can be set based on the difficulty of generating the adversarial training samples.

견고성 측정 모듈(30)은 상기 견고성 인덱스를 산출시 상기 적대적 트레이닝 샘플의 생성에 실패한 데이터를 제외시킨 후, 데이터의 그룹 별 상기 적대적 트레이닝 샘플의 생성에 성공한 N개에 대하여 평균값을 산출할 수 있다.When calculating the robustness index, the robustness measurement module 30 may exclude data that failed to generate the adversarial training samples and then calculate an average value for the N numbers that succeeded in generating the adversarial training samples for each data group.

견고성 측정 모듈(30)은 [수학식 3]을 기반으로 상기 견고성 인덱스를 산출할 수 있다.The robustness measurement module 30 can calculate the robustness index based on [Equation 3].

[수학식 3][Equation 3]

여기서, ARS는 Adversarial Robustness Score로 상기 견고성 인덱스를 의미하고, S는 각 그룹별 상기 적대적 트레이닝 샘플의 생성에 성공한 데이터의 집합을 의미하며, s는 각 그룹에 속해있는 상기 적대적 트레이닝이 샘플을 의미하고, d_s는 원본 데이터와 s에 대한 거리 값을 의미한다.Here, ARS is the Adversarial Robustness Score, which refers to the robustness index, S refers to the set of data that successfully generated the adversarial training sample for each group, and s refers to the adversarial training sample belonging to each group. , d _s means the distance value between the original data and s.

[수학식 3]을 참조하면, 측정한 거리 값들을 그룹 별로 대략적인 평균값을 산출한 것이 ARS의 기본 개념으로 볼 수 있고, 적대적 트레이닝 샘플 생성 실패로 인해 ARS가 무한대로 연산되는 것을 방지하기 위해 전체의 샘플 N개 중 거리 값이 가까운 N/2개의 거리 값에 대한 대략적인 평균값을 사용하였다. 견고성 측정 모듈(30)은 ARS가 클수록 모델이 견고하다고 판단할 수 있다. Referring to [Equation 3], calculating the approximate average value for each group of the measured distance values can be seen as the basic concept of ARS, and to prevent ARS from being infinitely calculated due to failure in generating adversarial training samples, the overall Among N samples, the approximate average value of N/2 distance values with close distance values was used. The robustness measurement module 30 may determine that the larger the ARS, the more robust the model.

그러나, [수학식 3]에 기반한 견고성 인덱스는 단순히 원본과의 평균 거리를 측정하기 때문에 이미지 데이터와 같이 피쳐 벨류(value)의 범위(range)가 동일한 데이터가 아닐 경우 사용되기에 부적합한 한계가 있다. 또한, 원본과의 거리를 토한 변조 크기만 측정하며, 그에 따른 probability의 변화량을 고려하지 않는다는 한계점이 있다.However, since the robustness index based on [Equation 3] simply measures the average distance from the original, it has the limitation of being unsuitable for use when the range of feature values is not the same as image data. In addition, there is a limitation in that it only measures the size of the modulation based on the distance from the original and does not take into account the change in probability accordingly.

이에 보다 바람직한 실시예로, 견고성 측정 모듈(30)은 하기의 [수학식 4]를 기반으로 상기 견고성 인덱스를 산출할 수 있다. Accordingly, in a more preferred embodiment, the robustness measurement module 30 may calculate the robustness index based on [Equation 4] below.

[수학식 4][Equation 4]

여기서, MARS는 Model Adversarial Robustness Score로 상기 견고성 인덱스를 의미하고, S는 각 그룹별 상기 적대적 트레이닝 샘플의 생성에 성공한 데이터의 집합을 의미하며, i는 피쳐 인덱스를 의미하고, n은 피쳐의 수를 의미하며, r은 피쳐의 범위를 의미하고, x는 오리지널 데이터이며, x’는 상기 적대적 트레이닝 샘플이고, P는 AI 모델의 예측값을 의미한다.Here, MARS refers to the Model Adversarial Robustness Score and refers to the robustness index, S refers to the set of data that successfully generated the adversarial training sample for each group, i refers to the feature index, and n refers to the number of features. This means, r means the range of the feature, x is the original data, x' is the adversarial training sample, and P means the predicted value of the AI model.

본 실시예에 따른 견고성 측정 모듈(30)은 [수학식 4]에 기반한 견고성 인덱스를 산출하여 범용적인 견고성 수준 측정을 수행할 수 있다. 본 실시예에 따른 견고성 인덱스는 원본의 적대적 공격 샘플의 거리차 계산시 각 피쳐의 스케일 크기로 나누어 산출하므로, 피쳐 스케일의 다양성을 반영할 수 있다. 또한, 견고성 인덱스는 AI 모델이 판단한 원본 데이터의 probability와 적대적 트레인이 샘플의 probability의 차이만큼 변조 크기에 나우어 산출하므로 probability의 변화량을 고려할 수 있다. 또한, 견고성 인덱스는 변조의 변화폭을 고려하기 위해 사용하는 피쳐의 수를 변조 크기에 나누어 산출되므로 피쳐 수에 따른 변조의 변화폭을 고려하여 다양한 AI 모델에서의 범용성을 확보할 수 있다.The robustness measurement module 30 according to this embodiment can perform general-purpose robustness level measurement by calculating a robustness index based on [Equation 4]. The robustness index according to this embodiment is calculated by dividing the distance difference between the original adversarial attack sample by the scale size of each feature, and thus can reflect the diversity of feature scales. In addition, the robustness index is calculated by dividing the modulation size by the difference between the probability of the original data determined by the AI model and the probability of the sample by the adversarial train, so the change in probability can be taken into consideration. In addition, the robustness index is calculated by dividing the number of features used by the modulation size to consider the variation of modulation, so it is possible to secure versatility in various AI models by considering the variation of modulation according to the number of features.

도 3은 본 발명의 실시예에 따른 AI 모델의 견고성 수준 평가를 위한 견고성 인덱스의 산출 과정을 도시한 것이다. 도 3을 참조하면, 견고성 인덱스인 ARS 스코어는 기존에 수집한 학습 데이터 셋을 기반으로 하는 AI 모델과 본 실시예에 따른 AI 모델의 재학습으로 구축된 적대적 학습 시스템에서 ARS를 각각 산출하여 비교하는 과정을 나타낸다.Figure 3 shows the process of calculating the robustness index for evaluating the robustness level of an AI model according to an embodiment of the present invention. Referring to Figure 3, the ARS score, which is a robustness index, calculates and compares the ARS in an AI model based on a previously collected learning data set and an adversarial learning system built by retraining the AI model according to this embodiment. It represents the process.

본 실시예에서, 일반 데이터를 학습하는 AI 모델은 여러 개의 은닉층으로 이루어진 인공신경망 모델을 학습 네트워크로 하는 모델일 수 있으며, 바람직하게는 DNN 모델일 수 있다. 기존에 수집한 학습 데이터셋을 기반으로 AI 모델이 구축되면, 이후 학습 데이터셋을 대상으로 전처리 모듈(10)의 수행에 따른 적대적 트레이닝 샘플을 생성한다. 도 3에서는 Adversrial Sample이 될 수 있다. 생성한 Adversrial Sample을 기존 학습 데이터셋에 추가하여 AI 모델을 재학습하고, 재학습 이전과의 견고성 수준 비교를 위해 그룹별 데이터를 대상으로 다시 한번 Adversrial Sample이 생성한다. 이전의 ARS 수치와 적대적 학습 이후의 ARS 수치를 비교하여 ARS 수치가 향상되도록 하는 것이 본 적대적 학습 시스템의 학습 로직이 될 수 있다. In this embodiment, the AI model that learns general data may be a model that uses an artificial neural network model consisting of multiple hidden layers as a learning network, and preferably may be a DNN model. When an AI model is built based on a previously collected learning dataset, an adversarial training sample is generated by performing the preprocessing module 10 on the learning dataset. In Figure 3, it can be an Adversrial Sample. The generated Adversrial Sample is added to the existing learning dataset to retrain the AI model, and the Adversrial Sample is created once again for each group's data to compare the robustness level with before retraining. The learning logic of this adversarial learning system can be to improve the ARS value by comparing the previous ARS value with the ARS value after adversarial learning.

출력 모듈(50)은 상기 견고성 측정 모듈이 산출한 상기 견고성 인덱스의 값을 이용하여 적대적 공격의 악성코드 그룹별 견고성 수준 측정 현황을 출력할 수 있다. The output module 50 may output the robustness level measurement status for each malicious code group of a hostile attack using the value of the robustness index calculated by the robustness measurement module.

본 실시예에 따른 견고성 측정 시스템은 AI 모델에 상기 적대적 트레이닝 샘플을 학습 데이터셋에 추가하여 상기 견고성 인덱스가 향상되도록 상기 AI 모델의 재학습을 수행시키는 시스템으로 구축 및 활용될 수 있다.The robustness measurement system according to this embodiment can be built and utilized as a system that performs retraining of the AI model to improve the robustness index by adding the adversarial training sample to the AI model to the learning dataset.

적대적 학습 실험례 : 견고성 강화 및 평가Adversarial Learning Experiment: Enhancing and Evaluating Robustness

적대적 공격 기법들은 대부분 입력 데이터가 이미지인 AI 모델을 고려하여 개발된다. 하지만, 이미지와 같이 각 피쳐의 범위가 일정하지 않은 데이터에 대해서는 적용하기 부적합할 수 있다. 이에 따라, 본 실험례에서는 악성코드 피쳐링(Featuring)으로, 최대-최소 정규화(min-max normalization) 기법을 사용하여 데이터의 피쳐 값들을 0과 1 사이로 변환하였다. Most adversarial attack techniques are developed considering AI models whose input data is images. However, it may be inappropriate to apply to data where the range of each feature is not constant, such as images. Accordingly, in this experimental example, the feature values of the data were converted between 0 and 1 using the min-max normalization technique for malicious code featuring.

견고성 인덱스인 ARS 측정 방식은 다음과 같다. L2 변수 기반의 유클라디안 거리 측정 방식을 사용하여 원본 데이터와 적대적 트레이닝 샘플 간의 거리를 측정한다. 측정한 거리 값을 바탕으로 각 그룹 별 거리 값들의 평균을 산출한다. 이 때, 적대적 트레이닝 샘플 생성에 실패한 데이터의 경우 제외시킨 후, 각 그룹 별 적대적 트레이닝 샘플 생성에 성공한 N개에 대하여 평균을 산출한다.The method of measuring ARS, a robustness index, is as follows. The distance between the original data and the adversarial training sample is measured using the Euclidean distance measurement method based on L2 variables. Based on the measured distance values, the average of the distance values for each group is calculated. At this time, data that failed to generate adversarial training samples is excluded, and then the average is calculated for the N successful adversarial training samples for each group.

실험을 위한 데이터셋은 2019 KISA Datachallenge 악성코드 데이터셋을 사용하였다. 데이터셋의 구성은 [표 1]과 같다.The 2019 KISA Datachallenge malware dataset was used as the dataset for the experiment. The composition of the dataset is as shown in [Table 1].

[표 1][Table 1]

학습 데이터셋은 악성 17,562개, 정상 11,568개 총 29,130개의 데이터를 사용하였으며, 테스트 데이터셋은 악성 4,513개, 정상 4,518개 총 9,031개의 데이터를 사용하였다. 해당 데이터 셋에서 확인된 AVClass 약 800가지 중 가장 많이 검출된 상위 5개의 AVClass만을 사용하여 본 실험을 진행하였다.The training dataset used a total of 29,130 data, including 17,562 malicious and 11,568 normal, and the test dataset used a total of 9,031 data, including 4,513 malicious and 4,518 normal. This experiment was conducted using only the top 5 most detected AVClasses among approximately 800 AVClasses identified in the data set.

AVClass의 구성은 [표 2]와 같다.The composition of AVClass is as shown in [Table 2].

[표 2][Table 2]

학습 데이터는 autoit 517개, ramnit 369개, scar 281개, winactivator 288개, zegost 198개로 총 1,653개를 사용하였으며, 테스트 데이터는 autoit 171개, ramnit 57개, scar 17개, winactivator 56개, zegost 37개로 총 388개를 사용하였다. Dataset의 Feature 구성은 악성코드의 PE(Portable Executable) 구조를 분석하여 Feature를 추출하였다. PE 파일의 PE header와 PE section에는 파일의 실행에 필요한 정보가 존재한다. 이 중 PE header의 정보에서 37개의 feature를 추출하였고, PE section의 Entropy를 사용하여 128개의 feature로 변환하였다. 위의 방식을 통하여 PE 파일로부터 총 165개의 feature를 추출하였으며, 이를 학습 데이터셋의 feature로 활용하였다.A total of 1,653 learning data were used: 517 autoit, 369 ramnit, 281 scar, 288 winactivator, and 198 zegost. Test data used was 171 autoit, 57 ramnit, 17 scar, 56 winactivator, and 37 zegost. A total of 388 pieces were used. The feature composition of the dataset was extracted by analyzing the PE (Portable Executable) structure of the malicious code. The PE header and PE section of the PE file contain information necessary for executing the file. Among these, 37 features were extracted from the information in the PE header and converted to 128 features using the Entropy of the PE section. A total of 165 features were extracted from the PE file through the above method, and these were used as features of the learning dataset.

일반 데이터를 학습하는 AI 모델은 여러 개의 은닉층으로 이루어진 DNN 모델을 사용하였다. 사용된 Parameter는 과적합을 방지하기 위하여 초반 3개의 Layer에서 Dropout 0.25를 적용하였고 batch_size = 100, epochs = 93을 학습 파라미터로 지정하였다. 활성화 함수는 relu을 사용하였으며, 마지막 계층에 2개의 출력으로 softmax를 사용하여 각 라벨에 대한 확률값이 산출되도록 모델을 설계하였다. AI 모델의 학습 결과는 [표 3]과 같다.The AI model that learns general data used a DNN model consisting of multiple hidden layers. As for the parameters used, Dropout 0.25 was applied to the first three layers to prevent overfitting, and batch_size = 100 and epochs = 93 were specified as learning parameters. Relu was used as the activation function, and the model was designed to calculate probability values for each label using softmax as the two outputs in the last layer. The learning results of the AI model are shown in [Table 3].

[표 3][Table 3]

이후, 전처리 모듈(10)을 이용하여, Adversarial Sample을 생성하였다. 생성시 적대적 공격의 파라메터는 [표 4]와 같다.Afterwards, an Adversarial Sample was created using the preprocessing module (10). The parameters of the hostile attack at the time of creation are as shown in [Table 4].

[표 4][Table 4]

Adversarial training 이전에 생성한 Adversarial sample의 경우 3개 family 각각에 대해 40개씩 총 200개의 Adversarial sample을 생성하였다. 그 결과 autoit 16개, ramnit 23개, scar 15개, winactivator 39개, zegost 31개의 데이터에 대해 Adversarial sample 생성을 성공하였다. ARS 수준 측정 결과 autoit 0.3650, ramnit 0.7513, scar 4.4400, winactivator 0.6370, zegost 1.0574의 수치를 나타내었다. Adversarial training 이후에 생성한 Adversarial sample의 경우 Adversarial training 이전과 같이 3개 family 각각에 대해 40개씩 총 120개의 Adversarial sample을 생성하였다. 그 결과 autoit 16개, ramnit 22개, scar32개, winactivator 39개, zegost 31개의 데이터에 대해 Adversarial sample 생성을 성공하였다. ARS 수준 측정 결과 autoit 0.5765, ramnit 2.3748, scar 0.9097, winactivator 0.6809, zegost 0.4536의 수치를 나타내었다.In the case of adversarial samples created before adversarial training, a total of 200 adversarial samples were created, 40 for each of the three families. As a result, Adversarial samples were successfully created for 16 autoit, 23 ramnit, 15 scar, 39 winactivator, and 31 zegost data. As a result of measuring the ARS level, the values were autoit 0.3650, ramnit 0.7513, scar 4.4400, winactivator 0.6370, and zegost 1.0574. In the case of adversarial samples created after adversarial training, a total of 120 adversarial samples were created, 40 for each of the three families, as before adversarial training. As a result, adversarial samples were successfully created for 16 autoit, 22 ramnit, 32 scar, 39 winactivator, and 31 zegost data. As a result of measuring the ARS level, the values were autoit 0.5765, ramnit 2.3748, scar 0.9097, winactivator 0.6809, and zegost 0.4536.

Adversarial training 이전 모델과 Adversarial training 이후 모델의 5개의 family에 대한 Accuracy를 산출하였다. Accuracy 산출 결과는 [표 5]와 같다.Accuracy was calculated for five families of the model before adversarial training and the model after adversarial training. Accuracy calculation results are shown in [Table 5].

[표 5][Table 5]

Adversarial training 이전 모델에서 autoit 1.0, ramnit 0.98245, scar 0.88235, zegost 1.0, winativator 0.98214가 산출되었다. Adversarial training 이전 모델에서 5개의 AVClass에 대한 평균 Accuracy는 0.99817의 수치를 나타내었다.In the model before adversarial training, autoit 1.0, ramnit 0.98245, scar 0.88235, zegost 1.0, and winativator 0.98214 were calculated. In the model prior to adversarial training, the average Accuracy for 5 AVClasses was 0.99817.

Adversarial training 이후 생성된 모델에서는 autoit 0.99415, ramnit 1.0, scar 0.88235, zegost 0.97297, winativator 0.98214가 산출되었다. Adversarial training 이후 모델에서 5개의 AVClass에 대한 평균 Accuracy는 0.98520의 수치를 나타내었다. 산출 결과를 통해 Adversarial training 이후 5개의 AVClass 중 ramnit의 경우 Accuracy가 증가하는 것을 확인할 수 있었으며, scar, winativator에서는 Accuracy가 유지, autoit, zegost은 오히려 Accuracy가 감소하는 경향이 있음을 확인할 수 있다.The model created after adversarial training yielded autoit 0.99415, ramnit 1.0, scar 0.88235, zegost 0.97297, and winativator 0.98214. After adversarial training, the average Accuracy for the five AVClasses in the model was 0.98520. Through the calculation results, it was confirmed that among the five AVClasses after adversarial training, the accuracy increased in the case of ramnit, the accuracy was maintained in scar and winativator, and the accuracy tended to decrease in autoit and zegost.

도 4는 본 발명의 실시예에 따른 적대적 학습 이전의 견고성 인덱스와 적대적 학습 이후의 견고성 인덱스를 비교한 실험례의 결과를 나타낸다.Figure 4 shows the results of an experiment comparing the robustness index before adversarial learning and the robustness index after adversarial learning according to an embodiment of the present invention.

도 4를 참조하면, ARS_1은 Adversarial training 이전의 ARS 수치를 의미하며, ARS_2는 Adversarial training 이후의 ARS 수치를 의미한다. family 중 autoit, ramnit, winactivator의 경우에는 ARS 수치가 정상적으로 상승한 것을 확인할 수 있었지만, scar와 zegost의 경우 ARS 수치가 오히려 줄어든 것을 확인할 수 있다.Referring to Figure 4, ARS_1 refers to the ARS value before adversarial training, and ARS_2 refers to the ARS value after adversarial training. In the case of autoit, ramnit, and winactivator among the families, it was confirmed that the ARS level increased normally, but in the case of scar and zegost, it was confirmed that the ARS level actually decreased.

이상에서 대표적인 실시예를 통하여 본 발명을 상세하게 설명하였으나, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리 범위는 설명한 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 특허청구범위와 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태에 의하여 정해져야 한다. Although the present invention has been described in detail through representative embodiments above, those skilled in the art will understand that various modifications can be made to the above-described embodiments without departing from the scope of the present invention. will be. Therefore, the scope of rights of the present invention should not be limited to the described embodiments, but should be determined not only by the claims described later, but also by all changes or modified forms derived from the claims and the concept of equivalents.

1: 견고성 측정 시스템
10: 전처리 모듈
30: 견고성 측정 모듈
50: 출력 모듈1: Robustness measurement system
10: Preprocessing module
30: Robustness measurement module
50: output module

Claims

In the robustness measurement system of AI models to respond to adversarial attacks,
A preprocessing module that generates an adversarial training sample that is close to the original but causes misclassification using the distance parameter between the original data and the attack sample generated due to the success of the adversarial attack;
a robustness measurement module that calculates the average value of the distance between the adversarial training sample and the original data as a robustness index value for evaluating the robustness level of the AI model; and
Including an output module that outputs the robustness level measurement status for each malicious code group of a hostile attack using the value of the robustness index calculated by the robustness measurement module.
An AI model robustness measurement system characterized by outputting an evaluation index of the robustness of the AI model against adversarial attacks.

According to claim 1,
The robustness measurement module is,
As the robustness index index makes it more difficult to generate the adversarial training samples, the noise increases, and thus the value of the warning index increases. As the value of the robustness index increases, the generation of the adversarial training samples becomes more difficult. have characteristics,
A robustness measurement system for an AI model, characterized in that the robustness index, which is an evaluation standard for the robustness of the AI model, is set based on the difficulty of generating the adversarial training sample.

According to claim 2,
The robustness measurement module is,
A robustness measurement system for an AI model, characterized in that the robustness index is calculated based on [Relational Equation 1].
[Relational Expression 1]

Here, MARS refers to the Model Adversarial Robustness Score and refers to the robustness index, S refers to the set of data that successfully generated the adversarial training sample for each group, i refers to the feature index, and n refers to the number of features. This means, r means the range of the feature, x is the original data, x' is the adversarial training sample, and P means the predicted value of the AI model.

According to claim 1,
The robustness measurement module is,
A robustness measurement system for an AI model, characterized in that when calculating the robustness index, data that fails to generate the adversarial training samples is excluded, and then the average value is calculated for the N numbers that succeeded in generating the adversarial training samples for each group of data. .

A smartphone, tablet, laptop, or computer that has an input means for inputting data, a processing means for processing the input data, and an output means and has an AI model for learning general data stored in the medium,
A preprocessing step of generating an adversarial training sample that is close to the original but causes misclassification using the distance parameter between the original data and the attack sample generated due to the success of the adversarial attack;
A robustness measurement step of calculating the average value of the distance between the adversarial training sample and the original data as a value of the robustness index to evaluate the robustness level of the AI model; and
An output step of outputting the robustness level measurement status for each malicious code group of a hostile attack using the value of the robustness index calculated in the robustness measurement step; A robustness measurement application of an AI model stored in a medium to execute.