KR20090005934A

KR20090005934A - Apparatus and method for detection of malicious program using program behavior

Info

Publication number: KR20090005934A
Application number: KR1020070099977A
Authority: KR
Inventors: 박태준; 신강근; 신 후; 아비짓 보스
Original assignee: 삼성전자주식회사; 더 리젠츠 오브 더 유니버시티 오브 미시건
Priority date: 2007-07-10
Filing date: 2007-10-04
Publication date: 2009-01-14
Also published as: KR101421136B1; KR20090005933A; KR101329141B1

Abstract

An apparatus and a method for detecting a malicious program by using behavior of a computer are provided to generates optimized malicious code diagnosis data which are used for judging whether a computer program executed in a computer system is a normal code or a malicious code. A behavior feature vector creation unit(410) creates a first behavior feature vector based on behavior features extracted from a diagnosis target program, and a diagnosis data storage unit(420) stores plural second behavior feature vectors for a sample program of which malicious feature is known to everybody. A code diagnosis unit(430) diagnoses whether or not a diagnosis target program is a malicious code by comparing the first behavior feature vector with the plural behavior feature vectors, and the code diagnosis unit consists of a distance calculation unit(440) and a code judgment unit(450).

Description

Apparatus and method for diagnosing whether or not a malicious program using behavior of computer program {APPARATUS AND METHOD FOR DETECTION OF MALICIOUS PROGRAM USING PROGRAM BEHAVIOR}

본 발명은 컴퓨터 시스템에서 실행되는 컴퓨터 프로그램이 악성 프로그램인지 여부를 진단하는 방법 및 장치에 관한 것으로, 더욱 상세하게는 컴퓨터 프로그램의 행동을 이용하여 컴퓨터 프로그램이 악성 프로그램인지 여부를 진단하는 장치 및 방법과, 상기 장치를 생성하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for diagnosing whether a computer program executed in a computer system is a malicious program, and more particularly, to an apparatus and method for diagnosing whether a computer program is a malicious program using the behavior of a computer program; And a method and apparatus for generating the apparatus.

종래의 악성 프로그램 감시 기술에 따르면, 기존에 발생한 악성 프로그램의 샘플을 수집하여 악성 프로그램의 특징이 되는 일정한 문자열을 추출하고, 상기 문자열이 진단 대상 컴퓨터 시스템의 파일 등에 존재하는지 여부를 이용하여 악성 프로그램에 감염 되었는지 여부를 판단하였다.According to the conventional malicious program monitoring technology, a sample of a conventionally generated malicious program is collected to extract a constant character string that is characteristic of the malicious program, and whether the character string is present in a file of a diagnosis target computer system or the like is applied to the malicious program. It was determined whether it was infected.

따라서, 새로운 악성 프로그램이 발생하면, 상기 새로운 악성 프로그램의 정체를 파악하고, 특징이 되는 문자열을 추출하여 상기 악성 프로그램에 대응하는 악성 프로그램을 감시하는 악성 프로그램 진단 장치가 개발되어야 했다. 이에, 기존의 악성 프로그램 진단 장치에 상기 악성 프로그램의 정보가 추가되기 전에는 상 기 새로운 악성 프로그램에 대한 대비가 불가능하여, 새로운 악성 프로그램에 대한 피해는 막을 수 없다는 문제점을 가지고 있었다. 또, 악성 프로그램의 종류가 증가함에 따라서, 악성 프로그램의 특징이 되는 문자열의 종류도 비례하여 증가한다. 따라서 악성 프로그램을 진단 장치가 악성 프로그램의 특징이 문자열이 존재하는지 여부를 검사하는 과정에 소요되는 시간도 증가할 수 밖에 없었다.Therefore, when a new malicious program occurs, a malicious program diagnostic apparatus for identifying the identity of the new malicious program, extracting a character string, and monitoring the malicious program corresponding to the malicious program should be developed. Therefore, before the information of the malicious program is added to the existing malicious program diagnosis apparatus, it is impossible to prepare for the new malicious program, and thus, the damage to the new malicious program cannot be prevented. Also, as the types of malicious programs increase, the types of character strings that characterize the malicious programs also increase proportionally. Therefore, the time required to diagnose the malicious program was increased in the process of checking whether the character string of the malicious program is present.

만약 배터리등을 이용하여 전원을 공급받는 이동 전화, PDA등의 모바일 기기에서는 특정 프로그램에서 문자열을 추출하고, 종래 악성 프로그램의 특징인 문자열과 동일한지 여부를 확인하는 절차에 소모되는 전력으로 인하여 모바일 기기의 동작 가능시간이 줄어드는 문제가 발생한다.If a mobile device such as a mobile phone or PDA, which is powered by a battery, is used to extract a character string from a specific program and check whether it is the same as a character string, which is a characteristic of a conventional malicious program, the mobile device is consumed. The problem occurs that the operation time of the decrease.

또한, 종래의 기술로는, 해커의 공격으로 인하여 컴퓨터의 취약점이 공개되면, 프로그램의 제작사가 상기 취약점을 수정한 패치(patch) 프로그램을 이용하여 해커의 공격을 저지할 수 있을 뿐이고, 아직 공개되지 않은 수많은 취약점을 이용한 다른 공격에 대해서는 대처할 수 없었다.In addition, in the related art, when a vulnerability of a computer is disclosed due to a hacker's attack, the manufacturer of the program may stop the attack of the hacker by using a patch program that fixes the vulnerability, and is not yet disclosed. It could not cope with other attacks that used a number of vulnerabilities.

대부분의 악성 프로그램들은 기존의 악성 프로그램과 전혀 다른, 새로운 프로그램이 아니라, 기존 악성 프로그램의 변종이다. 그러나 변종 악성 프로그램들을 검사하기 위해서는 종래 악성 프로그램으로부터 추출한 문자열이 아니라 변종 악성 프로그램으로부터 추출한 새로운 문자열을 사용해야 하므로, 각각의 변종 악성 프로그램들을 검사하기 위해서는 각각의 문자열을 구비해야 하였다.Most malicious programs are not new programs, they are completely different from existing malicious programs, but variants of existing malicious programs. However, in order to check variant malicious programs, it is necessary to use a new string extracted from a variant malicious program, not a string extracted from a conventional malicious program. Therefore, each string has to be provided to check each variant malicious program.

본 발명은 컴퓨터 시스템에서 실행되는 컴퓨터 프로그램의 행동을 이용하여 특정 컴퓨터 프로그램이 정상적인 코드인지, 악성 코드인지 여부를 판단하는 것을 목적으로 한다.An object of the present invention is to determine whether a particular computer program is normal code or malicious code using the behavior of a computer program running on a computer system.

본 발명은 컴퓨터 시스템에서 실행되는 컴퓨터 프로그램이 정상적인 코드인지, 악성 코드인지 여부를 판단하기 위해 사용되는 악성 코드 진단 데이터를 생성하는 것을 목적으로 한다.An object of the present invention is to generate malicious code diagnostic data used to determine whether a computer program executed in a computer system is normal code or malicious code.

상기의 목적을 이루고 종래기술의 문제점을 해결하기 위하여, 본 발명은 진단 대상 프로그램으로부터 추출한 행동 특징에 기반하여 제1 행동 특징 벡터를 생성하는 행동 특징 벡터 생성부, 이미 악성 여부가 알려진 복수의 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터를 저장하는 진단 데이터 저장부, 및 상기 제1 행동 특징 벡터를 상기 복수의 제2 행동 특징 벡터와 비교하여, 상기 진단 대상 프로그램이 악성 코드인지 여부를 진단 하는 코드 진단부를 포함하는 것을 특징으로 하는 악성 코드 진단 장치를 제공한다.In order to achieve the above object and to solve the problems of the prior art, the present invention provides a behavioral feature vector generator for generating a first behavioral feature vector based on a behavioral feature extracted from a diagnosis target program, and a plurality of sample programs that are known to be malicious. A diagnostic data storage for storing a plurality of second behavioral feature vectors, and a code for diagnosing whether the target program is a malicious code by comparing the first behavioral feature vector with the plurality of second behavioral feature vectors It provides a malicious code diagnostic apparatus comprising a diagnostic unit.

본 발명의 일측에 따르면, 진단 대상 프로그램으로부터 추출한 행동 특징에 기반하여 제1 행동 특징 벡터를 생성하는 단계, 이미 악성 여부가 알려진 복수의 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터를 로드하는 단계, 및 상기 제1 행동 특징 벡터를 상기 복수의 제2 행동 특징 벡터와 비교하여, 상기 진단 대상 프 로그램이 악성 코드인지 여부를 진단하는 단계를 포함하는 것을 특징으로 하는 악성 코드 진단 방법이 제공된다.According to one aspect of the invention, generating a first behavioral feature vector based on the behavioral features extracted from the program to be diagnosed, loading a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious, And comparing the first behavioral feature vector with the plurality of second behavioral feature vectors to diagnose whether the program to be diagnosed is a malicious code.

본 발명의 일측에 따르면, 악성 코드 진단 데이터 생성 장치에 있어서, 이미 악성 여부가 알려진 복수의 샘플 프로그램으로부터 각각의 행동 특징 벡터를 생성하는 행동 특징 벡터 생성부, 상기 복수의 샘플 프로그램에 대한 행동 특징 벡터 및 악성 여부에 기반하여 가중치 벡터를 결정하는 가중치 벡터 결정부, 및 상기 각각의 행동 특징 벡터 및 상기 가중치 벡터를 저장하는 진단 데이터 저장부를 포함하고, 상기 행동 특징 벡터 및 상기 가중치 벡터는 상기 진단 대상 프로그램이 악성 코드인지 여부를 판단하는데 이용되는 것을 특징으로 하는 악성 코드 진단 데이터 생성 장치가 제공된다.According to an aspect of the present invention, in the apparatus for generating malicious code diagnostic data, a behavior feature vector generator for generating respective behavior feature vectors from a plurality of sample programs that are known to be malicious, and the behavior feature vectors for the plurality of sample programs. And a weight vector determiner configured to determine a weight vector based on the malicious status, and a diagnostic data storage configured to store the respective behavior feature vector and the weight vector, wherein the behavior feature vector and the weight vector are the diagnostic target program. Provided is an apparatus for generating malicious code diagnostic data, which is used to determine whether the malicious code is malicious.

본 발명의 일측에 따르면, 악성 코드 진단 데이터 생성 방법에 있어서, 이미 악성 여부가 알려진 복수의 샘플 프로그램으로부터 각각의 행동 특징 벡터를 생성하는 단계, 상기 복수의 샘플 프로그램에 대한 행동 특징 벡터 및 악성 여부에 기반하여 가중치 벡터를 결정하는 단계, 및 상기 각각의 행동 특징 벡터 및 상기 가중치 벡터를 저장하는 단계를 포함하고, 상기 행동 특징 벡터 및 상기 가중치 벡터는 상기 진단 대상 프로그램이 악성 코드인지 여부를 판단하는데 이용되는 것을 특징으로 하는 악성 코드 진단 데이터 생성 방법이 제공된다.According to one aspect of the present invention, in the method for generating malicious code diagnosis data, generating each behavioral feature vector from a plurality of sample programs that are known to be malicious; Determining a weight vector based on the weight vector, and storing the respective behavior feature vector and the weight vector, wherein the behavior feature vector and the weight vector are used to determine whether the diagnosis target program is a malicious code. Provided is a method for generating malicious code diagnostic data, characterized in that.

본 발명에 따르면 컴퓨터 시스템에서 실행되는 컴퓨터 프로그램의 행동을 이용하여 특정 컴퓨터 프로그램이 정상적인 코드인지, 악성 코드인지 여부를 판단할 수 있다.According to the present invention, it is possible to determine whether a particular computer program is normal code or malicious code by using the behavior of the computer program running on the computer system.

본 발명에 따르면 컴퓨터 시스템에서 실행되는 컴퓨터 프로그램이 정상적인 코드인지, 악성 코드인지 여부를 판단하기 위해 사용되는 최적의 악성 코드 진단 데이터를 생성할 수 있다.According to the present invention, it is possible to generate optimal malicious code diagnostic data used to determine whether a computer program running on a computer system is normal code or malicious code.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention;

도 1은 본 발명에 따라서 컴퓨터 프로그램의 행동을 이용하여 악성 프로그램을 검사하는 방법을 단계별로 도시한 순서도이다. 이하 도 1을 참조하여 본 발명에 따라서 모델링된 컴퓨터 프로그램의 행동을 이용하여 악성 프로그램을 검사하는 방법을 상세히 설명하기로 한다.1 is a flowchart illustrating a step-by-step method for checking malicious programs using the behavior of computer programs in accordance with the present invention. Hereinafter, a method of checking a malicious program using the behavior of a computer program modeled according to the present invention will be described in detail with reference to FIG. 1.

본 발명에서 설명하는 악성 프로그램 또는 악성 코드는 제작자가 의도적으로 사용자에게 피해를 주고자 만든 모든 악의의 목적을 가진 프로그램 및 매크로(macro), 스크립트(script) 등 컴퓨터 시스템상에서 작동하는 모든 실행가능한 형태의 프로그램을 말한다.The malicious program or malicious code described in the present invention may be any and all forms of executable programs that operate on a computer system such as macros, scripts, and all malicious purposes created by the creators to intentionally damage the user. Say the program.

단계(S110)에서는 샘플 프로그램의 행동을 모델링하여 행동 특징 벡터를 생성한다. 샘플 프로그램에는 악성 프로그램뿐만 아니라, 정상 프로그램도 포함된다.In operation S110, a behavior feature vector is generated by modeling the behavior of the sample program. The sample program includes not only malicious programs but also normal programs.

단계(S120)에서는 악성 코드 진단 데이터를 생성한다. 악성 코드 진단 장치는 악성 코드 진단 데이터 및 단계(S110)에서 생성한 행동 특징 벡터를 이용하여 악성 프로그램과 정상 프로그램을 구분할 수 있다. 본 발명의 일실시예에 따르면, 악성 코드 진단 장치는 정상 프로그램의 행동을 모델링한 행동 특징 벡터에 대해서는 임계치 이상의 값을 산출하고, 악성 프로그램의 행동을 모델링한 행동 특징 벡터에 대해서는 임계치 미만의 값을 산출하도록 생성될 수 있다.In operation S120, malicious code diagnostic data is generated. The malware diagnosis apparatus may distinguish between a malicious program and a normal program using the malicious code diagnosis data and the behavioral feature vector generated in step S110. According to an embodiment of the present invention, the apparatus for diagnosing malware calculates a value greater than or equal to a threshold for a behavioral feature vector modeling the behavior of a normal program and a value less than or equal to a threshold for an behavioral feature vector modeling the behavior of a malicious program. Can be generated to calculate.

단계(S130)에서는 진단 대상 프로그램으로부터 생성한 행동 특징 벡터와 단계(S120)에서 생성된 악성 코드 진단 데이터를 이용하여 컴퓨터 시스템에서 실행되는 진단 대상 프로그램이 악성 프로그램인지, 정상 프로그램인지 여부를 진단한다.In operation S130, the diagnosis target program executed in the computer system is diagnosed as a malicious program or a normal program using the behavioral feature vector generated from the diagnostic target program and the malicious code diagnostic data generated in operation S120.

만약 진단 대상 프로그램이 정상 프로그램이라면, 악성 프로그램을 검사 장치는 진단 대상 프로그램의 행동 특징 벡터에 대하여 임계치 이상의 값을 산출하고, 만약 진단 대상 프로그램이 악성 프로그램이라면, 악성 프로그램을 검사하는 장치는 진단 대상 프로그램의 행동 특징 벡터에 대하여 임계치 미만의 값을 산출할 수 있다.If the program to be diagnosed is a normal program, the device for testing a malicious program calculates a value greater than or equal to a threshold for the behavioral feature vector of the program to be diagnosed. If the program to be diagnosed is a malicious program, the device for testing the malicious program is determined to be a program. A value below the threshold may be calculated for the behavioral feature vector of.

대부분의 악성 프로그램은 전혀 새로운 악성 프로그램이 아니다. 기존에 존재하는 악성 프로그램의 일부를 수정하여 개량한 변종 악성 프로그램에 불과하다. 따라서, 기존의 악성 프로그램 검사 장치로는 악성 프로그램으로 판단하지 못하는 새로운 악성 프로그램의 행동도 기존의 악성 프로그램의 행동과 유사하다.Most malicious programs are not new malicious programs at all. It is only a modified malicious program that has been modified by modifying some of the existing malicious programs. Therefore, the behavior of a new malicious program that cannot be determined as a malicious program by the existing malicious program inspection device is similar to that of an existing malicious program.

또 대부분의 악성 프로그램은 사용자의 컴퓨터 시스템에 침입하여 사용자의 데이터를 삭제하거나, 시스템 파일을 삭제하는 등 실제 행동은 대부분 유사하다.In addition, most malicious programs infiltrate the user's computer system and delete the user's data or delete the system files.

따라서, 진단 대상 프로그램의 행동을 이용하여, 진단 대상 프로그램이 정상적인 프로그램인지, 악성 프로그램인지 여부를 판단한다면, 기존 문자열 비교 방식보다 훨씬 정확하게 악성 프로그램 여부를 판단할 수 있다. 또한 지금까지 알려지 지 않은 새로운 변종 악성 프로그램에 대해서도 그 행동 만으로 악성 프로그램 여부를 판단할 수 있다.Therefore, if it is determined whether the diagnosis target program is a normal program or a malicious program using the behavior of the diagnosis target program, it is possible to determine whether the malicious program is much more accurate than the conventional string comparison method. In addition, the behavior of a new variant of a malicious program that has not been known so far can determine whether it is a malicious program.

종래 기술과 달리 새로운 악성 프로그램에 대한 정보를 분석할 필요가 없으므로, 새로운 악성 프로그램을 분석하는 시간에 발생하는 피해를 줄일 수 있다.Unlike the prior art, it is not necessary to analyze information on a new malicious program, thereby reducing damage occurring at the time of analyzing a new malicious program.

도 2는 샘플 프로그램으로부터 생성한 행동 특징 벡터를 행동 특징 벡터 공간상에 도시한 도면이다. 이하 도 2를 참조하여 진단 대상 프로그램으로부터 생성한 행동 특징 벡터를 이용하여 악성 프로그램 여부를 진단하는 개념을 상세히 설명하기로 한다.2 is a diagram illustrating a behavioral feature vector generated from a sample program in a behavioral feature vector space. Hereinafter, a concept of diagnosing a malicious program using a behavioral feature vector generated from a diagnosis target program will be described in detail with reference to FIG. 2.

도2 에서는 정상적인 프로그램으로부터 생성한 행동 특징 벡터(211, 212) 및 악성 프로그램으로부터 생성한 행동 특징 벡터(211, 222)가 2차원의 행동 특징 벡터 공간상에 위치한 실시예가 도시되었으나, 본 발명은 행동 특징 벡터 공간이 2차원 이상인 경우에 대해서도 적용 될 수 있다.2 illustrates an embodiment in which behavior feature vectors 211 and 212 generated from a normal program and behavior feature vectors 211 and 222 generated from a malicious program are located in a two-dimensional space of behavior feature vectors. The same can be applied to the case where the feature vector space is two or more dimensions.

정상적인 프로그램으로부터 추출한 행동 특징들은 서로 유사하고, 악성 프로그램으로부터 추출한 행동 특징들은 서로 유사하다. 그러나 정상 프로그램으로부터 추출한 행동 특징과 악성 프로그램으로부터 추출한 행동 특징은 서로 유사하지 않다. 따라서 정상적인 프로그램과 악성 프로그램으로부터 추출한 각각의 행동 특징에 기반하여 생성된 정상 행동 특징 벡터와 악성 행동 특징 벡터는 서로 비유사하다. 정상 행동 특징 벡터(211, 212)들과 악성 행동 특징 벡터(221, 222)들은 행동 특징 벡터 공간상에서 서로 구분된 영역에 위치한다.Behavioral features extracted from normal programs are similar to each other, and behavioral features extracted from malicious programs are similar to each other. However, behavioral features extracted from normal programs and behavioral features extracted from malicious programs are not similar. Therefore, normal and malignant feature vectors generated based on the behavioral features extracted from normal and malicious programs are similar to each other. The normal behavioral feature vectors 211 and 212 and the malicious behavioral feature vectors 221 and 222 are located in regions separated from each other in the behavioral feature vector space.

따라서, 본 발명에 따른 악성 코드 진단 장치는 행동 특징 벡터 공간을 정상 행동 특징 벡터(211, 212)가 위치하는 정상 행동 특징 벡터 영역(210)과 악성 행동 특징 벡터(221, 222)가 위치하는 악성 행동 특징 벡터 영역(220)으로 구분한다. 악성 코드 진단 장치는 진단 대상 프로그램으로부터 생성한 행동 특징 벡터가 정상 행동 특징 벡터 영역(210) 및 악성 행동 특징 벡터 영역(220) 중에서 어느 영역에 속하는지 여부로 진단 대상 프로그램이 악성 코드 인지 여부를 판단할 수 있다.Accordingly, the apparatus for diagnosing malicious code according to the present invention includes a malicious feature feature region 210 in which the normal feature feature vectors 211 and 212 are located and a malicious feature feature vector 221 and 222 located in the behavior feature vector space. Behavioral feature vector region 220. The malware diagnosis apparatus determines whether the diagnosis target program is malicious code based on whether the behavior feature vector generated from the diagnosis target program belongs to the normal behavior feature vector region 210 or the malicious behavior feature vector region 220. can do.

정상 행동 특징 벡터 영역(210)과 악성 행동 특징 벡터 영역(220)을 구분하는 악성 코드 진단 장치는 여러 가지(231, 232)가 있을 수 있다.There may be various types of malware diagnosis apparatuses 231 and 232 that distinguish the normal behavior feature vector region 210 and the malicious behavior feature vector region 220.

행동 특징 벡터 공간상의 특정한 영역(240)에 위치한 행동 특징 벡터에 상응하는 진단 대상 프로그램들은 각각의 악성 코드 진단 장치에 따라서 악성 코드 여부에 대한 진단 결과가 달라질 수 있다.The diagnostic target programs corresponding to the behavioral feature vector located in the specific region 240 in the behavioral feature vector space may have different diagnosis results depending on the malicious code diagnosis apparatus.

도 3은 본 발명에 따라 행동 특징 벡터 공간을 악성 벡터 공간과 정상 벡터 공간으로 구분하는 악성 코드 진단 데이터 중에서 최적의 데이터를 선택하는 개념을 도시한 도면이다. 이하 도 3을 참조하여 최적의 데이터를 선택하는 개념을 상세히 설명하기로 한다.3 is a diagram illustrating a concept of selecting optimal data from malicious code diagnostic data that divides a behavioral feature vector space into a malicious vector space and a normal vector space according to the present invention. Hereinafter, the concept of selecting the optimal data will be described in detail with reference to FIG. 3.

정상 행동 특징 벡터(311, 312, 313)가 정상 행동 특징 벡터 영역(310)에, 악성 행동 특징 벡터(321, 322, 323, 324)가 악성 행동 특징 벡터 영역(320)에 위치한다. 악성 코드 진단 장치는 제1 경계(330) 또는 제2 경계(340)을 이용하여 두 영역을 구분할 수 있다. 제1 경계(330) 및 제2 경계(340)는 악성 코드 진단 장치가 이용하는 진단 데이터에 따라서 결정된다.The normal behavior feature vectors 311, 312, and 313 are located in the normal behavior feature vector region 310, and the malicious behavior feature vectors 321, 322, 323, and 324 are located in the malicious behavior feature vector region 320. The malicious code diagnosis apparatus may distinguish the two regions by using the first boundary 330 or the second boundary 340. The first boundary 330 and the second boundary 340 are determined according to the diagnostic data used by the malicious code diagnosis apparatus.

제1 경계(330)를 이용하여 두 영역을 구분하는 경우에는, 제1 경계(330)와 가장 가까이 위치한 정상 행동 특징 벡터(313) 및 악성 행동 특징 벡터(323) 사이의 거리(333)만큼의 여유(margin)를 가질 수 있다.When the two regions are distinguished by using the first boundary 330, the distance 333 between the normal behavior feature vector 313 and the malicious behavior feature vector 323 located closest to the first boundary 330 may be equal to or greater than the distance 333. It can have margins.

제2 경계(340)를 이용하여 두 영역을 구분하는 경우에는, 제2 경계(340)와 가장 가까이 위치한 정상 행동 특징 벡터(313) 및 악성 행동 특징 벡터(324) 사이의 거리(343)만큼의 여유를 가질 수 있다.In the case of distinguishing the two regions using the second boundary 340, the distance 343 between the normal behavior feature vector 313 and the malicious behavior feature vector 324 located closest to the second boundary 340 may be equal to or greater than the distance 343. You can relax.

악성 코드 진단 장치가 진단 대상 프로그램의 악성 여부를 정확히 진단하기 위해서는, 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)을 구분하는 거리(333, 343)가 최대가 되어야 한다. 따라서 본 발명의 일실시예에 따른 악성 코드 진단 장치는 두 영역(310, 320)을 구분하는 거리가 최대가 되는 제1 경계(330)를 이용하여 진단 대상 프로그램으로부터 생성한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310)에 속하는지, 악성 행동 특징 벡터 영역(320)에 속하는지 여부를 정확히 판단할 수 있다.In order for the malicious code diagnosis apparatus to accurately diagnose whether the diagnosis target program is malicious, the distances 333 and 343 separating the normal behavior feature vector region 310 and the malicious behavior feature vector region 320 should be maximized. Therefore, in the malware diagnosis apparatus according to the embodiment of the present invention, the behavior feature vector generated from the program to be diagnosed using the first boundary 330 at which the distance separating the two regions 310 and 320 becomes the maximum is normal behavior. Whether it belongs to the feature vector region 310 or the malicious behavior feature vector region 320 can be accurately determined.

악성 코드 진단 장치는 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)사이의 거리가 최대가 될 때 정상 프로그램과 악성 프로그램을 정확히 진단할 수 있다. 악성 코드 진단 장치가 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)을 구분하는 경계는 악성 코드 진단 데이터에 의하여 결정되므로, 악성 코드 진단 장치가 제1 경계를 사용하게 하는 악성 코드 진단 데이터가 최적의 악성 코드 진단 데이터이다.The malicious code diagnosis apparatus may accurately diagnose the normal program and the malicious program when the distance between the normal behavior feature vector region 310 and the malicious behavior feature vector region 320 becomes maximum. Since the boundary for distinguishing the normal behavior feature vector region 310 and the malicious behavior feature vector region 320 from the malicious code diagnosis apparatus is determined by the malicious code diagnostic data, the malicious code for causing the malicious code diagnostic apparatus to use the first boundary. Diagnostic data is the optimal malware diagnostic data.

도 4는 본 발명의 일실시예에 따라 진단 대상 프로그램으로부터 생성한 행동 특징 벡터에 기반하여 진단 대상 프로그램이 악성 코드인지 여부를 진단하는 악성 코드 진단 장치의 구조를 도시한 도면이다. 이하 도 4를 참조하여 악성 코드 진단 장치의 구조를 상세히 설명하기로 한다.4 is a diagram illustrating a structure of a malicious code diagnosis apparatus for diagnosing whether a diagnosis target program is a malicious code based on a behavioral feature vector generated from the diagnosis target program according to an embodiment of the present invention. Hereinafter, the structure of a malicious code diagnosis apparatus will be described in detail with reference to FIG. 4.

본 발명에 따른 악성 코드 진단 장치(400)는 행동 특징 벡터 생성부(410), 진단 데이터 저장부(420) 및 코드 진단부(430)를 포함한다.The malicious code diagnosis apparatus 400 according to the present invention includes a behavior feature vector generator 410, a diagnostic data storage unit 420, and a code diagnosis unit 430.

본 발명의 일실시예에 따른 코드 진단부(430)는 거리 계산부(440) 및 코드 판단부(450)를 포함한다.The code diagnosis unit 430 according to an embodiment of the present invention includes a distance calculator 440 and a code determiner 450.

행동 특징 벡터 생성부(410)는 진단 대상 프로그램으로부터 추출한 행동 특징에 기반하여 제1 행동 특징 벡터를 생성한다.The behavioral feature vector generator 410 generates a first behavioral feature vector based on the behavioral feature extracted from the diagnosis target program.

진단 데이터 저장부(420)는 이미 악성 여부가 알려진 복수의 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터를 저장한다. 본 발명의 일실시예에 따르면 복수의 샘플 프로그램은 적어도 하나 이상의 정상 프로그램과 적어도 하나 이상의 악성 프로그램을 포함할 수 있다.The diagnostic data storage unit 420 stores a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious. According to an embodiment of the present invention, the plurality of sample programs may include at least one normal program and at least one malicious program.

본 발명의 일실시예에 따르면, 진단 데이터 저장부(420)는 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터 및 상기 샘플 프로그램 각각의 악성 여부, 및 상기 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터에 기반하여 생성된 가중치 벡터를 저장할 수 있다.According to an embodiment of the present invention, the diagnostic data storage unit 420 may include a plurality of second behavioral feature vectors for a sample program and whether each of the sample programs is malicious, and a plurality of second behavioral feature vectors for the sample program. The weight vector generated based on the S can be stored.

코드 진단부(430)는 제1 행동 특징 벡터를 상기 복수의 제2 행동 특징 벡터와 비교하여, 상기 진단 대상 프로그램이 악성 코드인지 여부를 진단한다.The code diagnosis unit 430 compares the first behavioral feature vector with the plurality of second behavioral feature vectors to diagnose whether the diagnosis target program is a malicious code.

본 발명의 일실시예에 따르면 코드 진단부(430)는 상기 제1 행동 특징 벡터와 상기 복수의 제2 행동 특징 벡터간의 각각의 거리를 계산하여 서로 비교하는 거 리 계산부(440) 및 상기 계산된 거리에 기반하여 상기 진단 대상 프로그램이 악성 코드인지 여부를 판단하는 코드 판단부(450)를 포함할 수 있다.According to an embodiment of the present invention, the code diagnosis unit 430 calculates a distance between the first behavioral feature vector and the plurality of second behavioral feature vectors and compares them with the distance calculation unit 440 and the calculation. It may include a code determination unit 450 that determines whether the diagnostic target program is a malicious code based on the distance.

본 발명의 일실시예에 따르면, 코드 판단부(450)는 정상 프로그램의 행동 특징 벡터와 진단 대상 프로그램의 행동 특징 벡터간의 거리 및 악성 프로그램의 행동 특징 벡터와 진단 대상 프로그램의 행동 특징 벡터간의 거리를 비교한다. 상기 비교된 거리에 기반하여 상기 진단 대상 프로그램의 행동 특징 벡터가 정상 행동 특징 벡터 영역(310) 또는 악성 행동 특징 벡터 영역(320)에 속하는지 여부를 판단하고, 그에 상응하여 진단 대상 프로그램이 악성 프로그램인지 여부를 판단한다.According to an embodiment of the present invention, the code determination unit 450 determines the distance between the behavioral feature vector of the normal program and the behavioral feature vector of the diagnosis target program and the distance between the behavioral feature vector of the malicious program and the behavioral feature vector of the diagnosis target program. Compare. Based on the compared distance, it is determined whether the behavioral feature vector of the program to be diagnosed belongs to the normal behavioral feature vector region 310 or the malicious behavioral feature vector region 320, and correspondingly, the diagnostic target program is a malicious program. Determine whether or not.

본 발명의 일실시예에 따르면 거리 계산부(440)는 제1 행동 특징 벡터와 제2 행동 특징 벡터간의 유클리디안 거리를 계산할 수 있다.According to an embodiment of the present invention, the distance calculator 440 may calculate the Euclidean distance between the first behavioral feature vector and the second behavioral feature vector.

행동 특징 벡터 공간은 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)으로 구분된다.The behavioral feature vector space is divided into a normal behavioral feature vector region 310 and a malicious behavioral feature vector region 320.

본 발명의 일실시예에 따르면 진단 데이터 저장부(420)는 상기 각각의 샘플 프로그램의 제2 행동 특징 벡터 및 악성 여부에 기반하여 결정된 가중치 벡터를 저장한다. 상기 코드 판단부(450)는 상기 계산된 거리에 상기 가중치 벡터의 원소를 곱하여 소정의 임계값과 비교함으로써 진단 대상 프로그램의 행동 특징 벡터가 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)중에서 어느 영역에 위치하는지 결정할 수 있다.According to an embodiment of the present invention, the diagnostic data storage unit 420 stores the weight vector determined based on the second behavioral feature vector and the malicious status of each sample program. The code determining unit 450 multiplies the calculated distance by an element of the weight vector and compares it with a predetermined threshold so that the behavior feature vector of the program to be diagnosed is the normal behavior feature vector region 310 and the malicious behavior feature vector region ( It may be determined in which area in 320.

본 발명의 일실시예에 따르면 코드 판단부(450)는 하기 수학식 1의 값에 따라서 상기 계산된 거리에 상기 가중치 벡터의 원소를 곱할 수 있다.According to an embodiment of the present invention, the code determiner 450 may multiply the calculated distance by an element of the weight vector according to the value of Equation 1 below.

[수학식 1][Equation 1]

여기서,

는 진단 대상 프로그램으로부터 추출한 제1 행동 특징 벡터를,

는 i번째 샘플 프로그램으로부터 추출한 제2 행동 특징 벡터를 각각 나타낸다. here,

Is the first behavioral feature vector extracted from the program to be diagnosed,

Respectively represent the second behavioral feature vectors extracted from the i th sample program.

는 i번째 샘플 프로그램의 악성 여부로서, 본 발명의 일실시예에 따르면 i번째 샘플 프로그램이 정상 프로그램이면 '-1', 악성 프로그램이면 '+1'이 될 수 있다.

Denotes whether the i th sample program is malicious. According to an embodiment of the present invention, 'i' may be '-1' if the i th sample program is a normal program and '+1' if it is a malicious program.

는 두 행동 특징 벡터 a, b 간의 거리에 반비례 하는 값으로서 하기 수학식 2에 의하여 결정된다.

Is a value inversely proportional to the distance between two behavioral feature vectors a and b, and is determined by Equation 2 below.

는 가중치 벡터의 i번째 원소를 나타낸다.

Denotes the i th element of the weight vector.

[수학식 2][Equation 2]

여기서,

은 샘플 프로그램의 개수를,

는 임의의 값을 가지는 상수로 서, 본 발명의 일실시예에 따르면

는 '1'이 될 수 있다.here,

Is the number of sample programs,

Is a constant having an arbitrary value, according to an embodiment of the present invention.

May be '1'.

여기서,

가 0이 아닐 경우에만 대응되는

를 진단 데이터 저장부(420)에 저장하고 수학식 1의 계산에 사용한다. 따라서, 그 외의 제2 행동 특징 벡터들은 수학식 1의 계산에 영향을 미치지 않는다. 이들

가 0이 아닐 경우에만 대응되는

를 지지 벡터(Support Vector)라고 한다. 지지 벡터들은 일반적으로 제1 경계(330) 또는 제2 경계(340) 부근에 위치하므로 그 수가 샘플 프로그램의 개수에 비해 매우 작으며, 따라서 진단 데이터 저장부(420)의 저장 용량 및 수학식 1의 계산량도 매우 작게 된다.here,

Matches only if is not zero

Is stored in the diagnostic data storage unit 420 and used for the calculation of Equation 1. Thus, the other second behavioral feature vectors do not affect the calculation of Equation 1. these

Matches only if is not zero

Is called a Support Vector. Since the support vectors are generally located near the first boundary 330 or the second boundary 340, the number of the support vectors is very small compared to the number of sample programs, so that the storage capacity of the diagnostic data storage unit 420 and The calculation amount is also very small.

본 발명의 일실시예에 따르면 코드 판단부(450)는 하기 수학식 1의 값과 하기 수학식 3에 의하여 결정되는 소정의 임계값을 비교하여 진단 대상 프로그램에 대한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)중에서 어느 영역에 위치하는지 결정할 수 있다.According to an embodiment of the present invention, the code determining unit 450 compares a value of the following Equation 1 with a predetermined threshold value determined by Equation 3 below to determine that the behavioral feature vector for the program to be diagnosed is a normal behavioral feature vector. It may be determined which area of the area 310 and the malicious behavior feature vector area 320.

[수학식 3][Equation 3]

여기서,

는 상기 소정의 임계값을,

은 정상 샘플 프로그램으로부터 추출한 제2 행동 특징 벡터를,

는 악성 샘플 프로그램으로부터 추출한 제2 행동특징 벡터를 각각 나타낸다.here,

Is the predetermined threshold,

Is the second behavioral feature vector extracted from the normal sample program,

Respectively represent the second behavioral feature vectors extracted from the malicious sample program.

본 발명의 일실시예에 따르면, 코드 판단부(450)는 상기 수학식 1의 값에 상기 소정의 임계값을 더하고, 상기 계산결과가 '0'보다 작은 경우에는 진단 대상 프로그램에 대한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310)에, '0'보다 큰 경우라면 진단 대상 프로그램에 대한 행동 특징 벡터가 악성 행동 특징 벡터 영역(320)에 위치한다고 결정할 수 있다.According to an embodiment of the present invention, the code determining unit 450 adds the predetermined threshold value to the value of Equation 1, and when the calculation result is smaller than '0', the behavior feature vector for the diagnosis target program. If is greater than '0' in the normal behavior feature vector region 310, it may be determined that the behavior feature vector for the program to be diagnosed is located in the malicious behavior feature vector region 320.

본 발명의 일실시예에 따르면, 코드 판단부(450)는 하기 수학식 4의 값이 '0'보다 큰 경우에는 진단 대상 프로그램에 대한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310)에, '0'보다 작은 경우라면 진단 대상 프로그램에 대한 행동 특징 벡터가 악성 행동 특징 벡터 영역(320)에 위치한다고 결정할 수 있다.According to the exemplary embodiment of the present invention, when the value of the following Equation 4 is greater than '0', the code determining unit 450 has a behavioral feature vector for the diagnosis target program in the normal behavioral feature vector region 310. If less than 0 ', it may be determined that the behavioral feature vector for the program to be diagnosed is located in the malicious behavioral feature vector region 320.

[수학식 4][Equation 4]

도 5는 진단 대상 프로그램으로부터 생성한 행동 특징 벡터를 이용해서는 악성 프로그램인지 여부를 판단 할 수 없는 경우에, 행동 특징 벡터를 고차원 변환한 특성 벡터에 기반하여 진단 대상 프로그램이 악성 프로그램인지 여부를 진단하는 개념을 도시한 도면이다. 이하 도 5를 참조하여 행동 특징 벡터를 고차원 변환한 특성 벡터를 이용하여 진단 대상 프로그램이 악성 프로그램인지 여부를 진단하는 개념을 상세히 설명하기로 한다.5 is a method for diagnosing whether a program to be diagnosed is a malicious program based on a feature vector obtained by transforming a behavior feature vector into a high level when it is not possible to determine whether the program is a malicious program using the behavior feature vector generated from the diagnosis target program. It is a figure which shows a concept. Hereinafter, a concept of diagnosing whether a program to be diagnosed as a malicious program using a feature vector obtained by transforming a behavioral feature vector into a high level will be described in detail.

도 5의 (a)는 정상 행동 특징 벡터(511, 512) 및 악성 행동 특징 벡터(513, 514)가 행동 특징 벡터 공간상에 위치한 것을 도시한 도면이다. 정상 행동 특징 벡터(511, 512) 및 악성 행동 특징 벡터(513, 514)가 전체 행동 특징 벡터 공간상에 일정한 규칙 없이 위치한다. 전체 행동 특징 벡터 공간을 정상 행동 특징 벡터 공간(310) 및 악성 행동 특징 벡터 공간(320)으로 구분할 수 없다.FIG. 5A illustrates the normal behavior feature vectors 511 and 512 and the malicious behavior feature vectors 513 and 514 located in the behavior feature vector space. The normal behavioral feature vectors 511 and 512 and the malicious behavioral feature vectors 513 and 514 are located in the overall behavioral feature vector space without any rules. The overall behavioral feature vector space cannot be divided into a normal behavioral feature vector space 310 and a malicious behavioral feature vector space 320.

도 5의 (b)는 정상 행동 특징 벡터(511, 512) 및 악성 행동 특징 벡터(513, 514)를 변환하여 각각에 상응하는 특성 벡터(521, 522, 523, 524)를 생성하고, 생성된 특성 벡터(521, 522, 523, 524)를 특성 벡터 공간에 도시한 도면이다.FIG. 5B converts the normal behavior feature vectors 511 and 512 and the malicious behavior feature vectors 513 and 514 to generate corresponding feature vectors 521, 522, 523, and 524, respectively. The characteristic vectors 521, 522, 523, and 524 are shown in the characteristic vector space.

도 5의 (b)에서는 2차원의 행동 특징 벡터를 3차원의 특성 벡터로 변환한 실시예가 도시되었으나, 본 발명의 다른 실시예에서는 행동 특징 벡터보다 고차원인 4차원 이상으로도 변환할 수 있다.In FIG. 5B, an embodiment in which a two-dimensional behavior feature vector is converted into a three-dimensional feature vector is illustrated, but in another embodiment of the present invention, the embodiment may be converted to a four-dimensional or higher dimension than the behavior feature vector.

도 5의 (c)는 특성 벡터 공간 상에 위치한 정상 프로그램의 특성 벡터(531, 532) 및 악성 프로그램의 특성 벡터(533, 534)를 도시한 도면이다. 행동 특징 벡터로부터 변환된 각각의 특성 벡터(531, 532, 533, 534)는 특성 벡터 공간 상에서 서로 구분된 영역에 위치한다.FIG. 5C is a diagram showing feature vectors 531 and 532 of a normal program and feature vectors 533 and 534 of a malicious program located in a feature vector space. Each feature vector 531, 532, 533, 534 transformed from the behavior feature vector is located in a region separated from each other in the feature vector space.

정상 프로그램의 특성 벡터(531, 532)와 악성 프로그램의 특성 벡터(533, 534)는 특정 경계(535)를 이용하여 구분할 수 있다.The feature vectors 531 and 532 of the normal program and the feature vectors 533 and 534 of the malicious program may be distinguished using a specific boundary 535.

도 6는 본 발명의 일실시예에 따라서 진단 대상 프로그램으로부터 생성한 특성 벡터에 기반하여 진단 대상 프로그램이 악성 코드인지 여부를 진단하는 악성 코드 진단 장치의 구조를 도시한 도면이다. 이하 도 6을 참조하여 특성 벡터에 기반하여 진단 대상 프로그램이 악성 코드인지 진단하는 악성 코드 진단 장치의 구조를 상세히 설명하기로 한다.FIG. 6 is a diagram illustrating a structure of a malicious code diagnosis apparatus for diagnosing whether a diagnosis target program is a malicious code based on a feature vector generated from the diagnosis target program according to an embodiment of the present invention. Hereinafter, a structure of a malicious code diagnosis apparatus for diagnosing whether a diagnosis target program is a malicious code based on the characteristic vector will be described in detail with reference to FIG. 6.

본 발명에 따른 악성 코드 진단 장치(600)는 행동 특징 벡터 생성부(610), 진단 데이터 저장부(620) 및 코드 진단부(630)를 포함한다.The malicious code diagnostic apparatus 600 according to the present invention includes a behavior feature vector generator 610, a diagnostic data storage 620, and a code diagnostic unit 630.

본 발명의 일실시예에 따른 코드 진단부(630)는 벡터 변환부(640), 거리 계산부(650) 및 코드 판단부(660)를 포함한다.The code diagnosis unit 630 according to an embodiment of the present invention includes a vector converter 640, a distance calculator 650, and a code determiner 660.

행동 특징 벡터 생성부(610)는 진단 대상 프로그램으로부터 추출한 행동 특징에 기반하여 제1 행동 특징 벡터를 생성한다.The behavioral feature vector generator 610 generates a first behavioral feature vector based on the behavioral feature extracted from the diagnosis target program.

진단 데이터 저장부(620)는 이미 악성 여부가 알려진 복수의 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터를 저장한다. 본 발명의 일실시예에 따르면 복수의 샘플 프로그램은 적어도 하나 이상의 정상 프로그램과 적어도 하나 이상의 악성 프로그램을 포함할 수 있다.The diagnostic data storage unit 620 stores a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious. According to an embodiment of the present invention, the plurality of sample programs may include at least one normal program and at least one malicious program.

본 발명의 일실시예에 따르면, 진단 데이터 저장부(620)는 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터 및 상기 샘플 프로그램 각각의 악성 여부, 및 상기 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터에 기반하여 생성된 가중치 벡터를 저장할 수 있다.According to an embodiment of the present invention, the diagnostic data storage unit 620 may include a plurality of second behavioral feature vectors for a sample program and whether each of the sample programs is malicious, and a plurality of second behavioral feature vectors for the sample program. The weight vector generated based on the S can be stored.

코드 진단부(630)는 제1 행동 특징 벡터를 상기 복수의 제2 행동 특징 벡터와 비교하여, 상기 진단 대상 프로그램이 악성 코드인지 여부를 진단한다.The code diagnosis unit 630 compares a first behavioral feature vector with the plurality of second behavioral feature vectors to diagnose whether the diagnosis target program is a malicious code.

본 발명의 일실시예에 따르면 코드 진단부(630)는 상기 제1 행동 특징 벡터를 제1 특성 벡터로 고차원 변환하고, 상기 복수의 제2 행동 특징 벡터를 복수의 제2 특성 벡터로 고차원 변환하는 벡터 변환부(640), 제1 특성 벡터 및 복수의 제2 특성 벡터간의 각각의 거리를 계산하는 거리 계산부(650) 및 상기 계산된 거리에 기반하여 상기 진단 대상 프로그램이 악성 코드인지 여부를 판단하는 코드 판단부(660)를 포함할 수 있다.According to an embodiment of the present invention, the code diagnosis unit 630 performs high-dimensional conversion of the first behavioral feature vector into a first characteristic vector, and high-dimensional conversion of the plurality of second behavioral feature vectors into a plurality of second characteristic vectors. On the basis of the calculated distance and the distance calculator 650 for calculating respective distances between the vector converter 640, the first feature vector and the plurality of second feature vectors, and determining whether the diagnosis target program is a malicious code. The code determining unit 660 may be included.

본 발명의 일실시예에 따르면 벡터 변환부(640)는 제1 행동 특징 벡터로부터 변환된 제1 특성 벡터와 제2 행동 특징 벡터로부터 변환된 제2 특성 벡터간의 거리가 제1 행동 특징 벡터 와 제2 행동 특징 벡터간의 거리에 비례하도록 각각의 벡터를 변환할 수 있다. 즉, 벡터 변환부(640)는 제1 행동 특징 벡터 및 제2 행동 특징 벡터간의 거리는 상기 제1 행동 특징 벡터에 상응하는 제1 특성 벡터 및 제2 행동 특징 벡터에 상응하는 제2 특성 벡터간의 거리에 비례하도록 상기 제1 행동 특징 벡터 및 상기 제2 행동 특징 벡터를 변환할 수 있다.According to an embodiment of the present invention, the vector transforming unit 640 may determine the distance between the first characteristic vector converted from the first behavioral feature vector and the second characteristic vector converted from the second behavioral feature vector. Each vector can be converted to be proportional to the distance between the two behavioral feature vectors. That is, the vector converter 640 determines that the distance between the first behavioral feature vector and the second behavioral feature vector is the distance between the first characteristic vector corresponding to the first behavioral feature vector and the second characteristic vector corresponding to the second behavioral feature vector. The first behavioral feature vector and the second behavioral feature vector may be converted to be proportional to.

특성 벡터 공간은 정상 특성 벡터 영역과 악성 특성 벡터 영역으로 구분된다.The characteristic vector space is divided into a normal characteristic vector region and a malicious characteristic vector region.

본 발명의 일실시예에 따르면 진단 데이터 저장부(620)는 상기 각각의 샘플 프로그램의 제2 특성 벡터 및 악성 여부에 기반하여 결정된 가중치 벡터를 저장한다. 상기 코드 판단부(660)는 상기 계산된 거리에 상기 가중치 벡터의 원소를 곱하 여 소정의 임계값과 비교함으로써 진단 대상 프로그램의 특성 벡터가 정상 특성 벡터 영역 및 악성 특성 벡터 영역 중에서 어느 영역에 위치하는지 결정할 수 있다.According to an embodiment of the present invention, the diagnostic data storage unit 620 stores the weight vector determined based on the second characteristic vector and the malicious status of each sample program. The code determining unit 660 multiplies the calculated distance by an element of the weight vector and compares it with a predetermined threshold to determine which region of the characteristic vector region and the malicious characteristic vector region is located in the diagnostic vector program. You can decide.

코드 판단부(660)는 진단 대상 벡터의 특성 벡터가 정상 특성 벡터 영역에 위치하는 경우에는 진단 대상 프로그램을 정상 프로그램으로 진단하고, 진단 대상 벡터의 특성 벡터가 악성 특성 벡터 영역이 위치하는 경우에는 진단 대상 프로그램을 악성 프로그램으로 진단할 수 있다.The code determining unit 660 diagnoses the diagnosis target program as a normal program when the feature vector of the diagnosis target vector is located in the normal feature vector region, and diagnoses the feature vector of the diagnosis target vector as a malicious feature vector region. The target program can be diagnosed as a malicious program.

도 7은 본 발명의 일실시예에 따라서 샘플 프로그램의 행동 특징 벡터를 이용하여 악성 코드 진단 데이터를 생성하는 장치의 구조를 도시한 도면이다. 이하 도 7을 참조하여 악성 코드 진단 데이터 생성 장치의 구조를 상세히 설명하기로 한다.FIG. 7 is a diagram illustrating a structure of an apparatus for generating malicious code diagnosis data using a behavioral feature vector of a sample program according to an embodiment of the present invention. Hereinafter, the structure of the apparatus for generating malicious code diagnosis data will be described in detail with reference to FIG. 7.

본 발명에 따른 악성 코드 진단 데이터 생성 장치(700)는 행동 특징 벡터 생성부(710), 거리 계산부(720), 가중치 벡터 결정부(730) 및 진단 데이터 저장부(740)를 포함한다.The malicious code diagnostic data generating apparatus 700 according to the present invention includes a behavior feature vector generator 710, a distance calculator 720, a weight vector determiner 730, and a diagnostic data storage 740.

행동 특징 벡터 생성부(710)는 이미 악성 여부가 알려진 복수의 샘플 프로그램으로부터 각각의 행동 특징 벡터를 생성한다. 본 발명의 일실시예에 따르면 복수의 샘플 프로그램은 적어도 하나 이상의 정상 프로그램과 적어도 하나 이상의 악성 프로그램을 포함할 수 있다.The behavioral feature vector generator 710 generates each behavioral feature vector from a plurality of sample programs that are known to be malicious. According to an embodiment of the present invention, the plurality of sample programs may include at least one normal program and at least one malicious program.

가중치 벡터 결정부(730)는 상기 복수의 샘플 프로그램에 대한 행동 특징 벡터 및 상기 복수의 샘플 프로그램 각각의 악성 여부에 기반하여 가중치 벡터를 결정한다.The weight vector determiner 730 determines a weight vector based on behavioral feature vectors for the plurality of sample programs and whether each of the plurality of sample programs is malicious.

본 발명의 일실시예에 따르면 거리 계산부(720)는 상기 복수의 샘플 프로그램으로부터 생성한 복수의 행동 특징 벡터간의 거리를 각각 계산하고, 가중치 벡터 결정부(730)는 상기 계산된 복수의 행동 특징 벡터간의 거리 및 상기 샘플 프로그램의 악성 여부에 기반하여 상기 가중치 벡터를 결정할 수 있다.According to an embodiment of the present invention, the distance calculator 720 calculates distances between the plurality of behavior feature vectors generated from the plurality of sample programs, and the weight vector determiner 730 calculates the plurality of calculated behavior features. The weight vector may be determined based on a distance between vectors and whether the sample program is malicious.

진단 데이터 저장부(740)는 상기 복수의 샘플 프로그램으로부터 생성한 복수의 행동 특징 벡터 및 상기 결정한 가중치 벡터를 저장한다. 상기 저장된 복수의 행동 특징 벡터 및 상기 가중치 벡터는 악성 코드 진단 장치가 상기 진단 대상 프로그램이 악성 코드인지 여부를 판단하는데 이용된다.The diagnostic data storage unit 740 stores a plurality of behavior feature vectors and the determined weight vector generated from the plurality of sample programs. The stored plurality of behavioral feature vectors and the weight vector are used by a malicious code diagnosis apparatus to determine whether the diagnosis target program is malicious code.

도 3에서의 제1 경계(330) 및 제2 경계(340)는 상기 가중치 벡터에 의하여 결정 된다. 본 발명의 일실시예에 따르면 가중치 벡터 결정부(730)는 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)을 구분할 수 있는 복수의 가중치 벡터 중에서, 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)을 최대한 정확히 구분할 수 있는 가중치 벡터를 선택할 수 있다.The first boundary 330 and the second boundary 340 in FIG. 3 are determined by the weight vector. According to an embodiment of the present invention, the weight vector determiner 730 is a normal behavior feature vector region 310 among a plurality of weight vectors capable of distinguishing between the normal behavior feature vector region 310 and the malicious behavior feature vector region 320. ) And a weight vector that can distinguish the malicious behavior feature vector region 320 as accurately as possible.

본 발명의 일실시예에 따르면 가중치 벡터 결정부(730)는 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)을 정확히 구분하기 위하여, 하기 수학식 5에 따라 결정되는

을 최대화 하고, 하기 수학식 6 및 하기 수학식 7을 만족하도록 상기 가중치 벡터를 결정할 수 있다.According to an embodiment of the present invention, the weight vector determiner 730 is determined according to Equation 5 to accurately distinguish between the normal behavior feature vector region 310 and the malicious behavior feature vector region 320.

The weight vector may be determined to maximize Equation 6 and satisfy Equation 6 and Equation 7 below.

[수학식 5][Equation 5]

여기서,

는 i번째 샘플 프로그램으로부터 추출한 행동 특징 벡터를,

는 i번째 샘플 프로그램의 악성 여부를 각각 나타낸다. 본 발명의 일실시예에 따르면 i번째 샘플 프로그램이 정상 프로그램이면 '-1'를, 악성 프로그램이면 '+1'이 될 수 있다.

는 두 행동 특징 벡터 a, b 간의 거리를,

은 샘플 프로그램의 개수를 나타낸다.

는 가중치 벡터의 i번째 원소를 나타낸다.here,

Is the behavioral feature vector extracted from the i th sample program,

Indicates whether the i th sample program is malicious. According to an embodiment of the present invention, if the i th sample program is a normal program, it may be '-1', and if it is a malicious program, it may be '+1'.

Is the distance between two behavioral feature vectors a and b,

Represents the number of sample programs.

Denotes the i th element of the weight vector.

[수학식 6][Equation 6]

[수학식 7][Equation 7]

본 발명의 일실시예에 따르면 가중치 벡터 결정부(730)는 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)을 구분하기 위한 소정의 임계값을 하기 수학식 8에 따라서 결정할 수 있다.According to an embodiment of the present invention, the weight vector determiner 730 may determine a predetermined threshold for distinguishing between the normal behavior feature vector region 310 and the malicious behavior feature vector region 320 according to Equation 8 below. have.

[수학식 8][Equation 8]

는 상기 소정의 임계값을,

는 악성 샘플 프로그램으로부터 추출한 제2 행동특징 벡터를 각각 나타낸다.

Is the predetermined threshold,

도 8는 본 발명의 일실시예에 따라서 진단 대상 프로그램의 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 방법을 단계별로 도시한 순서도이다. 이하 도 8를 참조하여 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 방법을 상세히 설명하기로 한다.8 is a flowchart illustrating a step-by-step method of diagnosing whether a malicious program is a malicious program using a behavioral feature vector of a diagnostic target program according to an embodiment of the present invention. Hereinafter, a method of diagnosing whether a malicious program is a malicious program using a behavioral feature vector will be described in detail.

단계(S810)에서는 진단 대상 프로그램으로부터 추출한 행동 특징에 기반하여 제1 행동 특징 벡터를 생성한다. 본 발명의 일실시예에 따르면 진단 대상 프로그램 으로부터 전체 행동 특징이 아니라 일부 행동 특징만을 추출하고, 상기 일부 행동 특징에 기반하여 제1 행동 특징 벡터를 생성할 수 있다.In operation S810, a first behavioral feature vector is generated based on the behavioral feature extracted from the diagnosis target program. According to an embodiment of the present invention, only some behavioral features may be extracted from the program to be diagnosed, not all behavioral features, and a first behavioral feature vector may be generated based on the partial behavioral features.

단계(S820)에서는 이미 악성 여부가 알려진 복수의 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터를 로드(load)한다. 본 발명의 일실시예에 따르면 복수의 샘플 프로그램은 적어도 하나 이상의 정상 프로그램 및 적어도 하나 이상의 악성 프로그램을 포함할 수 있다.In operation S820, a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious are loaded. According to an embodiment of the present invention, the plurality of sample programs may include at least one normal program and at least one malicious program.

단계(S830)에서는 상기 제1 행동 특징 벡터와 제2 행동 특징 벡터를 비교하여, 거리를 계산한다. 제2 행동 특징 벡터가 복수인 경우에는 제1 행동 특징 벡터와 각각의 제2 행동 특징 벡터간의 각각의 거리를 계산한다.In operation S830, a distance is calculated by comparing the first behavioral feature vector and the second behavioral feature vector. When there are a plurality of second behavioral feature vectors, respective distances between the first behavioral feature vector and each second behavioral feature vector are calculated.

본 발명의 일실시예에 따르면 단계(S830)에서는 제1 행동 특징 벡터와 제2 행동 특징 벡터간의 유클리디안 거리를 계산할 수 있다.According to an embodiment of the present invention, in step S830, the Euclidean distance between the first behavioral feature vector and the second behavioral feature vector may be calculated.

단계(S840)에서는 단계(S830)에서 계산한 행동 특징 벡터간의 거리에 기반하여 상기 진단 대상 프로그램이 악성 코드인지 여부를 진단한다. 본 발명의 일실시예에 따르면 단계(S840)에서는 상기 계산된 거리에 상기 가중치 벡터의 원소를 곱하여 소정의 임계값과 비교함으로써 진단 대상 프로그램의 행동 특징 벡터가 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)중에서 어느 영역에 위치하는지 결정할 수 있다. 진단 대상 프로그램의 행동 특징 벡터가 정상 행동 특징 벡터 영역(310)에 위치하면 진단 대상 프로그램을 정상 프로그램으로, 진단 대상 프로그램의 행동 특징 벡터가 악성 행동 특징 벡터 영역(310)에 위치하면 진단 대상 프로그램을 악성 프로그램으로 진단할 수 있다.In step S840, based on the distance between the behavioral feature vectors calculated in step S830, it is diagnosed whether the diagnosis target program is a malicious code. According to an embodiment of the present invention, in step S840, the calculated feature is multiplied by an element of the weight vector to be compared with a predetermined threshold, so that the feature vector of the diagnosis target program is a normal behavior feature vector region 310 and malicious. It may be determined which area of the behavioral feature vector region 320 is located. If the behavioral feature vector of the diagnostic target program is located in the normal behavioral feature vector region 310, the diagnostic target program is a normal program. If the behavioral feature vector of the diagnostic target program is located in the malicious behavioral feature vector region 310, the diagnostic target program is selected. It can be diagnosed as a malicious program.

본 발명의 일실시예에 따르면 단계(S840)에서는 하기 수학식 9의 값에 따라서 상기 계산된 거리에 상기 가중치 벡터의 원소를 곱할 수 있다.According to an embodiment of the present invention, in step S840, the element of the weight vector may be multiplied by the calculated distance according to the value of Equation 9 below.

[수학식 9][Equation 9]

는 i번째 샘플 프로그램으로부터 추출한 제2 행동 특징 벡터의 i번째 원소를 각각 나타낸다.

는 i번째 샘플 프로그램의 악성 여부로서, 본 발명의 일실시예에 따르면 i번째 샘플 프로그램이 정상 프로그램이면 -1'의 값을, 악성 프로그램이면 '+1'이 될 수 있다.

은 샘플 프로그램의 개수를,

는 가중치 벡터의 i번째 원소를 각각 나타낸다.

는 두 행동 특징 벡터 a, b 간의 거리에 반비례 하는 값으로서 하기 수학식 10에 의하여 결정된다.

Denote i-th elements of the second behavioral feature vector extracted from the i-th sample program, respectively.

Denotes whether the i th sample program is malicious, and according to an embodiment of the present invention, 'i' may be a value of -1 'if the i th sample program is a normal program and' +1 'if it is a malicious program.

Is the number of sample programs,

Each represents the i th element of the weight vector.

Is a value inversely proportional to the distance between two behavioral feature vectors a and b, and is determined by Equation 10 below.

여기서,

가 0이 아닐 경우에만 대응되는

를 수학식 9의 계산에 사용한다. 따라서, 그 외의 제2 행동 특징 벡터들은 수학식 9의 계산에 영향을 미치지 않는다. 이들

가 0이 아닐 경우에만 대응되는

를 지지 벡터(Support Vector)라고 한다. 지지 벡터들은 일반적으로 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)의 경계 부근에 위치하므로 그 수가 전체 샘플 프로그램의 개수

에 비해 매우 작으며, 따라서 수학식 9의 계산량도 매우 작게 된다.here,

Matches only if is not zero

Is used in the calculation of equation (9). Thus, the other second behavioral feature vectors do not affect the calculation of equation (9). these

Matches only if is not zero

Is called a Support Vector. The support vectors are generally located near the boundary between the normal behavior feature vector region 310 and the malicious behavior feature vector region 320, so that the number of the total number of sample programs is maintained.

It is very small compared with, and thus the calculation amount of Equation 9 is also very small.

[수학식 10][Equation 10]

는 임의의 값을 가지는 상수로서, 본 발명의 일실시예에 따르면

는 '1'이 될 수 있다.

May be '1'.

본 발명의 일실시예에 따르면 단계(S840)는 상기 수학식 9의 값과 하기 수학식 11에 의하여 결정되는 소정의 임계값을 비교하여 진단 대상 프로그램에 대한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310) 및 악성 행동 특징 벡터 영역(320)중에서 어느 영역에 위치하는지 결정할 수 있다.According to an embodiment of the present invention, step S840 may be performed by comparing the value of Equation 9 with a predetermined threshold value determined by Equation 11 below to determine whether the behavioral feature vector for the diagnosis target program is a normal behavioral feature vector region ( 310 and the malicious behavior feature vector region 320.

[수학식 11][Equation 11]

는 상기 소정의 임계값을,

Is the predetermined threshold,

본 발명의 일실시예에 따르면, 단계(S840)에서는 상기 수학식 9의 값에 상기 소정의 임계값을 더하고, 상기 계산결과가 '0'보다 작은 경우에는 진단 대상 프로그램에 대한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310)에, '0'보다 큰 경우라면 진단 대상 프로그램에 대한 행동 특징 벡터가 악성 행동 특징 벡터 영역(320)에 위치한다고 결정할 수 있다.According to an embodiment of the present invention, in step S840, the predetermined threshold value is added to the value of Equation 9, and when the calculation result is smaller than '0', the behavioral feature vector for the diagnosis target program is normal. If the behavioral feature vector region 310 is greater than '0', it may be determined that the behavioral feature vector for the program to be diagnosed is located in the malicious behavioral feature vector region 320.

본 발명의 일실시예에 따르면, 단계(S840)는 하기 수학식 12의 값이 '0'보다 작은 경우에는 진단 대상 프로그램에 대한 행동 특징 벡터가 정상 행동 특징 벡터 영역(310)에, '0'보다 큰 경우라면 진단 대상 프로그램에 대한 행동 특징 벡터가 악성 행동 특징 벡터 영역(320)에 위치한다고 결정할 수 있다.According to an embodiment of the present invention, in step S840, when the value of Equation 12 is smaller than '0', the behavioral feature vector for the program to be diagnosed is '0' in the normal behavioral feature vector region 310. In a larger case, it may be determined that the behavioral feature vector for the program to be diagnosed is located in the malicious behavioral feature vector region 320.

[수학식 12][Equation 12]

도 9은 본 발명의 일실시예에 따라서 진단 대상 프로그램의 특성 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 방법을 단계별로 도시한 순서도이다. 이하 도 9을 참조하여 특성 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 방법을 상세히 설명하기로 한다.9 is a flowchart illustrating a step-by-step method of diagnosing whether a malicious program is a malicious program using a characteristic vector of a program to be diagnosed according to an embodiment of the present invention. Hereinafter, a method of diagnosing whether a malicious program is a malicious program using a characteristic vector will be described in detail with reference to FIG. 9.

단계(S910)에서는 진단 대상 프로그램으로부터 추출한 행동 특징에 기반하여 제1 행동 특징 벡터를 생성한다. 본 발명의 일실시예에 따르면 진단 대상 프로그램으로부터 전체 행동 특징이 아니라 일부 행동 특징만을 추출하고, 상기 일부 행동 특징에 기반하여 제1 행동 특징 벡터를 생성할 수 있다.In operation S910, a first behavioral feature vector is generated based on the behavioral feature extracted from the diagnosis target program. According to an embodiment of the present invention, only some behavioral features may be extracted from the program to be diagnosed, not all behavioral features, and a first behavioral feature vector may be generated based on the partial behavioral features.

단계(S920)에서는 이미 악성 여부가 알려진 복수의 샘플 프로그램에 대한 복수의 제2 행동 특징 벡터를 로드(load)한다. 본 발명의 일실시예에 따르면 복수의 샘플 프로그램은 적어도 하나 이상의 정상 프로그램 및 적어도 하나 이상의 악성 프로그램을 포함할 수 있다.In step S920, a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious are loaded. According to an embodiment of the present invention, the plurality of sample programs may include at least one normal program and at least one malicious program.

본 발명의 일실시예에 따르면 단계(S920)에서는 상기 복수의 샘플 프로그램의 복수의 제2 행동 특징 벡터 및 상기 복수의 샘플 프로그램 각각의 악성 여부에 기반하여 결정된 가중치 벡터를 로드할 수 있다.According to an embodiment of the present invention, in step S920, a weight vector determined based on whether a plurality of second behavioral feature vectors of the plurality of sample programs and each of the plurality of sample programs is malicious may be loaded.

단계(S930)에서는 생기 생성한 제1 행동 특징 벡터를 제1 특성 벡터로 고차 원 변환하고, 상기 로드한 복수의 제2 행동 특징 벡터를 제2 특성 벡터로 고차원 변환한다.In operation S930, the high-order transform of the generated first behavioral feature vector is transformed into a first characteristic vector, and the plurality of loaded second behavioral feature vectors are high-dimensionally transformed into a second characteristic vector.

단계(S940)에서는 제1 특성 벡터 및 상기 복수의 제2 특성 벡터간의 각각의 거리를 계산한다.In operation S940, respective distances between the first feature vector and the plurality of second feature vectors are calculated.

단계(S950)에서는 계산된 거리에 기반하여 상기 진단 대상 프로그램이 악성 코드인지 여부를 판단한다. 본 발명의 일실시예에 따르면 상기 계산된 거리에 상기 가중치 벡터의 원소를 곱하여 소정의 임계값과 비교함으로써 진단 대상 프로그램의 특성 벡터가 정상 특성 벡터 영역 및 악성 특성 벡터 영역 중에서 어느 영역에 위치하는지 결정할 수 있다.In step S950, it is determined whether the diagnosis target program is a malicious code based on the calculated distance. According to an embodiment of the present invention, the calculated distance is multiplied by an element of the weight vector to be compared with a predetermined threshold to determine which region of the feature vector of the diagnosis target program is located between a normal feature region and a malicious feature vector region. Can be.

본 발명의 일실시예에 따르면 단계(S950)에서는 진단 대상 벡터의 특성 벡터가 정상 특성 벡터 영역에 위치하는 경우에는 진단 대상 프로그램을 정상 프로그램으로 진단하고, 진단 대상 벡터의 특성 벡터가 악성 특성 벡터 영역이 위치하는 경우에는 진단 대상 프로그램을 악성 프로그램으로 진단할 수 있다.According to an embodiment of the present invention, in step S950, when the characteristic vector of the diagnosis target vector is located in the normal characteristic vector region, the diagnosis target program is diagnosed as a normal program, and the characteristic vector of the diagnosis target vector is a malicious characteristic vector region. In this case, the diagnosis target program can be diagnosed as a malicious program.

도 10은 본 발명의 일실시예에 따라서 진단 대상 프로그램의 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 악성 코드 진단 데이터를 생성하는 방법을 단계별로 도시한 순서도이다. 이하 도 10을 참조하여 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 악성 코드 진단 데이터를 생성하는 방법을 상세히 설명하기로 한다.10 is a flowchart illustrating a step-by-step method of generating malicious code diagnostic data for diagnosing whether a malicious program is a malicious program using a behavioral feature vector of a diagnostic target program according to an embodiment of the present invention. Hereinafter, a method of generating malicious code diagnostic data for diagnosing whether a malicious program is a malicious program using a behavioral feature vector will be described in detail.

단계(S1010)에서는 이미 악성 여부가 알려진 복수의 샘플 프로그램으로부터 각각의 행동 특징 벡터를 생성한다. 본 발명의 일실시예에 따르면 상기 복수의 샘 플 프로그램은 적어도 하나의 정상 프로그램 및 적어도 하나의 악성 프로그램을 포함할 수 있다.In step S1010, each behavioral feature vector is generated from a plurality of sample programs that are known to be malicious. According to an embodiment of the present invention, the plurality of sample programs may include at least one normal program and at least one malicious program.

단계(S1020)에서는 단계(S1010)생성한 복수의 행동 특징 벡터간의 각각의 거리를 계산한다. 본 발명의 일실시예에 따르면 상기 거리는 유클리디안 거리일 수 있다.In step S1020, each distance between the plurality of behavioral feature vectors generated in step S1010 is calculated. According to an embodiment of the present invention, the distance may be an Euclidean distance.

단계(S1030)에서는 단계(S1020)에서 계산한 복수의 행동 특징 벡터간의 거리 및 상기 샘플 프로그램 각각의 악성 여부에 기반하여 가중치 벡터를 결정한다.In step S1030, a weight vector is determined based on the distance between the plurality of behavior feature vectors calculated in step S1020 and whether each of the sample programs is malicious.

본 발명의 일실시예에 따르면 단계(S1030)에서는 정상 행동 특징 벡터 영역(310)과 악성 행동 특징 벡터 영역(320)을 정확히 구분하기 위하여, 하기 수학식 13에 따라 결정되는

을 최대화 하고, 하기 수학식 14 및 하기 수학식 15을 만족하도록 상기 가중치 벡터를 결정할 수 있다.According to an embodiment of the present invention, in step S1030, in order to accurately distinguish between the normal behavioral feature vector region 310 and the malicious behavioral feature vector region 320, it is determined according to Equation 13 below.

, And the weight vector may be determined to satisfy Equation 14 and Equation 15 below.

[수학식 13][Equation 13]

는 i번째 샘플 프로그램으로부터 추출한 행동 특징 벡터를,

는 i번째 샘플 프로그램의 악성 여부로서 본 발명의 일실시예에 따르면 i번째 샘플 프로그램이 정상 프로그램이면 -1'의 값을, 악성 프로그램이면 '+1'이 될 수 있다.

는 두 행동 특징 벡터 a, b 간의 거리를,

은 샘플 프로그램의 개수를

는 가중치 벡터의 i번째 원소를 각각 나타낸다.

Is the behavioral feature vector extracted from the i th sample program,

The i-th sample program is malicious or not according to one embodiment of the present invention. If the i-th sample program is a normal program, the value may be -1, and if it is a malicious program, the value may be '+1'.

Is the distance between two behavioral feature vectors a and b,

Number of sample programs

Each represents the i th element of the weight vector.

[수학식 14][Equation 14]

[수학식 15][Equation 15]

본 발명의 일실시예에 따르면 단계(S1030)에서는 악성 코드 진단 장치가 진단 대상 프로그램이 악성 프로그램인지 여부를 진단 하는데 이용되는 임계값을 결정할 수 있다.According to an embodiment of the present invention, in step S1030, the malware diagnosis apparatus may determine a threshold value used to diagnose whether the diagnosis target program is a malicious program.

[수학식 16][Equation 16]

는 상기 소정의 임계값을,

Is the predetermined threshold,

단계(S1040)에서는 상기 생성된 각각의 행동 특징 벡터 및 상기 결정된 가중치 벡터를 저장한다. 본 발명의 일실시예에 따르면 단계(S1040)에서는 상기 결정된 소정의 임계값도 저장할 수 있다.In step S1040, the generated behavioral feature vector and the determined weight vector are stored. According to an embodiment of the present invention, in operation S1040, the determined predetermined threshold value may also be stored.

상기 저장된 소정의 임계값 및 가중치 벡터는 악성 코드 진단 장치가 진단 대상 프로그램이 악성 코드인지 여부를 판단하는데 이용된다.The stored predetermined threshold value and weight vector are used by the malicious code diagnosis apparatus to determine whether the diagnosis target program is malicious code.

본 발명의 실시예들은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매 체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 본 발명에서 설명된 이동 단말 또는 기지국의 동작의 전부 또는 일부가 컴퓨터 프로그램으로 구현된 경우, 상기 컴퓨터 프로그램을 저장한 컴퓨터 판독 가능 기록 매체도 본 발명에 포함된다.Embodiments of the invention may be recorded on a computer readable medium containing program instructions for performing various computer-implemented operations. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those skilled in the art. Examples of computer readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and floptical disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. When all or part of the operation of the mobile terminal or base station described in the present invention is implemented as a computer program, a computer readable recording medium storing the computer program is also included in the present invention.

도 1은 본 발명에 따라서 컴퓨터 프로그램의 행동을 이용하여 악성 프로그램을 검사하는 방법을 단계별로 도시한 순서도이다.1 is a flowchart illustrating a step-by-step method for checking malicious programs using the behavior of computer programs in accordance with the present invention.

도 2는 샘플 프로그램으로부터 생성한 행동 특징 벡터를 행동 특징 벡터 공간상에 도시한 도면이다.2 is a diagram illustrating a behavioral feature vector generated from a sample program in a behavioral feature vector space.

도 3은 본 발명에 따라 행동 특징 벡터 공간을 악성 벡터 공간과 정상 벡터 공간으로 구분하는 악성 코드 진단 데이터 중에서 최적의 데이터를 선택하는 개념을 도시한 도면이다.3 is a diagram illustrating a concept of selecting optimal data from malicious code diagnostic data that divides a behavioral feature vector space into a malicious vector space and a normal vector space according to the present invention.

도 4는 본 발명의 일실시예에 따라 진단 대상 프로그램으로부터 생성한 행동 특징 벡터에 기반하여 진단 대상 프로그램이 악성 코드인지 여부를 진단하는 악성 코드 진단 장치의 구조를 도시한 도면이다.4 is a diagram illustrating a structure of a malicious code diagnosis apparatus for diagnosing whether a diagnosis target program is a malicious code based on a behavioral feature vector generated from the diagnosis target program according to an embodiment of the present invention.

도 5는 진단 대상 프로그램으로부터 생성한 행동 특징 벡터를 이용해서는 악성 프로그램인지 여부를 판단 할 수 없는 경우에, 행동 특징 벡터를 고차원 변환한 특성 벡터에 기반하여 진단 대상 프로그램이 악성 프로그램인지 여부를 진단하는 개념을 도시한 도면이다.5 is a method for diagnosing whether a program to be diagnosed is a malicious program based on a feature vector obtained by transforming a behavior feature vector into a high level when it is not possible to determine whether the program is a malicious program using the behavior feature vector generated from the diagnosis target program. It is a figure which shows a concept.

도 6는 본 발명의 일실시예에 따라서 진단 대상 프로그램으로부터 생성한 특성 벡터에 기반하여 진단 대상 프로그램이 악성 코드인지 여부를 진단하는 악성 코드 진단 장치의 구조를 도시한 도면이다.FIG. 6 is a diagram illustrating a structure of a malicious code diagnosis apparatus for diagnosing whether a diagnosis target program is a malicious code based on a feature vector generated from the diagnosis target program according to an embodiment of the present invention.

도 7은 본 발명의 일실시예에 따라서 샘플 프로그램의 행동 특징 벡터를 이용하여 악성 코드 진단 데이터를 생성하는 장치의 구조를 도시한 도면이다.FIG. 7 is a diagram illustrating a structure of an apparatus for generating malicious code diagnosis data using a behavioral feature vector of a sample program according to an embodiment of the present invention.

도 8는 본 발명의 일실시예에 따라서 진단 대상 프로그램의 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 방법을 단계별로 도시한 순서도이다.8 is a flowchart illustrating a step-by-step method of diagnosing whether a malicious program is a malicious program using a behavioral feature vector of a diagnostic target program according to an embodiment of the present invention.

도 9은 본 발명의 일실시예에 따라서 진단 대상 프로그램의 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하는 방법을 단계별로 도시한 순서도이다.9 is a flowchart illustrating a step-by-step method of diagnosing whether a malicious program is a malicious program using a behavioral feature vector of a program to be diagnosed according to an embodiment of the present invention.

도 10은 본 발명의 일실시예에 따라서 진단 대상 프로그램의 행동 특징 벡터를 이용하여 악성 프로그램인지 여부를 진단하기 위해 사용되는 악성 코드 진단 데이터를 생성하는 방법을 단계별로 도시한 순서도이다.10 is a flowchart illustrating a step-by-step method of generating malicious code diagnostic data used for diagnosing whether a malicious program is a malicious program using a behavioral feature vector of a program to be diagnosed according to an embodiment of the present invention.

Claims

A behavior feature vector generator configured to generate a first behavior feature vector based on the behavior feature extracted from the diagnosis target program;

A diagnostic data storage unit for storing a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious; And

Code diagnosis unit for diagnosing whether the diagnostic target program is a malicious code by comparing the first behavioral feature vector with the plurality of second behavioral feature vectors

Malware diagnostic apparatus comprising a.

The method of claim 1, wherein the code diagnostic unit

A distance calculator configured to calculate respective distances between the first behavioral feature vector and the plurality of second behavioral feature vectors and compare them with each other; And

Code determination unit for determining whether the diagnostic target program is a malicious code based on the calculated distance

Malware diagnostic apparatus comprising a.

The method of claim 2,

And the distance is Euclidean distance.

The method of claim 1, wherein the code diagnostic unit,

A vector transform unit for high-dimensional transforming the first behavioral feature vector into a first characteristic vector and high-dimensional transforming the plurality of second behavioral feature vectors into a plurality of second characteristic vectors;

A distance calculator configured to calculate respective distances between the first feature vector and the plurality of second feature vectors; And

Malware diagnostic apparatus comprising a.

The method of claim 2,

The diagnostic data storage unit stores a weight vector determined based on the second behavioral feature vector and malicious status of each sample program,

And the code determining unit determines whether the malicious code is the malicious code by multiplying the calculated distance by an element of the weight vector and comparing it with a predetermined threshold value.

The method of claim 5, wherein the code determination unit,

The apparatus of claim 1, wherein the calculated distance is multiplied by an element of the weight vector according to the value of Equation 1 below.

[Equation 1]

Is the first behavioral feature vector extracted from the diagnosis target program,

Is the i th element of the second behavioral feature vector extracted from the i th sample program.

Whether the i-th sample program is malicious.

Is a value inversely proportional to the distance between two characteristic vectors a and b, and is determined by Equation 2 below.

[Equation 2]

Is a constant.

Is the number of sample programs.

Is the i element of the weight vector

The method of claim 6, wherein the predetermined threshold is,

Malware diagnostic apparatus, characterized in that determined by the following equation (3).

[Equation 3]

Is a predetermined threshold.

Is a second behavioral feature vector extracted from the normal sample program.

Is a second behavioral feature vector extracted from the malicious sample program.

Generating a first behavioral feature vector based on the behavioral feature extracted from the diagnosis target program;

Loading a plurality of second behavioral feature vectors for a plurality of sample programs that are known to be malicious; And

Diagnosing whether the diagnosis target program is a malicious code by comparing the first behavioral feature vector with the plurality of second behavioral feature vectors

Malware diagnostic method comprising a.

The method of claim 8, wherein the diagnosing whether the malicious code is a malicious code comprises:

Calculating respective distances between the first behavioral feature vector and the plurality of second behavioral feature vectors; And

Determining whether the diagnosis target program is a malicious code based on the calculated distance

Malware diagnostic method comprising a.

The method of claim 9, wherein the calculating and comparing the distances with each other comprises:

And calculating a Euclidean distance between each of the behavioral feature vectors.

The method of claim 8, wherein the step of diagnosing whether the code is malicious code,

High dimensional transforming the first behavioral feature vector into a first feature vector and high dimensional transforming the plurality of second behavioral feature vectors into a plurality of second feature vectors;

Calculating respective distances between the first characteristic vector and the plurality of second characteristic vectors; And

Malware diagnostic method comprising a.

The method of claim 9, wherein the determining of the malicious code comprises:

Characterized in that the malicious code is determined by multiplying the calculated distance by an element of the weight vector determined based on the second behavioral feature vector and the malicious status of each sample program and comparing it with a predetermined threshold value. How to diagnose code.

In the apparatus for generating malware diagnostic data,

A behavior feature vector generator for generating respective behavior feature vectors from a plurality of sample programs that are known to be malicious;

A weight vector determiner which determines a weight vector based on behavior feature vectors and malicious status of the plurality of sample programs; And

Diagnostic data storage unit for storing the respective behavior feature vector and the weight vector

Including,

The behavior feature vector and the weight vector are used to determine whether the diagnosis target program is a malicious code.

The method of claim 13,

Distance calculation unit for calculating the distance between the generated plurality of behavior feature vectors, respectively

More,

The weight vector determining unit determines the weight vector based on the calculated distance between the plurality of behavioral feature vectors and whether the sample program is malicious.

15. The apparatus of claim 14, wherein the weight vector determiner

Determined according to the following equation (4)

And maximize the weight vector and determine the weight vector so as to satisfy Equations 5 and 6 below.

[Equation 4]

Is the behavioral feature vector extracted from the i th sample program.

Whether the i-th sample program is malicious.

Is the distance between two behavioral feature vectors a and b

Is the number of sample programs

Is the i element of the weight vector

[Equation 5]

[Equation 6]

The method of claim 14,

The weight vector determiner determines a predetermined threshold value used to determine whether the diagnosis target program is a malicious code according to Equation 7 below.

And the diagnostic data storage unit stores the determined predetermined threshold value.

[Equation 7]

Is a predetermined threshold.

Is a second behavioral feature vector extracted from the normal sample program.

In the malware diagnostic data generation method,

Generating each behavioral feature vector from a plurality of sample programs that are known to be malicious;

Determining a weight vector based on behavioral feature vectors and malicious status of the plurality of sample programs; And

Storing the respective behavior feature vector and the weight vector

Including,

The method of claim 17,

Calculating distances between the generated plurality of behavioral feature vectors, respectively

More,

The determining of the weight vector may include determining the weight vector based on the calculated distance between the plurality of behavioral feature vectors and whether the sample program is malicious.

The method of claim 17, wherein determining the weight vector comprises:

Determined according to Equation 8

And determining the weight vector so as to satisfy Equation 9 and Equation 10 below.

[Equation 8]

Is the behavioral feature vector extracted from the i th sample program.

Whether the i-th sample program is malicious.

Is the distance between two behavioral feature vectors a and b

Is the number of sample programs

Is the i element of the weight vector

[Equation 9]

[Equation 10]

The method of claim 17,

The determining of the weight vector may include determining a predetermined threshold value used to determine whether the diagnosis target program is a malicious code according to Equation 11 below.

And storing each of the behavioral feature vectors and the weight vector comprises storing the determined predetermined threshold value.

[Equation 11]

Is a predetermined threshold.

Is a second behavioral feature vector extracted from the normal sample program.

21. A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 8-12 and 17-20.