KR101508577B1

KR101508577B1 - Device and method for detecting malware

Info

Publication number: KR101508577B1
Application number: KR20130120032A
Authority: KR
Inventors: 이희조; 이제현; 이수연
Original assignee: 고려대학교 산학협력단
Priority date: 2013-10-08
Filing date: 2013-10-08
Publication date: 2015-04-07

Abstract

A device for detecting malicious code according to the present invention comprises: a sample information collection unit for collecting sample information from malicious samples and normal samples which are stored or input in advance; a sample information refining unit for removing, from the malicious sample and the normal sample, code information existing in common in at least one sample information of at least one malicious sample and at least one normal sample, to generate unique information of the malicious sample and the normal sample; a weight calculation unit for calculating the percentages of the malicious samples and the normal samples which have each unique information and assigning the percentage to each unique information; and a malicious code detection unit for detecting whether a target application is malicious code by determining which malicious sample of the malicious samples is most similar to the target application, based on the percentages.

Description

[0001] DEVICE AND METHOD FOR DETECTING MALWARE [0002]

본 발명은 악성코드 탐지장치 및 방법에 관한 것으로서, 보다 상세하게는, 악성코드의 고유정보와의 유사도를 이용하여 악성코드를 탐지하는 장치 및 방법에 관한 것이다. The present invention relates to a malicious code detection apparatus and method, and more particularly, to an apparatus and method for detecting a malicious code using similarity with unique information of a malicious code.

안드로이드(Android)는 휴대 전화를 비롯한 휴대용 장치를 위한 운영 체제와 미들웨어, 사용자 인터페이스 그리고 표준 응용 프로그램(예를 들어, 웹 브라우저, 이메일 클라이언트, 단문 메시지 서비스(SMS), 멀티미디어 메시지 서비스(MMS)등)을 포함하고 있는 소프트웨어 스택(STACK)이자 모바일 운영 체제로서, 전 세계 모바일 운영 체제 시장에서 큰 비중을 차지하고 있다. 안드로이드를 사용하는 휴대용 장치의 사용이 증가하면서 이를 대상으로 하는 악성코드 역시 빠르게 증가하여 모바일 장치에서의 악성코드의 주요 구동환경으로 지목되고 있다.Android includes an operating system and middleware for handheld devices including mobile phones, a user interface and standard applications such as a web browser, an email client, a short message service (SMS), a multimedia message service (MMS) (STACK), which is a mobile operating system that includes a large portion of the global mobile operating system market. As the use of mobile devices using Android increases, the number of malicious codes targeted by them also increases rapidly, which is pointing to the main operating environment of malicious codes in mobile devices.

안드로이드 환경에서 구동되는 악성코드는 휴대용 장치에 저장된 사용자의 개인정보에 접근하여 변조, 삭제 또는 장치의 네트워크 기능을 이용하여 외부로 유출 시킬 수 있으며, 사용자의 허가 없이 금전적 이득을 취하거나, 휴대용 장치의 저장장치, 연산장치, 통신장치를 임의로 유용하여 다른 네트워크나 컴퓨터를 공격하는데 사용해 중대한 문제가 되고 있다.The malicious code running in the Android environment can access the personal information of the user stored in the portable device, and can be transmitted to the outside by modulating, deleting or using the network function of the device. In addition, Storage devices, arithmetic devices, and communication devices arbitrarily useful for attacking other networks or computers.

알려진 안드로이드 악성코드들의 대부분은 서로 유사성을 가진 종으로 밝혀지고 있으며, 새로이 발견되고 있는 악성코드들의 상당 수가 알려진 종에 속한 변종으로 보고되고 있다. 안드로이드 악성코드 제작자는 다수의 변종을 빠르게 생성, 갱신하여 악성코드를 탑재한 응용의 탐지 효율을 떨어뜨리는데, 이를 위해 한 번 사용한 코드를 재사용하거나, 기능 또는 구조의 일부만을 수정, 삭제, 또는 추가하거나, 리패키징 기술을 사용하여 악성코드를 탑재한 응용만을 변경하는 방법으로 다량의 변종을 제작한다. 따라서 안드로이드 악성코드에 효율적으로 대응하기 위해서는 악성코드 여부를 판단할 때 기존에 알려진 종과 유사성을 가지는지를 우선 판단하는 것이 효율적 접근방법이며 이를 위한 기술들이 연구, 제안되어왔다. Most of the known Android malicious codes have been identified as similar species, and many of the newly discovered malicious codes are reported as variants belonging to known species. Android malware authors can quickly create and update many variants, reducing the detection efficiency of malicious code-based applications by reusing once-used code, or modifying, deleting, or adding only a portion of a feature or structure Alternatively, a large number of variants can be created by changing only the application containing the malicious code using repackaging technology. Therefore, in order to efficiently respond to Android malicious code, it is an efficient approach to judge whether the malicious code is similar to the known malicious code, and techniques for researching it have been proposed.

대표적인 기존 안드로이드 악성코드 탐지 기술은 데스크탑 환경과 동일한 접근방법을 사용하여 특정 종을 판단하는 기준이 되는 대표 고유정보를 전문가의 분석으로부터 도출하여 정의하고, 이를 기준으로 탐지하는 기술이 있다. 변종에 대응하기 위해 제안된 기술로는 기준 고유정보의 정의 없이 응용 간 유사성을 비교하여 변종 여부를 탐지하는 방식이 있다.The representative Android malicious code detection technology has the technology to detect and represent representative unique information which is a criterion for judging a specific species by using the same approach as that of desktop environment, from expert analysis and to detect it. In the proposed technique to deal with the variant, there is a method of comparing the similarity between the applications without the definition of the reference specific information and detecting the variant.

상용 장치 또는 응용에서 일반적으로 사용되는 안드로이드 악성코드 탐지 기술은 특정 종을 판단하는 기준이 되는 대표 고유정보가 포괄하는 범위가 매우 좁아 고도의 변종을 탐지하지 못하거나, 과도하게 넓은 범위를 포괄하는 정보를 사용하거나, 기준 고유정보의 정의 없이 응용 간 유사성 비교방식을 사용하여 오탐률이 높고 시간/공간적 효율이 떨어지는 단점이 있다.Android malicious code detection technology, which is commonly used in commercial devices or applications, can not detect highly variants because of the very narrow range covered by representative proprietary information that is a criterion for determining a specific species, or information that covers an excessively wide range Or using the similarity comparison method between applications without defining the reference specific information, there is a disadvantage that the false rate is high and the time / space efficiency is low.

이와 관련하여 대한민국 공개특허공보 제 10-2010-0069135호(발명의 명칭: 악성코드 분류 시스템)에는 악성코드의 유사도를 측정함으로써, 기존의 악성코드와 새로운 악성코드의 유형 및 관련 정도를 쉽게 파악할 수 있는 악성코드 분류 시스템에 대하여 기술하고 있다.In this regard, Korean Patent Publication No. 10-2010-0069135 (entitled "Malicious Code Classification System") is used to measure the similarity of malicious codes to easily identify the type and degree of association between existing malicious codes and new malicious codes Which describes the malicious code classification system.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 악성표본과 탐지 대상 응용의 유사성을 판단하기 위해 고유정보를 정의하고, 고유정보를 통계적 기법으로 선별하여 악성 코드를 탐지하는 악성코드 탐지장치 및 방법을 제공한다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and it is an object of the present invention to provide a malicious code detection apparatus for detecting malicious code by defining unique information for judging similarity between a malicious sample and a detection target application, And methods.

또한, 본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 악성표본과 탐지 대상 응용의 고유정보 간의 비교를 통해 유사도를 검출하여 악성 코드를 탐지하는 악성코드 탐지장치 및 방법을 제공한다.It is another object of the present invention to provide a malicious code detection apparatus and method for detecting a malicious code by detecting similarity through comparison between malicious samples and unique information of a detection target application.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 악성코드 탐지장치는, 기저장된 또는 입력된 악성표본 및 정상표본으로부터 표본정보를 수집하는 정보추출부, 하나 이상의 악성표본의 하나 이상의 표본정보 및 하나 이상의 정상표본의 하나 이상의 표본정보에 공통으로 존재하는 코드정보를 악성표본 및 정상표본에서 제거하여 악성표본 및 정상표본의 고유정보를 생성하는 표본정보 정제부, 각 고유정보를 포함하는 악성표본 종의 비율 및 각 고유정보를 포함하는 정상표본 종의 비율을 산출하여 각 고유정보에 비율을 부여하는 가중치 계산부, 및 비율을 기초로 탐지 대상 응용이 하나 이상의 악성표본 종 중 어느 악성표본 종과 유사한지 판단하여 탐지 대상 응용이 악성 코드임을 탐지하는 탐지부를 포함한다.According to a first aspect of the present invention, there is provided an apparatus for detecting malicious code, comprising: an information extracting unit for collecting sample information from a previously stored or inputted malicious sample and a normal sample; A sample information refinement unit which removes code information common to one or more sample information of one or more normal samples and malicious samples and normal samples to generate unique information of malicious samples and normal samples, And a ratio calculating unit for calculating a ratio of normal species containing each specific information and giving a ratio to each unique information, and a weight calculating unit for calculating, based on the ratio, And a detection unit for detecting that the detection target application is a malicious code by judging it to be similar to a malignant sample species.

또한, 본 발명의 제 2 측면에 따른 악성코드 탐지방법은, 기저장된 또는 입력된 악성표본 및 정상표본으로부터 표본정보를 수집하는 단계, 하나 이상의 악성표본의 하나 이상의 표본정보 및 하나 이상의 정상표본의 하나 이상의 표본정보에 공통으로 존재하는 코드정보를 악성표본 및 정상표본에서 제거하여 악성표본 및 정상표본의 고유정보를 생성하는 단계, 각 고유정보를 포함하는 악성표본 종의 비율 및 각 고유정보를 포함하는 정상표본 종의 비율을 산출하여 각 고유정보에 비율을 부여하는 단계, 및 비율을 기초로 탐지 대상 응용이 하나 이상의 악성표본 종 중 어느 악성표본 종과 유사한지 판단하여 탐지 대상 응용이 악성 코드임을 탐지하는 단계를 포함한다.According to a second aspect of the present invention, there is provided a malicious code detection method comprising the steps of: collecting sample information from a previously stored or inputted malicious sample and a normal sample; detecting one or more sample information of one or more malicious samples; Generating unique information of a malicious sample and a normal sample by removing code information common to the sample information from the malicious sample and the normal sample to generate unique information of the malicious sample and the normal sample, Calculating a proportion of normal specimens to give a ratio to each unique information, and determining, based on the ratio, whether the detected application is similar to one of the malignant specimens of at least one of the malignant specimens so that the detected application is malicious code .

전술한 본 발명의 과제 해결 수단에 의하면, 정상표본의 고유정보가 정제된 악성표본의 고유정보를 추출하여 탐지 대상 응용의 악성표본 종을 정의할 수 있다. According to the above-mentioned problem solving means of the present invention, the malicious sample of the detection target application can be defined by extracting the unique information of the malicious sample whose unique information of the normal sample is purified.

또한, 전술한 본 발명의 과제 해결 수단에 의하면, 추출된 고유정보를 이용하여 탐지 대상 응용이 어떤 종과 유사한지 확률적으로 산출하여 유사도를 검출하고, 유사도를 기반으로 악성표본 종을 정의할 수 있다.Further, according to the above-mentioned problem solving means of the present invention, it is possible to detect the similarity by probably calculating the similarity of a detection target application with the extracted unique information, and to define a malignant sample species based on the similarity have.

또한, 전술한 본 발명의 과제 해결 수단에 의하면, 악성코드 탐지장치는 악성코드 고유정보의 유사성을 이용하여, 코드 재사용, 리패키징, 코드의 일부 변경과 같이 의도된 변종 악성코드 또는 코드 구현단계에서 일부 고유정보를 이용하는 신종 악성코드를 탐지할 수 있다. According to the above-mentioned problem solving means of the present invention, the malicious code detecting apparatus is capable of detecting malicious code using the similarity of malicious code unique information, and at the stage of implementing the malicious code or the code in an intended variant such as code reuse, repackaging, Detect new malicious code using some unique information.

도 1은 본 발명의 일 실시예에 따른 악성코드 탐지장치의 블록도이다.
도 2는 본 발명의 일 실시예에 따른 악성코드 탐지장치가 악성코드를 탐지하는 방법의 순서도이다.
도 3은 표본정보의 정제 및 비율 산출의 과정을 상세하게 도시한다.
도 4는 표본정보의 정제 및 비율 산출의 과정에 대하여 설명하기 위한 일 예에 대하여 도시한다.
도 5는 악성코드 탐지장치가 산출된 비율을 이용하여 탐지 대상 응용과 유사한 악성 표본을 탐지하는 방법에 대한 일 예이다.
도 6은 본 발명의 일 실시예에 따른 악성코드 탐지장치를 이용하여 평균 탐지 성공률, 평균 오탐율을 산출한 성능 도표이다. 1 is a block diagram of a malicious code detection apparatus according to an embodiment of the present invention.
2 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention.
FIG. 3 shows in detail the process of refining and calculating the ratio of sample information.
FIG. 4 shows an example for explaining the process of refining and calculating the ratio of sample information.
5 is an example of a method of detecting malicious samples similar to the detection target application using the calculated ratio of the malicious code detection device.
FIG. 6 is a performance chart for calculating an average detection success rate and an average false rate using a malicious code detection apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 본원 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다.Throughout this specification, when an element is referred to as "including " an element, it is understood that the element may include other elements as well, without departing from the other elements unless specifically stated otherwise. The word " step (or step) "or" step "used to the extent that it is used throughout the specification does not mean" step for.

본 발명의 일 실시예에 따르면 악성 코드 탐지장치는 안드로이드 악성표본 및 정상표본을 초기에 저장할 수 있고, 표본정보 추출, 표본정보의 정제 및 고유정보의 비율 계산, 악성코드 탐지의 세 단계를 거친다. 악성표본은 악성 코드를 종류별로 분류한 표본으로, 일정 기준에 따라 각각 "종"으로 그룹화될 수 있다. 정상표본(White list)은 악성표본이 될 수 없는 정상적인 코드를 종류별로 분류한 표본이고 마찬가지로 일정 기준에 따라 각각 "종"으로 그룹화될 수 있다. 단, 정상표본은 일 예에 의하면 모든 표본이 하나의 종을 형성한다.According to an embodiment of the present invention, the malicious code detection device can initially store malicious samples and normal samples of the Android, and performs three steps of extracting sample information, purifying sample information, calculating ratio of unique information, and detecting malicious code. Malicious samples are classified as malicious codes, and can be grouped into "species" according to certain criteria. The white list is a sample of normal codes that can not be malignant specimens, and can be grouped into "species" according to certain criteria. However, according to one example of a normal specimen, all specimens form one species.

도 1은 본 발명의 일 실시예에 따른 악성코드 탐지장치의 블록도이다.1 is a block diagram of a malicious code detection apparatus according to an embodiment of the present invention.

상술한 과정을 수행하기 위한 본 발명의 일 실시예에 따른 악성코드 탐지장치는, 정보추출부(110), 표본정보 정제부(120), 가중치 계산부(130), 및 탐지부(140)를 포함한다.The malicious code detection apparatus according to an exemplary embodiment of the present invention may include an information extraction unit 110, a sample information refinement unit 120, a weight calculation unit 130, and a detection unit 140 .

우선, 정보추출부(110)는 기저장된 또는 입력된 악성표본 및 정상표본으로부터 그들의 표본정보를 수집한다. First, the information extracting unit 110 collects their sample information from previously stored or inputted malicious samples and normal samples.

표본정보 정제부(120)는 하나 이상의 악성표본의 하나 이상의 표본정보 및 하나 이상의 정상표본의 하나 이상의 표본정보에 공통으로 존재하는 코드정보를 악성표본 및 정상표본에서 제거하여 악성표본 및 정상표본의 고유정보를 생성한다. 즉, 악성표본과 정상표본이 모두 포함하고 있는 코드정보는 악성표본 또는 정상표본을 구분하는 명확한 기준이 될 수 없기 때문에 하나 이상의 정상표본의 모든 표본정보와 하나 이상의 악성표본의 모든 표본정보를 비교하여 공통으로 존재하는 코드정보가 제거되고, 이러한 코드정보가 제거된 결과 고유정보가 생성된다.The sample information refinement unit 120 removes code information common to one or more specimen information of one or more malignant specimens and one or more specimen information of one or more normal specimens from malignant specimens and normal specimens, Information. That is, since the code information including both the malignant sample and the normal sample can not be a definite standard for distinguishing the malignant sample or the normal sample, all sample information of one or more normal samples and all sample information of one or more malignant samples are compared Common code information is removed, and unique information resulting from elimination of such code information is generated.

가중치 계산부(130)는 정제된 악성표본 또는 정제된 정상표본의 각 고유정보를 포함하는 악성표본 종의 비율 또는 정상표본 종의 비율을 산출한다. 본 발명의 일 실시예에 따르면, 가중치 계산부(130)는 전체 악성표본 종 내 어느 한 고유정보를 포함하는 악성표본 수 대비 어느 한 악성표본 종 내 어느 한 고유정보를 포함하는 악성표본 수를 구하여 비율을 산출할 수 있다. 정상표본에 대해서도 마찬가지로 비율을 산출할 수 있다. 일 예에 따라 모든 정상표본이 하나의 종을 형성하는 경우, 비율은 전체 정상표본의 수 대비 전체 정상표본 내 어느 한 고유정보를 포함하는 정상표본 수일 수 있다. 이러한 비율은 각 고유정보에 대한 '가중치'로 표현될 수 있고, 이러한 비율 또는 가중치는 각 고유정보에 부여될 수 있다.The weight calculation unit 130 calculates the ratio of the malignant sample species containing the unique information of the purified malignant sample or the purified normal sample or the ratio of the normal sample species. According to an embodiment of the present invention, the weight calculation unit 130 obtains the number of malicious samples including any one unique information in one malignant sample relative to the number of malicious samples including any one unique information in all malignant samples The ratio can be calculated. The ratio can be similarly calculated for the normal sample. If, for example, all normal samples form a species, the ratio may be a normal sample number that includes any unique information within the entire normal sample versus the total number of normal samples. This ratio can be expressed as a " weight " for each unique information, and such ratio or weight can be given to each unique information.

탐지부(140)는 탐지 대상 응용이 하나 이상의 악성표본 종 중 어느 악성표본 종과 유사한지 판단하여 탐지 대상 응용이 악성 코드인지 탐지한다. 이때, 탐지부(140)는 가중치 계산부(130)에서 산출한 비율을 기초로 유사한지 판단할 수 있다. 본 발명의 일 실시예에 따르면, 탐지부(140)는 탐지 대상 응용 및 어느 한 악성표본 종이 공통으로 포함하고 있는 고유정보가 해당 악성표본 종에 존재할 확률, 및 탐지 대상 응용 및 정상표본이 공통으로 포함하고 있는 고유정보가 해당 정상표본에 존재하지 않을 확률을 이용하여 탐지 대상 응용이 어느 한 악성표본 종과 유사한지 판단할 수 있다. 이러한 탐지부(140)의 확률 계산 방법을 식으로 나타내면 수학식 1과 같이 나타낼 수 있다.The detection unit 140 determines whether the detection target application is a malicious code by judging whether the detection target application is similar to one of the malicious one or more malignant sample species. At this time, the detection unit 140 can determine whether the similarity based on the ratio calculated by the weight calculation unit 130 is similar. According to an embodiment of the present invention, the detection unit 140 detects the probability that the unique information included commonly in the detection target application and the malicious sample paper exists in the malicious sample species, and the probability that the detection target application and the normal sample are common It is possible to judge whether the detected application is similar to one of the malignant specimens by using the probability that the unique information contained therein does not exist in the normal specimen. The probability calculation method of the detection unit 140 can be expressed by Equation (1).

이때, a는 탐지 대상 응용, Fi는 i번째 악성표본 종에 포함된 표본정보의 합집합, k는 고유정보, p_k는 고유정보k를 포함한 악성표본 종의 비율 또는 정상표본 종의 비율, W는 하나 이상의 정상표본에 포함된 표본정보의 합집합, S(a, Fi)는 탐지 대상 응용과 i번째 악성표본 종의 유사도를 의미한다. 수학식 1과 관련된 상세한 설명은 도 5와 관련하여 후술한다.In this case, a is the detection application, Fi is the union of the sample information included in the i-th malignant specimen, k is the unique information, p _k is the ratio of malignant specimen containing the unique information k, The union of sample information contained in one or more normal samples, S (a, Fi), means the similarity between the detection target application and the i-th malignant specimen. A detailed description related to Equation 1 will be described later with reference to FIG.

본 발명의 또 다른 실시예에 따르면 탐지부(140)는 이러한 유사도가 일정한 임계치 이상인 때, 탐지 대상 응용이 해당 악성표본 종에 속하는 악성 코드임을 탐지할 수 있고, 임계치는 보안 상태에 따라 다르게 조정할 수 있다.According to another embodiment of the present invention, when the degree of similarity is equal to or greater than a predetermined threshold value, the detection unit 140 can detect that the detection target application is a malicious code belonging to the malicious sample species, and the threshold value can be adjusted differently have.

이하, 도 2 내지 도 5와 관련하여 악성코드 탐지장치가 악성코드를 탐지하는 방법에 대하여 설명한다.Hereinafter, a method for detecting a malicious code by the malicious code detection apparatus will be described with reference to FIGS. 2 to 5. FIG.

도 2는 본 발명의 일 실시예에 따른 악성코드 탐지장치가 악성코드를 탐지하는 방법의 순서도이다.2 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention.

먼저, 악성코드 탐지장치가 악성표본 및 정상표본으로부터 종 표본정보를 수집한다(S210). 예를 들어, 악성코드 탐지장치는 악성표본 또는 정상표본으로부터 문자열, 클래스(Class)명, 메소드(Method)명, API 호출부, 및 바이트코드 명령부의 네 가지 정보를 악성코드 탐지를 위한 표본정보로써 수집한다. 이는, 안드로이드에서 사용되는 응용의 실행코드 파일(Dalvik Executable (DEX)) 이 일반적으로 압축된 "응용 이름.apk" 파일 내 "classes.dex"의 형식으로 존재하는 표준구조 정보를 이용하여, 수집된 악성표본 또는 정상표본에서 "1) ASCII 문자열, 2) 클래스(Class), 메소드(Method) 명 문자열, 3) API 호출 이진문자열, 4) 메소드 구현 이진문자열"을 이진문자열의 형태로 수집하는 것을 의미한다.First, the malicious code detection apparatus collects species specimen information from the malignant specimen and the normal specimen (S210). For example, the malicious code detection device may detect malicious code or four types of information from the normal sample, namely, a string, a class name, a method name, an API call section, and a bytecode command section as sample information for detecting malicious code Collect. This is because the executable code file (Dalvik Executable (DEX)) of the application used in Android is collected using standard structure information existing in the form of "classes.dex" in the compressed "application name.apk" file (1) ASCII string, (2) Class, Method name string, (3) API call binary string, and (4) Method implementation binary string in the malicious or normal sample. do.

이어서, 악성코드 탐지장치는 표본정보를 정제하고 고유정보의 비율을 계산(산출)한다(S220, S230). 이때, 악성코드 탐지장치는 기저장된 또는 입력된 하나의 악성표본 또는 정상표본에서 발견된 하나 이상의 표본정보에 대하여 같은 종류의 표본정보가 여러 개인 경우 한 개의 표본정보로 보고, 모든 정상표본은 한 개의 종으로 본다. Subsequently, the malicious code detection device refines the sample information and calculates (calculates) the ratio of the unique information (S220, S230). At this time, the malicious code detection device reports one malicious sample inputted or one malicious sample inputted or one or more sample information found in the normal sample as one sample information when there is a plurality of the same kind of sample information, I see it as a bell.

도 3은 이러한 표본정보의 정제 및 비율 계산의 과정을 더욱 상세하게 도시한다.FIG. 3 shows the process of refining and calculating the sample information in more detail.

표본 정보를 정제하는 경우(S220), 예를 들어, 종 1에 포함된 악성표본이 네 개인 때, 종 1에 포함된 표본 1 내지 표본 4에서 정상표본에 포함된 모든 표본정보를 제거한다. 즉, 하나 이상의 정상표본이 포함하는 표본정보의 합집합 및 하나 이상의 악성표본이 포함하는 표본정보의 합집합이 공통으로 포함하는 표본정보를 제거한다. 더욱 상세한 설명은 도 4와 관련하여 후술한다. 이러한 공통 표본정보를 제거한 악성표본은 고유정보를 포함하고 있다. 정상표본에 대해서도 마찬가지로 수행된다.In the case of purifying the sample information (S220), for example, when there are four malicious samples included in the species 1, all the sample information included in the normal sample is removed from the samples 1 to 4 included in the species 1. That is, the sample information included in the union of the sample information included in one or more normal samples and the union of the sample information included in one or more malicious samples is removed. A more detailed description will be given later with respect to Fig. A malicious sample that has removed this common sample information contains unique information. The same is true for the normal sample.

이어서, 각 고유정보를 포함하는 악성표본 종의 비율 및 각 고유정보를 포함하는 정상표본 종의 비율을 계산한다(S230). 즉, 전체 악성표본 종 내 어느 한 고유정보를 포함하는 악성표본 수 대비 어느 한 악성표본 종 내에서 고유정보를 포함하는 악성표본 수를 이용하여 비율을 산출한다(비율 = (종 내 정보 A가 존재하는 표본 수) / (전체 표본 내 정보 A가 존재하는 표본 수)). 정상표본에 대해서는, 모든 정상표본이 하나의 종을 형성하는 경우, 비율은 전체 정상표본의 수 대비 전체 정상표본 내 어느 한 고유정보를 포함하는 정상표본 수일 수 있다. 이때, 비율은 한 종의 각 고유정보에 대한 가중치를 의미할 수 있다. 그리고, 이와 같이 산출된 비율(또는, 가중치)은 각 고유정보에 부여된다. Next, the ratio of the malignant sample species including each unique information and the ratio of the normal sample species including each unique information are calculated (S230). That is, the ratio is calculated by using the number of malicious samples including unique information in a malignant sample with respect to the number of malicious samples including any one unique information in the entire malignant sample (ratio = (The number of samples in which the information A in the whole sample exists)). For normal specimens, if all normal specimens form one species, the ratio may be a normal specimen count containing any unique information within the entire normal specimen versus the total number of normal specimens. In this case, the ratio can be a weight for each unique information of a species. The ratio (or weight) thus calculated is assigned to each unique information.

도 4는 표본정보의 정제 및 비율 계산의 과정에 대하여 설명하기 위한 일 예에 대하여 도시한다.FIG. 4 shows an example for explaining the process of refining sample information and calculating the ratio.

하나 이상의 악성표본 종(종 1, 종 2, 종 3)은 각 종에 속한 악성표본(악성표본1, 악성표본 2)을 포함하고, 정상표본(정상표본 1, 정상표본 2)은 한 표본이 하나의 종을 형성한다. 악성표본 및 정상표본은 표본 정보(A, B, C, D, E, F, G, H, I, J, K)를 포함한다.One or more malignant specimens (Species 1, 2 and 3) include malignant specimens belonging to each species (malignant specimen 1 and malignant specimen 2), and normal specimens (normal specimen 1 and normal specimen 2) It forms one species. Malicious samples and normal samples include sample information (A, B, C, D, E, F, G, H, I, J, K).

악성 코드 탐지장치가 하나 이상의 정상표본1 또는 정상표본 2에 존재하는 모든 표본 정보(E, F, I, J, K)를 악성표본에서 제거하여 악성표본의 고유 정보를 생성한다(S220). 따라서, 모든 악성표본 종에 존재하는 표본정보(A, B, C, D, E, F, G, H)에서 정상표본에 존재하는 표본정보(E, F)를 제거한다. 마찬가지로 하나 이상의 악성표본 종에 존재하는 모든 표본 정보(A, B, C, D, E, F, G, H)를 정상표본에서 제거하여 정상표본의 고유 정보를 생성한다. 따라서, 모든 정상표본 종에 존재하는 표본정보(E, F, I, J, K)에서 정상표본에 존재하는 표본정보(E, F)를 제거한다. 즉, 모든 악성표본 및 모든 정상표본에 공통으로 존재하는 코드정보를 각 종에서 제거한다. 그 결과, 악성표본 종1은 고유정보(A, D, G), 악성표본 종 2는 고유정보(B, D, H), 악성표본 종 3은 고유정보(C, G, H), 정상표본 종은 고유정보(I, J, K)를 포함한다.The malicious code detection device removes all the sample information E, F, I, J, and K from at least one normal sample 1 or the normal sample 2 from the malicious sample to generate unique information of the malicious sample at step S220. Therefore, the sample information (E, F) existing in the normal sample is removed from the sample information (A, B, C, D, E, F, G, H) existing in all the malignant sample species. Similarly, all sample information (A, B, C, D, E, F, G, H) existing in one or more malignant specimens is removed from the normal specimen to generate unique information of the normal specimen. Therefore, the sample information (E, F) existing in the normal sample is removed from the sample information (E, F, I, J, K) existing in all the normal sample species. That is, all malicious samples and code information common to all normal samples are removed from each species. As a result, the malignant specimen 1 has unique information (A, D, G), malignant specimen 2 has unique information (B, D, H) Species include unique information (I, J, K).

이어서, 각 악성표본 종에 존재하는 각 고유정보에 대하여 악성표본의 비율을 계산하는데(S230), 악성표본 종 1의 고유정보 D에 대하여 일 예를 들면, 전체 악성표본 종 중에서 고유정보 D를 포함하는 악성표본의 수는 3이고(악성표본 종1의 표본 1 및 표본 2, 악성표본 종 2의 표본 1), 악성표본 종 1에서 고유정보 D를 포함하는 악성표본의 수는 2이다(악성표본 종 1의 표본 1 및 표본 2). 따라서 비율은 2/3으로 산출된다. 악성표본 종 2의 고유정보 H에 대하여 다른 예를 들면, 전체 악성표본 종 중에서 고유정보 H를 포함하는 악성표본의 수는 2이고(악성표본 종 2의 표본 2, 악성표본 종 3의 표본 1), 악성표본 종 2에서 고유정보 H를 포함하는 악성표본의 수는 1이다(악성표본 종 2의 표본 2). 나머지, 악성표본 종 1의 고유정보 A, G, 악성표본 종 2의 고유정보 B, D, 악성표본 종 3의 고유정보 C, G, H에 대해서도 동일한 방법으로 산출된다. 그리고 이러한 비율(또는 가중치)이 각 고유정보에 부여된다.Subsequently, the ratio of the malicious sample to each unique information existing in each malicious sample is calculated (S230). For example, the unique information D of all malicious samples is included in the unique information D of the malicious sample 1 The number of malignant specimens is 3 (malignant specimen 1 specimen 1 and specimen 2, malignant specimen 2 specimen 1), and malignant specimen specimen 1 contains 2 specimen malignant specimens (malignant specimen 1 Specimen 1 and Specimen 2 of species 1). Therefore, the ratio is calculated as 2/3. For example, the number of malicious samples including unique information H among all malignant specimens is 2 (sample 2 of malignant specimen 2, specimen 1 of malignant specimen 3) , The number of malignant specimens containing unique information H in malignant specimen 2 is 1 (specimen 2 of malignant specimen 2). The remaining information C, G, and H of the malicious sample 1's unique information A, G, malicious sample 2's unique information B, D, and malicious sample 3 are also calculated in the same manner. These ratios (or weights) are given to each unique information.

정상표본은 표본에 관계없이 하나의 종을 형성할 수 있다. 따라서 악성표본과 달리 정상표본의 비율 계산시, 고유정보 J에 대하여, 전체 정상표본의 수는 2(정상표본 1, 정상표본 2)이고, 고유정보 J를 포함하는 정상표본의 수는 1(정상표본 1)인바, 고유정보 J에 대한 비율은 1/2로 계산된다. 나머지 고유정보 I, K에 대해서도 동일한 방법으로 산출된다. 그리고 이러한 비율(또는 가중치)이 각 고유정보에 부여된다.The normal sample can form one species regardless of the sample. Therefore, when calculating the ratio of the normal sample to the malignant sample, the total number of normal samples is 2 (normal sample 1 and normal sample 2) for the unique information J, and the number of normal samples including the unique information J is 1 Sample 1) Invar, the ratio to unique information J is calculated as 1/2. The remaining unique information I and K are also calculated in the same way. These ratios (or weights) are given to each unique information.

그리고 각 고유정보에 대해 부여된 비율을 기초로 탐지 대상 응용이 어느 악성 표본 종과 유사한지 판단하여 탐지대상 응용이 해당 악성표본 종으로 분류되는 악성코드임을 탐지할 수 있게 된다(S240). Then, based on the ratio given to each unique information, it is possible to detect that the detected application is a malicious code classified as the corresponding malicious sample by judging which malicious sample is similar to the malicious sample (S240).

본 발명의 일 실시예에 따르면 악성코드 탐지장치는 탐지 대상 응용 및 어느 한 악성표본 종이 공통으로 포함하고 있는 고유정보가 악성표본 종에 존재할 확률, 및 탐지 대상 응용 및 어느 한 정상표본이 공통으로 포함하고 있는 고유정보가 정상표본에 존재하지 않을 확률을 이용하여 탐지 대상 응용이 어느 악성표본 종과 유사한지 판단할 수 있다. According to one embodiment of the present invention, the malicious code detection device detects the probability that the unique information included commonly in the detection target application and the malicious sample paper exists in the malignant sample species, and the probability that the detection target application and the normal sample are included in common We can determine whether the target application is similar to a malignant specimen by using the probability that the unique information is not present in the normal specimen.

또한 본 발명의 다른 실시예에 따르면, 수학식 1을 이용하여 유사도를 산출하고 유사도에 기반해 어느 악성표본 종과 유사한지 판단할 수 있다. According to another embodiment of the present invention, the degree of similarity can be calculated using Equation (1), and it can be determined based on the degree of similarity that it is similar to a malignant sample species.

또한 본 발명의 또 다른 실시예에 따르면, 산출된 유사도가 특정 임계치 이상일 때, 탐지 대상 응용이 해당 유사도가 산출된 악성표본 종에 속하는 악성코드임을 탐지할 수 있다. 이와 관련하여 도 5에서 상세히 후술한다.According to another embodiment of the present invention, when the calculated similarity degree is equal to or greater than a specific threshold value, the detection target application can detect that the similarity is a malicious code belonging to the calculated malignant sample species. This will be described later in detail with reference to FIG.

도 5는 악성코드 탐지장치가 이와 같이 산출된 비율을 이용하여 탐지 대상 응용과 유사한 악성 표본을 탐지하는 방법에 대한 일 예이다.5 is an example of a method of detecting a malignant sample similar to a detection target application by using a ratio calculated by the malicious code detecting apparatus.

본 발명의 일 실시예에 따르면 탐지대상 응용에 대해서도, 상술한 바와 같이 정상표본 및 악성표본 공통으로 포함하는 표본정보(E, F)를 제거하고, 나머지 표본정보(즉, 고유정보)를 기초로 각 악성표본 종과의 유사성을 탐지한다. 예를 들어, 수학식 1을 이용한 경우, 악성표본 종 2와의 유사성을 탐지함에 있어서, 고유정보 B에 대하여 악성표본 종 2는 2/2(=1)의 비율을, 고유정보 D에 대하여 종 2는 1/3(=0.33)의 비율을, 고유정보 H에 대하여 악성표본 종 2는 1/2(=0.5)의 비율을 부여한다. 비율의 총합은 1.83이 된다. 탐지대상 응용(a)이 포함하는 고유정보는 A, D, G, J 인바, 이 중 고유정보 2와 공통되는 고유정보 D의 비율은 0.33이다. 그리고, 정상표본과 탐지대상 응용(a)간의 비유사성을 수학식 1에서 이용하기 위해, 정상표본에 존재하는 고유정보(I, J, K)의 비율의 총합은 2이고, 탐지대상 응용(a)에 공통으로 존재하는 고유정보는 J인바, J에 대한 정상표본의 비율은 0.5이다. 이를 수학식 1에 대입한 경우, 다음과 같이 0.14의 결과가 도출된다(

). 악성표본 종 1 및 종 3에 대해서도 이와 같이 유사도를 산출하면, 종 1에 대해서는 0.75, 종 3에 대해서는 0.19의 유사도가 산출된다. 따라서, 탐지대상 응용(a)은 이 중 가장 높은 유사도를 보이는 종 1로 탐지될 수 있다. 다만, 본 발명의 다른 실시예에 따르면, 유사도에 대하여 임계치를 기설정할 수 있고, 유사도가 일정 임계치를 초과한 경우에 대하여 탐지대상 응용(a)이 해당 악성표본 종으로 분류되는 악성코드임을 탐지할 수 있게 된다. According to the embodiment of the present invention, the sample information E and F included in the normal sample and common malicious sample are removed as described above, and based on the remaining sample information (i.e., unique information) It detects similarity with each malignant specimen. For example, in the case of using Equation 1, in detecting similarity with malicious sample 2, malicious sample 2 has a ratio of 2/2 (= 1) to unique information B, (= 0.33) for the characteristic information H, and a ratio of 1/2 (= 0.5) for the malicious sample 2 to the unique information H. The sum of the ratios is 1.83. The specific information contained in the detection target application (a) is A, D, G, and J, and the ratio of the unique information D common to the unique information 2 is 0.33. In order to use the non-similarity between the normal sample and the detection target application (a) in Equation (1), the sum of the ratios of the unique information (I, J, K) existing in the normal sample is 2, ), The ratio of the normal sample to J is 0.5. When this is substituted into Equation 1, the result of 0.14 is derived as follows (

). When the similarity is also calculated for the malignant specimen 1 and the species 3, the similarity of 0.75 for species 1 and 0.19 for species 3 is calculated. Therefore, the application (a) to be detected can be detected as the species 1 having the highest degree of similarity. However, according to another embodiment of the present invention, it is possible to set a threshold value for the degree of similarity, and to detect that the detection target application (a) is a malicious code classified as the malicious sample species when the degree of similarity exceeds a predetermined threshold value .

도 6은 본 발명의 일 실시예에 따른 악성코드 탐지장치를 이용하여 평균 탐지 성공률, 평균 오탐율을 계산한 성능 도표이다. FIG. 6 is a performance chart for calculating an average detection success rate and an average false rate using a malicious code detection apparatus according to an embodiment of the present invention.

도 6의 성능 도표는 공개된 안드로이드 악성표본 4개의 종, 79개의 표본과 1,680개의 정상표본을 사용하여 각 종에서 난수적으로 선택한 20%를 고유정보 생성표본으로, 나머지 80%를 검사 표본으로하여 10회 실험한 결과의 평균값이다. The performance chart of FIG. 6 uses 20 kinds of randomly selected samples from each species using the 4 types of malicious Android specs, 79 specimens and 1,680 normal specimens as the specimen of the unique information and the remaining 80% as the specimen This is the average of the results of 10 experiments.

표 1의 성능 비교표는 동일 기술분야의 해외 선행 연구에서 공개된 동일 성능지표와 본 기법의 실험에서 도출된 성능의 비교표이다. Accuracy, Precision, Recall의 3개 성능지표에서 모두 진보된 성능을 보일 뿐 아니라 Precision과 Recall의 종합평가지표인 F-Measure, 구체적으로 F1 Score평가에서도 높은 수치를 보였다.Table 1 shows a comparison table of the same performance metrics disclosed in previous foreign studies in the same technical field and the performance derived from the experiment of this technique. Accuracy, Precision, and Recall, all of the three performance indicators showed not only improved performance, but also the F-Measure, which is an overall evaluation index of Precision and Recall.

MethodMethod AccuracyAccuracy RecallRecall PrecisionPrecision F-F- measuremeasure AndroGuardAndroGuard 93.04%93.04% 49.58%49.58% 99.16%99.16% 66.11%66.11% DroidMatDroidMat 97.87%97.87% 87.39%87.39% 96.74%96.74% 91.83%91.83% ProposedProposed 99.89%99.89% 97.73%97.73% 99.74%99.74% 98.73%98.73%

본 발명은 이와 같이 다수의 응용 중에서 악성코드를 탐지하여야 할 때, 알려진 악성코드의 표본과 그 변종을 사전에 수집하여 정제한 종의 고유정보를 이용하여 정확하고 효율적으로 악성코드를 탐지함으로서 전문가가 직접 분석하여야 하는 대상을 대폭 감소시키고, 최종적으로 모바일 장치의 사용자를 악성코드로의 감염 위험으로부터 보호함과 동시에 악성코드에 감염된 모바일 장치에 의해 발생할 수 있는 이차적 피해를 방지하는 효과가 있다.When detecting malicious code among a plurality of applications, the present invention collects malicious code samples and variants thereof in advance, and detects malicious codes accurately and efficiently by using the unique information of the refined species. As a result, It is possible to greatly reduce the number of objects to be directly analyzed and to ultimately protect the user of the mobile device from the risk of infecting the malicious code and to prevent the secondary damage that may be caused by the malicious code infected mobile device.

참고로, 본 발명의 실시예에 따른 도 1에 도시된 구성 요소들은 소프트웨어 또는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)와 같은 하드웨어 구성 요소를 의미하며, 소정의 역할들을 수행한다.1 refers to a hardware component such as software or an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit), and performs predetermined roles .

그렇지만 '구성 요소들'은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 각 구성 요소는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다.However, 'components' are not meant to be limited to software or hardware, and each component may be configured to reside on an addressable storage medium and configured to play one or more processors.

따라서, 일 예로서 구성 요소는 소프트웨어 구성 요소들, 객체지향 소프트웨어 구성 요소들, 클래스 구성 요소들 및 태스크 구성 요소들과 같은 구성 요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다.Thus, by way of example, an element may comprise components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, Routines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

구성 요소들과 해당 구성 요소들 안에서 제공되는 기능은 더 작은 수의 구성 요소들로 결합되거나 추가적인 구성 요소들로 더 분리될 수 있다.The components and functions provided within those components may be combined into a smaller number of components or further separated into additional components.

한편, 도 1에서 도시된 각각의 구성요소는 일종의 '모듈'로 구성될 수 있다. 상기 '모듈'은 소프트웨어 또는 Field Programmable Gate Array(FPGA) 또는 주문형 반도체(ASIC, Application Specific Integrated Circuit)과 같은 하드웨어 구성요소를 의미하며, 모듈은 어떤 역할들을 수행한다. 그렇지만 모듈은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. 모듈은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 실행시키도록 구성될 수도 있다. 구성요소들과 모듈들에서 제공되는 기능은 더 작은 수의 구성요소들 및 모듈들로 결합되거나 추가적인 구성요소들과 모듈들로 더 분리될 수 있다.Each component shown in FIG. 1 may be composed of a 'module'. The term 'module' refers to a hardware component such as software or a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), and the module performs certain roles. However, a module is not limited to software or hardware. A module may be configured to reside on an addressable storage medium and may be configured to execute one or more processors. The functionality provided by the components and modules may be combined into a smaller number of components and modules or further separated into additional components and modules.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. One embodiment of the present invention may also be embodied in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.

상술한 본 발명에 따른 악성코드 탐지방법은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터 시스템에 의하여 해독될 수 있는 데이터가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다.The above-described malicious code detection method according to the present invention can be implemented as computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, it may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device, or the like. In addition, the computer-readable recording medium may be distributed and executed in a computer system connected to a computer network, and may be stored and executed as a code readable in a distributed manner.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다. While the methods and systems of the present invention have been described in connection with specific embodiments, some or all of those elements or operations may be implemented using a computer system having a general purpose hardware architecture.

S210: 고유정보 수집
S220, S230: 정제 및 비율 계산
S240: 악성코드 여부 탐지
110: 정보추출부
120: 표본정보 정제부
130: 가중치 계산부
140: 탐지부S210: Collecting unique information
S220, S230: Purification and ratio calculation
S240: Detecting malicious code
110:
120: Sample information refinement unit
130: Weight calculation unit
140:

Claims

In a malicious code detection device,
An information extracting unit for collecting malicious samples and normal samples from previously stored or inputted malicious codes and normal codes;
A sample information refinement unit for generating unique information of the malicious sample and the normal sample by removing code information common to the collected malicious sample and the normal sample from the malicious sample and the normal sample;
A weight calculation unit for calculating a weight of each unique information included in the species of the malignant specimen and the species of the normal specimen; And
And a detection unit detecting a malicious code corresponding to the species of the malicious sample to which the detection target application corresponds based on the calculated weight and the probability of one or more unique information included in the detection target application,
The malignant specimen and the normal specimen are classified into one species of one or more malignant specimens and species of a normal specimen,
Wherein the probability of the unique information is calculated based on a weight of each unique information contained in the species of the one or more malign samples and the species of the normal specimen.

The method according to claim 1,
The weight calculation unit may calculate,
The malicious sample including at least one unique information contained in the malicious sample is compared with the number of malicious samples included in the malicious sample, and based on the ratio of the malicious sample including any one of the unique information in the species of the malicious sample, Weights are calculated for each unique information contained in the species,
Wherein the number of normal samples including one of the unique information contained in the normal sample is greater than the number of normal samples included in the normal sample, A malicious code detection device that calculates weights for each unique information contained in a species.

The method according to claim 1,
The detection unit detects,
A probability that the detection target application is included in the species of the malicious sample based on at least one unique information included in the detection target application, and a weight of each unique information included in the species of the malicious sample and the species of the normal sample, And determining whether the detection target application is similar to a malignant sample species by calculating a probability that the detection result is not included in the species of the normal specimen.

The method according to claim 1,
The detection unit detects,
Wherein the malicious code detection apparatus determines whether the detection target application is similar to a malignant sample species using the following equation.
[Mathematical Expression]

a is the detection application, Fi is the union of the sample information contained in the i-th malignant specimen, k is the unique information, p _k is the ratio of the malignant specimen containing the unique information k or the ratio of the normal specimen, W is one or more The union of sample information included in the normal sample, S (a, Fi) is the similarity of the detection target application and the i-th malignant sample species.

5. The method of claim 4,
The detection unit detects,
And detects that the detection target application is a malicious code belonging to the i-th malicious sample species when the similarity degree is equal to or greater than a predetermined threshold value.

In a malicious code detection method,
Collecting malignant and normal specimens from pre-stored or input malicious code and normal code;
Generating unique information of the malignant specimen and the normal specimen by removing code information common to the collected malignant specimen and the normal specimen from the malignant specimen and the normal specimen;
Calculating a weight of each unique information included in the species of the malignant specimen and the species of the normal specimen; And
Detecting a malicious code corresponding to a species of the malicious sample to which the detection target application corresponds based on the calculated weight and one or more unique information included in the detection target application,
The malignant specimen and the normal specimen are classified into one species of at least one malignant specimen and species of a normal specimen,
Wherein the probability of the unique information is calculated based on a weight of each unique information contained in the one or more malignant specimens and the species of the normal specimen.

The method according to claim 6,
The step of calculating the weight includes:
The malicious sample including at least one unique information contained in the malicious sample is compared with the number of malicious samples included in the malicious sample, and based on the ratio of the malicious sample including any one of the unique information in the species of the malicious sample, Weights are calculated for each unique information contained in the species,
Wherein the number of normal samples including one of the unique information contained in the normal sample is greater than the number of normal samples included in the normal sample, A malicious code detection method that calculates weights for each unique information contained in a species.

The method according to claim 6,
Wherein the detecting comprises:
A probability that the detection target application is included in the species of the malicious sample based on at least one unique information included in the detection target application, and a weight of each unique information included in the species of the malicious sample and the species of the normal sample, And calculating a probability that the detection target application is not included in the species of the normal specimen to judge whether the detection target application is similar to a malignant specimen.

The method according to claim 6,
Wherein the detecting comprises:
A malicious code detection method for determining whether a detection target application is similar to a malicious sample using the following equation.
[Mathematical Expression]

10. The method of claim 9,
Wherein the detecting comprises:
And detecting that the detection target application is a malicious code belonging to the i-th malicious sample species when the similarity degree is equal to or greater than a predetermined threshold value.