KR102053869B1

KR102053869B1 - Method and apparatus for detecting malignant code of linux environment

Info

Publication number: KR102053869B1
Application number: KR1020190063136A
Authority: KR
Inventors: 임정환
Original assignee: 쿤텍 주식회사
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2020-01-22

Abstract

Disclosed are a method for detecting a malicious code in a Linux environment and an apparatus thereof. According to one embodiment of the present invention, the apparatus for detecting a malicious code, which is to detect a malicious code in a Linux environment, comprises: a behavior data collection module collecting behavior data including each system call and system call parameter called as a target file is executed; a behavior data processing module classifying collected one or more system calls into any one of predetermined behavior groups to generate the behavior group related information of the target file; and a similarity determination module measuring a similarity between the behavior group related information of the target file and the behavior group related information of a pre-stored malicious file.

Description

METHOD AND APPARATUS FOR DETECTING MALIGNANT CODE OF LINUX ENVIRONMENT}

본 발명의 실시예는 리눅스 환경의 악성 코드 검출 기술과 관련된다. Embodiments of the invention relate to malware detection techniques in the Linux environment.

최근, 악성 코드가 증가하고 있으며 증가하는 악성 코드에 대한 신속한 대응을 위해 악성 코드 분석을 자동화하려는 연구가 활발히 진행되고 있다. 하지만, 기존의 정적 분석만으로는 한계가 있으며, 동적 행위 분석을 통한 악성 코드 탐지 기술에 대한 연구가 진행되고 있다. Recently, the number of malicious codes is increasing, and researches are being actively conducted to automate malicious code analysis in order to quickly respond to the increasing malicious codes. However, the existing static analysis alone has its limitations, and researches on malicious code detection technology through dynamic behavior analysis have been conducted.

이러한 동적 행위 분석은 API(Application Program Interface)나 시스템 콜 등 커널 또는 유저 영역의 함수 호출 정보 등을 기반으로 하고 있다. 윈도우 환경의 경우, 커널 API가 기능적으로 세분화되어 있어 이러한 API 호출 기반의 동적 행위 분석이 유효하다. Such dynamic behavior analysis is based on function call information of kernel or user area such as API (Application Program Interface) or system call. In the case of the Windows environment, the kernel API is functionally divided so that dynamic behavior analysis based on such API call is valid.

예를 들어, 윈도우 환경의 API는 CreateProcess(프로세스 생성), WinExec(프로그램 실행), ShellExecute(쉘 실행), CreateFile(파일 생성), CopyFile(파일 복사), RegCreateKeyEx(레지스트리 키 생성), RegSetValueEx(레지스트리 키 값 설정) 등 그 기능이 세분화 되어 있어 개별 API 호출만으로도 각 행위에 대한 정보를 용이하게 파악할 수 있다.For example, the Windows environment's APIs include CreateProcess, WinExec, ShellExecute, CreateFile, CopyFile, RegCreateKeyEx, and RegSetValueEx. Its functions, such as setting a value, are subdivided so that you can easily grasp information about each action with individual API calls.

그러나, 리눅스 환경은 윈도우와는 달리 개별 API가 기능적으로 세분화되어 있지 않아 개별 API 만으로는 행위 분석을 하기가 어렵다. 또한, 파일 접근, 네트워크 접근, 프로세스 생성 소멸 등 시스템 콜이 혼재되어 있는 바, 단순한 시스템 콜 호출 정보 및 순서 만으로는 악성 코드와의 유사도를 측정하기 어렵다.However, unlike the Windows environment, individual APIs are not functionally divided, so it is difficult to analyze behavior only by individual APIs. In addition, since system calls such as file access, network access, and process creation and destruction are mixed, it is difficult to measure similarity with malicious code only by simple system call call information and sequence.

01)(한국등록특허공보 제10-1620931호, 2016.05.13)01) (Korea Patent Publication No. 10-1620931, 2016.05.13)

본 발명의 실시예는 리눅스 환경에서 악성 코드를 검출하기 위한 새로운 기법을 제공하기 위한 것이다. An embodiment of the present invention is to provide a new technique for detecting malicious code in the Linux environment.

개시되는 일 실시예에 따른 악성 코드 검출 장치는, 리눅스 환경에서 악성 코드를 검출하기 위한 장치로서, 대상 파일이 실행되면서 호출되는 각 시스템 콜 및 시스템 콜 파라미터를 포함하는 행위 데이터를 수집하는 행위 데이터 수집 모듈; 상기 수집된 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하여 상기 대상 파일의 행위 그룹 관련 정보를 생성하는 행위 데이터 가공 모듈; 및 상기 대상 파일의 행위 그룹 관련 정보와 기 저장된 악성 파일의 행위 그룹 관련 정보 간의 유사도를 측정하는 유사도 판단 모듈을 포함한다.Malware detection apparatus according to an embodiment is a device for detecting malicious code in a Linux environment, collecting behavior data to collect behavior data including each system call and system call parameters that are called while the target file is executed module; An action data processing module for classifying the collected one or more system calls into any one action group among preset action groups to generate action group related information of the target file; And a similarity determination module for measuring similarity between the action group related information of the target file and the previously stored action group related information of the malicious file.

상기 행위 데이터 가공 모듈은, 상기 수집된 각 시스템 콜을 기 설정된 카테고리 별로 분류하고, 해당 시스템 콜에 상기 분류된 카테고리에 기 설정된 카테고리 분류 번호를 부여하는 전처리부; 상기 수집된 각 시스템 콜의 호출 순서에 따라 상기 카테고리 분류 번호 및 상기 시스템 콜 파라미터를 포함하는 시스템 콜 테이블을 생성하는 시스템 콜 테이블 생성부; 및 상기 시스템 콜 테이블의 카테고리 분류 번호 및 시스템 콜 파라미터를 기반으로 상기 수집된 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하고, 분류된 행위 그룹에 대해 기 설정된 행위 그룹 번호를 부여하는 그룹 분류부를 포함할 수 있다.The behavior data processing module may include: a pre-processing unit classifying each of the collected system calls by a predetermined category, and assigning a predetermined category classification number to the classified category to the corresponding system call; A system call table generation unit generating a system call table including the category classification number and the system call parameter according to a call order of the collected system calls; And classify the collected one or more system calls into any one of a set of action groups based on a category classification number and a system call parameter of the system call table, and set a predetermined action group number for the classified action group. It may include a group classification unit for granting.

상기 그룹 분류부는, 상기 하나 이상의 시스템 콜을 시스템 콜의 유형에 따라 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하고, 상기 시스템 콜의 유형은, 단일 시스템 콜 유형, 동일 카테고리의 연속 시스템 콜 유형, 및 복합 카테고리 시스템 콜 유형을 포함할 수 있다.The group classification unit classifies the one or more system calls into any one action group among predetermined action groups according to the type of system call, and the type of the system call is a single system call type or a continuous system call of the same category. Types, and complex category system call types.

상기 행위 데이터 가공 모듈은, 상기 분류된 각 행위 그룹의 시간 순서에 따라 행위 그룹 번호 및 행위 그룹 파라미터를 포함하는 컨텍스트 맵 테이블을 생성하는 컨텍스트 맵 테이블 생성부를 더 포함하고, 상기 유사도 판단 모듈은, 상기 대상 파일의 컨텍스트 맵 테이블과 기 저장된 악성 파일의 컨텍스트 맵 테이블 간의 유사도를 측정할 수 있다.The behavior data processing module may further include a context map table generation unit configured to generate a context map table including an action group number and an action group parameter according to a time sequence of each classified action group, and the similarity determination module may include: The similarity between the context map table of the target file and the context map table of the previously stored malicious file can be measured.

상기 유사도 판단 모듈은, 상기 대상 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호의 시간 순서와 기 저장된 악성 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호의 시간 순서 간의 유사성을 판단하여 제1 유사도를 측정할 수 있다.The similarity determination module may measure the first similarity by determining similarity between the time order of the action group number in the context map table of the target file and the time order of the action group number in the prestored context map table of the malicious file.

상기 유사도 판단 모듈은, 상기 대상 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호 및 행위 그룹 파라미터와 기 저장된 악성 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호 및 행위 그룹 파라미터 간의 유사성을 판단하여 제2 유사도를 측정할 수 있다.The similarity determination module may measure a second similarity by determining similarity between the action group number and the action group parameter in the context map table of the target file and the action group number and the action group parameter in the prestored context map table of the malicious file. have.

상기 유사도 판단 모듈은, 상기 제1 유사도에 제1 가중치를 부여하고, 상기 제2 유사도에 상기 제1 가중치 보다 높은 제2 가중치를 부여한 후 이를 합산하여 상기 대상 파일의 컨텍스트 맵 테이블과 기 저장된 악성 파일의 컨텍스트 맵 테이블 간 총 유사도 점수를 산출할 수 있다.The similarity determination module gives a first weight to the first similarity, gives a second weight higher than the first weight to the second similarity, adds the second weight, and adds it to the context map table of the target file and the previously stored malicious file. The total similarity score between the context map tables can be calculated.

개시되는 일 실시예에 따른 리눅스 환경의 악성 코드 검출 방법은, 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치에서 수행되는 방법으로서, 대상 파일이 실행되면서 호출되는 각 시스템 콜 및 시스템 콜 파라미터를 수집하는 동작; 상기 수집된 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하여 상기 대상 파일의 행위 그룹 관련 정보를 생성하는 동작; 및 상기 대상 파일의 행위 그룹 관련 정보와 기 저장된 악성 파일의 행위 그룹 관련 정보 간의 유사도를 측정하는 동작을 포함한다.A malicious code detection method of a Linux environment according to an embodiment of the present disclosure is a method performed in a computing device having one or more processors and a memory storing one or more programs executed by the one or more processors. Collecting each system call and system call parameter that is called as the file is executed; Generating the action group related information of the target file by classifying the collected one or more system calls into any one action group among preset action groups; And measuring similarity between the action group related information of the target file and the action group related information of the pre-stored malicious file.

개시되는 실시예에 의하면, 대상 파일이 실행되면서 호출되는 각 시스템 콜 및 시스템 콜 파라미터를 수집하고, 수집된 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하여 대상 파일의 행위 그룹 관련 정보를 생성함으로써, 리눅스 환경에서도 시스템 콜에 따른 행위를 분류하여 시스템 콜 기반의 악성 코드 동적 분석을 수행할 수 있게 된다.According to an exemplary embodiment of the present disclosure, each system call and system call parameter that is called while the target file is executed are collected, and the collected one or more system calls are classified into any one action group among predetermined action groups, thereby acting on the target file. By generating group-related information, it is possible to classify the actions according to system calls in the Linux environment and perform dynamic analysis of malicious codes based on system calls.

도 1은 리눅스 환경에서 시스템 콜이 혼재된 상태를 나타낸 도면
도 2는 개시되는 일 실시예에 따른 리눅스 환경의 악성 코드 검출 장치의 구성을 나타낸 블록도
도 3은 개시되는 일 실시예에 따른 행위 데이터 가공 모듈의 구성을 나타낸 블록도
도 4는 개시되는 일 실시예에서 단일 시스템 콜 유형에 대해 행위 그룹을 분류하는 상태를 나타낸 도면
도 5는 개시되는 일 실시예에서 동일 카테고리의 연속 시스템 콜 유형에 대해 행위 그룹을 분류하는 상태를 나타낸 도면
도 6은 개시되는 일 실시예에서 복합 카테고리 시스템 콜 유형에 대해 행위 그룹을 분류하는 상태를 나타낸 도면
도 7은 개시되는 일 실시예에 따라 분류된 행위 그룹들을 시간 순으로 나타낸 그래프
도 8은 본 발명의 일 실시예에 따른 리눅스 환경의 악성 코드 검출 방법을 나타낸 흐름도
도 9는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a diagram illustrating a system call mixed state in a Linux environment
2 is a block diagram illustrating a configuration of an apparatus for detecting malicious code in a Linux environment according to an embodiment of the present disclosure;
3 is a block diagram showing a configuration of a behavior data processing module according to an embodiment of the present disclosure;
FIG. 4 illustrates a state classification of behavior groups for a single system call type in one disclosed embodiment. FIG.
5 is a diagram illustrating a state of classifying an action group for consecutive system call types of the same category in an embodiment of the present disclosure.
6 is a diagram illustrating a state of classifying an action group for a compound category system call type in an embodiment of the present disclosure.
7 is a graph showing chronological order of behavior groups classified according to one disclosed embodiment.
8 is a flowchart illustrating a malicious code detection method in a Linux environment according to an embodiment of the present invention.
9 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to assist in a comprehensive understanding of the methods, devices, and / or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments of the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, terms to be described below are terms defined in consideration of functions in the present invention, and may be changed according to intention or custom of a user or an operator. Therefore, the definition should be made based on the contents throughout the specification. The terminology used in the description is for the purpose of describing embodiments of the invention only and should not be limiting. Unless expressly used otherwise, the singular forms “a,” “an,” and “the” include plural forms of meaning. In this description, expressions such as "comprises" or "equipment" are intended to indicate certain features, numbers, steps, actions, elements, portions or combinations thereof, and one or more than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, actions, elements, portions or combinations thereof.

이하의 설명에 있어서, 신호 또는 정보의 "전송", "통신", "송신", "수신" 기타 이와 유사한 의미의 용어는 일 구성요소에서 다른 구성요소로 신호 또는 정보가 직접 전달되는 것뿐만이 아니라 다른 구성요소를 거쳐 전달되는 것도 포함한다. 특히 신호 또는 정보를 일 구성요소로 "전송" 또는 "송신"한다는 것은 그 신호 또는 정보의 최종 목적지를 지시하는 것이고 직접적인 목적지를 의미하는 것이 아니다. 이는 신호 또는 정보의 "수신"에 있어서도 동일하다. 또한 본 명세서에 있어서, 2 이상의 데이터 또는 정보가 "관련"된다는 것은 하나의 데이터(또는 정보)를 획득하면, 그에 기초하여 다른 데이터(또는 정보)의 적어도 일부를 획득할 수 있음을 의미한다. In the following description, the terms "transfer", "communication", "transmit", "receive", and the like in the sense of a signal or information are not only to directly convey the signal or information from one component to another component, It also includes passing through other components. In particular, "transmitting" or "transmitting" a signal or information to one component indicates the final destination of the signal or information and does not mean a direct destination. The same is true for the "reception" of a signal or information. In addition, in this specification, "relating" two or more pieces of data or information means that if one data (or information) is obtained, at least a portion of the other data (or information) can be obtained based thereon.

한편, 상측, 하측, 일측, 타측 등과 같은 방향성 용어는 개시된 도면들의 배향과 관련하여 사용된다. 본 발명의 실시예의 구성 요소는 다양한 배향으로 위치 설정될 수 있으므로, 방향성 용어는 예시를 목적으로 사용되는 것이지 이를 제한하는 것은 아니다.Meanwhile, directional terms such as upper side, lower side, one side, the other side, and the like are used in connection with the orientation of the disclosed drawings. Since components of the embodiments of the present invention may be positioned in various orientations, the directional terminology is used for the purpose of illustration and not limitation.

또한, 제1, 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로 사용될 수 있다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.In addition, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms may be used for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

개시되는 실시예는, 리눅스 환경(즉, 운영 체제가 리눅스(Linux)인 컴퓨팅 환경)에서 악성 코드를 탐지하기 위한 기술이다. 리눅스 환경의 경우, 오픈 소스로서 배포판 및 라이브러리 등이 다양하여 윈도우 환경과는 달리 API 호출에 기반해서는 악성 코드를 탐지하기 어려운 점이 있다. The disclosed embodiment is a technique for detecting malicious code in a Linux environment (ie, a computing environment in which the operating system is Linux). In the case of Linux environment, there are various distributions and libraries as open source, so unlike the Windows environment, it is difficult to detect malicious code based on API calls.

리눅스 환경의 경우 시스템 콜 호출 기반으로 악성 코드를 탐지할 수 있다. 시스템 콜은 운영 체제가 파일 접근 또는 네트워크 접근 등과 같은 커널 영역 기능을 사용하기 위한 인터페이스로서, 시스템 콜 분석을 통해 응용 프로그램의 행위를 알 수 있다. In the Linux environment, malicious code can be detected based on system call calls. The system call is an interface for the operating system to use kernel area functions such as file access or network access. The system call analysis can be used to understand the behavior of an application program.

그러나, 리눅스 환경에서 시스템 콜은 기초적인 함수 호출로 이루어져 있어 개별 시스템 콜 정보만으로는 그 행위 맥락을 파악하기 어렵다. 즉, 표 1에서 보는 바와 같이, 리눅스 환경에서 시스템 콜은 open, read, write, close 등과 같은 원시적인 함수 호출로 이루어져 있기 때문에 개별 시스템 콜 만으로는 행위 분석이 어렵게 된다. However, in a Linux environment, system calls are made up of basic function calls, so it is difficult to understand the context of the behavior based on individual system call information. In other words, as shown in Table 1, in Linux environment, system call is composed of primitive function calls such as open, read, write, and close.

Linux System Call의 예Example of Linux System Call Sys_createSys_create 생성produce Sys_openSys_open 열기Heat Sys_readSys_read 읽기read Sys_forkSys_fork 포크fork Sys_writeSys_write 쓰기writing Sys_closeSys_close 닫기close

또한, 리눅스 환경에서 시스템 콜은 동일한 로직인 경우에도 실행 시 호출 순서가 변경되는 경우가 있고, 도 1에서 보는 바와 같이, 프로세스, 네트워크 콜, 파일 접근 콜 등 여러 카테고리의 시스템 콜이 혼재되어 있는 바, 시스템 콜의 호출 정보 및 순서만으로 악성 코드와의 유사도를 판단하기 어렵게 된다.In addition, even in the case of the system logic in the Linux environment, the call order may be changed when executed. As shown in FIG. 1, system calls of various categories such as processes, network calls, and file access calls are mixed. In other words, it is difficult to determine similarity with malicious code only by the call information and sequence of system calls.

이에, 개시되는 실시예에서는, 대상 파일이 실행되면서 호출되는 각 시스템 콜 및 시스템 콜 파라미터를 수집하고, 수집된 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하여 대상 파일의 행위 그룹 관련 정보를 생성하며, 대상 파일의 행위 그룹 관련 정보와 기 저장된 악성 파일의 행위 그룹 관련 정보 간의 유사도를 측정하여 대상 파일에 악성 코드 존재 여부를 확인할 수 있게 된다. 즉, 리눅스 환경에서도 시스템 콜에 기반하여 동적으로 악성 코드를 검출할 수 있게 된다. 이하, 이에 대해 자세히 설명하기로 한다.Accordingly, in the disclosed embodiment, each system call and system call parameter that is called while the target file is executed are collected, and the collected one or more system calls are classified into any one action group among the predetermined action groups, and the Behavior group related information is generated and the similarity between the behavior group related information of the target file and the behavior group related information of the pre-stored malicious file can be checked to determine whether the malicious code exists in the target file. That is, even in the Linux environment, malicious code can be detected dynamically based on system calls. This will be described in detail below.

도 2는 개시되는 일 실시예에 따른 리눅스 환경의 악성 코드 검출 장치의 구성을 나타낸 블록도이다. 2 is a block diagram illustrating a configuration of an apparatus for detecting malicious code in a Linux environment according to an embodiment of the present disclosure.

도 2를 참조하면, 리눅스 환경의 악성 코드 검출 장치(100)는 행위 데이터 수집 모듈(102), 행위 데이터 가공 모듈(104), 및 유사도 판단 모듈(106)을 포함할 수 있다. 도 2에 도시된 악성 코드 검출 장치(100)의 구성은, 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 본 발명에 따른 기능을 수행하기 위하여 상호 기능적으로 연결될 수 있으며, 어느 하나 이상의 구성이 실제 물리적으로는 서로 통합되어 구현될 수도 있다.Referring to FIG. 2, the malicious code detection apparatus 100 of the Linux environment may include a behavior data collection module 102, a behavior data processing module 104, and a similarity determination module 106. The configuration of the malicious code detection apparatus 100 illustrated in FIG. 2 is a functional element that is functionally divided, and may be mutually functionally connected to perform a function according to the present invention, and any one or more components may be physically physically connected. May be implemented to be integrated with each other.

행위 데이터 수집 모듈(102)은 대상 파일(즉, 악성 코드의 검출 대상이 되는 파일)의 실행에 따른 행위 데이터를 수집할 수 있다. 예시적인 실시예에서, 행위 데이터 수집 모듈(102)은 가상 머신을 통해 대상 파일을 실행하여 행위 데이터를 수집할 수 있다. The behavior data collection module 102 may collect behavior data according to execution of a target file (that is, a file to be detected as malicious code). In an example embodiment, the behavior data collection module 102 may execute the target file via the virtual machine to collect behavior data.

행위 데이터 수집 모듈(102)은 대상 파일이 실행되면서 호출되는 개별 시스템 콜 및 시스템 콜 파라미터를 수집할 수 있다. 여기서, 시스템 콜 파라미터는 시스템 콜 호출 시 동반되는 호출 인자로, 예를 들어 파일명, 파일 기술자 등이 있다. 표 2는 개시되는 일 실시예에 따른 시스템 콜 및 시스템 콜 파라미터를 나타낸 표이다. The behavior data collection module 102 may collect individual system calls and system call parameters that are invoked as the target file is executed. Here, the system call parameter is a call argument accompanying the system call, for example, a file name, a file descriptor, and the like. Table 2 is a table showing system call and system call parameters according to an embodiment disclosed.

시스템 콜System call 시스템 콜 파라미터System call parameters Sys_unlinkSys_unlink ./0x7a30786ceo./0x7a30786ceo Sys_openSys_open "/dev/misc/watchdog", 2,0"/ dev / misc / watchdog", 2,0 Sys_chdirSys_chdir "/""/" Sys_socketSys_socket 2, 2, 02, 2, 0 Sys_connectSys_connect 3, 8.8.8.8, 450843, 8.8.8.8, 45084 Sys_closeSys_close 33 Sys_forkSys_fork Sys_writeSys_write 2, 0x14394, 12, 0x14394, 1

행위 데이터 가공 모듈(104)은 수집된 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하여 대상 파일의 행위 그룹 관련 정보를 생성할 수 있다. The behavior data processing module 104 may classify the collected one or more system calls into one of the predetermined behavior groups to generate behavior group related information of the target file.

도 3은 개시되는 일 실시예에 따른 행위 데이터 가공 모듈(104)의 구성을 나타낸 블록도이다. 도 3을 참조하면, 행위 데이터 가공 모듈(104)은 전처리부(111), 시스템 콜 테이블 생성부(113), 그룹 분류부(115), 및 컨텍스트 맵 테이블 생성부(117)를 포함할 수 있다. 3 is a block diagram showing the configuration of the behavior data processing module 104 according to one embodiment. Referring to FIG. 3, the behavior data processing module 104 may include a preprocessor 111, a system call table generator 113, a group classifier 115, and a context map table generator 117. .

일 실시예에서, 전처리부(111), 시스템 콜 테이블 생성부(113), 그룹화부(115), 및 컨텍스트 맵 테이블 생성부(117)는 물리적으로 구분된 하나 이상의 장치를 이용하여 구현되거나, 하나 이상의 프로세서 또는 하나 이상의 프로세서 및 소프트웨어의 결합에 의해 구현될 수 있으며, 도시된 예와 달리 구체적 동작에 있어 명확히 구분되지 않을 수 있다.In one embodiment, the preprocessor 111, the system call table generator 113, the grouper 115, and the context map table generator 117 are implemented using one or more physically separated devices, or one It may be implemented by the above processor or a combination of one or more processors and software, and unlike the illustrated example may not be clearly distinguished in the specific operation.

전처리부(111)는 행위 데이터 수집 모듈(102)이 수집한 개별 시스템 콜을 기 설정된 카테고리 별로 분류하고, 개별 시스템 콜에 대해 고유 번호를 부여할 수 있다. 예시적인 실시예에서, 전처리부(111)는 각 시스템 콜의 행위 특성에 따라 각 시스템 콜을 프로세스 카테고리, 파일 카테고리, 및 네트워크 카테고리 등으로 분류할 수 있다. 여기서, 기 설정된 각 카테고리는 분류 번호가 설정될 수 있다. 예를 들어, 프로세스 카테고리는 1, 파일 카테고리는 2, 네트워크 카테고리는 3 등으로 카테고리 분류 번호가 설정될 수 있다. The preprocessor 111 may classify the individual system calls collected by the behavior data collection module 102 by preset categories and may assign unique numbers to the individual system calls. In an exemplary embodiment, the preprocessor 111 may classify each system call into a process category, a file category, a network category, and the like according to the behavior characteristic of each system call. Here, a category number may be set for each preset category. For example, a category classification number may be set to 1 for a process category, 2 for a file category, 3 for a network category, and the like.

전처리부(111)는 수집된 각 시스템 콜에 대해 해당 시스템 콜의 행위 특성에 따라 각 시스템 콜을 기 설정된 카테고리 중 어느 하나의 카테고리로 분류하고, 해당 시스템 콜에 상기 분류된 카테고리에 설정된 카테고리 분류 번호를 부여할 수 있다. 표 3은 전처리부(111)가 각 시스템 콜에 대해 고유 번호 및 카테고리 분류 번호를 부여한 상태를 나타낸 표이다. The preprocessor 111 classifies each system call into any one category among preset categories according to the behavior characteristics of the corresponding system call for each collected system call, and the category classification number set in the category classified in the corresponding system call. Can be given. Table 3 is a table showing a state in which the preprocessor 111 has assigned a unique number and category classification number for each system call.

시스템 콜System call 고유 번호Unique number 카테고리 분류번호Category Classification Number Sys_openSys_open 1One 33 Sys_writeSys_write 22 33 ...... ...... ...... Sys_socketSys_socket 3535 22 Sys_connectSys_connect 3636 22 ...... ...... ...... Sys_forkSys_fork 122122 1One ...... ...... ......

시스템 콜 테이블 생성부(113)는 행위 데이터 수집 모듈(102)이 수집한 개별 시스템 콜에 대해 시스템 콜 테이블을 생성할 수 있다. 예시적인 실시예에서, 시스템 콜 테이블 생성부(113)는 수집된 시스템 콜의 호출 순서에 따라 카테고리 분류 번호, 시스템 콜 고유 번호, 파일 기술자, 및 시스템 콜 파라미터를 포함하는 시스템 콜 테이블을 생성할 수 있다. 표 4는 개시되는 일 실시예에 따른 시스템 콜 테이블로서, 표 2에 나타낸 각 시스템 콜에 대해 시스템 콜 테이블을 작성한 예를 나타낸 것이다. The system call table generator 113 may generate a system call table for individual system calls collected by the behavior data collection module 102. In an exemplary embodiment, the system call table generator 113 may generate a system call table including a category classification number, a system call unique number, a file descriptor, and a system call parameter according to the collected call order of the system call. have. Table 4 is a system call table according to an embodiment of the present disclosure, and shows an example of creating a system call table for each system call shown in Table 2.

호출순서
Call Order 카테고리
분류번호category
Classification number 시스템 콜
고유번호System call
ID number
파일 기술자
File descriptor 시스템 콜
파라미터System call
parameter 1One 33 99 1One ./0x7a30786ceo./0x7a30786ceo 22 33 1One 22 /dev/misc/watchdog/ dev / misc / watchdog 33 33 55 NULLNULL /Of 44 22 3535 33 22 55 22 3636 33 8.8.8.88.8.8.8 66 33 22 33 33 77 1One 1212 NULLNULL NULLNULL 88 1One 44 22 22

그룹 분류부(115)는 시스템 콜 테이블의 카테고리 분류 번호 및 시스템 콜 파라미터를 기반으로 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류할 수 있다. The group classifier 115 may classify one or more system calls into any one action group among predetermined action groups based on the category classification number and the system call parameter of the system call table.

그룹 분류부(115)는 시스템 콜 테이블의 카테고리 분류 번호 및 시스템 콜 파라미터를 기반으로 각 시스템 콜의 유형에 따라 기 설정된 행위 그룹으로 분류할 수 있다. 예시적인 실시예에서, 기 설정된 행위 그룹은 단일 시스템 콜 유형, 동일 카테고리의 연속 시스템 콜 유형, 및 복합 카테고리 시스템 콜 유형 등으로 구분되어 분류될 수 있다. The group classifier 115 may classify the group into a predetermined action group according to the type of each system call based on the category classification number of the system call table and the system call parameter. In an exemplary embodiment, the predetermined action group may be classified into a single system call type, a continuous system call type of the same category, a complex category system call type, and the like.

여기서, 단일 시스템 콜 유형은 프로세스 포크, 파일 생성, 파일 삭제 등과 같이 단일 시스템 콜로 그 행위를 파악할 수 있는 경우를 나타내는 것으로, 도 4는 개시되는 일 실시예에서 단일 시스템 콜 유형에 대해 행위 그룹을 분류하는 상태를 나타낸 도면이다. 도 4에서는 해당 시스템 콜에 따른 행위를 "파일 삭제"라는 행위 그룹으로 분류를 하였다.Here, the single system call type represents a case in which a single system call can be identified as a process fork, file creation, file deletion, and the like. FIG. 4 illustrates a group of behaviors for a single system call type in an exemplary embodiment of the present disclosure. It is a figure which shows the state to make. In FIG. 4, actions according to the corresponding system call are classified into an action group called "file deletion".

또한, 동일 카테고리의 연속 시스템 콜 유형은 파일 읽기, 네트워크 수신 등과 같이 동일한 카테고리에서 시스템 콜이 연속적으로 호출되는 경우를 나타내는 것으로, 도 5는 개시되는 일 실시예에서 동일 카테고리의 연속 시스템 콜 유형에 대해 행위 그룹을 분류하는 상태를 나타낸 도면이다. 도 5에서는 "파일"(카테고리 분류 번호 3)이라는 동일한 카테고리에서 시스템 콜이 연속적으로 나타나는 경우로, 해당 시스템 콜들에 따른 행위를 "파일 읽기"라는 행위 그룹으로 분류하였다.In addition, the continuous system call type of the same category indicates a case where the system call is continuously called in the same category such as file reading, network reception, etc. FIG. 5 illustrates a continuous system call type of the same category in an embodiment of the disclosure. A diagram illustrating a state of classifying an action group. In FIG. 5, system calls appear continuously in the same category of "file" (category classification number 3), and actions according to the corresponding system calls are classified into an action group called "file reading."

또한, 복합 카테고리 시스템 콜 유형은 2개 이상의 카테고리에서 시스템 콜이 복합되어 호출되는 경우를 나타내는 것으로, 도 6은 개시되는 일 실시예에서 복합 카테고리 시스템 콜 유형에 대해 행위 그룹을 분류하는 상태를 나타낸 도면이다. 도 6에서는 "네트워크"(카테고리 분류 번호 2) 및 "파일"(카테고리 분류 번호 2)이라는 2개 이상의 카테고리에서 시스템 콜이 복합되어 호출된 경우로, 해당 시스템 콜들에 따른 행위를 "파일 다운로드"라는 행위 그룹으로 분류하였다. 즉, 해당 시스템 콜은 네트워크 통신을 통해 데이터를 전송 받는 부분(즉, 네트워크 카테고리)과 전송 받은 데이터를 파일에 쓰는 부분(즉, 파일 카테고리)으로 구성됨에 따라 복합 카테고리 시스템 콜 유형이며, 이는 "파일 다운로드"라는 행위 그룹으로 분류될 수 있다.In addition, the compound category system call type indicates a case where system calls are combined and called in two or more categories, and FIG. 6 is a diagram illustrating a state of classifying an action group for a compound category system call type in an embodiment of the present disclosure. to be. In FIG. 6, when a system call is combined in two or more categories called "network" (category classification number 2) and "file" (category classification number 2), an action according to the system calls is called "file download". Classified as an action group. That is, the system call is a complex category system call type, consisting of a part that receives data through network communication (that is, a network category) and a part that writes the received data to a file (that is, a file category). Download "may be classified into an action group.

그룹 분류부(115)는 분류된 행위 그룹에 대해 기 설정된 행위 그룹 번호를 부여할 수 있다. 표 5는 개시되는 일 실시예에 따른 행위 그룹 및 행위 그룹 번호를 나타낸 표이다. The group classification unit 115 may assign a predetermined action group number to the classified action group. Table 5 is a table showing an action group and an action group number according to an embodiment of the disclosure.

행위 그룹Act group 행위 그룹 번호Act group number 파일 생성Generate file 101101 파일 삭제Delete file 102102 ...... ...... 네트워크 접속Network connection 201201 다운로드Download 210210 ...... ...... 프로세스 생성Process creation 301301 ...... ......

컨텍스트 맵 테이블 생성부(117)는 그룹 분류부(115)에 의해 분류된 각 행위 그룹에 대해 컨텍스트 맵 테이블을 생성할 수 있다. 예시적인 실시예에서, 컨텍스트 맵 테이블 생성부(117)는 분류된 각 행위 그룹의 시간 순서에 따라 행위 그룹 번호 및 행위 그룹 파라미터를 포함하는 컨텍스트 맵 테이블을 생성할 수 있다. 여기서, 행위 그룹 파라미터는 해당 행위 그룹의 주요 인자가 무엇인지를 나타내는 정보일 수 있다. 예시적인 실시예에서, 행위 그룹 파라미터는 해당 행위 그룹에 대응하는 시스템 콜의 시스템 콜 파라미터를 기반으로 설정될 수 있다. The context map table generator 117 may generate a context map table for each action group classified by the group classifier 115. In an exemplary embodiment, the context map table generator 117 may generate a context map table including an action group number and an action group parameter according to a time sequence of each classified action group. Here, the action group parameter may be information indicating what the main factor of the action group is. In an exemplary embodiment, the action group parameter may be set based on a system call parameter of a system call corresponding to the action group.

도 7은 개시되는 일 실시예에 따라 분류된 행위 그룹들을 시간 순으로 나타낸 그래프이고, 표 6은 도 7에 도시된 행위 그룹들에 대한 컨텍스트 맵 테이블을 나타낸 표이다. FIG. 7 is a graph illustrating behavior groups classified according to an embodiment of the present disclosure in chronological order, and Table 6 is a table illustrating a context map table for the behavior groups illustrated in FIG. 7.

순서order 행위 그룹 번호Act group number 행위 그룹 파라미터Behavior group parameter 00 301301 프로세스 명(A)Process name (A) 1One 302302 프로세스 명(A)Process name (A) 22 301301 프로세스 명(B)Process name (B) 33 104104 파일 명(A)File name (A) 44 105105 파일 명(B)File name (B) 55 201201 IP 주소(A 네트워크)IP address (A network) 66 210210 파일 명(C)File name (C)

유사도 판단 모듈(106)은 대상 파일의 행위 그룹의 시간 순서에 따른 행위 그룹 번호 및 행위 그룹 파라미터(즉, 대상 파일의 행위 그룹 관련 정보)와 기 저장된 악성 파일의 행위 그룹의 시간 순서에 따른 행위 그룹 번호 및 행위 그룹 파라미터(즉, 악성 파일의 행위 그룹 관련 정보)를 비교하여 그 유사성을 판단할 수 있다. 예시적인 실시예에서, 유사도 판단 모듈(106)은 대상 파일의 컨텍스트 맵 테이블과 기 저장된 악성 파일의 컨텍스트 맵 테이블을 비교하여 그 유사성을 판단할 수 있다. The similarity determination module 106 may include an action group number and an action group parameter according to the time order of the action group of the target file (ie, action group related information of the target file), and an action group according to the time order of the action group of the previously stored malicious file. The similarity can be determined by comparing the number and behavior group parameters (ie, behavior group related information of the malicious file). In an exemplary embodiment, the similarity determination module 106 may compare the context map table of the target file with the context map table of the previously stored malicious file and determine the similarity.

구체적으로, 유사도 판단 모듈(106)은 대상 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호의 시간 순서와 기 저장된 악성 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호의 시간 순서 간의 유사성을 판단하여 제1 유사도를 측정할 수 있다. 예를 들어, 유사도 판단 모듈(106)은 N-gram 알고리즘을 이용하여 제1 유사도를 측정할 수 있으나, 유사도 측정 방식이 이에 한정되는 것은 아니다. In detail, the similarity determination module 106 may measure the first similarity by determining the similarity between the time order of the action group number in the context map table of the target file and the time order of the action group number in the context map table of the previously stored malicious file. Can be. For example, the similarity determining module 106 may measure the first similarity using the N-gram algorithm, but the similarity measuring method is not limited thereto.

또한, 유사도 판단 모듈(106)은 대상 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호 및 행위 그룹 파라미터와 기 저장된 악성 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호 및 행위 그룹 파라미터 간의 유사성을 판단하여 제2 유사도를 측정할 수 있다. 예를 들어, 유사도 판단 모듈(106)은 코사인 유사도 측정 또는 자카드 유사도 측정 방식을 이용하여 제2 유사도를 측정할 수 있으나, 유사도 측정 방식이 이에 한정되는 것은 아니다. In addition, the similarity determination module 106 measures the second similarity by determining similarity between the behavior group number and the behavior group parameter in the context map table of the target file and the behavior group number and the behavior group parameter in the context map table of the pre-stored malicious file. can do. For example, the similarity determination module 106 may measure the second similarity using a cosine similarity measurement or a jacquard similarity measurement method, but the similarity measurement method is not limited thereto.

유사도 판단 모듈(106)은 측정된 제1 유사도에 제1 가중치를 부여하고, 측정된 제2 유사도에 제2 가중치를 부여한 후 이를 합산하여 대상 파일의 행위 그룹 관련 정보와 악성 파일의 행위 그룹 관련 정보 간 총 유사도 점수를 산출할 수 있다. 여기서, 제2 가중치는 제1 가중치 보다 높은 가중치가 부여될 수 있다. 예를 들어, 제1 가중치와 제2 가중치의 합은 1로 설정될 수 있다. 총 유사도 점수는 하기의 수학식 1을 통해 산출할 수 있다.The similarity determination module 106 assigns a first weight to the measured first similarity, adds a second weight to the measured second similarity, and adds the sums to the behavior group related information of the target file and the behavior group related information of the malicious file. The total similarity score of the liver can be calculated. Here, the second weight may be weighted higher than the first weight. For example, the sum of the first weight and the second weight may be set to one. The total similarity score may be calculated through Equation 1 below.

(수학식 1)(Equation 1)

총 유사도 점수 = (제1 유사도 × 제1 가중치) + (제2 유사도 × 제1 가중치)Total similarity score = (first similarity × first weight) + (second similarity × first weight)

유사도 판단 모듈(106)은 대상 파일의 행위 그룹 관련 정보와 악성 파일의 행위 그룹 관련 정보 간 총 유사도 점수에 따라 대상 파일에 악성 코드가 있는지 여부를 판단할 수 있다. The similarity determination module 106 may determine whether there is malicious code in the target file according to the total similarity score between the behavior group related information of the target file and the behavior group related information of the malicious file.

본 명세서에서 모듈이라 함은, 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적, 구조적 결합을 의미할 수 있다. 예건대, 상기 "모듈"은 소정의 코드와 상기 소정의 코드가 수행되기 위한 하드웨어 리소스의 논리적인 단위를 의미할 수 있으며, 반드시 물리적으로 연결된 코드를 의미하거나, 한 종류의 하드웨어를 의미하는 것은 아니다.In the present specification, the module may mean a functional and structural combination of hardware for performing the technical idea of the present invention and software for driving the hardware. For example, the “module” may mean a logical unit of a predetermined code and a hardware resource for performing the predetermined code, and does not necessarily mean a physically connected code or a kind of hardware. .

도 8은 본 발명의 일 실시예에 따른 리눅스 환경의 악성 코드 검출 방법을 나타낸 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.8 is a flowchart illustrating a malicious code detection method of a Linux environment according to an embodiment of the present invention. In the flowchart shown, the method is divided into a plurality of steps, but at least some of the steps may be performed in a reverse order, in combination with other steps, omitted together, divided into detailed steps, or not shown. One or more steps may be added and performed.

도 8을 참조하면, 악성 코드 검출 장치(100)는 대상 파일이 실행되면서 호출되는 각 시스템 콜 및 시스템 콜 파라미터를 포함하는 행위 데이터를 수집한다(S 101). Referring to FIG. 8, the malicious code detecting apparatus 100 collects behavior data including each system call and system call parameters that are called while the target file is executed (S 101).

다음으로, 악성 코드 검출 장치(100)는 수집된 각 시스템 콜을 기 설정된 카테고리 별로 분류하고, 해당 시스템 콜에 상기 분류된 카테고리에 설정된 카테고리 분류 번호를 부여한다(S 103). 여기서, 악성 코드 검출 장치(100)는 수집된 각 시스템 콜에 고유 번호를 부여할 수 있다. Next, the malicious code detection apparatus 100 classifies each collected system call by a predetermined category, and assigns a category classification number set in the classified category to the corresponding system call (S 103). Here, the malicious code detection apparatus 100 may assign a unique number to each collected system call.

다음으로, 악성 코드 검출 장치(100)는 수집된 시스템 콜의 호출 순서에 따라 카테고리 분류 번호 및 시스템 콜 파라미터를 포함하는 시스템 콜 테이블을 생성한다(S 105). 여기서, 시스템 콜 테이블에는 시스템 콜 고유 번호 및 파일 기술자 등이 더 포함될 수 있다. Next, the malicious code detection apparatus 100 generates a system call table including a category classification number and a system call parameter according to the collected call order of system calls (S 105). Here, the system call table may further include a system call unique number and a file descriptor.

다음으로, 악성 코드 검출 장치(100)는 시스템 콜의 카테고리 분류 번호 및 시스템 콜 파라미터를 기반으로 하나 이상의 시스템 콜을 기 설정된 행위 그룹들 중 어느 하나의 행위 그룹으로 분류하고, 분류된 행위 그룹에 설정된 행위 그룹 번호를 부여한다(S 107). Next, the malicious code detection apparatus 100 classifies one or more system calls into any one action group among predetermined action groups based on the category classification number and the system call parameter of the system call, and sets the classified action group. An action group number is assigned (S 107).

다음으로, 악성 코드 검출 장치(100)는 분류된 각 행위 그룹의 시간 순서에 따라 행위 그룹 번호 및 행위 그룹 파라미터를 포함하는 컨텍스트 맵 테이블을 생성한다(S 109). Next, the malicious code detection apparatus 100 generates a context map table including the action group number and the action group parameter according to the time sequence of each classified action group (S 109).

다음으로, 악성 코드 검출 장치(100)는 대상 파일의 컨텍스트 맵 테이블과 기 저장된 악성 파일의 컨텍스트 맵 간의 유사도를 측정한다(S 111). Next, the malicious code detection apparatus 100 measures the similarity between the context map table of the target file and the context map of the previously stored malicious file (S 111).

악성 코드 검출 장치(100)는 대상 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호의 시간 순서와 기 저장된 악성 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호의 시간 순서 간의 유사성을 판단하여 제1 유사도를 측정할 수 있다. 또한, 악성 코드 검출 장치(100)는 대상 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호 및 행위 그룹 파라미터와 기 저장된 악성 파일의 컨텍스트 맵 테이블에서 행위 그룹 번호 및 행위 그룹 파라미터 간의 유사성을 판단하여 제2 유사도를 측정할 수 있다.The malicious code detection apparatus 100 may measure the first similarity by determining the similarity between the time order of the action group number in the context map table of the target file and the time order of the action group number in the prestored context map table of the malicious file. . In addition, the malicious code detection apparatus 100 determines the similarity between the behavior group number and the behavior group parameter in the context map table of the target file and the behavior group number and the behavior group parameter in the prestored context map table of the malicious file to determine the second similarity. It can be measured.

악성 코드 검출 장치(100)는 측정된 제1 유사도에 제1 가중치를 부여하고, 측정된 제2 유사도에 제2 가중치를 부여한 후 이를 합산하여 총 유사도 점수를 산출할 수 있다.The malicious code detection apparatus 100 may calculate a total similarity score by adding a first weight to the measured first similarity, adding a second weight to the measured second similarity, and adding the weights.

다음으로, 악성 코드 검출 장치(100)는 측정된 유사도에 따라 대상 파일에 악성 코드가 있는지 여부를 판단한다(S 113). 예를 들어, 측정된 유사도가 기 설정된 기준 유사도를 초과하는 경우, 악성 코드 검출 장치(100)는 대상 파일에 악성 코드가 있는 것으로 판단할 수 있다.Next, the malicious code detection apparatus 100 determines whether there is malicious code in the target file according to the measured similarity (S 113). For example, when the measured similarity exceeds the preset reference similarity, the malicious code detecting apparatus 100 may determine that there is malicious code in the target file.

도 9는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.9 is a block diagram illustrating and describing a computing environment 10 that includes a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 리눅스 환경의 악성 코드 검출 장치(100)일 수 있다.The illustrated computing environment 10 includes a computing device 12. In one embodiment, computing device 12 may be a malware detection device 100 in a Linux environment.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14, computer readable storage medium 16, and communication bus 18. The processor 14 may cause the computing device 12 to operate according to the example embodiments mentioned above. For example, processor 14 may execute one or more programs stored in computer readable storage medium 16. The one or more programs may include one or more computer executable instructions that, when executed by the processor 14, cause the computing device 12 to perform operations in accordance with an exemplary embodiment. Can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer readable storage medium 16 is configured to store computer executable instructions or program code, program data and / or other suitable forms of information. The program 20 stored in the computer readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, computer readable storage medium 16 may include memory (volatile memory, such as random access memory, nonvolatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash Memory devices, or any other form of storage medium accessible by computing device 12 and capable of storing desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 18 interconnects various other components of the computing device 12, including the processor 14 and the computer readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input / output interfaces 22 and one or more network communication interfaces 26 that provide an interface for one or more input / output devices 24. The input / output interface 22 and the network communication interface 26 are connected to the communication bus 18. The input / output device 24 may be connected to other components of the computing device 12 via the input / output interface 22. Exemplary input / output device 24 may include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touchscreen), a voice or sound input device, various types of sensor devices, and / or imaging devices. Input devices, and / or output devices such as display devices, printers, speakers, and / or network cards. The example input / output device 24 may be included inside the computing device 12 as one component of the computing device 12, and may be connected to the computing device 12 as a separate device from the computing device 12. It may be.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although exemplary embodiments of the present invention have been described in detail above, those skilled in the art will appreciate that various modifications can be made to the above-described embodiments without departing from the scope of the present invention. . Therefore, the scope of the present invention should not be limited to the embodiments described, but should be defined by the claims below and equivalents thereof.

100 : 리눅스 환경의 악성 코드 검출 장치
102 : 행위 데이터 수집 모듈
104 : 행위 데이터 가공 모듈
106 : 유사도 판단 모듈
111 : 전처리부
113 : 시스템 콜 테이블 생성부
115 : 그룹 분류부
117 : 컨텍스트 맵 테이블 생성부100: Linux malware detection device
102: behavior data collection module
104: Behavioral Data Processing Module
106: Similarity Determination Module
111: preprocessing unit
113: system call table generation unit
115: group classification unit
117: context map table generator

Claims

As a device for detecting malicious code in Linux environment,
A behavior data collection module for collecting behavior data including each system call and system call parameters called while the target file is executed;
An action data processing module for classifying the collected one or more system calls into any one action group among preset action groups to generate action group related information of the target file; And
And a similarity determination module for measuring similarity between the behavior group related information of the target file and the previously stored behavior group related information of the malicious file.
The system call is composed of basic function calls for the operating system to use Linux kernel region functions in the Linux environment.
The behavior data processing module,
A pre-processing unit for classifying the collected system calls into predetermined categories according to behavior characteristics of the corresponding system calls, and assigning a predetermined category classification number to the classified categories to the corresponding system calls;
A system call table generation unit generating a system call table including the category classification number and the system call parameter according to a call order of the collected system calls;
Based on the category classification number and system call parameter of the system call table, the collected one or more system calls are classified into any one action group among predetermined action groups, and a predetermined action group number for the classified action group is assigned. A group classification unit to be given; And
A context map table generator for generating a context map table including an action group number and an action group parameter according to the time sequence of each classified action group,
The action group parameter is information indicating what the main factor of the action group is, and is set based on a system call parameter of a system call corresponding to the action group.
The similarity determination module,
The first similarity is measured by determining the similarity between the time sequence of the action group number in the context map table of the target file and the time order of the action group number in the context map table of the previously stored malicious file, and measuring the first similarity in the context map table of the target file. Determining a similarity between the behavior group number and the behavior group parameter in the context group table of the behavior group number and the behavior group parameter and the previously stored malicious file, measuring a second similarity, giving a first weight to the first similarity, Giving a second weight higher than the first weight to 2 similarities and adding the same to calculate a total similarity score between the context map table of the target file and the previously stored malicious map table;
The group classification unit classifies the one or more system calls into one of the predetermined action groups according to the type of system call,
The type of system call includes a single system call type, a continuous system call type of the same category, and a complex category system call type,
The single system call type indicates a case in which a single system call can grasp its behavior, and the continuous system call type of the same category indicates when system calls are continuously called in the same category, and the complex category system call type is 2 Apparatus for detecting malware in a Linux environment, which represents a case where a system call is called in a combination of two or more categories.

delete

One or more processors, and
A method performed in a computing device having a memory that stores one or more programs executed by the one or more processors, the method comprising:
Collecting each system call and system call parameter that is called while the target file is executed;
Generating the action group related information of the target file by classifying the collected one or more system calls into any one action group among preset action groups; And
Measuring the similarity between the behavior group related information of the target file and the behavior group related information of a pre-stored malicious file;
The system call is composed of basic function calls for the operating system to use Linux kernel area functions in a Linux environment.
The generating of the action group related information may include:
Classifying each of the collected system calls by a predetermined category according to the behavior characteristic of the corresponding system call, and assigning a predetermined category classification number to the classified category to the corresponding system call;
Generating a system call table including the category classification number and the system call parameter according to the calling order of the collected system calls;
Based on the category classification number and system call parameter of the system call table, the collected one or more system calls are classified into any one action group among preset action groups, and the action group number preset for the classified action group is assigned. Giving action; And
Generating a context map table including an action group number and an action group parameter according to the time sequence of each classified action group,
The action group parameter is information indicating what the main factor of the action group is, and is set based on a system call parameter of a system call corresponding to the action group.
Measuring the similarity,
The first similarity is measured by determining the similarity between the time sequence of the action group number in the context map table of the target file and the time order of the action group number in the context map table of the prestored malicious file, and measuring the first similarity in the context map table of the target file. Determining a similarity between the behavior group number and the behavior group parameter in the context group table of the behavior group number and the behavior group parameter and the previously stored malicious file, measuring a second similarity, giving a first weight to the first similarity, Giving a second weight higher than the first weight to 2 similarities and adding the same to calculate a total similarity score between the context map table of the target file and the previously stored malicious map table;
The classifying into the action group may include classifying the at least one system call into any one action group among predetermined action groups according to a type of a system call,
The type of system call includes a single system call type, a continuous system call type of the same category, and a complex category system call type,
The single system call type indicates a case in which a single system call can grasp its behavior, and the continuous system call type of the same category indicates when system calls are continuously called in the same category, and the complex category system call type is 2 A method of detecting malicious code in a Linux environment, which indicates a case where a system call is called in a combination of two or more categories.

A computer program stored in a non-transitory computer readable storage medium,
The computer program is a computer program for detecting malicious code in a Linux environment,
The computer program includes one or more instructions that, when executed by a computing device having one or more processors, cause the computing device to:
Have the target file run to collect each system call and system call parameter that is called,
Classify the collected one or more system calls into any one action group among preset action groups to generate action group related information of the target file; and
Measure the similarity between the behavior group-related information of the target file and the behavior group-related information of a pre-stored malicious file;
The system call is composed of basic function calls for the operating system to use Linux kernel region functions in the Linux environment.
Instructions for generating the action group related information,
Classify each of the collected system calls by a predetermined category according to a behavior characteristic of a corresponding system call, and assign a predetermined category classification number to the classified category to the corresponding system call,
Generate a system call table including the category classification number and the system call parameter according to the calling order of the collected system calls;
Based on the category classification number and system call parameter of the system call table, the collected one or more system calls are classified into any one action group among predetermined action groups, and a predetermined action group number for the classified action group is assigned. To grant,
Generate a context map table including an action group number and an action group parameter according to the time sequence of each classified action group,
The action group parameter is information indicating what the main factor of the action group is, and is set based on a system call parameter of a system call corresponding to the action group.
Instructions for measuring the similarity,
Determine a similarity between the time order of the action group number in the context map table of the target file and the time order of the action group number in the context map table of the pre-stored malicious file to measure a first similarity, and the context map table of the target file Determine a similarity between the behavior group number and the behavior group parameter in the context map table of the previously stored malicious file and the behavior group parameter and the previously stored malicious file, measure a second similarity, give a first weight to the first similarity, The second similarity is given a second weight higher than the first weight and summed to calculate the total similarity score between the context map table of the target file and the stored context map table of the malicious file.
The instructions for classifying the action group may include: classifying the one or more system calls into any one action group among predetermined action groups according to a type of a system call,
The type of system call includes a single system call type, a continuous system call type of the same category, and a complex category system call type,
The single system call type indicates a case in which a single system call can grasp its behavior, and the continuous system call type of the same category indicates when system calls are continuously called in the same category, and the complex category system call type is 2 A computer program representing when a system call is called in combination in more than one category.