KR102495329B1

KR102495329B1 - Malware detection system using lstm method to provide a service vaccine platform with high detction rate

Info

Publication number: KR102495329B1
Application number: KR1020220099895A
Authority: KR
Inventors: 김재춘
Original assignee: (주)케이엔비씨
Priority date: 2022-08-10
Filing date: 2022-08-10
Publication date: 2023-02-06

Abstract

The present invention relates to a malware detection system using a long short-term memory (LSTM) method to provide a service anti-virus platform with a high detection rate. The malware detection system comprises: a data input module that receives an analysis target file; a neural network module including an LSTM neural network; a neural network learning unit that trains the LSTM neural network using characteristics of input malware; and a malware diagnosis unit that inputs data from the analysis target file into the LSTM neural network to diagnose whether the same contains malware. The malware diagnosis unit includes a static diagnosis unit that diagnoses the data of the analysis target file through static analysis, and a dynamic diagnosis unit that executes the analysis target file and diagnoses the same through dynamic analysis. The static diagnosis unit diagnoses whether malware is included through the LSTM neural network.

Description

Malware detection system using LSTM method to provide service vaccine platform with high detection rate

본 발명은 악성코드 탐지 시스템에 관한 것으로, 보다 상세히는 장단기 메모리(Long Short-Term Memory, LSTM) 신경망을 적용하여 멀웨어(Malicious software, Malware) 탐지 성능이 향상된 악성코드 탐지 시스템에 관한 것이다.The present invention relates to a malicious code detection system, and more particularly, to a malicious code detection system with improved malware detection performance by applying a long short-term memory (LSTM) neural network.

멀웨어는 정상적인 기능을 위해 만들어진 것이 아닌, 사용자에게 해악을 끼치는 악의적 목적으로 만들어진 악성프로그램을 의미한다. 여기서 멀웨어는 형태나 목적에 따라 스파이웨어, 애드웨어, 랜섬웨어 등으로 분류될 수 있으며, 사용자단말에는 안티멀웨어가 구현되어 상기 멀웨어를 실시간으로 탐지하는 기능이 제공되고 있다. 여기서 상기 안티멀웨어는 흔히 백신이라고 많이 불리며, 기존의 백신은 사전에 수집된 식별데이터인 시그니처를 이용하여 멀웨어를 탐지하는 것이 일반적이다. 상술한 시그니처 방식의 백신은 시그니처와 멀웨어에 대한 패턴 매칭을 통해 탐지됨에 따라, 최신 멀웨어를 탐지하기 위해서는 지속 업데이트가 필수적이며, 멀웨어가 위변조되는 경우에는 탐지가 어려운 한계점이 있었다.Malware refers to malicious programs created for malicious purposes that harm users, not for normal functions. Here, malware can be classified into spyware, adware, ransomware, etc. according to its form or purpose, and anti-malware is implemented in a user terminal to provide a function of detecting the malware in real time. Here, the anti-malware is often called a vaccine, and conventional vaccines generally detect malware using signatures, which are identification data collected in advance. As the above-described signature-type vaccine is detected through pattern matching of signatures and malware, continuous updates are essential to detect the latest malware, and detection is difficult when malware is forged or altered.

이에 따라 한국등록특허공보 제10-2174475호(이하, '종래기술'이라 함.)에서는 위와 같은 한계점을 극복하기 위해 머신러닝을 이용하여 애플리케이션의 난독화 또는 패킹 여부를 식별하는 시스템과, 그것을 포함하는 은폐된 멀웨어 탐지 분류 시스템 및 방법이 개시되어 있다. 위 종래기술에서는 도 1에서 도시된 바와 같이, 은폐 검사부, 데이터 추출부, 데이터 변환부 및 멀웨어 진단부를 포함하되, 상기 멀웨어 진단부가 CNN(Convolutional Neural Network) 등의 정적진단부와 RNN(Recurrent Neural Network) 등의 동적진단부를 포함하여 악성코드를 탐지 능력이 향상되는 것을 특징으로 한다.Accordingly, Korean Registered Patent Publication No. 10-2174475 (hereinafter referred to as 'prior art') discloses a system for identifying whether an application is obfuscated or packed using machine learning to overcome the above limitations, including it. A concealed malware detection classification system and method are disclosed. As shown in FIG. 1, the prior art includes a concealment inspection unit, a data extraction unit, a data conversion unit, and a malware diagnosis unit, wherein the malware diagnosis unit includes a static diagnosis unit such as a convolutional neural network (CNN) and a recurrent neural network (RNN) ), etc., to improve the ability to detect malicious codes.

위 선행문헌과 같이 시그니처 방식에만 국한된 기술을 탈피하여 인공지능 방식을 적용한 경우에는 보다 월등한 탐지 성능을 가지면서도 위변조된 멀웨어 대한 탐지 성능 또한 향상되는 장점이 있다. 하지만, 멀웨어 진단에 있어서 기존 RNN 신경망을 사용하는 경우에 관련 정보와 그 정보를 사용하는 지점 사이 거리가 멀 경우 역전파시 그래디언트가 점차 줄어 학습능력이 크게 저하되는 기울기 소실 문제(Vanishing gradient problem)가 발생되는 단점이 있음에 따라 장기간 사용 시에 데이터가 소실되어 성능이 저하가 되는 문제로 이어질 수 있다. As in the preceding literature, when the artificial intelligence method is applied by breaking away from the technology limited to the signature method, there is an advantage in that the detection performance for forged malware is also improved while having superior detection performance. However, in the case of using an existing RNN neural network in malware diagnosis, if the distance between the relevant information and the point where the information is used is long, the gradient gradually decreases during backpropagation, resulting in a vanishing gradient problem in which the learning ability is greatly reduced. According to the disadvantage of being used for a long time, data is lost during long-term use, which can lead to a problem of performance deterioration.

KRKR 10-2174475 10-2174475 B1B1 (2020.11.04.)(2020.11.04.)

본 발명은 종래 기술의 문제점을 해결하기 위해 안출된 것으로, 본 발명은 멀웨어 탐지 성능이 향상되도록 LSTM 신경망을 이용한 인공지능 분석이 적용된 악성코드 탐지 시스템에 관한 것이다.The present invention was made to solve the problems of the prior art, and the present invention relates to a malware detection system to which artificial intelligence analysis using an LSTM neural network is applied to improve malware detection performance.

상기한 바와 같은 목적을 달성하기 위한 본 발명에 따른 분석대상 파일을 입력받는 데이터 입력모듈; LSTM(Long Short-Term Memory) 신경망을 포함하는 신경망 모듈; 입력된 멀웨어의 특성을 이용하여 상기 LSTM 신경망을 학습시키는 신경망 학습부; 및 상기 분석대상 파일의 데이터를 상기 LSTM 신경망에 입력시켜 멀웨어 포함 여부를 진단하는 멀웨어 진단부;를 포함하고, 상기 멀웨어 진단부는, 상기 분석대상 파일의 데이터를 대상으로 정적 분석으로 진단하는 정적진단부 및 상기 분석대상 파일을 실행하여 동적 분석으로 진단하는 동적진단부를 포함하고, 상기 정적진단부에서 상기 LSTM 신경망을 통해 멀웨어 포함 여부를 진단할 수 있다.A data input module for receiving an analysis target file according to the present invention to achieve the above object; A neural network module including a Long Short-Term Memory (LSTM) neural network; a neural network learning unit for learning the LSTM neural network using the input characteristics of the malware; and a malware diagnosis unit inputting the data of the analysis target file into the LSTM neural network to diagnose whether malware is included, wherein the malware diagnosis unit diagnoses the data of the analysis target file through static analysis. and a dynamic diagnosis unit that executes the file to be analyzed and diagnoses it through dynamic analysis, and the static diagnosis unit can diagnose whether or not malware is included through the LSTM neural network.

또한, 상기 동적진단부는, 복수의 가상 머신(Virtual Machine, VM) 환경에서 멀웨어 탐지를 위해 동적 분석으로 상기 분석대상 파일을 진단하는 가상 머신 분석부를 포함할 수 있다.In addition, the dynamic diagnosis unit may include a virtual machine analyzer for diagnosing the analysis target file through dynamic analysis to detect malware in a plurality of virtual machine (VM) environments.

또한, 상기 가상 머신 분석부는, 복수의 가상 머신 중 일부의 가상 머신이 서로 다른 운영체제(OS: Operating System)를 구동하고, 각 운영체제 상에서 분석대상 파일을 실행하여 정보를 수집할 수 있다.In addition, the virtual machine analyzer may collect information by allowing some of the plurality of virtual machines to run different operating systems (OS) and to execute analysis target files on each operating system.

또한, 상기 동적진단부는, 적어도 하나의 리얼 머신(Real Machine) 환경에서 멀웨어 탐지를 위해 동적분석으로 상기 분석대상 파일을 진단하는 리얼 머신 분석부를 더 포함하되, 상기 가상 머신 분석부는, 분석대상 파일에 우회로그가 존재하는지 탐지하되, 상기 리얼 머신 분석부는, 상기 가상 머신 분석부로부터 가상 머신 분석부를 우회하기 위한 우회로그가 포함된 분석대상 파일을 전송 받을 수 있다.In addition, the dynamic diagnosis unit further includes a real machine analysis unit for diagnosing the analysis target file by dynamic analysis to detect malware in at least one real machine environment, wherein the virtual machine analysis unit determines the analysis target file While detecting whether a detour log exists, the real machine analyzer may receive an analysis target file including a detour log for bypassing the virtual machine analyzer from the virtual machine analyzer.

또한, 상기 신경망 학습부는, 정상파일과 멀웨어를 포함하는 악성파일의 정보를 수집하고 상기 정상파일과 악성파일의 특성을 각각 분석하여 상기 LSTM 신경망을 학습시키되, 상기 정상파일 및 악성파일의 특성은 헤더정보, 사이즈 및 패킹정보, Import API, Export API, 파일종류 및 크기 중 적어도 하나 이상을 포함할 수 있다.In addition, the neural network learning unit collects information on normal files and malicious files including malware, analyzes the characteristics of the normal files and malicious files, and learns the LSTM neural network. It may include at least one of information, size and packing information, Import API, Export API, file type and size.

아울러 본 발명에 따른 안티멀웨어 클라우드 서버는 상술한 악성코드 탐지 시스템; 사용자단말로부터 분석대상 파일을 네트워크로 수신받는 통신부; 및 상기 사용자단말에 설치되어 상기 서버로부터 LSTM 신경망의 정보를 제공받는 애플리케이션;을 포함하고, 사용자단말로부터 분석대상 파일을 수신받으면 상기 분석대상 파일의 멀웨어 포함여부를 진단하되, 진단 결과를 학습한 상기 LSTM 신경망을 상기 사용자단말에 제공할 수 있다.In addition, the anti-malware cloud server according to the present invention includes the above-described malicious code detection system; A communication unit for receiving analysis target files from a user terminal through a network; And an application installed in the user terminal and receiving information of the LSTM neural network from the server; and upon receiving the analysis target file from the user terminal, diagnosing whether the analysis target file contains malware, and learning the diagnosis result An LSTM neural network may be provided to the user terminal.

아울러 본 발명에 따른 안티멀웨어 클라우드 서버는 상술한 악성코드 탐지 시스템; 사용자단말로부터 분석대상 파일을 네트워크로 수신받는 통신부; 수신된 분석대상 파일을 분석 전까지 임시 저장하는 메모리부; 상기 정적진단부 및 동적진단부에서 각각 진단한 분석결과를 저장하는 데이터베이스; 및 상기 데이터베이스에 저장된 분석결과에 대해 후가공하는 데이터 보정모듈;을 포함하고, 상기 데이터 보정모듈은, 정적 분석 및 동적 분석이 끝난 데이터에 대해서 멀웨어 종류에 따른 위험도를 관련 데이터로 저장하도록 후가공할 수 있다.In addition, the anti-malware cloud server according to the present invention includes the above-described malicious code detection system; A communication unit for receiving analysis target files from a user terminal through a network; a memory unit for temporarily storing the received analysis target file until analysis; a database storing analysis results respectively diagnosed by the static diagnosis unit and the dynamic diagnosis unit; and a data correction module for post-processing the analysis results stored in the database, wherein the data correction module can post-process data after static analysis and dynamic analysis to store the risk according to the type of malware as related data. .

또한, 상기 데이터베이스는 복수의 기억장치를 포함하되, 복수의 상기 기억장치는, 복수의 그룹데이터에 대해서 각 그룹별로 데이터셋이 구축된 그룹DB; 복수의 사용자데이터에 대해서 각 사용자별로 데이터셋이 구축된 개인DB; 및 각 사용자가 속하는 그룹에 대한 정보가 저장되는 관계DB;를 포함할 수 있다.In addition, the database includes a plurality of storage devices, and the plurality of storage devices include: a group DB in which data sets are constructed for each group for a plurality of group data; A personal DB in which data sets are constructed for each user for a plurality of user data; and a relationship DB in which information on groups to which each user belongs is stored.

또한, 상기 그룹DB는 기관, 기업, 정부지차체 및 학교 중 적어도 하나를 포함하는 복수의 그룹이 사내 인트라넷에 의해서 구분되고, 상기 관계DB에는 그룹들 간의 연관관계 데이터가 저장될 수 있다.In addition, in the group DB, a plurality of groups including at least one of institutions, corporations, local governments, and schools are divided by an in-house intranet, and relation data between groups may be stored in the relation DB.

또한, 상기 메모리부는 소정의 메모리를 가지되, 본 발명에 따른 안티멀웨어 클라우드 서버는 상기 메모리부의 데이터 최대량에 대비하여 사용률이 N₁% 보다 큰 경우, 상기 메모리부에 임시 저장된 파일을 아래의 단계를 통해 처리할 수 있다.In addition, the memory unit has a predetermined memory, and when the anti-malware cloud server according to the present invention has a usage rate greater than N ₁ % compared to the maximum amount of data in the memory unit, the files temporarily stored in the memory unit are stored in the memory unit through the following steps. can be processed through

a) 파일을 업로드한 사용자와 속하는 그룹이 있는지 유무를 판별하는 단계;a) determining whether there is a group belonging to the user who uploaded the file;

b) 상기 a단계에서 매칭된 그룹이 있는 경우, 해당 그룹에서 기 탐지된 멀웨어가 있는지 여부를 검색하는 단계;b) if there is a group matched in step a, searching for previously detected malware in the corresponding group;

c) 해당 그룹에서 기 탐지된 멀웨어가 있는 경우, 기 탐지된 멀웨어만 특정하여 파일을 정적 진단하는 단계;c) statically diagnosing files by specifying only previously detected malware when there is already detected malware in the corresponding group;

(여기서, 100 > N₁ > 0)(Where 100 > N ₁ > 0)

또한, 본 발명에 따른 안티멀웨어 클라우드 서버는 상기 메모리부의 데이터 최대량에 대비하여 사용률이 N₂% 보다 크고 상기 N₁% 보다 작은 경우, 상기 메모리부에 임시 저장된 파일을 아래의 단계를 통해 처리할 수 있다.In addition, the anti-malware cloud server according to the present invention may process files temporarily stored in the memory unit through the following steps when the usage rate is greater than N ₂ % and smaller than N ₁ % in preparation for the maximum amount of data in the memory unit. there is.

d) 해당 그룹에서 기 탐지된 멀웨어가 없는 경우, 해당 그룹과 연관된 적어도 하나의 연관그룹을 관계DB에서 검색하는 단계; 및d) if there is no previously detected malware in the corresponding group, searching for at least one association group related to the corresponding group in a relational DB; and

e) 검색된 연관그룹에서 기 탐지된 멀웨어가 있는 경우, 기 탐지된 멀웨어만 특정하여 파일을 정적진단하는 단계;e) statically diagnosing files by specifying only previously detected malware when there is previously detected malware in the searched association group;

(여기서, 100 > N₁ > N₂ > 0)(Where 100 > N ₁ > N ₂ > 0)

또한, 상기 N₂는 30 내지 50 사이, 상기 N₁은 60 내지 80 사이로 형성될 수 있다.In addition, the N ₂ may be between 30 and 50, and the N ₁ may be formed between 60 and 80.

상술한 구성에 의한 본 발명에 따른 악성코드 탐지 시스템은, 인공 신경망을 수집서버를 통해 최신 멀웨어 정보를 기반으로 학습시키면서 위변조된 파일을 선별할 수 있어 신속하면서도 정확한 탐지 기능을 제공할 수 있는 장점이 있다. The malicious code detection system according to the present invention according to the above-described configuration has the advantage of providing a fast and accurate detection function by allowing an artificial neural network to learn based on the latest malware information through a collection server and to select forged or altered files. there is.

아울러 본 발명에 따른 악성코드 탐지 시스템은, 사용자 요청에 따라 정적분석 및 동적분석을 선별적으로 활용할 수 있으면서, LSTM을 통해 정적분석의 정확도를 향상시키고 가상머신 및 리얼머신을 포함하여 동적분석의 정확도를 향상시키는 장점이 있다. 이에 시스템 부하를 최소화하는 장점으로 이어질 수 있다.In addition, the malicious code detection system according to the present invention can selectively utilize static analysis and dynamic analysis according to the user's request, improve the accuracy of static analysis through LSTM, and improve the accuracy of dynamic analysis including virtual and real machines. has the advantage of improving This may lead to an advantage of minimizing system load.

도 1은 종래기술에 따른 은폐된 멀웨어 탐지 시스템의 구성도.
도 2는 본 발명에 따른 사용자단말과 통신하는 안티멀웨어 클라우드 서버를 도시한 도면.
도 3은 본 발명에 따른 안티멀웨어 클라우드 서버의 구성도.
도 4는 본 발명에 따른 악성코드 탐지 시스템의 구성도.
도 5는 본 발명에 따른 진단부의 구성도.
도 6은 본 발명에 따른 정적진단부에서 사용되는 AI 특성을 나타낸 도면.
도 7은 본 발명에 따른 정적진단부에서 사용되는 라이브러리를 나타낸 도면.
도 8은 본 발명에 따른 동적진단부의 구성도.
도 9는 본 발명에 따른 신경망 학습과정을 도시한 도면.
도 10 및 도 11은 본 발명에 따른 신경망 모델의 예시도.
도 12는 본 발명에 따른 학습데이터의 분포도.
도 13은 CNN(Convolutional Neural Network) 신경망 모델에 따른 혼동 행렬을 도시한 도면.
도 14는 RNN(Recurrent Neural Network) 신경망 모델에 따른 혼동 행렬을 도시한 도면.
도 15는 LSTM(Long Short-Term Memory) 신경망 모델에 따른 혼동 행렬을 도시한 도면.
도 16은 CNN-LSTM 신경망 모델에 따른 혼동 행렬을 도시한 도면.
도 17은 본 발명에 따른 멀웨어 분석방법을 도시한 플로차트.
도 18 내지 도 20은 사용자단말에 출력되는 본 발명의 인터페이스를 도시한 도면.
도 21은 다수의 기억장치를 포함하는 데이터베이스를 도시한 도면.1 is a block diagram of a concealed malware detection system according to the prior art.
2 is a diagram illustrating an anti-malware cloud server communicating with a user terminal according to the present invention.
3 is a configuration diagram of an anti-malware cloud server according to the present invention.
4 is a configuration diagram of a malicious code detection system according to the present invention.
5 is a configuration diagram of a diagnosis unit according to the present invention.
6 is a diagram showing AI characteristics used in a static diagnosis unit according to the present invention.
7 is a diagram showing a library used in a static diagnosis unit according to the present invention.
8 is a configuration diagram of a dynamic diagnosis unit according to the present invention.
9 is a diagram showing a neural network learning process according to the present invention.
10 and 11 are exemplary diagrams of a neural network model according to the present invention.
12 is a distribution diagram of learning data according to the present invention.
13 is a diagram illustrating a confusion matrix according to a Convolutional Neural Network (CNN) neural network model.
14 is a diagram illustrating a confusion matrix according to a Recurrent Neural Network (RNN) neural network model.
15 is a diagram showing a confusion matrix according to a long short-term memory (LSTM) neural network model.
16 is a diagram showing a confusion matrix according to the CNN-LSTM neural network model.
17 is a flowchart showing a malware analysis method according to the present invention.
18 to 20 are diagrams showing the interface of the present invention output to the user terminal.
Fig. 21 illustrates a database comprising multiple storage devices.

이하 첨부한 도면들을 참조하여 본 발명에 따른 악성코드 탐지 시스템을 상세히 설명한다. 다음에 소개되는 도면들은 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. 따라서 본 발명은 이하 제시되는 도면들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 또한 명세서 전반에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, a malicious code detection system according to the present invention will be described in detail with reference to the accompanying drawings. The drawings introduced below are provided as examples to sufficiently convey the spirit of the present invention to those skilled in the art. Therefore, the present invention may be embodied in other forms without being limited to the drawings presented below. Also, like reference numbers indicate like elements throughout the specification.

이때 사용되는 기술 용어 및 과학 용어에 있어서 다른 정의가 없다면, 이 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 통상적으로 이해하고 있는 의미를 가지며, 하기의 설명 및 첨부 도면에서 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 설명은 생략한다.Unless otherwise defined, the technical terms and scientific terms used at this time have meanings commonly understood by those of ordinary skill in the art to which this invention belongs, and the gist of the present invention is unnecessary in the following description and accompanying drawings. Descriptions of well-known functions and configurations that may be obscure are omitted.

또한, 본 명세서에서 설명된 구현은, 예컨대, 방법 또는 프로세스, 장치, 소프트웨어 프로그램, 데이터 스트림 또는 신호로 구현될 수 있다. 단일 형태의 구현의 맥락에서만 논의(예컨대, 방법으로서만 논의)되었더라도, 논의된 특징의 구현은 또한 다른 형태(예컨대, 장치 또는 프로그램)로도 구현될 수 있다. 장치는 적절한 하드웨어, 소프트웨어 및 펌웨어 등으로 구현될 수 있다. 방법은, 예컨대, 컴퓨터, 마이크로프로세서, 집적 회로 또는 프로그래밍 가능한 로직 디바이스 등을 포함하는 프로세싱 디바이스를 일반적으로 지칭하는 프로세서 등과 같은 장치에서 구현될 수 있다. 프로세서는 또한 최종-사용자 사이에 정보의 통신을 용이하게 하는 컴퓨터, 셀 폰, 휴대용/개인용 정보 단말기(personal digital assistant: "PDA") 및 다른 디바이스 등과 같은 통신 디바이스를 포함한다.Further, implementations described herein may be implemented in, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Even if discussed only in the context of a single form of implementation (eg, discussed only as a method), the implementation of features discussed may also be implemented in other forms (eg, an apparatus or program). The device may be implemented in suitable hardware, software and firmware. The method may be implemented in an apparatus such as a processor, which is generally referred to as a processing device including, for example, a computer, microprocessor, integrated circuit, programmable logic device, or the like. Processors also include communication devices such as computers, cell phones, personal digital assistants ("PDAs") and other devices that facilitate communication of information between end-users.

또한, 본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. In addition, terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

도 2 및 도 3은 본 발명에 따른 안티멀웨어 클라우드 서버에 관한 것으로, 도 2는 사용자단말과 통신하는 안티멀웨어 클라우드 서버를 도시한 도면을, 도 3은 안티멀웨어 클라우드 서버의 구성도를 각각 나타낸다.2 and 3 relate to an anti-malware cloud server according to the present invention. FIG. 2 shows a diagram showing an anti-malware cloud server communicating with a user terminal, and FIG. 3 shows a configuration diagram of the anti-malware cloud server, respectively.

도 2를 참조하면, 본 발명에 따른 안티멀웨어 클라우드 서버(10)는 사용자단말(20)과 네트워크(Network)를 통해 접속될 수 있으며, 상기 사용자단말(20)은 개인 사용자가 지참하는 개인 사용자단말(21) 또는 기관에서 접속하는 기관 사용자단말(22) 등을 포함할 수 있다. 여기서 상기 개인 사용자단말(21)은 국내외 개인 사용자가 접속하는 형태일 수 있으며, 상기 기관 사용자단말(22)은 중소기업, 중견기업 및 대기업 등의 사기업이나, 금융기관 또는 공공기관 등에서 접속하는 형태일 수 있다. 이때 본 발명에 따른 안티멀웨어 클라우드 서버(10)는 상기 사용자단말(20)으로부터 인공지능을 통한 멀웨어 분석요청이 수신되면 탑재된 인공지능 신경망을 통해 수신된 파일을 분석하여 멀웨어 포함 여부를 진단할 수 있다.Referring to FIG. 2, the anti-malware cloud server 10 according to the present invention can be connected to a user terminal 20 through a network, and the user terminal 20 is a personal user terminal brought by an individual user. (21) or an institutional user terminal 22 accessed by the institution. Here, the personal user terminal 21 may be in the form of access by domestic and foreign individual users, and the institutional user terminal 22 may be in the form of access by private companies such as small and medium-sized enterprises, medium-sized companies, and large companies, financial institutions, or public institutions. there is. At this time, when the anti-malware cloud server 10 according to the present invention receives a malware analysis request through artificial intelligence from the user terminal 20, it can diagnose whether or not malware is included by analyzing the received file through the built-in artificial intelligence neural network. there is.

상기 네트워크는 본 발명에 따른 안티멀웨어 클라우드 서버(10)와 상기 사용자단말(20)의 노드가 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 상기 네트워크의 예로 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다. The network means a connection structure in which the nodes of the anti-malware cloud server 10 and the user terminal 20 according to the present invention can exchange information with each other. Examples of the network include a 3rd Generation Partnership Project (3GPP) network, LTE (Long Term Evolution) network, WIMAX (World Interoperability for Microwave Access) network, Internet (Internet), LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network) Network), a Bluetooth network, a satellite broadcasting network, an analog broadcasting network, a Digital Multimedia Broadcasting (DMB) network, and the like, but are not limited thereto.

상기 사용자단말(20)은 디지털방송용 단말기, 휴대폰, PDA(personal digital assistants), PMP(portable multimedia player), 네비게이션, 태블릿 PC(tablet PC), 웨어러블 디바이스(wearable device) 및 스마트 글라스(smart glass) 등 다양한 형태의 이동형 단말기로 구현될 수 있다. 또는 상기 사용자단말(20)은, 고정형 단말기인 데스크 탑 PC, 노트북 컴퓨터(laptop computer), 울트라북(ultrabook)과 같은 퍼스널 컴퓨터 등과 같이 구현되어 본 발명에 따른 안티멀웨어 클라우드 서버(10)와 접속될 수도 있다. The user terminal 20 includes a digital broadcasting terminal, mobile phone, PDA (personal digital assistants), PMP (portable multimedia player), navigation, tablet PC, wearable device, smart glass, etc. It can be implemented in various types of mobile terminals. Alternatively, the user terminal 20 is implemented as a desktop PC, a laptop computer, or a personal computer such as an ultrabook, which is a fixed terminal, and is connected to the anti-malware cloud server 10 according to the present invention. may be

도 3을 참조하면, 본 발명에 따른 안티멀웨어 클라우드 서버(10)는 수신된 분석대상 파일에 멀웨어가 포함되어 있는지 여부를 분석하는 악성코드 탐지 시스템(100), 상기 사용자단말(20)과 네트워크로 연결되어 데이터 통신하는 통신부(200), 상기 악성코드 탐지 시스템(100)과 연계되어 데이터를 기록하는 데이터베이스(300)를 포함할 수 있다. 그리고 본 발명에 따른 안티멀웨어 클라우드 서버(10)는 상기 통신부(200)에서 수신된 분석대상 파일을 분석 전까지 임시 저장하는 메모리부(400)와, 상기 데이터베이스(300)에 저장된 분석결과에 대해 후가공하는 데이터 보정모듈(500)을 더 포함할 수 있다. 상기 데이터베이스(300)는 복수의 기억장치로 구성되거나, 메타데이터로 구성된 메타 데이터베이스(Meta Database) 또는 그래프 데이터베이스(Graph Database, GDB)를 포함하여 상호연결성이 높은 데이터 세트를 보다 손쉽게 탐색하도록 구성될 수 있다. 아울러 상기 데이터 보정모듈(500)은 정적 분석 및 동적 분석이 끝난 데이터에 대해서 멀웨어 종류에 따른 위험도를 관련 데이터로 저장하도록 후가공할 수 있다.Referring to FIG. 3, the anti-malware cloud server 10 according to the present invention is networked with the malicious code detection system 100 and the user terminal 20 for analyzing whether or not malware is included in the received analysis target file. It may include a communication unit 200 that is connected and communicates data, and a database 300 that records data in association with the malicious code detection system 100. In addition, the anti-malware cloud server 10 according to the present invention includes a memory unit 400 for temporarily storing the analysis target file received from the communication unit 200 until analysis, and post-processing the analysis result stored in the database 300 A data correction module 500 may be further included. The database 300 may be configured to more easily search for highly interconnected data sets, including a meta database or a graph database (GDB) composed of a plurality of storage devices or metadata. there is. In addition, the data correction module 500 may post-process data after static analysis and dynamic analysis to store the risk according to the type of malware as related data.

본 발명에 따른 안티멀웨어 클라우드 서버(10)는 사용자단말(20)에 설치되어 입출력 인터페이스를 제공하는 애플리케이션(600)을 더 포함할 수 있다. 상기 악성코드 탐지 시스템(100)은 상기 사용자단말(20)로부터 분석대상 파일이 수신되면 분석대상 파일에 멀웨어가 포함된지 여부를 확인할 수 있으며, 상기 사용자단말(20)의 애플리케이션(600)의 인터페이스에서 상기 분석대상 파일이 업로드되어 상기 안티멀웨어 클라우드 서버(10)에 전송될 수 있다. 그리고 상기 악성코드 탐지 시스템(100)은 인공 신경망을 포함하여 AI 분석을 통해 상기 분석대상 파일의 멀웨어 포함 여부를 판별할 수 있다. The anti-malware cloud server 10 according to the present invention may further include an application 600 installed in the user terminal 20 to provide an input/output interface. When the file to be analyzed is received from the user terminal 20, the malicious code detection system 100 can check whether the file to be analyzed contains malware, and in the interface of the application 600 of the user terminal 20 The file to be analyzed may be uploaded and transmitted to the anti-malware cloud server 10 . In addition, the malicious code detection system 100 may determine whether the file to be analyzed includes malware through AI analysis including an artificial neural network.

상기 메모리부(400)는 다수의 분석대상 파일이 수신된 경우에 임시로 보관하여 분석대상 파일을 순차적으로 상기 악성코드 탐지 시스템(100)에 전송할 수 있다. 그리고 상기 데이터베이스(300)는 상기 사용자단말(20)의 식별정보나 히스토리를 기록하거나, 상기 인공 신경망 및 멀웨어 정보에 대한 DB를 최신정보로 업데이트할 수 있다. 그리고 상기 악성코드 탐지 시스템(100)은 기존의 시그니쳐 방식의 정적분석과 더불어 기록된 멀웨어 정보와 수신된 분석대상 파일의 특성을 비교하여 인공 신경망의 유사도 판단을 통해 위변조된 멀웨어까지 탐지할 수 있도록 제공될 수 있다. When a plurality of analysis target files are received, the memory unit 400 may temporarily store the analysis target files and sequentially transmit the analysis target files to the malicious code detection system 100 . And the database 300 can record the identification information or history of the user terminal 20, or update the DB for the artificial neural network and malware information to the latest information. In addition, the malicious code detection system 100 compares the characteristics of the received file to be analyzed with the recorded malware information in addition to the static analysis of the existing signature method, and provides to detect even forged malware by determining the similarity of the artificial neural network. It can be.

도 4 및 도 5는 본 발명에 따른 악성코드 탐지 시스템에 관한 것으로, 도 4는 악성코드 탐지 시스템의 구성도를, 도 5는 멀웨어 진단부의 구성도를 각각 나타낸다.4 and 5 relate to a malicious code detection system according to the present invention, FIG. 4 shows a configuration diagram of the malicious code detection system, and FIG. 5 shows a configuration diagram of a malware diagnosis unit, respectively.

도 4를 참조하면, 본 발명에 따른 악성코드 탐지 시스템(100)은 멀웨어 진단부(110), 신경망 모듈(120), 데이터 입력모듈(130) 및 데이터 출력모듈(140)을 포함할 수 있다. 상기 사용자단말(20)로부터 통신부(200)를 통해 분석대상 파일이 수신되면, 상기 데이터 입력모듈(130)로 상기 분석대상 파일이 입력될 수 있다. 이때 상기 분석대상 파일이 다수인 경우나 기 분석되고 있는 파일이 존재하는 경우 등의 임시로 저장이 필요한 경우에는 상기 메모리부(400)를 거쳐 상기 데이터 입력모듈(130)로 분석대상 파일이 입력되어 상기 멀웨어 진단부(110)에서 분석대상 파일의 멀웨어 포함 여부를 진단할 수 있다. 이때 상기 멀웨어 진단부(110)는 상기 신경망 모듈(120)과 연결될 수 있으며, 상기 신경망 모듈(120)은 인공 신경망을 포함할 수 있다. 여기서 상기 신경망 모듈(120)은 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network) 또는 LSTM(Long Short-Term Memory) 등의 인공신경망을 포함하거나, CNN-LSTM과 같이 복합 인공신경망을 포함하도록 구성될 수도 있다. 보다 바람직하게는 상기 신경망 모듈(120)은 LSTM 신경망을 포함할 수 있다. 이때 상기 멀웨어 진단부(110)는 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세스(microprocessors), 기타 기능 수행을 위한 전기적 유닛 중 적어도 하나를 이용하여 구현될 수 있다 아울러 상기 멀웨어 진단부(110)에서 진단한 결과는 상기 데이터 출력모듈(140)을 통해 출력되어 상기 통신부(200)로 상기 사용자단말(20)에 전송될 수 있다. 이에 사용자는 자신이 전송한 분석대상 파일의 악성여부를 즉시 확인할 수 있도록 제공될 수 있다. Referring to FIG. 4 , the malicious code detection system 100 according to the present invention may include a malware diagnosis unit 110, a neural network module 120, a data input module 130, and a data output module 140. When a file to be analyzed is received from the user terminal 20 through the communication unit 200 , the file to be analyzed may be input to the data input module 130 . At this time, when temporary storage is required, such as when there are a large number of files to be analyzed or when a file being previously analyzed exists, the file to be analyzed is input to the data input module 130 via the memory unit 400 The malware diagnosis unit 110 may diagnose whether the file to be analyzed includes malware. At this time, the malware diagnosis unit 110 may be connected to the neural network module 120, and the neural network module 120 may include an artificial neural network. Here, the neural network module 120 includes an artificial neural network such as a convolutional neural network (CNN), a recurrent neural network (RNN), or a long short-term memory (LSTM), or a complex artificial neural network such as CNN-LSTM. It could be. More preferably, the neural network module 120 may include an LSTM neural network. At this time, the malware diagnosis unit 110 includes application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), and controllers. ), micro-controllers, microprocessors, and electrical units for performing other functions. In addition, the result of diagnosis by the malware diagnosis unit 110 is the data output module. It can be output through 140 and transmitted to the user terminal 20 through the communication unit 200 . Accordingly, it can be provided so that the user can immediately check whether the file to be analyzed is malicious or not.

본 발명에 따른 악성코드 탐지 시스템(100)은 입력된 멀웨어의 특성을 이용하여 상기 인공 신경망을 학습시키는 신경망 학습부(150)를 더 포함할 수 있다. 여기서 상기 신경망 학습부(150)는 상기 데이터베이스(300)와 수집서버(30) 등과 접속되어 데이터가 송수신될 수 있으며, 상기 수집서버(30)에서 수집된 정상파일과 멀웨어의 특성을 기반으로 상기 신경망 모듈(120)의 인공 신경망을 학습시킬 수 있다. The malicious code detection system 100 according to the present invention may further include a neural network learning unit 150 that trains the artificial neural network using the input characteristics of malware. Here, the neural network learning unit 150 is connected to the database 300 and the collection server 30 to transmit/receive data, and based on the characteristics of normal files and malware collected by the collection server 30, the neural network The artificial neural network of module 120 may be trained.

도 5를 참조하면, 상기 멀웨어 진단부(110)는 상기 분석대상 파일의 데이터를 대상으로 정적 분석으로 진단하는 정적진단부(111) 및 상기 분석대상 파일을 실행하여 동적 분석으로 진단하는 동적진단부(112)를 포함할 수 있다. 여기서 상기 정적진단부(111)는 상기 분석대상 파일을 분석하고, 상기 신경망 모듈(120)을 통해 상기 분석대상 파일에 대해서 정적분석하여 멀웨어를 탐지할 수 있다. 그리고 상기 동적진단부(112)는 적어도 하나의 물리머신을 통해 상기 분석대상 파일을 실행하여 분석함으로써 멀웨어를 탐지할 수 있다. 이때 동적 분석은, 분석대상 파일을 가상 머신 또는 리얼 머신 환경에서 구동되는 운영체제 상에서 실행하였을 경우 분석대상 파일이 행하는 행위 정보를 수집함으로써 악성코드 분석을 수행하는 분석 방법을 의미할 수 있다.Referring to FIG. 5, the malware diagnosis unit 110 includes a static diagnosis unit 111 for diagnosing the data of the analysis target file through static analysis and a dynamic diagnosis unit for executing the analysis target file and diagnosing it through dynamic analysis. (112). Here, the static diagnosis unit 111 may detect malware by analyzing the analysis target file and performing static analysis on the analysis target file through the neural network module 120 . Also, the dynamic diagnosis unit 112 may detect malware by executing and analyzing the file to be analyzed through at least one physical machine. In this case, dynamic analysis may refer to an analysis method of performing malicious code analysis by collecting behavior information of an analysis target file when the analysis target file is executed on an operating system running in a virtual machine or real machine environment.

도 6 내지 도 8은 본 발명에 따른 악성코드 탐지 시스템에 관한 것으로, 도 6은 정적진단부에서 사용되는 AI 특성을 나타낸 도면을, 도 7은 정적진단부에서 사용되는 라이브러리를 나타낸 도면을, 도 8은 동적진단부의 구성도를 각각 나타낸다.6 to 8 relate to a malicious code detection system according to the present invention, FIG. 6 is a diagram showing AI characteristics used in the static diagnosis unit, FIG. 7 is a diagram showing a library used in the static diagnosis unit, and FIG. 8 shows a configuration diagram of the dynamic diagnosis unit, respectively.

도 6을 참조하면, 상기 분석대상 파일이 윈도우에서 실행되는 PE(Portable Excutable) 파일인 경우로 예거하여 보다 상세히 설명한다. 상기 정적진단부(111)에서 멀웨어 탐지를 위한 AI 특성(Feature)으로 다양한 속성을 사용할 수 있다. 여기서 사용될 수 있는 속성은 PE 파일의 Load Configuration 존재 여부인 pe.has_configuration, PE 파일의 Debug section 존재 여부인 pe.has_debug, PE 파일의 exceptions 사용 여부인 pe.has_exceptions, PE 파일의 exported symbol 존재 여부인 pe.has_exports, PE 파일의 symbol 임포트 여부인 pe.has_imports, PE 파일의 set 존재 여부인 pe.has_nx, PE 파일의 relocation entries 존재 여부인 pe.has_relocations, PE 파일의 resource 존재 여부인 pe.has_resources, rich header의 존재 여부인 pe.has_rich_header, PE 파일의 전자서명 여부인 pe.has_signature 또는, PE 파일의 사용 여부인 pe.has_tls 중 적어도 하나일 수 있다. Referring to FIG. 6, a case in which the analysis target file is a PE (Portable Executable) file executed in Windows will be described in more detail. In the static diagnosis unit 111, various attributes can be used as AI features for malware detection. The properties that can be used here are pe.has_configuration, whether the Load Configuration exists in the PE file, pe.has_debug, whether the Debug section exists in the PE file, pe.has_exceptions whether or not exceptions are used in the PE file, and pe, whether the exported symbol exists in the PE file. .has_exports, pe.has_imports whether symbols are imported in PE files, pe.has_nx whether sets exist in PE files, pe.has_relocations whether there are relocation entries in PE files, pe.has_resources whether resources exist in PE files, rich header may be at least one of pe.has_rich_header indicating existence of , pe.has_signature indicating whether a PE file is digitally signed, or pe.has_tls indicating whether a PE file is used.

상기 정적진단부(111)는 멀웨어 탐지를 위해 상기 PE 파일의 진입점으로 첫 64바이트를 정규화하여 특성으로 사용할 수 있으며, PE 파일의 각 바이트를 정규화하여 PE 히스토그램으로 특성을 사용할 수 있다. 그리고 도 7에서 도시된 바와 같이 다수의 라이브러리로 구성된 PE파일에서 일부 라이브러리의 목록에 해당하면 가중치를 부여하여 특성으로 사용할 수도 있다. 아울러 PE 섹션에 대한 일부 정보는 인코딩한 값을 특성으로 사용할 수 있으며, 이때 각 섹션의 평균 섀넌 엔트로피(Shannon entropy) 크기나 디스크 상의 크기와 메모리상의 크기의 비율 등이 포함될 수 있다. 상기 정적진단부(111)는 난독화 또는 패킹 여부를 선행 분석하여 언패킹(Unpacking)하는 과정을 통해 전처리하는 과정을 포함할 수도 있다. 그리고 언패킹된 분석대상 파일은, 소프트웨어 디아모닝(Software De-Armoring)에서 얻은 덤프 파일을 Veratrace, IDA Pro 및 Objdump와 같은 도구를 사용하여 분석할 수 있다. 그리고 파일의 특징에 대해서 니모닉(Mnemonic), API호출 또는 지침 등을 추출하고 추출된 특징의 빈도를 산출할 수 있다. 이어 각 기능의 주파수 벡터 표와 Zscore 등을 사용한 특징 공간의 정규화 및 이산화를 통해 특징을 추출하는 과정을 거칠 수 있다. The static diagnosis unit 111 normalizes the first 64 bytes as an entry point of the PE file to use as a characteristic to detect malware, and normalizes each byte of the PE file to use the characteristic as a PE histogram. In addition, as shown in FIG. 7, if a PE file composed of a plurality of libraries corresponds to a list of some libraries, a weight may be assigned and used as a characteristic. In addition, some information about the PE section may use an encoded value as a characteristic, and at this time, the average Shannon entropy size of each section or the ratio between the size on disk and the size on memory may be included. The static diagnosis unit 111 may include a preprocessing process through a process of pre-analyzing obfuscation or packing and unpacking. In addition, the unpacked analysis target file can be analyzed using tools such as Veratrace, IDA Pro, and Objdump in the dump file obtained from Software De-Armoring. Also, mnemonic, API calls, or guidelines may be extracted for the characteristics of the file, and the frequency of the extracted characteristics may be calculated. Subsequently, a process of extracting features may be performed through normalization and discretization of the feature space using the frequency vector table of each feature and Zscore.

도 8을 참조하면, 상기 동적진단부(112)는 복수의 가상 머신(Virtual Machine, VM) 환경에서 멀웨어 탐지를 위해 동적 분석으로 상기 분석대상 파일을 진단하는 가상 머신 분석부(112a)를 포함할 수 있다. 이때 상기 가상 머신 분석부(112a)는 가상 머신 환경에서 분석대상 파일에 대한 동적 분석을 수행하고, 분석 결과를 상기 데이터베이스(300)에 저장할 수 있다. Referring to FIG. 8 , the dynamic diagnosis unit 112 may include a virtual machine analysis unit 112a for diagnosing the analysis target file by dynamic analysis to detect malware in a plurality of virtual machine (VM) environments. can At this time, the virtual machine analysis unit 112a may perform dynamic analysis on the analysis target file in the virtual machine environment and store the analysis result in the database 300 .

상기 동적진단부(112)는 적어도 하나의 리얼 머신(Real Machine) 환경에서 펄웨어 탐지를 위해 동적분석으로 상기 분석대상 파일을 진단하는 리얼 머신 분석부(112b)를 더 포함할 수도 있다. 그리고 상기 분석대상 파일에 가상 머신을 우회하기 위한 우회로그가 포함되어 있는 경우에는, 가상 머신 분석부(112a)는 분석대상 파일을 상기 리얼 머신 분석부(112b)로 전송하고 리얼 머신 분석부(112b)를 통해 리얼 머신 환경에서 한 번 더 동적 분석을 수행하며, 분석 결과를 상기 데이터베이스(300)에 저장할 수 있다. 여기서 우회로그라 함은 일반적으로 사용되는 가상 머신 기반의 동적 분석 기술을 회피하기 위한 로그를 의미할 수 있으며, 분석대상 파일에 이러한 우회로그가 포함되어 있는 경우에 가상 머신 환경에서는 분석대상 파일의 행위 정보를 수집하는 것에 한계가 발생됨에 따라 리얼 머신 환경에서 한 번 더 동적 분석을 수행하도록 제공될 수 있다. 상기 데이터베이스(300)는 가상 머신 분석부(112a) 및 리얼 머신 분석부(112b)가 수행한 동적 분석의 결과 정보와 함께 외부 서버로부터 수집한 OSINT(Open Source Intelligence) 정보를 저장하여, 상기 멀웨어 진단부(110)가 데이터베이스(300)에 저장된 동적 분석 결과 정보 및 OSINT 정보를 이용하여 상관분석을 수행할 수도 있다. 여기서 상관분석은 API 호출 순서, 취약점 공격 정보, Drop file 정보, 덤프 정보, 악성 트래픽 정보 및 프로세스 정보에 대한 상관 분석을 수행하고, 그 결과를 검출할 수 있다.The dynamic diagnosis unit 112 may further include a real machine analysis unit 112b for diagnosing the analysis target file through dynamic analysis to detect perlware in at least one real machine environment. And, if the file to be analyzed includes a detour log for bypassing the virtual machine, the virtual machine analyzer 112a transmits the file to be analyzed to the real machine analyzer 112b and the real machine analyzer 112b ), dynamic analysis may be performed once more in a real machine environment, and analysis results may be stored in the database 300. Here, the detour log may mean a log to avoid a commonly used virtual machine-based dynamic analysis technique, and if the analysis target file includes such a detour log, the behavior information of the analysis target file in the virtual machine environment. As a limitation occurs in collecting , it may be provided to perform dynamic analysis once more in a real machine environment. The database 300 stores OSINT (Open Source Intelligence) information collected from an external server together with result information of the dynamic analysis performed by the virtual machine analyzer 112a and the real machine analyzer 112b, thereby diagnosing the malware. The unit 110 may perform correlation analysis using the dynamic analysis result information and OSINT information stored in the database 300 . Here, the correlation analysis may perform correlation analysis on the API call sequence, vulnerability attack information, drop file information, dump information, malicious traffic information, and process information, and detect the result.

그리고 상기 가상 머신(VM: Virtual Machine)은 리얼 머신(PC, 노트북, 휴대용 단말 등)을 통해 이루어지는 컴퓨팅 환경을 에뮬레이션 하여 소프트웨어로 구현한 것으로, 리얼 머신에서와 같이 운영체제 또는 응용 프로그램을 실행할 수 있는 환경을 제공할 수 있다 가상 머신의 예시로는 VMware, VirtualBox 또는 QEMU/KVM 등이 포함될 수 있으나 반드시 이에 한정되는 것은 아니다.In addition, the virtual machine (VM: Virtual Machine) emulates a computing environment made through a real machine (PC, laptop, portable terminal, etc.) and implements it as software, an environment in which an operating system or application program can be executed as in a real machine Examples of virtual machines may include VMware, VirtualBox, or QEMU/KVM, but are not necessarily limited thereto.

복수의 상기 가상 머신 분석부(112a) 중 일부의 가상 머신 분석부(112a)는 서로 다른 운영체제(OS: Operating System)를 구동하고, 각 운영체제 상에서 분석대상 파일을 실행하여 정보를 수집할 수도 있다. 서로 다른 운영체제(OS)는 windows 7(32bit, 64bit), windows 8(32bit, 64bit), windows 10(32bit, 64bit), windows 11(32bit, 64bit), Linux, Mac, Android 등을 포함할 수 있으나, 이는 현재 이용되고 있는 운영체제를 예시로서 설명한 것일 뿐 반드시 이에 한정되는 것은 아니고 향후 개발될 운영체제를 포함하여 임의의 운영체제(OS)에 대하여 동적 분석이 수행될 수 있음은 물론이다. 그리고 이와 같은 분석 방법을 통해 동일한 분석대상 파일에 대하여 리얼 머신 환경의 각기 다른 운영체제(OS)에서 동시 또는 순차적으로 동적 분석을 수행함으로써, 분석대상 파일에서 탐지되는 악성코드가 어떤 운영체제에 있어서 가장 취약한지 분석할 수 있다.Some of the virtual machine analyzers 112a among the plurality of virtual machine analyzers 112a may collect information by driving different operating systems (OSs) and executing analysis target files on each operating system. Different operating systems (OS) may include windows 7 (32bit, 64bit), windows 8 (32bit, 64bit), windows 10 (32bit, 64bit), windows 11 (32bit, 64bit), Linux, Mac, Android, etc. , This is only explained as an example of an operating system currently in use, but is not necessarily limited thereto, and of course, dynamic analysis can be performed for any operating system (OS), including an operating system to be developed in the future. In addition, by performing dynamic analysis simultaneously or sequentially on different operating systems (OS) in a real machine environment for the same analysis target file through this analysis method, it is possible to determine in which operating system the malware detected in the analysis target file is most vulnerable. can be analyzed.

도 9 내지 도 16은 본 발명에 따른 악성코드 탐지 시스템의 학습과정에 관한 것으로, 도 9는 본 발명에 따른 신경망 학습과정을 도시한 도면을, 도 10 및 도 11은 신경망 모델의 예시도를, 도 12는 학습데이터의 분포도를, 도 13 내지 도 16은 다양한 신경망 모델에 따른 혼동 행렬을 도시한 도면을 각각 나타낸다.9 to 16 relate to a learning process of a malicious code detection system according to the present invention, FIG. 9 is a diagram showing a neural network learning process according to the present invention, FIGS. 10 and 11 are examples of a neural network model, 12 shows a distribution of training data, and FIGS. 13 to 16 show a confusion matrix according to various neural network models.

도 9를 참조하면, 본 발명에 따른 신경망 학습부(150)는, 수집서버(30)를 통해 웹(Web)에서 정상파일과 악성파일을 수집하고, 정상파일과 악성파일의 특성을 각각 분석하고 데이터베이스(300)에 기록하여 이를 기반으로 신경망 모듈(120)을 학습시킬 수 있다. 이때 상기 신경망 학습부(150)는 정상파일과 악성파일의 Header 정보, 사이즈 정보, Packing 정보, Import API, Export API, 파일종류 및 파일크기 중 적어도 하나의 특성을 추출 및 분석하여 기록할 수 있다. Referring to FIG. 9, the neural network learning unit 150 according to the present invention collects normal files and malicious files from the web through the collection server 30, analyzes the characteristics of the normal files and malicious files, and It is recorded in the database 300 and based on this, the neural network module 120 can be trained. At this time, the neural network learning unit 150 may extract, analyze, and record at least one characteristic of header information, size information, packing information, import API, export API, file type, and file size of the normal file and the malicious file.

도 10 및 도 11을 참조하면, 본 발명에 따른 신경망 모듈(120)은 일 예로, 70개의 뉴런을 가지고 있는 뉴런 네트워크 2개로 구성되어 활성화 함수가 ReLU 드롭 아웃 30%로 구성될 수 있다. 이때 인풋이 486개 아웃풋이 2개로 설정되고 옵티마이저의 Adam Loss Function은 banary_crossentropy를 사용할 수 있다. 이때 배치사이즈는 64, 학습 횟수(epoch)는 50회이며, EarlyStopping 콜백 함수를 사영하여 5회 이상 loss의 감소가 없는 경우에 학습이 중지되도록 설계될 수 있다. 도 12를 참조하면, 임시로 설계된 신경망 모듈(120)에서 사용된 데이터의 분포는 도시된 바와 같이 고르게 분포될 수 있다. 그리고 총 학습데이터는 139,979개로 정의하여 다양한 종류의 인공 신경망으로 알고리즘을 검증함에 따라 각 인공 신경망의 정확도를 측정할 수 있다.Referring to FIGS. 10 and 11 , the neural network module 120 according to the present invention may be composed of, for example, two neuron networks having 70 neurons, and an activation function of ReLU dropout of 30%. At this time, the input is set to 486 and the output is set to 2, and the optimizer's Adam Loss Function can use banary_crossentropy. At this time, the batch size is 64, the number of learning times (epochs) is 50, and the EarlyStopping callback function can be used to design to stop learning when there is no decrease in loss for more than 5 times. Referring to FIG. 12 , the distribution of data used in the temporarily designed neural network module 120 may be evenly distributed as shown. In addition, the total learning data is defined as 139,979, and the accuracy of each artificial neural network can be measured by verifying the algorithm with various types of artificial neural networks.

도 13에서 도시된 바와 같이 CNN(Convolutional Neural Network) 모델을 통한 혼동 행렬(Confusion Matrix)에 따르면, 학습의 loss가 0.3237로 accuracy가 0.8558로 산출되고, 검증의 val_loss가 0.3204로 val_accuracy가 0.8550으로 산출될 수 있다. 그리고 도 14에서 도시된 바와 같이 RNN(Recurrent Neural Network) 모델을 통한 혼동 행렬에 따르면, 학습의 loss가 0.5243로 accuracy가 0.7558로 산출되고, 검증의 val_loss가 0.4844로 val_accuracy가 0.7795로 산출될 수 있다. As shown in FIG. 13, according to the confusion matrix through the CNN (Convolutional Neural Network) model, the learning loss is 0.3237, the accuracy is calculated as 0.8558, the validation val_loss is 0.3204, and the val_accuracy is calculated as 0.8550. can And, as shown in FIG. 14, according to the confusion matrix through the RNN (Recurrent Neural Network) model, the learning loss is 0.5243, the accuracy is 0.7558, the verification val_loss is 0.4844, and the val_accuracy is 0.7795. Can be calculated.

도 15에서 도시된 바와 같이 LSTM(Long Short-Term Memory) 모델을 통한 혼동 행렬(Confusion Matrix)에 따르면, 학습의 loss가 0.0977로 accuracy가 0.9669로 산출되고, 검증의 val_loss가 0.0943으로 val_accuracy가 0.9689로 산출될 수 있다. 그리고 도 16에서 도시된 바와 같이 CNN-LSTM 모델을 통한 혼동 행렬에 따르면, 학습의 loss가 0.6382로 accuracy가 0.5872로 산출되고, 검증의 val_loss가 0.6398로 val_accuracy가 0.5894로 산출될 수 있다.As shown in FIG. 15, according to the confusion matrix through the LSTM (Long Short-Term Memory) model, the learning loss is 0.0977 and the accuracy is 0.9669, and the validation val_loss is 0.0943 and the val_accuracy is 0.9689. can be derived. And, as shown in FIG. 16, according to the confusion matrix through the CNN-LSTM model, the learning loss is 0.6382, the accuracy is 0.5872, the verification val_loss is 0.6398, and the val_accuracy is 0.5894.

위와 같이 멀웨어 탐지에 적합한 AI 모델은 LSTM 모델이 정확도에서 가장 좋은 결과를 나타냄에 따라 상기 신경망 모듈(120)에서 활용되기에 가장 바람직할 수 있다. As described above, an AI model suitable for malware detection may be most preferable to be used in the neural network module 120 as the LSTM model shows the best result in accuracy.

도 17 내지 도 20은 본 발명에 따른 멀웨어 분석방법에 관한 것으로, 도 9는 본 발명에 따른 신경망 학습과정을 도시한 도면을, 도 10 및 도 11은 신경망 모델의 예시도를, 도 12는 학습데이터의 분포도를, 도 13 내지 도 16은 다양한 신경망 모델에 따른 혼동 행렬을 도시한 도면을 각각 나타낸다.17 to 20 relate to a malware analysis method according to the present invention, FIG. 9 is a diagram showing a neural network training process according to the present invention, FIGS. 10 and 11 are examples of a neural network model, and FIG. 12 is a learning diagram. As for the distribution of data, FIGS. 13 to 16 each show a confusion matrix according to various neural network models.

도 17을 참조하면, 본 발명에 따른 멀웨어 분석방법은, 상기 사용자단말(20)로부터 분석대상 파일이 수신되어 분석요청이 시작되면, 분석대기 단계(S100), 정적분석 단계(S200), 결과기록 단계(S300) 및 정보가공 단계(S400)를 거쳐 상기 사용자단말(20)로 결과에 대한 정보를 제공해줄 수 있다. 그리고 상기 사용자단말(20)에서 동적분석도 함께 요청하는 경우에는 있으며, 동적분석 단계(S210)를 더 포함할 수 있다. 여기서 상기 분석대기 단계(S100)는 수신된 분석대상 파일을 상기 정적분석 또는 동적분선 전에 상기 메모리부(400)에서 임시 저장하는 단계일 수 있다. 그리고 상기 정적분석 단계(S200) 및 동적분석 단계(S210)는 각각 상기 정적진단부(111) 및 동적진단부(112)를 통해 정적분석 및 동적분석이 진행되는 단계일 수 있다. 아울러 상기 결과기록 단계(S300)는 상기 데이터베이스(300)에 분석 결과에 기록되는 단계일 수 있으며, 상기 데이터베이스(300)의 정보를 상기 데이터 보정모듈(500)을 통한 후가공되는 것이 포함되는 단계일 수도 있다. 그리고 상기 정보가공 단계(S300)는 분석된 결과 데이터를 사용자가 사용자단말(20)의 인터페이스를 통해 볼 수 있도록 가공하는 것으로, 분석대상 파일에 멀웨어가 포함되었는지 여부나 포함된 멀웨어의 종류 혹은 멀웨어의 위험도 등이 결과 데이터가 포함될 수 있다. Referring to FIG. 17, in the malware analysis method according to the present invention, when an analysis target file is received from the user terminal 20 and an analysis request is started, an analysis standby step (S100), a static analysis step (S200), and result recording Through step (S300) and information processing step (S400), it is possible to provide information on the result to the user terminal (20). In addition, if the user terminal 20 also requests dynamic analysis, a dynamic analysis step (S210) may be further included. Here, the analysis standby step (S100) may be a step of temporarily storing the received analysis target file in the memory unit 400 before the static analysis or dynamic segmentation. The static analysis step (S200) and the dynamic analysis step (S210) may be steps in which static analysis and dynamic analysis are performed through the static diagnosis unit 111 and the dynamic diagnosis unit 112, respectively. In addition, the result recording step (S300) may be a step of recording the analysis result in the database 300, and may be a step that includes post-processing the information of the database 300 through the data correction module 500. there is. And the information processing step (S300) is to process the analyzed result data so that the user can view it through the interface of the user terminal 20, whether or not the analysis target file contains malware, the type of malware included, or the type of malware. Result data such as risk may be included.

도 18에서 도시된 바와 같이 사용자단말(20)에 애플리케이션 등을 통해 본 발명의 안티멀웨어 클라우드 서버(10)와 접속되는 인터페이스가 제공될 수 있다. 이때 사용자단말(20)의 인터페이스에서는 분석대상 파일을 업로드하는 기능과, 분석 대상 파일에 대한 정적분석 및 동적분석 중 어느 하나를 선택할 수 있는 기능, AI 분석을 통한 위협 알림 임계치 등을 설정할 수 있다. 그리고 도 19에서 도시된 바와 같이 안티멀웨어 클라우드 서버(10)에서는 각 멀웨어 대한 위험도를 관련 데이터로 저장하되, 분석대상 파일에서 발견된 멀웨어 대한 위험도를 사용자단말(20)에서 함께 표출하도록 제공될 수도 있다. 혹은 사용자단말(20)에 상기 신경망 모듈(120)의 인공 신경망에 대한 정보를 제공할 수도 있으며, 상기 사용자단말(20)의 프로세서가 제공받은 인공 신경망 정보를 기반으로 기기에서 멀웨어를 즉시 분석하도록 제공될 수도 있다. As shown in FIG. 18 , an interface connected to the anti-malware cloud server 10 of the present invention may be provided to the user terminal 20 through an application or the like. At this time, the interface of the user terminal 20 may set a function to upload a file to be analyzed, a function to select any one of static analysis and dynamic analysis for the file to be analyzed, and a threat notification threshold through AI analysis. And, as shown in FIG. 19, the anti-malware cloud server 10 stores the risk level for each malware as related data, but may also provide the user terminal 20 with the risk level for the malware found in the file to be analyzed. . Alternatively, information on the artificial neural network of the neural network module 120 may be provided to the user terminal 20, and the processor of the user terminal 20 provides the device to immediately analyze malware based on the received artificial neural network information. It could be.

도 20은 안티멀웨어 클라우드 서버(10)에서 사용자단말(20)로 제공되는 탐지 분석 보고서의 예시로, 분류, 탐지 시작 및 종료 시간, 소요시간, 분석 머신, 분석 OS, 위험도 등을 제공할 수 있다. 또한 분석 정보에 대한 상세 정보를 추가로 제공해줌에 따라, 사용자가 보다 정밀한 정보를 식별할 수 있는 장점이 있다. 20 is an example of a detection analysis report provided from the anti-malware cloud server 10 to the user terminal 20, and may provide classification, detection start and end time, required time, analysis machine, analysis OS, risk level, etc. . In addition, as detailed information on the analysis information is additionally provided, there is an advantage in that the user can identify more precise information.

도 21은 본 발명에 따른 안티멀웨어 클라우드 서버에 관한 것으로, 도 21은 다수의 기억장치를 포함하는 데이터베이스를 도시한 도면을 나타낸다.21 relates to an anti-malware cloud server according to the present invention, and FIG. 21 shows a diagram showing a database including a plurality of storage devices.

도 21을 참조하면, 본 발명에 따른 안티멀웨어 클라우드 서버(10)는, 상술한 바와 같이 수신된 분석대상 파일에 멀웨어가 포함되어 있는지 여부를 분석하는 악성코드 탐지 시스템(100), 상기 사용자단말(20)과 네트워크로 연결되어 데이터 통신하는 통신부(200), 상기 악성코드 탐지 시스템(100)과 연계되어 데이터를 기록하는 데이터베이스(300)를 포함할 수 있다. 그리고 본 발명에 따른 안티멀웨어 클라우드 서버(10)는 상기 통신부(200)에서 수신된 분석대상 파일을 분석 전까지 임시 저장하는 메모리부(400)와, 상기 데이터베이스(300)에 저장된 분석결과에 대해 후가공하는 데이터 보정모듈(500)을 더 포함할 수 있다. 여기서 상기 데이터베이스(300)는 그룹DB(310), 개인DB(320) 및 관계DB(330)를 포함한 복수의 기억장치로 구성될 수 있다. Referring to FIG. 21, the anti-malware cloud server 10 according to the present invention includes a malicious code detection system 100 for analyzing whether or not malware is included in the received analysis target file as described above, and the user terminal ( 20) and a communication unit 200 that communicates data by being connected to the network, and a database 300 that records data in association with the malicious code detection system 100. In addition, the anti-malware cloud server 10 according to the present invention includes a memory unit 400 for temporarily storing the analysis target file received from the communication unit 200 until analysis, and post-processing the analysis result stored in the database 300 A data correction module 500 may be further included. Here, the database 300 may be composed of a plurality of storage devices including a group DB 310, an individual DB 320 and a relation DB 330.

상기 그룹DB(310)는 복수의 그룹데이터에 대해서 각 그룹별로 데이터셋(Dataset)이 구축될 수 있으며, 각 그룹은 사내 인트라넷(Intranet)이 구축된 여부에 따라 그룹핑(Grouping)될 수 있다. 보다 명확하게는 하나의 인트라넷 망이 구축된 기관, 기업, 정부지자체 및 학교 중 적어도 하나가 그룹핑될 수 있으며, 복수의 그룹들은 서로 다른 인터넷망을 사용할 수 있다. 이때 상기 그룹 DB(310)에는 사명, 위치 등과 같이 해당 그룹에 관련된 정보와, 해당 그룹에서 이전에 업로드한 파일이나 해당 파일의 진단 결과 등이 포함될 수 있다. 이때 진단 결과는 해당 파일에서 발견된 멀웨어의 종류가 포함되는 것이 바람직할 수 있다.In the group DB 310, a dataset may be built for each group for a plurality of group data, and each group may be grouped according to whether an intranet is built. More specifically, at least one of institutions, corporations, local governments, and schools in which one intranet network is built may be grouped, and a plurality of groups may use different Internet networks. At this time, the group DB 310 may include information related to the group, such as a company name and location, a file previously uploaded by the group, or a diagnosis result of the file. In this case, it may be preferable that the diagnosis result includes the type of malware found in the corresponding file.

상기 개인DB(320)는 복수의 사용자데이터에 대해서 각 사용자별로 데이터셋이 구축될 수 있다. 이때 사용자데이터는 IP주소, 기기정보, ID(Identification Number) 등을 통해서 각 사용자 별로 구분될 수 있다. 이때 각 사용자 별로 구축된 데이터셋에는 사용자의 개인정보, 해당 사용자가 이전 업로드한 파일이나 해당 파일의 진단 결과 등이 포함될 수 있다. 이때 진단 결과는 해당 파일에서 발견된 멀웨어의 종류가 포함되는 것이 바람직할 수 있다.In the personal DB 320, a data set may be constructed for each user with respect to a plurality of user data. At this time, user data may be classified for each user through IP address, device information, ID (Identification Number), and the like. In this case, the data set built for each user may include the user's personal information, a file previously uploaded by the user, or a diagnosis result of the file. In this case, it may be preferable that the diagnosis result includes the type of malware found in the corresponding file.

상기 관계DB(330)는, 각 사용자가 속하는 그룹에 대한 정보가 저장될 수 있으며, 그룹들 간의 연관관계 데이터가 저장될 수도 있다. 보다 명확히는 적어도 그룹핑된 기관, 기업, 정부지자체 및 학교에 소속된 사용자의 경우에, 해당 그룹에 해당 사용자가 매칭되도록 계층화할 수 있다. 이때 하나의 그룹에는 다수의 사용자 군이 매칭되어 계층화될 수 있으며, 하나의 사용자가 업로드한 파일에서 멀웨어가 진단된 경우에 하나의 사용자가 소속된 그룹에 대해 진단된 멀웨어에 대한 정보가 저장될 수도 있다. 그리고 다른 사용자가 같은 그룹에 소속된 경우에는 다른 사용자의 단말에서 업로드한 파일에 대한 히스토리와 더불어, 같은 그룹으로 그룹핑된 사용자들의 히스토리까지 함께 연관데이터로 기록될 수 있다. 그리고 서로 다른 인트라넷을 사용하는 그룹들 간에도 협력사, 자회사, 고객사 등 다양한 관계를 형성할 수 있음에 따라, 둘 이상의 그룹이 서로를 관계사로 지정할 수 있는 기능을 포함할 수 있다. 일 예로 모회사인 A 그룹에서 자회사인 B 그룹에 대하여 관계사로 입력한 경우에, B 그룹으로 인증된 사용자단말(20)로부터 관계사 승인요청을 받아 A 그룹과 B 그룹 간의 관계사 지정이 될 수 있다. 그리고 하나의 사용자단말(20)에서 업로드된 파일에서 멀웨어가 발견된 경우, 동일한 인트라넷을 사용하는 다른 사용자단말(20)에서 업로드된 파일을 해당 멀웨어에 대한 제1위험군으로 분류하고, 동일한 인트라넷은 아니지만 관계사 그룹에 소속된 또 다른 사용자단말(20)에서 업로드된 파일을 해당 멀웨어 대한 제2위험군으로 분류할 수 있다. 여기서 악성코드 탐지 시스템(100)은 제1위험군이 제2위험군 보다 높은 확률로 감염된 것으로 판단할 수 있다.The relation DB 330 may store information on groups to which each user belongs, and may also store relational data between groups. More specifically, in the case of users belonging to at least grouped institutions, corporations, government local governments, and schools, the user can be stratified so that the user is matched with the corresponding group. At this time, a number of user groups may be matched and stratified in one group, and when malware is diagnosed in a file uploaded by one user, information on the malware diagnosed for the group to which one user belongs may be stored. there is. In addition, when another user belongs to the same group, the history of files uploaded from the terminal of the other user and the history of users grouped into the same group may be recorded as related data. Also, since various relationships such as partner companies, subsidiaries, and customer companies can be formed between groups using different intranets, a function for designating two or more groups as related companies may be included. For example, in the case of inputting a parent company A group to a subsidiary group B as an affiliate, an affiliate approval request from the user terminal 20 authenticated as the B group may be designated as an affiliate between the A group and the B group. And when malware is found in a file uploaded from one user terminal 20, the file uploaded from another user terminal 20 using the same intranet is classified as the first risk group for the malware, and it is not the same intranet, but A file uploaded from another user terminal 20 belonging to an affiliate group may be classified as a second risk group for the corresponding malware. Here, the malicious code detection system 100 may determine that the first risk group is infected with a higher probability than the second risk group.

상기 메모리부(400)는 소정의 메모리를 가질 수 있으며, 데이터 최대량에 대비하여 사용률이 높은 경우에 제1위험군과 제2위험군에 대한 신속검사가 이루어질 수 있다. 즉, 본 발명에 따른 안티멀웨어 클라우드 서버(10)에 단시간동안 대용량의 파일들이 업로드되어 상기 메모리부(400)의 사용량이 높은 경우에 신속처리가 가능하면서 부하를 감소시키도록 신속검사가 이루어질 수 있다. 일 예로 상기 메모리부(400)의 데이터 최대량에 대비하여 사용률이 N₁% 보다 큰 경우, 상기 메모리부(400)에 임시 저장된 파일을 아래의 단계를 통해 처리할 수 있다. The memory unit 400 may have a predetermined memory, and when the usage rate is high compared to the maximum amount of data, a rapid test may be performed for the first risk group and the second risk group. That is, when large-capacity files are uploaded to the anti-malware cloud server 10 in a short period of time and the usage of the memory unit 400 is high, rapid processing is possible and a quick scan can be performed to reduce the load. . For example, when the usage rate compared to the maximum amount of data in the memory unit 400 is greater than N ₁ %, files temporarily stored in the memory unit 400 may be processed through the following steps.

a) 파일을 업로드한 사용자와 속하는 그룹이 있는지 유무를 판별하는 단계,a) determining whether there is a group belonging to the user who uploaded the file;

b) 상기 a단계에서 매칭된 그룹이 있는 경우, 해당 그룹에서 기 탐지된 멀웨어가 있는지 여부를 검색하는 단계,b) if there is a group matched in step a, searching for previously detected malware in the group;

c) 해당 그룹에서 기 탐지된 멀웨어가 있는 경우, 기 탐지된 멀웨어만 특정하여 파일을 정적 진단하는 단계,c) statically diagnosing files by specifying only previously detected malware if there is already detected malware in the group;

상기 N₁%는 100% 보다는 작고 0% 보다는 크도록 형성될 수 있다. 이처럼 상기 메모리부(400)의 메모리 사용량이 N₁%보다 큰 경우에 제1위험군에 해당하는 사용자단말(20)로부터 업로드된 파일들은 소속된 그룹에서 이전에 발견되었던 멀웨어 종류만 특정하여 정적 진단이 이루어질 수 있다. 그리고 상기 사용자단말(20)로 신속 진단 결과를 제공함과 동시에 신속 진단되었음을 인터페이스를 통해 안내할 수 있다. 여기서 제1위험군은 소속된 그룹에서 발견되었던 모든 멀웨어 종류군에 대한 정적 진단 뿐만 아니라, 소속된 그룹에서 2년 내에 발견된 멀웨어 종류군과 같이 특정 기간에 수집된 정보를 활용하여 정적 진단이 이루어질 수도 있다. The N ₁ % may be formed to be less than 100% and greater than 0%. In this way, when the memory usage of the memory unit 400 is greater than N ₁ %, files uploaded from the user terminal 20 corresponding to the first risk group are statically diagnosed by specifying only the malware type previously found in the group to which they belong. It can be done. In addition, the quick diagnosis result may be provided to the user terminal 20 and at the same time, it may be notified through an interface that the quick diagnosis has been made. Here, in the first risk group, not only static diagnosis of all malware types found in the group to which it belongs, but also static diagnosis may be made using information collected during a specific period, such as malware types found within 2 years in the group to which it belongs. there is.

상기 메모리부(400)의 사용률이 N₁% 보다는 작고 N₂% 보다는 큰 경우에 임시저장된 파일에 대해서 아래의 단계를 통해 제1위험군과 제2위험군을 모두 진단할 수도 있다. When the usage rate of the memory unit 400 is less than N ₁ % and greater than N ₂ %, both the first risk group and the second risk group may be diagnosed for temporarily stored files through the following steps.

d) 해당 그룹에서 기 탐지된 멀웨어가 없는 경우, 해당 그룹과 연관된 적어도 하나의 연관그룹을 관계DB에서 검색하는 단계,d) If there is no previously detected malware in the corresponding group, searching for at least one related group associated with the corresponding group in the relational DB;

e) 검색된 연관그룹에서 기 탐지된 멀웨어가 있는 경우, 기 탐지된 멀웨어만 특정하여 파일을 정적진단하는 단계,e) If there is already detected malware in the searched association group, statically diagnosing the file by specifying only the previously detected malware;

(여기서, 100 > N₁ > N₂ > 0)(Where 100 > N ₁ > N ₂ > 0)

보다 바람직하게는 상기 N₂는 30 내지 50 사이, 상기 N₁은 60 내지 80 사이로 형성될 수 있다. 이에 따라 상기 메모리부(400)의 사용률이 과도하게 높아져 N1% 보다 큰 리소스를 사용하는 경우에 제1위험군에 대한 신속 검사가 이루어질 수 있으며, 상기 메모리부(400)의 사용률이 상기 N1%와 N2% 사이로 높은 리소스를 사용하는 경우에는 제1위험군 및 제2위험군에 대한 신속 검사가 이루어질 수 있다. 이때 신속 검사는 상술한 바와 같이 순차적으로 업로드 된 파일들을 선별하여 상기 메모리부(400)에서 상기 악성코드 탐지 시스템(100)으로 전달하되, 상기 악성코드 탐지 시스템(100)의 검색결과를 사용자단말(20)로 제공해주는 과정을 거칠 수 있다. 즉, 상기 a단계에서 분석대상 파일을 업로드한 사용자가 속하는 그룹이 없는 경우에 상기 악성코드 탐지 시스템(100)은 해당 파일에 대해서 정적 진단 및 동적 진단할 수 있으며, 매칭된 그룹이 있는 경우에는 상기 메모리부(400)의 사용률에 따라 상기 악성코드 탐지 시스템(100)의 정적진단부(111)가 제1위험군에 대한 검사 혹은 제1위험군 및 제2위험군에 대해 검사할 수 있다.More preferably, the N ₂ is between 30 and 50, and the N ₁ may be formed between 60 and 80. Accordingly, when the usage rate of the memory unit 400 is excessively high and a resource greater than N1% is used, a rapid test for the first risk group can be performed, and the usage rate of the memory unit 400 is the N1% and N2%. In the case of using a high resource between %, a rapid test for the first risk group and the second risk group may be performed. At this time, the quick scan selects the files uploaded sequentially as described above and transfers them from the memory unit 400 to the malicious code detection system 100, and the user terminal ( 20) can be provided. That is, when there is no group to which the user who uploaded the file to be analyzed in step a belongs, the malicious code detection system 100 can perform static and dynamic diagnosis on the corresponding file, and if there is a matched group, the malicious code detection system 100 can perform Depending on the usage rate of the memory unit 400, the static diagnosis unit 111 of the malicious code detection system 100 may inspect the first risk group or the first and second risk groups.

이상 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. medium), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes generated by a compiler. A hardware device may be modified with one or more software modules to perform processing according to the present invention and vice versa.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, "필수적인", "중요하게" 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.Specific implementations described in the present invention are examples and do not limit the scope of the present invention in any way. For brevity of the specification, description of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection of lines or connecting members between the components shown in the drawings are examples of functional connections and / or physical or circuit connections, which can be replaced in actual devices or additional various functional connections, physical connection, or circuit connections. In addition, if there is no specific reference such as "essential" or "important", it may not necessarily be a component necessary for the application of the present invention.

또한 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술할 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정하여져야만 할 것이다.In addition, the detailed description of the present invention described has been described with reference to preferred embodiments of the present invention, but those skilled in the art or those having ordinary knowledge in the art will find the spirit of the present invention described in the claims to be described later. And it will be understood that the present invention can be variously modified and changed without departing from the technical scope. Therefore, the technical scope of the present invention is not limited to the contents described in the detailed description of the specification, but should be defined by the claims.

10 : 안티멀웨어 클라우드 서버
20 : 사용자단말
21 : 개인 사용자단말
22 : 기관 사용자단말
30 : 수집서버
100 : 악성코드 탐지 시스템
110 : 멀웨어 진단부
111 : 정적진단부
112 : 동적진단부
112a : 가상 머신 분석부
112b : 리얼 머신 분석부
120 : 신경망 모듈
130 : 데이터 입력모듈
140 : 데이터 출력모듈
150 : 신경망 학습부
200 : 통신부
300 : 데이터베이스
310 : 그룹DB
320 : 개인DB
330 ; 관계DB
400 : 메모리부
500 : 데이터 보정모듈
600 : 애플리케이션10 : Antimalware Cloud Server
20: user terminal
21: personal user terminal
22: institutional user terminal
30: collection server
100: malware detection system
110: malware diagnosis unit
111: static diagnosis unit
112: dynamic diagnosis unit
112a: virtual machine analysis unit
112b: real machine analysis unit
120: neural network module
130: data input module
140: data output module
150: neural network learning unit
200: Ministry of Communication
300: database
310: group DB
320: Personal DB
330; Relation DB
400: memory unit
500: data correction module
600: application

Claims

In the anti-malware cloud server including a malicious code detection system,
The malicious code detection system,
A data input module that receives an analysis target file;
A neural network module including a Long Short-Term Memory (LSTM) neural network;
a neural network learning unit for learning the LSTM neural network using the input characteristics of the malware; and
a malware diagnosis unit for diagnosing whether or not malware is included by inputting the data of the file to be analyzed into the LSTM neural network;
including,
The malware diagnosis unit,
A static diagnosis unit for diagnosing the data of the analysis target file by static analysis and a dynamic diagnosis unit for diagnosis by dynamic analysis by executing the analysis target file;
The static diagnosis unit diagnoses whether or not malware is included through the LSTM neural network,
The dynamic diagnosis unit,
Including a virtual machine analysis unit for diagnosing the analysis target file by dynamic analysis to detect malware in a plurality of virtual machine (VM) environments;
The anti-malware cloud server,
the malicious code detection system;
A communication unit for receiving analysis target files from a user terminal through a network;
a memory unit for temporarily storing the received analysis target file until analysis;
a database storing analysis results respectively diagnosed by the static diagnosis unit and the dynamic diagnosis unit; and
A data correction module for post-processing the analysis results stored in the database;
including,
The data correction module,
For the data that has been subjected to static and dynamic analysis, post-processing is performed to store the risk level according to the type of malware as related data.
The database includes a plurality of storage devices,
A plurality of the memory devices,
A group DB in which data sets are constructed for each group for a plurality of group data; A personal DB in which data sets are constructed for each user for a plurality of user data; and a relationship DB in which information on groups to which each user belongs is stored.
including,
In the group DB, a plurality of groups including at least one of institutions, corporations, government bodies, and schools are classified by an in-house intranet,
The anti-malware cloud server, characterized in that the relation data between groups is stored in the relation DB.

delete

According to claim 1,
The virtual machine analysis unit,
An anti-malware cloud server characterized in that some of the virtual machines run different operating systems (OS) and collect information by executing analysis target files on each operating system.

According to claim 1,
The dynamic diagnosis unit,
Further comprising a real machine analysis unit for diagnosing the analysis target file by dynamic analysis to detect malware in at least one real machine environment,
The virtual machine analysis unit,
Detect whether a detour log exists in the file to be analyzed,
The real machine analysis unit,
The anti-malware cloud server, characterized in that for receiving an analysis target file including a detour log for bypassing the virtual machine analysis unit from the virtual machine analysis unit.

According to claim 1,
The neural network learning unit,
Collect information on normal files and malicious files including malware, analyze the characteristics of the normal files and malicious files, and train the LSTM neural network,
The characteristics of the normal file and the malicious file include at least one or more of header information, size and packing information, Import API, Export API, file type and size.

delete

According to claim 1,
The memory unit has a predetermined memory,
When the usage rate is greater than N ₁ % compared to the maximum amount of data in the memory unit,
The anti-malware cloud server, characterized in that for processing the files temporarily stored in the memory unit through the following steps.
a) determining whether there is a group belonging to the user who uploaded the file;
b) if there is a group matched in step a, searching for previously detected malware in the corresponding group;
c) statically diagnosing files by specifying only previously detected malware when there is already detected malware in the corresponding group;
(Where 100 > N ₁ > 0)

According to claim 10,
The anti-malware cloud server, characterized in that, in preparation for the maximum amount of data in the memory unit, processing the files temporarily stored in the memory unit through the following steps when the usage rate is greater than N ₂ % and smaller than N ₁ %.
a) determining whether there is a group belonging to the user who uploaded the file;
b) if there is a group matched in step a, searching for previously detected malware in the corresponding group;
c) statically diagnosing files by specifying only previously detected malware when there is already detected malware in the corresponding group;
d) if there is no previously detected malware in the corresponding group, searching for at least one association group related to the corresponding group in a relational DB; and
e) statically diagnosing files by specifying only previously detected malware when there is previously detected malware in the searched association group;
(Where 100 > N ₁ > N ₂ > 0)

According to claim 11,
The anti-malware cloud server, wherein N ₂ is 30 or more and 50 or less, and N ₁ is 60 or more and 80 or less.