KR102213460B1

KR102213460B1 - System and method for generating software whistlist using machine run

Info

Publication number: KR102213460B1
Application number: KR1020190030449A
Authority: KR
Inventors: 최승환; 김정민; 송영빈
Original assignee: 주식회사 위젯누리
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2021-02-08
Also published as: KR102213460B9; KR20200115730A

Abstract

본 발명의 일 실시예에 따라 사용자 단말에 의해 수행되는, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 방법에 있어서, (a) 사용자 단말에 저장중인 소프트웨어가 포함한 소프트웨어 파일을 인식하고, 소프트웨어 파일로부터 속성 정보를 추출하는 단계; (b) 속성 정보에 기초하여 소프트웨어 파일에 대한 기 설정된 머신러닝 기법을 수행하여 신뢰도 정보를 산출하는 단계; (c) 신뢰도 정보에 기초하여 소프트웨어의 등급을 산출하고, 등급을 기반으로 화이트리스트를 생성하는 단계; 및 (d) 화이트리스트를 다른 사용자 단말로 제공 또는 다른 사용자 단말로부터 생성된 화이트리스트를 수신하여 화이트리스트를 업데이트하는 단계를 포함하, 속성 정보는 정적 데이터, 동적 데이터, 휴리스틱 데이터 및 사회공학적 데이터를 포함하는 방법이다.In a method for generating a software whitelist using machine learning performed by a user terminal according to an embodiment of the present invention, (a) a software file included in the software being stored in the user terminal is recognized, and attribute information is retrieved from the software file. Extracting; (b) calculating reliability information by performing a preset machine learning technique on the software file based on attribute information; (c) calculating a rating of software based on the reliability information, and generating a white list based on the rating; And (d) providing a white list to another user terminal or receiving a white list generated from another user terminal to update the white list, wherein the attribute information includes static data, dynamic data, heuristic data, and social engineering data. This is how to include.

Description

A system and method for creating a software white list using machine learning {SYSTEM AND METHOD FOR GENERATING SOFTWARE WHISTLIST USING MACHINE RUN}

본 발명은 단말 장치에서 구동되는 소프트웨어 중 안전한 소프트웨어로 인식하여 인증하기 위한 화이트리스트를 생성하는 시스템 및 그 방법에 관한 것이다.The present invention relates to a system and method for generating a white list for authentication by recognizing it as safe software among software running in a terminal device.

일반적으로 화이트리스트에 등록하여 관리하는 방법은 사용자의 주관적인 판단 및 디지털서명과 같은 단순한 방법으로 화이트리스트에 등록한다. 즉 종래의 화이트리스 등록 방법은 기존에 사용하던 소프트웨어이거나 안티 바이러스 제품을 이용하여 검출되지 않은 것들을 화이트리스트에 등록한다.In general, a method of registering and managing a white list is registered in a white list by a simple method such as a user's subjective judgment and digital signature. That is, the conventional whiteless registration method registers software that has been used in the past or those that have not been detected using an anti-virus product in the white list.

특히, 종래의 화이트리스트 등록 방법은 객관적인 지표 없이 사용자 관점에서 사용하거나 평판으로 판단하거나 응용소프트웨어 인증기관의 디지털 서명을 확인하고 안티 바이러스 제품으로 검사하여 이상이 없는 것들에 한해 화이트리스트 등록을 판단하고 있다.In particular, the conventional whitelist registration method is used from the user's point of view without objective indicators, or judged by reputation, or by checking the digital signature of an application software certification authority and inspecting it with an anti-virus product to determine whitelist registration only for those that are not abnormal. .

그러나 이러한 방법은 사용자의 의존도가 높고 처리능력이 낮으며 전문성을 필요로 한다. 또한, 안티 바이러스에 검출되지 않은 신종 악성 소프트웨어 및 변종 악성 소프트웨어가 등록될 수 있고 디지털 서명이 되어 있지 않은 정상 소프트웨어도 있어 화이트리스트에 등록하는데 객관화와 자동화 하기 어려움이 있다.However, this method has high user dependence, low processing power, and requires expertise. In addition, new malicious software and variant malicious software that are not detected by anti-virus can be registered, and there are normal software that are not digitally signed, so it is difficult to objectify and automate registration in whitelist.

일반적으로 블랙리스트기반의 악성코드 탐지기법을 사용하는 경우, 이미 시그니처, 평판기반의 악성코드 탐지기법으론 랜섬웨어와 같은 악성 소프트웨어를 탐지하지 못하였으며 이를 보완한 행위기반도 패턴의 다양화로 인해 탐지하기가 어려워 지고 있기 때문에 화이트리스트 기반으로 가고 있으나 화이트리스트를 등록하고 관리하는데 어려움이 있다.In general, when a blacklist-based malicious code detection method is used, the signature and reputation-based malicious code detection method has not already detected malicious software such as ransomware, and the behavior-based supplementing this is also detected due to diversification of patterns. It is going to be based on a white list, but it is difficult to register and manage the white list.

또한 단순한 화이트리스트 허용 또는 블랙리스트 차단은 악성 소프트웨어의 화이트리스트 등록으로 보안 위협이 크거나 정상 소프트웨어의 블랙리스트 등록으로 사용자의 불편함이 커질 수 있다.In addition, allowing simple whitelisting or blocking blacklisting may increase security threats due to whitelist registration of malicious software, or increase user discomfort due to blacklist registration of normal software.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 소프트웨어를 구성하는 파일을 인식하고, 인식된 소프트웨어 파일의 특징 및 행위정보를 수집하여 수집된 정보에서 사전에 정의된 특징을 추출 후 기계학습을 이용하여 신뢰도에 대한 클러스터링을 수행하게 된다. 이후 신뢰도에 대한 클러스터링에 기초하여 소프트웨어를 등급별로 화이트리스트에 자동으로 등록, 차단 및 삭제에 대응하는 시스템을 구현하는 것을 목적으로 한다.The present invention is to solve the problems of the prior art described above, recognizes a file constituting software, collects feature and behavior information of the recognized software file, and extracts a pre-defined feature from the collected information, then machine learning Clustering for reliability is performed using. After that, the object is to implement a system that automatically registers, blocks, and deletes software in the whitelist for each grade based on reliability clustering.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따라 사용자 단말에 의해 수행되는, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 방법에 있어서, (a) 사용자 단말에 저장중인 소프트웨어가 포함한 소프트웨어 파일을 인식하고, 소프트웨어 파일로부터 속성 정보를 추출하는 단계; (b) 속성 정보에 기초하여 소프트웨어 파일에 대한 기 설정된 머신러닝 기법을 수행하여 신뢰도 정보를 산출하는 단계; (c) 신뢰도 정보에 기초하여 소프트웨어의 등급을 산출하고, 등급을 기반으로 화이트리스트를 생성하는 단계; 및 (d) 화이트리스트를 다른 사용자 단말로 제공 또는 다른 사용자 단말로부터 생성된 화이트리스트를 수신하여 화이트리스트를 업데이트하는 단계를 포함하, 속성 정보는 정적 데이터, 동적 데이터, 휴리스틱 데이터 및 사회공학적 데이터를 포함하는 방법일 수 있다.As a technical means for achieving the above-described technical problem, in a method for generating a software white list using machine learning, performed by a user terminal according to an embodiment of the present invention, (a) the software being stored in the user terminal includes Recognizing a software file and extracting attribute information from the software file; (b) calculating reliability information by performing a preset machine learning technique on the software file based on attribute information; (c) calculating a rating of software based on the reliability information, and generating a white list based on the rating; And (d) providing a white list to another user terminal or receiving a white list generated from another user terminal to update the white list, wherein the attribute information includes static data, dynamic data, heuristic data, and social engineering data. It may be a method including.

또한, 기 설정된 머신러닝 기법은, 속성 정보 중 정적 데이터에 대해서 딥러닝 CNN알고리즘 및 머신러닝 word2vec 알고리즘을 혼합하여 적용하고, 동적 데이터, 휴리스틱 데이터 및 사회공학적 데이터에 대해서 지도학습 random forest 알고리즘 및 비지도학습 gan 알고리즘을 적용하여 결과값을 산출하여, 결과값을 k-mean 알고리즘을 통해 클러스터링을 수행하여 신뢰도 정보를 산출하는 방법일 수 있다.In addition, the preset machine learning technique is applied by mixing deep learning CNN algorithm and machine learning word2vec algorithm for static data among attribute information, and supervised learning random forest algorithm and unsupervised learning for dynamic data, heuristic data and social engineering data. It may be a method of calculating a result value by applying a learning gan algorithm, and calculating reliability information by performing clustering of the result value through a k-mean algorithm.

또한, 정적 데이터는 검증 대상 소프트웨어의 디지털 서명 보유 유무, 파일 설명(description) 보유 유무, 파일 버전의 최신 여부, 제품 버전의 최신 여부, 제품 이름 보유유무, 저작권 보유 유무 중 적어도 하나를 포함하며, 동적 데이터는 검증 대상 소프트웨어가 실행 시 산출되는 프로세스 정보, 메모리 상태, 설치 경로, 검증 대상 소프트웨어의 실행 및 사용 이력, 등록현황 및 검증 대상 소프트웨어의 실행 상태 정보 중 적어도 하나를 포함하며, 휴리스틱 데이터는 검증 대상 소프트웨어의 메모리, 용량, CPU점유율, 네트워크, 파일 입출력, 스레드 사용량 및 임계치 증가율을 포함하는 비정상 행위에 대한 데이터 중 적어도 하나를 포함하며, 사회공학적 데이터는 검증 대상 소프트웨어의 입력장치에 의한 적어도 한번 이상의 실행 여부, 파일 이름, 제품 이름 및 저작권 정보의 해석 가능 여부, 실행 시간의 기준 기간 경과 여부, 사용률이 기준 값 이상인지 여부, 및 기준 퍼센테이지 이상의 사용자 인증 여부 중 적어도 하나를 포함하는 방법일 수 있다.In addition, the static data includes at least one of the presence or absence of a digital signature of the software to be verified, the presence of a file description, the latest version of the file, the latest version of the product, the existence of a product name, and the presence or absence of a copyright. The data includes at least one of process information calculated when the software to be verified is executed, memory status, installation path, execution and use history of the software to be verified, registration status, and execution status information of the software to be verified, and the heuristic data is subject to verification. Contains at least one of data on abnormal behavior including memory, capacity, CPU share, network, file input/output, thread usage, and threshold increase rate of the software, and social engineering data is executed at least once by the input device of the software to be verified It may be a method including at least one of whether or not, whether the file name, product name, and copyright information can be interpreted, whether the reference period of the execution time has elapsed, whether the usage rate is greater than or equal to the reference value, and whether or not user authentication is greater than or equal to the reference percentage.

또한, 신뢰도 정보에 기초하여 분류된 클러스터링에 의해 기 설정된 개수로 등급을 나누고, 신뢰도 정보가 높을수록 높은 등급에 매칭하는 방법일 수 있다.In addition, it may be a method of dividing the grade by a preset number by clustering classified based on the reliability information, and matching with a higher grade as the reliability information is higher.

또한, 등급에 기초하여, 소프트웨어의 실행, 네트워크 연결, 파일 i/o, 자식 프로세스 생성 및 폴더 접근권한 중 적어도 어느 하나의 기능을 차등하여 적용하되, 등급이 높으면 더 많은 기능이 적용되는 방법일 수 있다.In addition, based on the rating, at least one of software execution, network connection, file i/o, child process creation, and folder access rights are applied differentially, but a higher rating may be a method in which more functions are applied. have.

또한, 화이트리스트에 등록이 완료된 소프트웨어 파일에 대하여 사용자 단말 내에 임의의 저장 영역에 백업 데이터를 생성할 수 있다.In addition, backup data may be generated in an arbitrary storage area in the user terminal for the software file registered in the white list.

또한, 화이트리스트에 등록이 완료된 소프트웨어 파일에 포함되는 특정 파일이 수정되면, 특정 파일의 백업 데이터와 수정된 특정 파일을 비교하여 비교값을 산출하고, 비교값이 기 설정된 값 미만으로 산출되면 화이트리스트에 등록된 소프트웨어를 삭제하고, 비교값이 기 설정된 값 이상으로 산출되면 상기 화이트리스트를 업데이트할 수 있다.In addition, when a specific file included in a software file that has been registered in the white list is modified, a comparison value is calculated by comparing the backup data of the specific file with the modified specific file, and when the comparison value is less than a preset value, the white list When the software registered in is deleted and the comparison value is calculated to be greater than or equal to a preset value, the white list may be updated.

또한, 화이트리스트에서 제거된 소프트웨어에 있어서 상기 (a) 단계 내지 (b) 단계를 다시 수행하여 화이트리스트를 업데이트할 수 있다.In addition, in the software removed from the white list, the white list may be updated by performing steps (a) to (b) again.

또한, 화이트리스트 공유 서버로 신뢰도 정보를 더 제공하고, 화이트리스트 공유 서버로부터 다른 사용자 단말이 생성한 신뢰도 정보를 수신하여, 머신러닝 기법에 적용할 수 있다.In addition, reliability information may be further provided to the whitelist sharing server, and reliability information generated by other user terminals may be received from the whitelist sharing server and applied to the machine learning technique.

또한, (d) 단계 이후 소프트웨어를 인증하는 방법에 있어서, (e) 사용자 단말에 저장 중인 소프트웨어 중 실행된 소프트웨어를 화이트리스트의 검증 대상 소프트웨어로서 검출하는 단계; 및 (f) 검증 대상 소프트웨어의 화이트리스트 등록 여부를 판단하는 단계를 더 포함하되, 검증 대상 소프트웨어가 화이트리스트에 등록되어 있지 않으면, (a) 단계 내지 (d) 단계를 수행할 수 있다.In addition, a method of authenticating software after step (d), the method comprising: (e) detecting an executed software among software stored in a user terminal as whitelisted software to be verified; And (f) determining whether the software to be verified is registered on the white list, but if the software to be verified is not registered on the white list, steps (a) to (d) may be performed.

또한, 머신러닝을 이용한 소프트웨어 화이트리스트 생성하는 사용자 단말에 있어서, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 프로그램이 저장된 메모리; 및 메모리에 저장된 프로그램을 실행하는 프로세서를 포함하고, 프로세서는 사용자 단말에 저장중인 소프트웨어가 포함한 소프트웨어 파일을 인식하고, 소프트웨어 파일로부터 속성 정보를 추출하고, 속성 정보에 기초하여 소프트웨어 파일에 대한 기 설정된 머신러닝 기법을 수행하여 신뢰도 정보를 산출하고, 신뢰도 정보에 기초하여 소프트웨어의 등급을 산출하고, 등급을 기반으로 화이트리스트를 생성하고, 화이트리스트를 다른 사용자 단말로 제공 또는 다른 사용자 단말로부터 생성된 화이트리스트를 수신하여 화이트리스트를 업데이트하고, 속성 정보는 정적 데이터, 동적 데이터, 휴리스틱 데이터 및 사회공학적 데이터를 포함하는 단말일 수 있다.In addition, a user terminal for generating a software white list using machine learning, comprising: a memory storing a software white list generation program using machine learning; And a processor that executes a program stored in the memory, wherein the processor recognizes a software file included in the software stored in the user terminal, extracts attribute information from the software file, and a preset machine for the software file based on the attribute information Calculates reliability information by performing a learning technique, calculates a software grade based on the reliability information, creates a white list based on the grade, and provides a white list to another user terminal or a white list created from another user terminal Receives and updates the white list, and the attribute information may be a terminal including static data, dynamic data, heuristic data, and social engineering data.

또한, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 방법을 수행하기 위한 프로그램이 기록된 컴퓨터 판독가능 저장매체일 수 있다.In addition, it may be a computer-readable storage medium in which a program for performing a method for generating a software white list using machine learning is recorded.

본 발명은 소프트웨어를 구성하는 파일을 인식하고, 인식된 소프트웨어 파일의 특징 및 행위정보를 수집하여 수집된 정보에서 사전에 정의된 특징을 추출 후 기계학습을 이용하여 신뢰도에 대한 클러스터링을 수행하게 된다. 이후 신뢰도에 대한 클러스터링에 기초하여 소프트웨어를 등급별로 화이트리스트에 자동으로 등록, 차단 및 삭제에 대응하는 시스템을 구현할 수 있다.The present invention recognizes a file constituting software, collects feature and behavior information of the recognized software file, extracts a predefined feature from the collected information, and performs clustering for reliability using machine learning. Thereafter, a system corresponding to automatic registration, blocking, and deletion of software in the whitelist for each grade can be implemented based on clustering for reliability.

도 1은 본 발명의 일 실시예에 따른, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 시스템의 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른, 사용자 단말의 구성을 나타낸 도면이다.
도 3은 본 발명의 일 실시예에 따른, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 방법을 나타낸 동작흐름도이다.1 is a diagram showing the configuration of a software white list generation system using machine learning according to an embodiment of the present invention.
2 is a diagram showing the configuration of a user terminal according to an embodiment of the present invention.
3 is an operation flow diagram illustrating a method of generating a software white list using machine learning according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

이하의 실시예는 본 발명의 이해를 돕기 위한 상세한 설명이며, 본 발명의 권리 범위를 제한하는 것이 아니다. 따라서 본 발명과 동일한 기능을 수행하는 동일 범위의 발명 역시 본 발명의 권리 범위에 속할 것이다.The following examples are detailed descriptions to aid understanding of the present invention, and do not limit the scope of the present invention. Accordingly, the invention of the same scope performing the same function as the present invention will also belong to the scope of the present invention.

도 1은 본 발명의 일 실시예에 따른, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 시스템의 구성을 나타낸 도면이다.1 is a diagram showing the configuration of a software white list generation system using machine learning according to an embodiment of the present invention.

도 1을 참조하면, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 시스템(1)은 사용자 단말(100)과 서버(200)로 구성되고, 통신망을 통해 상호 연결되어 있다.Referring to FIG. 1, a software white list generation system 1 using machine learning is composed of a user terminal 100 and a server 200, and are interconnected through a communication network.

이때, 사용자 단말(100)은 사용자의 단말로서 머신러닝을 이용한 소프트웨어 화이트리스트 생성하는 단말을, 서버(200)는 화이트리스트 공유를 위한 장치를 의미할 수 있다.In this case, the user terminal 100 may refer to a terminal for generating a software white list using machine learning as a user terminal, and the server 200 may refer to a device for whitelist sharing.

본 발명의 일 실시예에 따라, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 시스템(1)에서사용자 단말(100)은 저장중인 소프트웨어가 포함한 소프트웨어 파일을 인식하고, 소프트웨어 파일로부터 속성 정보를 추출하게 된다.According to an embodiment of the present invention, in the software white list generation system 1 using machine learning, the user terminal 100 recognizes a software file included in the software being stored, and extracts attribute information from the software file.

또한, 속성 정보에 기초하여 소프트웨어 파일에 대한 기 설정된 머신러닝 기법을 수행하여 신뢰도 정보를 산출하고, 산출된 신뢰도 정보에 기초하여 소프트웨어 등급을 정하게 된다.In addition, reliability information is calculated by performing a machine learning technique set in advance for a software file based on attribute information, and a software grade is determined based on the calculated reliability information.

마지막으로 등급을 기반으로 화이트리스트를 생성하고, 서버(200)로 화이트리스트를 공유하거나 다른 사용자 단말(100)이 생성한 화이트리스트를 수신하여 사용자 단말(100)에 저장된 화이트리스트를 업데이트하는 것을 특징으로 한다.Finally, a white list is created based on the rating, and the white list stored in the user terminal 100 is updated by sharing the white list with the server 200 or receiving the white list generated by another user terminal 100 To do.

이때, 속성 성보란 정적 데이터, 동적 데이터, 휴리스틱 데이터 및 사회공학적 데이터 중 적어도 하나 이상을 포함하는 정보로 이를 상세하게 서술하면 아래와 같다.In this case, the attribute report refers to information including at least one of static data, dynamic data, heuristic data, and social engineering data, and is described in detail as follows.

첫 번째로 정적 데이터는 검증 대상 소프트웨어의 디지털 서명 보유 유무, 파일 설명(description) 보유 유무, 파일 버전의 최신 여부, 제품 버전의 최신 여부, 제품 이름 보유유무, 저작권 보유 유무 중 적어도 하나를 포함하는 정보일 수 있다.First, static data is information that includes at least one of whether the software to be verified has a digital signature, whether a file description is held, whether the file version is up-to-date, whether the product version is up-to-date, whether a product name is held, or whether a copyright is held. Can be

두 번째로 동적 데이터는 검증 대상 소프트웨어가 실행 시 산출되는 프로세스 정보, 메모리 상태, 설치 경로, 검증 대상 소프트웨어의 실행 및 사용 이력, 등록현황 및 검증 대상 소프트웨어의 실행 상태 정보 중 적어도 하나를 포함하는 정보일 수 있다.Secondly, the dynamic data is information including at least one of process information calculated when the software to be verified is executed, memory status, installation path, execution and use history of the software to be verified, registration status, and execution status information of the software to be verified. I can.

세 번째로 휴리스틱 데이터는 검증 대상 소프트웨어의 메모리, 용량, CPU점유율, 네트워크, 파일 입출력, 스레드 사용량 및 임계치 증가율을 포함하는 비정상 행위에 대한 데이터 중 적어도 하나를 포함하는 정보일 수 있다.Thirdly, the heuristic data may be information including at least one of data on abnormal behavior including memory, capacity, CPU occupancy rate, network, file input/output, thread usage, and threshold increase rate of the software to be verified.

네 번째로 사회공학적 데이터는 검증 대상 소프트웨어의 입력장치에 의한 적어도 한번 이상의 실행 여부, 파일 이름, 제품 이름 및 저작권 정보의 해석 가능 여부, 실행 시간의 기준 기간 경과 여부, 사용률이 기준 값 이상인지 여부, 및 기준 퍼센테이지 이상의 사용자 인증 여부 중 적어도 하나를 포함하는 정보일 수 있다.Fourth, social engineering data is determined whether or not the software to be verified is executed at least once by the input device, whether the file name, product name and copyright information can be interpreted, whether the execution time has elapsed, and whether the usage rate is greater than the reference value And it may be information including at least one of whether or not user authentication is greater than the reference percentage.

또한, 화이트리스트에서 등급이란 사용자가 사용자 단말(100)을 이용하여 소프트웨어를 사용할 시, 어느 정도의 기능까지 사용 권한을 제공할 수 있는지 여부를 판단하는 지표가 될 수 있다.In addition, the level in the white list may be an index that determines whether or not a user can provide usage rights to a certain degree when using the software using the user terminal 100.

예를 들여, 소프트웨어가 매칭되는 등급에 기초하여, 소프트웨어의 실행, 네트워크 연결, 파일 i/o, 자식 프로세스 생성 및 폴더 접근권한들 중 적어도 하나 이상의 기능을 차등하여 제공하게 되며, 등급에 따라 제공되는 기능은 본 명세서에 서술된 기능 이외에도 다양한 기능이 적용될 수 있기에 본 발명의 범위를 제한하지는 않는다. For example, based on the matching level of the software, at least one or more of the software execution, network connection, file i/o, child process creation, and folder access rights are differentially provided. Functions are not intended to limit the scope of the present invention as various functions may be applied in addition to the functions described herein.

이때, 소프트웨어에 매칭되는 등급이 높으면, 더 많은 기능이 중복되어 적용될 수 있다.At this time, if the rating matching the software is high, more functions can be duplicated and applied.

반대로 등급이 낮을수록 어플리케이션은 소프트웨어에 대한 사용 권한을 제한하게 되며, 소프트웨어가 사용자 단말(100)에 해를 끼치는 것으로 판단하는 경우 차단하거나 삭제하는 기능이 제공될 수도 있다. Conversely, the lower the rating, the more the application restricts the right to use the software, and when it is determined that the software is harmful to the user terminal 100, a function of blocking or deleting may be provided.

사용자 단말(100)은 머신러닝을 이용한 소프트웨어 화이트리스트 생성 어플리케이션을 통해 소프트웨어의 화이트리스트 등록을 수행하게 된다.The user terminal 100 performs whitelist registration of software through a software whitelist generation application using machine learning.

머신러닝을 이용한 소프트웨어 화이트리스트 생성 어플리케이션은 사용자 단말(100)에 내장된 어플리케이션이거나, 어플리케이션 배포 서버로부터 다운로드되어 사용자 단말(100)에 설치된 어플리케이션일 수 있다.The software whitelist generation application using machine learning may be an application embedded in the user terminal 100 or an application downloaded from an application distribution server and installed in the user terminal 100.

복수 개의 사용자 단말(100)들은 유무선 통신 환경에서 단말 어플리케이션을 이용할 수 있는 통신 단말기를 의미한다. 여기서 사용자 단말(100)은 사용자의 휴대용 단말기일 수 있다. 도 1에서는 사용자 단말(100)이 휴대용 단말기의 일종인 스마트폰(smart phone)으로 도시되었지만, 본 발명의 사상은 이에 제한되지 아니하며, 상술한 바와 같이 단말 어플리케이션을 탑재할 수 있는 단말에 대해서 제한 없이 차용될 수 있다.The plurality of user terminals 100 refer to communication terminals that can use a terminal application in a wired or wireless communication environment. Here, the user terminal 100 may be a user's portable terminal. In FIG. 1, the user terminal 100 is shown as a smart phone, which is a kind of portable terminal, but the spirit of the present invention is not limited thereto, and as described above, a terminal capable of mounting a terminal application is not limited. Can be borrowed.

이를 더욱 상세히 설명하면, 사용자 단말(100)은 핸드헬드 컴퓨팅 디바이스(예를 들면, PDA, 이메일 클라이언트 등), 핸드폰의 임의의 형태, 또는 다른 종류의 컴퓨팅 또는 커뮤니케이션 플랫폼의 임의의 형태를 포함할 수 있으나, 본 발명이 이에 한정되는 것은 아니다. To explain this in more detail, the user terminal 100 may include a handheld computing device (e.g., PDA, email client, etc.), any form of a mobile phone, or any form of another kind of computing or communication platform. However, the present invention is not limited thereto.

서버(200)는 화이트리스트를 공유하기 위한 서버로서, 다른 사용자 단말(100)을 통해 설정된 화이트리스트를 서버(200)를 통해 유할 수 있게 된다.The server 200 is a server for sharing a white list, and a white list set through another user terminal 100 can be shared through the server 200.

한편, 통신망은 사용자 단말(100)과 서버(200)들을 연결하는 역할을 수행한다. 즉, 통신망은 사용자 단말(100)들이 서버(200)에 접속한 후 데이터를 송수신할 수 있도록 접속 경로를 제공하는 통신망을 의미한다. 통신망은 예컨대 LANs(Local Area Networks), WANs(Wide Area Networks), MANs(Metropolitan Area Networks), ISDNs(Integrated Service Digital Networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Meanwhile, the communication network serves to connect the user terminal 100 and the server 200. That is, the communication network refers to a communication network that provides a connection path so that the user terminals 100 can transmit and receive data after accessing the server 200. Communication networks are wired networks such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), ISDNs (Integrated Service Digital Networks), and wireless networks such as wireless LANs, CDMA, Bluetooth, and satellite communications. However, the scope of the present invention is not limited thereto.

도 2는 본 발명의 일 실시예에 따른, 사용자 단말의 구성을 나타낸 도면이다.2 is a diagram showing the configuration of a user terminal according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 사용자 단말(100)은 통신 모듈(110), 메모리(120) 및 프로세서(130)를 포함한다.Referring to FIG. 2, a user terminal 100 according to an embodiment of the present invention includes a communication module 110, a memory 120, and a processor 130.

상세히, 통신 모듈(110)은 통신망과 연동하여 사용자 단말(100)과 서버(200) 간의 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공한다. 나아가, 통신 모듈(110)은 서버(200)로부터 데이터 요청을 수신하고, 이에 대한 응답으로서 데이터를 송신하는 역할을 수행할 수 있다.In detail, the communication module 110 provides a communication interface necessary for providing a transmission/reception signal between the user terminal 100 and the server 200 in the form of packet data in connection with a communication network. Furthermore, the communication module 110 may receive a data request from the server 200 and transmit data as a response thereto.

여기서, 통신 모듈(110)은 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다.Here, the communication module 110 may be a device including hardware and software necessary for transmitting and receiving a signal such as a control signal or a data signal through a wired or wireless connection with another network device.

메모리(120)는 머신러닝을 이용한 소프트웨어 화이트리스트 생성 방법을 수행하기 위한 프로그램이 기록된다. 또한, 프로세서(130)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행한다. 여기서, 메모리(120)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.In the memory 120, a program for performing a software white list generation method using machine learning is recorded. In addition, it performs a function of temporarily or permanently storing data processed by the processor 130. Here, the memory 120 may include a magnetic storage medium or a flash storage medium, but the scope of the present invention is not limited thereto.

또한, 메모리(120)에는 화이트리스트를 포함한 화이트리스트 생성에 필요한 정보가 저장될 수 있다. 이는 에컨대, 소프트웨어 파일과 상기 소프트웨어 파일에서 추출된 속성 정보 및 학습이 수행된 머신러닝에 대한 정보가 추가로 저장될 수 있다.In addition, information necessary for generating a white list including a white list may be stored in the memory 120. For example, a software file, attribute information extracted from the software file, and information on machine learning in which learning has been performed may be additionally stored.

이때, 화이트리스트, 소프트웨어 파일, 속성 정보 등의 구체적인 내용은 도 1을 참조하여 상술한 내용과 같다.In this case, detailed contents such as white list, software file, attribute information, etc. are the same as those described above with reference to FIG. 1.

프로세서(130)는 일종의 중앙처리장치로서 머신러닝을 이용한 소프트웨어 화이트리스트 생성하는 전체 과정을 제어한다. 프로세서(130)가 수행하는 각 단계에 대해서는 도 3을 참조하여 후술하기로 한다.The processor 130, as a kind of central processing unit, controls the entire process of generating a software white list using machine learning. Each step performed by the processor 130 will be described later with reference to FIG. 3.

여기서, 프로세서(130)는 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the processor 130 may include all types of devices capable of processing data, such as a processor. Here, the'processor' may refer to a data processing device embedded in hardware having a circuit physically structured to perform a function represented by a code or instruction included in a program. As an example of a data processing device built into the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, and an application-specific integrated (ASIC) circuit) and processing devices such as field programmable gate arrays (FPGAs), but the scope of the present invention is not limited thereto.

도 3은 본 발명의 일 실시예에 따른, 머신러닝을 이용한 소프트웨어 화이트리스트 생성 방법을 나타낸 동작흐름도이다.3 is an operation flow diagram illustrating a method of generating a software white list using machine learning according to an embodiment of the present invention.

도 3을 참조하면, 머신러닝을 이용하여 소프트웨어 화이트리스트를 생성하기 위해서는 사용자 단말(100)이 소프트웨어의 파일을 인식하고, 속성 정보를 추출한다(S310).Referring to FIG. 3, in order to generate a software white list using machine learning, the user terminal 100 recognizes a file of software and extracts attribute information (S310).

단계(S310)에서 머신러닝을 이용하여 소프트웨어 화이트리스트를 생성 어플리케이션은 사용자 단말(100)의 메모리(120)에 저장중인 소프트웨어가 포함하는 소프트웨어 파일을 인식하게 된다. The application that generates a software white list using machine learning in step S310 recognizes a software file included in the software being stored in the memory 120 of the user terminal 100.

이때, 추가 실시예로 소프트웨어를 구성하는 파일이 아닌 소프트웨어 그 자체를 인식하여 속성 정보를 추출할 수 있다.In this case, in an additional embodiment, the attribute information may be extracted by recognizing the software itself, not the file constituting the software.

단계(S310)에서 추출된 속성 정보를 통해 머신러닝을 학습하고, 소프트웨어 파일의 신뢰도 정보를 산출한다(S320).Machine learning is learned through the attribute information extracted in step S310, and reliability information of the software file is calculated (S320).

단계(S310)에서 추출된 속성 정보는 종류에 따라 상이한 알고리즘이 적용되어 소프트웨어 파일에 대한 신뢰도 정보를 산출하게 된다.In the attribute information extracted in step S310, a different algorithm is applied according to the type to calculate reliability information for the software file.

예컨대, 정적 데이터는 딥러닝 CNN알고리즘 및 머신러닝 word2vec 알고리즘을 혼합하여 적용하여 신뢰도 정보를 산출하게 된다.For example, static data is applied by mixing a deep learning CNN algorithm and a machine learning word2vec algorithm to calculate reliability information.

동적 데이터, 휴리스틱 데이터 및 사회공학적 데이터에 대해서는 지도학습 random forest 알고리즘과 비지도학습 GAN을 수행하여 신뢰도 정보를 산출하게 된다.Reliability information is calculated by performing supervised learning random forest algorithm and unsupervised learning GAN for dynamic data, heuristic data and social engineering data.

또한, 서로 다른 속성 정보에 대해 다양한 알고리즘을 통해 산출되는 신뢰도에 대한 결과값에 대해서는, 그 결과값을 k-mean 알고리즘을 통해 클러스터링을 수행하여 신뢰도 정보를 최종적으로 산출하게 되는 것이다.In addition, for the result values for the reliability calculated through various algorithms for different attribute information, the reliability information is finally calculated by clustering the result values through the k-mean algorithm.

신뢰도 정보에 기초하여 소프트웨어의 등급 산출 및 화이트리스트를 생성한다(S330).Based on the reliability information, the software grade is calculated and a white list is generated (S330).

단계(S320)에서 산출된 신뢰도 정보에 기초하여 분류된 클러스터링에 의해 기 설정된 개수로 등급을 나누게 된다. The grade is divided by a preset number by clustering classified based on the reliability information calculated in step S320.

예를 들어, 클러스터링을 7가지 분류로 나누게 된다면, 등급도 7가지의 등급으로 나뉘게 되는 것이다.For example, if clustering is divided into 7 classifications, the grade is also divided into 7 grades.

또한, 등급은 신뢰도 정보에 따라 나뉘어지되, 소프트웨어 파일에 대해 신뢰도 정보가 높을수록 높은 등급을 비례하여 매칭하게 된다.In addition, the grade is divided according to the reliability information, but the higher the reliability information for the software file, the higher the grade is proportionally matched.

이때, 앞서 서술한 바와 같이 각 등급에 따라 주어지는 권한이 달라지게 된다. 예를 들어, 1등급은 위험관찰, 2등급은 경계, 3등급은 주의, 4등급은 보통, 5등급은 안전, 6등급은 보안, 7등급은 보호로 지정하는 것과 같은 것이다.At this time, as described above, the authority given to each level varies. For example, level 1 is risk observation, level 2 is alert, level 3 is caution, level 4 is normal, level 5 is safety, level 6 is security, and level 7 is protection.

단계(S330)에서 생성된 화이트리스트를 서버(200)로 제공하고, 다른 사용자 단말(100)과 공유한다(S340).The white list generated in step S330 is provided to the server 200 and is shared with the other user terminal 100 (S340).

화이트리스트에 소프트웨어의 등록이 완료되면 소프트웨어 파일에 대한 정보를 사용자 단말(100) 내의 임의의 저장 영역에 백업 데이터를 생성할 수 있다. 해당 정보는 소프트웨어 자체 정보나, 이용 기록, 변형 로그 등이 될 수 있다.When the registration of the software in the white list is completed, information on the software file may be generated in a storage area of the user terminal 100. The information may be software's own information, usage records, and transformation logs.

만약, 단계(S340) 이후, 생성된 화이트리스트에 등록이 완료된 소프트웨어 파일에 포함된 특정 파일이 수정된다면, 특정 파일의 백업 데이터와 수정된 특정 파일을 비교하게 된다.If, after step S340, a specific file included in the registered software file in the generated white list is modified, the backup data of the specific file and the modified specific file are compared.

이후, 비교값이 산출되어, 화이트리스트의 기 설정된 등록 범위를 벗어나는지 여부를 확인하게 된다. 만약, 비교값이 기 설정된 값 미만으로 산출되면 화이트리스트에서 해당 소프트웨어를 삭제하게 되고, 반대로 비교값이 기 설정된 값 이상이면 화이트리스트를 업데이트하게 된다.Thereafter, the comparison value is calculated, and it is checked whether or not it is out of the preset registration range of the white list. If the comparison value is calculated to be less than the preset value, the software is deleted from the white list. Conversely, if the comparison value is higher than the preset value, the white list is updated.

이때, 선택적 실시예로 변형되어 화이트리스트에서 삭제된 소프트웨어에 대해 단계(S310) 내지 단계(S340)을 다시 수행하여 화이트리스트를 업데이트할 수도 있다.In this case, it is possible to update the white list by performing steps S310 to S340 again for software that has been modified to an optional embodiment and deleted from the white list.

사용자 단말(100)에 기초하여 생성된 제 1 화이트리스트는 서버(200)로 제공되어 다른 사용자 단말(100)과 공유할 수 있다. 반대로 서버(200)로부터 다른 사용자 단말(100)이 생성한 제 2 화이트리스트를 수신하여 사용자 단말(100)이 생성한 제 1 화이트리스트를 업데이트할 수도 있다. 필요에 따라 서버(200)를 거치지 않고, 사용자 단말(100) 간의 통신을 통해 화이트리스트가 공유될 수도 있다.The first white list generated based on the user terminal 100 may be provided to the server 200 and may be shared with other user terminals 100. Conversely, the second white list generated by the other user terminal 100 may be received from the server 200 and the first white list generated by the user terminal 100 may be updated. If necessary, the white list may be shared through communication between the user terminals 100 without going through the server 200.

또한, 추가 실시예로 화이트리스트 공유 서버(200)로 속성 정보를 통해 산출된 신뢰도 정보를 더 제공할 수 있다. 이를 통해, 화이트리스트 공유 서버(200)로부터 사용자 단말(100)은 다른 사용자 단말(100)이 생성한 신뢰도 정보를 수신하여 머신러닝 기법에 적용할 수 있게 되는 것이다.In addition, as an additional embodiment, reliability information calculated through attribute information may be further provided to the whitelist sharing server 200. Through this, the user terminal 100 can receive the reliability information generated by the other user terminal 100 from the whitelist sharing server 200 and apply it to the machine learning technique.

한편, 추가 실시예로서 사용자 단말(100)이 다른 사용자 단말(100)로부터 수신한 제 2 화이트리스트와 사용자 단말(100)에 기 저장하고 있는 제 1 화이트리스트를 양 자간 비교하는 과정에서 동일한 소프트웨어이지만 서로 다른 등급을 가지는 경우가 발생할 수 있다. 즉, 각 사용자 별로 해당 소프트웨어를 사용하는 빈도나 시간 등에 따라 등급이 상이하게 산출될 수 있기 때문이다. 이러한 경우, 제 1 화이트리스트를 갱신할 때에 제 1 및 제 2 화이트리스트에 등록된 동일한 소프트웨어의 등급 정보 중 더 높은 등급을 기준으로 제 1 화이트리스트에 포함된 소프트웨어 등급을 갱신할 수 있다. 이는 기존 사용자가 자주 사용하던 소프트웨어의 등급이 갑자기 낮아지면서 사용에 대한 제한이 발생하는 문제점을 방지할 수 있기 때문이다.Meanwhile, as an additional embodiment, the user terminal 100 compares the second white list received from the other user terminal 100 and the first white list previously stored in the user terminal 100 between the two. There may be cases with different grades. That is, the rating may be calculated differently for each user depending on the frequency or time of using the software. In this case, when updating the first white list, the software grade included in the first white list may be updated based on a higher grade among grade information of the same software registered in the first and second white lists. This is because it is possible to prevent a problem in which restrictions on use occur when the grade of software used by existing users is suddenly lowered.

또한, 다른 추가 실시예로서, 각각의 사용자 단말(100)에서 주로 사용하는 소프트웨어의 정적 데이터, 동적 데이터, 휴리스틱 데이터, 사회공학적 데이터를 참조하여, 각각의 단말에서 어떠한 종류의 소프트웨어가 주로 사용되는지에 대한 성향을 검출할 수도 있다. 예를 들어, 동일한 사용자 단말(100)이라도, 회사에서 공용으로 사용되는 단말의 경우 문서작업 소프트웨어가 주로 사용될 것이므로 사무적 성향이라는 결론이 도출될 수 있는 반면, 일반적인 가정집에서 사용하는 단말의 경우 게임이나 동영상을 재생하는 소프트웨어 등이 주로 사용될 것이므로 엔터테인먼트 성향이라는 결론을 도출할 수 있게 된다. 이후, 이러한 사용자 성향 정보에 기초하여 유사한 성향을 갖는 단말이 생성한 화이트리스트들을 비교하여 각 단말에서 화이트리스트 갱신이 되도록 할 수 있다. 이러한 경우, 유사한 성향을 갖는 사용자 단말(100)들의 화이트리스트가 포함하는 소프트웨어 등급이 다른 성향의 사용자 단말(100)들의 화이트리스트에 포함된 소프트웨어 등급과 동일한 소프트웨어에 대한 것일지라도 등급이 다르게 규정되는 경우가 발생할 수 있기 때문이다.In addition, as another additional embodiment, with reference to static data, dynamic data, heuristic data, and social engineering data of software mainly used in each user terminal 100, it is determined what kind of software is mainly used in each terminal. You can also detect the propensity for. For example, even with the same user terminal 100, in the case of a terminal commonly used in a company, a document-working software will be mainly used, so it can be concluded that it is a clerical propensity, whereas in the case of a terminal used in a general home, a game or video Since software that plays the game will be mainly used, it is possible to draw a conclusion that it is an entertainment tendency. Thereafter, whitelists generated by terminals having similar tendencies may be compared based on the user propensity information to update the whitelists in each terminal. In this case, even if the software level included in the whitelist of the user terminals 100 having similar tendencies is for the same software as the software level included in the whitelist of the user terminals 100 having different tendencies, the rating is defined differently. This is because it can happen.

단계(S340)가 종료된 이후, 어플리케이션은 소프트웨어의 인증을 수행하게 된다.After the step (S340) is finished, the application performs software authentication.

이때, 인증을 수행하는 방법에 있어서, 사용자 단말(100)에 저장 중인 소프트웨어 중 실행된 소프트웨어를 화이트리스트의 검증 대상 소프트웨어로서 검출하게 된다.In this case, in the method of performing authentication, the executed software among software stored in the user terminal 100 is detected as the software to be verified in the white list.

이후, 검증 대상 소프트웨어의 화이트리스트 등록 여부를 판단하여 등급을 내리게 되는 것이다. 만약, 검증 대상 소프트웨어가 화이트리스트에 등록되어 있지 않으면, 단계(S310) 내지 단계(S340)를 수행하여 소프트웨어의 등급을 지정하고 화이트리스트에 등록을 수행하게 된다.After that, it is determined whether or not the software to be verified is registered on the white list, and the rating is lowered. If the software to be verified is not registered in the white list, steps S310 to S340 are performed to designate a level of the software and register it in the white list.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include a computer storage medium. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described in connection with specific embodiments, some or all of their components or operations may be implemented using a computer system having a general-purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

1: 머신러닝을 이용한 소프트웨어 화이트리스트 생성 시스템
100: 사용자 단말 200: 서버1: Software whitelist generation system using machine learning
100: user terminal 200: server

Claims

In a method for generating a software white list using machine learning performed by a user terminal,
(a) recognizing a software file included in the software being stored in the user terminal and extracting attribute information from the software file;
(b) calculating reliability information by performing a predetermined machine learning technique for the software file based on the attribute information;
(c) calculating a rating of the software based on the reliability information and generating a white list based on the rating; And
(d) providing the white list to another user terminal or receiving a white list generated from another user terminal to update the white list,
Step (d)
The reliability information is further provided to the whitelist sharing server,
Receiving the reliability information generated by another user terminal from the whitelist sharing server, applying it to the machine learning technique,
After step (d),
Based on the attribute information, a propensity for what kind of software is used in the user terminal is calculated, and the whitelists calculated by the user terminals corresponding to each other are compared, Update the whitelist,
The attribute information includes static data, dynamic data, heuristic data, and social engineering data. A method of generating a software white list using machine learning.

The method of claim 1,
The preset machine learning technique,
Among the attribute information, a deep learning CNN algorithm and a machine learning word2vec algorithm are mixed and applied to the static data,
By applying supervised learning random forest algorithm and unsupervised learning gan algorithm to the dynamic data, heuristic data, and social engineering data, a result value is calculated,
The method for generating a software white list using machine learning to calculate the reliability information by performing clustering on the result value through a k-mean algorithm.

The method of claim 1,
The static data,
It includes at least one of the presence or absence of a digital signature of the software subject to verification, the presence of a file description, the latest version of the file, the latest version of the product, the presence of a product name, and the presence of copyright, and
The dynamic data,
It includes at least one of process information calculated when the verification target software is executed, a memory state, an installation path, an execution and use history of the verification target software, a registration status, and execution state information of the verification target software,
The heuristic data,
At least one of data on abnormal behavior including memory, capacity, CPU share, network, file input/output, thread usage, and threshold increase rate of the verification target software,
The social engineering data,
Whether the software to be verified is executed at least once by the input device, whether the file name, product name, and copyright information can be interpreted, whether the execution time has elapsed, whether the usage rate is above the reference value, and user authentication of the reference percentage or more That includes at least one of whether or not, a method for generating a software white list using machine learning.

The method of claim 1,
Dividing the grade by a preset number by clustering classified based on the reliability information,
Step (c)
The higher the reliability information, the higher the grade is matched.

The method of claim 4,
Based on the rating, at least one of the software execution, network connection, file i/o, child process creation, and folder access rights are applied differentially,
If the rating is higher, more functions are applied.

The method of claim 1,
Step (d)
To generate backup data in an arbitrary storage area in the user terminal for the software file registered in the white list is completed, a software white list generation method using machine learning.

The method of claim 6,
When a specific file included in the software file registered in the white list is modified, a comparison value is calculated by comparing the backup data of the specific file with the modified specific file,
When the comparison value is calculated below a preset value, the software registered in the white list is deleted, and when the comparison value is calculated above a preset value, the white list is updated. Way.

The method of claim 7,
In the software removed from the whitelist,
The method of generating a software white list using machine learning to update the white list by performing steps (a) to (b) again.

delete

The method of claim 1,
In the method of authenticating software after step (d),
(e) detecting an executed software among software stored in the user terminal as the software to be verified in the white list; And
(f) further comprising the step of determining whether the verification target software is registered in the white list,
If the verification target software is not registered in the white list, the (a) to (d) steps are performed, a software white list generation method using machine learning.

In a user terminal generating a software white list using machine learning,
A memory in which a software whitelist generation program using machine learning is stored; And
Comprising a processor for executing a program stored in the memory,
The processor recognizes a software file included in the software stored in the user terminal, extracts attribute information from the software file, and calculates reliability information by performing a preset machine learning technique for the software file based on the attribute information. And, calculating the level of the software based on the reliability information, generating a white list based on the level, sharing the white list with a preset white list sharing server and receiving a white list generated by another user terminal , In the process of updating the white list and updating the white list, the reliability information is further provided to a white list sharing server, and the reliability information generated by another user terminal is received from the white list sharing server, and the machine After applying a learning technique and updating the white list, a propensity for what kind of software is used in the user terminal is calculated based on the attribute information, and the propensity for the user terminal corresponding to each other is calculated. By comparing the calculated white list, the white list of each user terminal is updated, and the attribute information includes static data, dynamic data, heuristic data and social engineering data, software white list using machine learning The user terminal to generate.

A computer-readable storage medium in which a program for performing the method for generating a software white list using machine learning according to claim 1 is recorded.