KR101867299B1

KR101867299B1 - Method and apparatus for determining information leakage risk

Info

Publication number: KR101867299B1
Application number: KR1020160134837A
Authority: KR
Inventors: 김명호; 이예슬
Original assignee: 숭실대학교산학협력단
Priority date: 2016-08-10
Filing date: 2016-10-18
Publication date: 2018-06-14
Also published as: KR20180018238A

Abstract

본 발명은 정보 유출 위험도 판단 방법 및 장치를 개시한다. 본 발명의 일 측면에 따른 머신 러닝을 이용한 정보 유출 위험도 판단 장치에서의 정보 유출 위험도 판단 방법은, 보안 솔루션을 통해 수집된 보안 이벤트 정보 및 사용자 정보를 포함하는 로그 데이터를 머신 러닝 알고리즘을 이용하여 학습시킴으로써, 사용자별 이벤트에 따른 위험도 값을 포함하는 머신 러닝 모델을 생성하는 단계; 및 신규 데이터가 입력되면, 상기 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단하는 단계;를 포함한다.The present invention discloses a method and apparatus for judging an information leakage risk. A method for determining an information leakage risk in an information leakage risk determination apparatus using machine learning according to an aspect of the present invention includes: learning log data including security event information and user information collected through a security solution using a machine learning algorithm; Generating a machine learning model including a risk value according to a user-specific event; And determining a risk value for information leakage using the learned machine learning model when new data is input.

Description

METHOD AND APPARATUS DETERMINING INFORMATION LEAKAGE RISK BACKGROUND OF THE INVENTION 1. Field of the Invention [

본 발명은 정보 유출 위험도 판단 방법 및 장치에 관한 것으로, 더욱 상세하게는 머신 러닝을 이용해 로그 데이터를 학습시키고, 학습된 결과를 통해 신규 데이터의 입력시 자동으로 정보 유출에 대한 위험도를 산출할 수 있는 정보 유출 위험도 판단 방법 및 장치에 관한 것이다.More particularly, the present invention relates to a method and apparatus for judging an information leakage risk, and more particularly, to a system and method for learning log data using machine learning and automatically calculating a risk of information leakage upon input of new data Information leakage risk determination method and apparatus.

기업 내에서 현재 발생하고 있는 정보와 관련한 대부분의 보안 사고는 내부에서 발생한다.Most security incidents related to the information currently occurring in the enterprise occur internally.

기업에서는 보안 솔루션을 구축하여 정보 유출을 방지하고 있지만, 개별 구축된 정보 유출 차단 시스템의 증가와 분석 업무의 다양화로 인해 관리 인력 및 통합 모니터링의 한계가 대두되고 있으며, 다양한 유출 시도 및 경로에 대한 추적의 어려움이 발생하고 있다. 이러한 문제점을 해결하기 위해, 정보 유출 관련 로그의 통합과 누가, 어떤 정보를 어떻게 유출하려고 시도했는지에 대한 패턴을 분석하고 감시할 수 있는 솔루션들이 구축되었다. 하지만, 기존의 솔루션들은 다양한 유출 경로에 대한 시나리오 기반 분석, 통계 분석을 통해 정보 유출에 대한 위험도를 관리자가 직접 산출하고, 산출된 위험도에 따라 내부 정보 유출 시도에 대한 모니터링 및 탐지를 하고 있다. 이에 따르면 종래 기술은, 관리 포인트의 증가 및 단편적인 모니터링과 대응의 한계 등으로 정보 유출을 사전에 효과적으로 탐지하지 못하며, 관리 비용이 증가하는 문제점이 발생한다.Although enterprises are preventing information leakage by building security solutions, the limit of management manpower and integrated monitoring is rising due to the increase of individual information leakage prevention system and diversification of analytical work, and tracking of various leakage attempts and paths The difficulty of In order to solve these problems, solutions have been developed to analyze and monitor the pattern of integration of information leakage logs and who tried to leak information. However, existing solutions are based on scenario-based analysis and statistical analysis of various outflow routes, and the administrator directly calculates the risk of information leakage and monitors and detects internal information leakage attempts according to the calculated risk. According to the related art, there is a problem that the information leakage can not be detected in advance due to the increase of the management point, the fragmentary monitoring and the limit of the correspondence, and the management cost increases.

한국공개특허 제2008-0029602호(2008.04.03 공개)Korean Patent Publication No. 2008-0029602 (Published Apr. 03, 2008)

본 발명은 상기와 같은 문제점을 해결하기 위해 제안된 것으로서, 머신 러닝을 이용해 로그 데이터를 학습시키고, 학습된 결과를 통해 신규 데이터의 입력시 자동으로 정보 유출에 대한 위험도를 산출하여, 정보 유출을 탐지할 수 있는 정보 유출 위험도 판단 방법 및 장치를 제공하는데 그 목적이 있다.Disclosure of Invention Technical Problem [8] The present invention has been proposed in order to solve the above-mentioned problems, and it is an object of the present invention to provide a system and method for detecting log information by learning log data using machine learning, And a method and apparatus for determining the risk of information leakage.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 일 실시 예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by one embodiment of the present invention. It will also be readily apparent that the objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.

상기와 같은 목적을 달성하기 위한 본 발명의 일 측면에 따른 머신 러닝을 이용한 정보 유출 위험도 판단 장치에서의 정보 유출 위험도 판단 방법은, 보안 솔루션을 통해 수집된 보안 이벤트 정보 및 사용자 정보를 포함하는 로그 데이터를 머신 러닝 알고리즘을 이용하여 학습시킴으로써, 사용자별 이벤트에 따른 위험도 값을 포함하는 머신 러닝 모델을 생성하는 단계; 및 신규 데이터가 입력되면, 상기 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단하는 단계;를 포함한다.According to an aspect of the present invention, there is provided a method for determining an information leakage risk in an information leakage risk determination apparatus using machine learning, the method comprising: receiving log data including security event information and user information collected through a security solution; Learning by using a machine learning algorithm, thereby generating a machine learning model including a risk value according to a user-specific event; And determining a risk value for information leakage using the learned machine learning model when new data is input.

상기 머신 러닝 모델을 생성하는 단계에서는, 상기 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 관리자 단말로 전송하여, 상기 관리자 단말로부터 재학습에 대한 요청 메시지를 수신하는 경우, 상기 재학습에 대한 요청 메시지에 포함된 정보를 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재생성할 수 있다. In the step of generating the machine learning model, the information including the risk value according to the per-user event is transmitted to the administrator terminal, and when receiving the re-learning request message from the administrator terminal, Based on the information contained in the message, the machine learning model can be regenerated by re-learning using the machine learning algorithm.

상기 재학습에 대한 요청 메시지에 포함된 정보는, 관리자에 의해 재산출된 위험도 값을 포함할 수 있으며, 상기 재학습에 대한 요청 메시지에 포함된 정보를 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재생성하는 것은, 상기 재산출된 위험도 값을 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재성성할 수 있다. The information included in the re-learning request message may include a risk value re-calculated by the administrator, and re-learned using a machine learning algorithm based on the information included in the re- Regenerating the running model may re-learn using a machine learning algorithm based on the recalculated risk value to regenerate the machine learning model.

상기 위험도 값을 판단하는 단계에서는, 입력되는 신규 데이터에 포함된 보안 이벤트 정보 및 사용자 정보와 머신 러닝 모델에 포함된 보안 이벤트 정보 및 사용자 정보를 각각 비교하여 신규 데이터에 대한 위험도 값을 산출할 수 있다. In the step of determining the risk value, the risk value for the new data may be calculated by comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model .

입력되는 신규 데이터에 포함된 보안 이벤트 정보 및 사용자 정보와 머신 러닝 모델에 포함된 보안 이벤트 정보 및 사용자 정보를 각각 비교하여 신규 데이터에 대한 위험도 값을 산출하되, 머신 러닝 모델에서 보안 이벤트 정보가 유사한 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 유사도가 높은 순으로 다수 개 선출하고, 상기 선출된 정보에 포함된 사용자 정보와 신규 데이터에 포함된 사용자 정보를 비교하여 가장 유사도가 높은 하나의 정보에 포함된 위험도 값을 신규 데이터의 사용자별 이벤트에 대한 위험도 값으로 산출할 수 있다.The security event information and the user information included in the input new data are compared with the security event information and the user information included in the machine learning model to calculate the risk value for the new data. In the machine learning model, A plurality of pieces of information including a risk value according to a specific event are selected in descending order of similarity, and the user information included in the selected information is compared with the user information included in the new data to be included in one piece of information having the highest degree of similarity The risk value of the new data can be calculated as the risk value for the user-specific event.

입력되는 신규 데이터에 포함된 보안 이벤트 정보 및 사용자 정보와 머신 러닝 모델에 포함된 보안 이벤트 정보 및 사용자 정보를 각각 비교하여 신규 데이터에 대한 위험도 값을 산출하되, 머신 러닝 모델에서 사용자 정보가 유사한 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 유사도가 높은 순으로 다수 개 선출하고, 상기 선출된 정보에 포함된 보안 이벤트 정보와 신규 데이터에 포함된 보안 이벤트 정보를 비교하여 가장 유사도가 높은 하나의 정보에 포함된 위험도 값을 신규 데이터의 사용자별 이벤트에 대한 위험도 값으로 산출할 수 있다. The security event information and the user information included in the input new data are compared with the security event information and the user information included in the machine learning model to calculate the risk value for the new data. In the machine learning model, The security event information included in the selected information and the security event information included in the new data are compared with each other to obtain one piece of information having the highest degree of similarity, The included risk value can be calculated as the risk value for the user-specific event of the new data.

상기와 같은 목적을 달성하기 위한 본 발명의 다른 측면에 따른 정보 유출 위험도 판단 장치는, 보안 솔루션을 통해 수집된 보안 이벤트 정보 및 사용자 정보를 포함하는 로그 데이터를 머신 러닝 알고리즘을 이용하여 학습시킴으로써, 사용자별 이벤트에 따른 위험도 값을 포함하는 머신 러닝 모델을 생성하는 머신 러닝 모델 생성부; 및 신규 데이터가 입력되면, 상기 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단하는 위험도 값 판단부;를 포함한다. According to another aspect of the present invention, there is provided an apparatus for determining an information leakage risk, the apparatus comprising: a learning algorithm for learning log data including security event information and user information collected through a security solution; A machine learning model generating unit for generating a machine learning model including a risk value according to a specific event; And a risk value determiner for determining a risk value for information leakage using the learned machine learning model when new data is input.

상기 머신 러닝 모델 생성부는, 상기 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 관리자 단말로 전송하여, 상기 관리자 단말로부터 재학습에 대한 요청 메시지를 수신하는 경우, 상기 재학습에 대한 요청 메시지에 포함된 정보를 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재생성할 수 있다. The machine learning model generation unit transmits information including the risk value according to the per-user event to the administrator terminal, and when receiving the request message for re-learning from the administrator terminal, the machine learning model generation unit includes in the request message for re-learning Based on the information obtained, the machine learning model can be regenerated by re-learning using the machine learning algorithm.

상기 위험도 값 판단부는, 입력되는 신규 데이터에 포함된 보안 이벤트 정보 및 사용자 정보와 머신 러닝 모델에 포함된 보안 이벤트 정보 및 사용자 정보를 각각 비교하여 신규 데이터에 대한 위험도 값을 산출할 수 있다. The risk value determination unit may compute the risk value for the new data by comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model, respectively.

본 발명의 일 측면에 따르면, 머신 러닝을 이용해 다양한 정보 유출 관련 이벤트들의 종합 분석을 통해 위험도 값을 산출함에 따라, 정보 유출을 보다 효율적으로 탐지할 수 있는 효과가 있다.According to an aspect of the present invention, a risk value is calculated through a comprehensive analysis of a variety of information leakage related events using machine learning, so that information leakage can be detected more efficiently.

또한, 신규 데이터가 입력되더라도 자동적으로 정보 유출에 대한 위험도 값을 산출할 수 있어 보안 관리의 효율성이 뛰어나다.In addition, even if new data is input, the risk value for information leakage can be automatically calculated, and the security management efficiency is excellent.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtained in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description .

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시 예를 예시하는 것이며, 발명을 실시하기 위한 구체적인 내용들과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 머신 러닝의 개념을 도시한 도면,
도 2는 본 발명의 일 실시 예에 따른 시스템의 개략적인 구성을 도시한 도면,
도 3은 도 2의 정보 유출 위험도 판단 장치의 구성을 개략적으로 도시한 블록도,
도 4는 본 발명의 일 실시 예에 따른 보안 이벤트 정보를 도시한 도면,
도 5는 본 발명의 일 실시 예에 따른 사용자 정보를 도시한 도면,
도 6은 본 발명의 일 실시 예에 따른 사용자별 이벤트에 따른 위험도 값 정보를 도시한 도면,
도 7은 본 발명의 일 실시 예에 따른 정보 유출 위험도 판단 방법의 흐름을 도시한 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments of the invention and, together with the specific details for carrying out the invention, And shall not be construed as limited to the matters described.
1 shows a concept of machine learning,
2 shows a schematic configuration of a system according to an embodiment of the present invention,
FIG. 3 is a block diagram schematically showing the configuration of the information leakage risk determination device of FIG. 2;
FIG. 4 illustrates security event information according to an embodiment of the present invention. FIG.
FIG. 5 illustrates user information according to an embodiment of the present invention. FIG.
FIG. 6 illustrates risk value information according to an event according to an embodiment of the present invention; FIG.
7 is a flowchart illustrating a method of determining an information leakage risk according to an embodiment of the present invention.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시 예를 상세히 설명하기로 한다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which: There will be. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 “…부” 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when an element is referred to as " comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. In addition, the term "Quot; and " part " refer to a unit that processes at least one function or operation, which may be implemented in hardware, software, or a combination of hardware and software.

도 1은 머신 러닝의 개념을 도시한 도면이다.1 is a diagram showing the concept of machine learning.

머신 러닝(Machine Learning : 기계학습)은 인간이 갖고 있는 고유의 지능적 기능인 학습 능력을 기계를 통해 구현하는 방법이다. 머신 러닝은 환경과의 상호 작용에 기반을 둔 데이터로부터 스스로 성능을 향상시키는 알고리즘 및 기술을 개발할 수 있다. 즉, 머신 러닝은 컴퓨터에게 사람이 직접 명시적으로 로직을 지시하지 않아도 데이터를 통해 컴퓨터가 학습을 하고, 그것을 사용해 컴퓨터가 자동으로 문제를 해결하도록 할 수 있다. Machine Learning (Machine Learning) is a way to implement learning capabilities, which are human intelligence functions, through machines. Machine learning can develop algorithms and techniques to improve performance on its own from data based on interaction with the environment. In other words, machine learning allows a computer to learn through the data, even if the computer does not explicitly instruct the logic directly, and use it to let the computer automatically resolve the problem.

도 1에 도시된 바와 같이, 머신 러닝은, 주어진 데이터(training data)에 포함된 정보들을 머신 러닝 알고리즘에 학습시켜 관계(머신 러닝 모델)를 찾는 과정일 수 있다. 이때, 주어진 데이터(training data)는 학습을 위해 외부에서 주어는 데이터일 수 있다. 머신 러닝 알고리즘은 신경망 알고리즘일 수 있다. 일반적으로 신경망은, 시간에 따른 날씨의 변화 등과 같이 수학적으로 해결되지 않는 복잡한 문제들을 분석하는데 유용하고, 다양한 문제에 적용이 가능하며, 복잡한 문제에 우수한 결과를 보이고, 과거의 통계학적 분석 방법에 비해 학습을 통해 분석하므로 비교적 올바른 결과를 보인다. 또한, 분석 시간이 짧고, 계산 비용이 적으며, 패턴 인식, 예측, 분류 등에 효과적이라는 장점을 가진다. As shown in FIG. 1, machine learning may be a process of learning the information contained in the training data in a machine learning algorithm to find a relationship (machine learning model). At this time, the given data (training data) may be externally given data for learning. The machine learning algorithm may be a neural network algorithm. In general, neural networks are useful for analyzing complex problems that are not resolved mathematically, such as changes in weather over time, are applicable to a variety of problems, exhibit excellent results in complex problems, It is analyzed through learning and shows relatively correct result. In addition, it has advantages such as short analysis time, low calculation cost, pattern recognition, prediction, classification and the like.

주어진 데이터(training data)는 신경망 알고리즘을 통해 학습되고, 그 결과가 머신러닝 모델로 피드백될 수 있다. 이후, 학습된 머신 러닝 모델에 신규 데이터(test data)를 입력하게 되면, 학습된 머신 러닝 모델을 기초로 신규 데이터를 판단한 결과가 산출될 수 있다.The training data is learned through a neural network algorithm and the results can be fed back to the machine learning model. Thereafter, when new data (test data) is input to the learned machine learning model, the result of judging new data based on the learned machine learning model can be calculated.

머신 러닝의 종류는 아래와 같다.The types of machine running are as follows.

1) 지도 학습(Supervised Learning)1) Supervised Learning

: 학습 시 출력 값을 미리 알려주는 '교사(supervised)' 가 존재하는 형태로, 주로 인식, 분류 진단 예측 등의 문제 해결에 적합하다. : There is 'supervised' which informs the output value at the time of learning, and it is suitable for solving problems such as recognition and classification diagnosis prediction.

2) 비지도학습(Unsupervised Learning)2) Unsupervised Learning

: 학습 시 출력 값에 대한 정보 없이(교사없이) 학습이 이루어지는 형태로, 군집화, 밀도 추정, 차원축소, 특징 추출 등이 필요한 문제에 적합하다.: It is suitable for problems that require clustering, density estimation, dimensional reduction, feature extraction, etc., in which learning is performed without information on output values during learning (without a teacher).

3) 강화학습(Reinforcement Learning)3) Reinforcement Learning

: 상태(state)에서 어떤 행동(action)을 취하는 것이 최적인지를 학습하는 것으로, 행동을 취할 때마다 외부 환경에서 보상(reward)이 주어지며, 보상을 최대화하는 방향으로 학습이 진행될 수 있다. : It learns which action is best to take in the state. Every action takes a reward in the external environment, and learning can proceed in the direction of maximizing the compensation.

머신 러닝의 종류는 상술한 것 이외에, 다양하게 존재할 수 있으며, 그 종류는 공지되어 있으므로 보다 자세한 설명은 생략하기로 한다. The types of machine learning can be variously provided in addition to those described above, and the types thereof are known, so that a detailed description will be omitted.

도 2는 본 발명의 일 실시 예에 따른 시스템의 개략적인 구성을 도시한 도면, 도 3은 도 2의 정보 유출 위험도 판단 장치의 구성을 개략적으로 도시한 블록도, 도 4는 본 발명의 일 실시 예에 따른 보안 이벤트 정보를 도시한 도면, 도 5는 본 발명의 일 실시 예에 따른 사용자 정보를 도시한 도면, 도 6은 본 발명의 일 실시 예에 따른 사용자별 이벤트에 따른 위험도 값 정보를 도시한 도면이다. 3 is a block diagram schematically showing the configuration of an information leakage risk determination apparatus of FIG. 2; and FIG. 4 is a block diagram of an embodiment of the present invention. FIG. 5 is a diagram illustrating user information according to an embodiment of the present invention. FIG. 6 is a diagram illustrating security value information according to a user-specific event according to an embodiment of the present invention. Fig.

본 실시 예를 설명함에 있어서, 정보 유출 위험도 판단 장치(300)는 상술한 바와 같은 머신 러닝의 기본 개념을 적용하여 로그 데이터를 학습하고, 머신 러닝 모델을 생성할 수 있다. In explaining the present embodiment, the information leakage risk determination apparatus 300 can learn log data and generate a machine learning model by applying the basic concept of machine learning as described above.

도 2를 참조하면, 본 실시 예에 다른 정보 유출 위험도 판단 시스템은, 사용자 단말(100), 보안 솔루션(200) 및 정보 유출 위험도 판단 장치(300)를 포함할 수 있다. Referring to FIG. 2, the information leakage risk assessment system according to the present embodiment may include a user terminal 100, a security solution 200, and an information leakage risk determination apparatus 300.

사용자 단말(100)은, 사용자(예컨대, 시스템이 적용되는 장소가 기업인 경우 직원)가 기업 내의 시스템에 접속하기 위해 소유한 단말일 수 있다. 본 실시 예에서 사용자 단말(100)은, 기업 내의 시스템과 네트워크로 연결된 컴퓨터로 설명하기로 한다. 하지만, 이에 한하지 않으며, 사용자 단말(100)은 기업 내의 시스템과 네트워크로 연결되어 사용자가 기업 내의 시스템에 접근할 수 있는 장치이면 관계없다. 사용자 단말(100)은 기업 내의 시스템에 접속하여 어떤 행위를 취하는 경우, 사용자 정보 및 보안 이벤트 정보를 포함하는 신규 데이터를 발생시킬 수 있으며, 상기 신규 데이터는 정보 유출 위험도 판단 장치(300)로 전송될 수 있다.The user terminal 100 may be a terminal owned by a user (e.g., an employee if the system to which the system is applied is an employee) to access a system in the enterprise. In the present embodiment, the user terminal 100 will be described as a computer connected to a system in the enterprise via a network. However, the present invention is not limited to this, and the user terminal 100 may be connected to a system in the enterprise through a network so that the user terminal 100 can access the system in the enterprise. When the user terminal 100 accesses a system in the enterprise and takes an action, the user terminal 100 may generate new data including user information and security event information, and the new data may be transmitted to the information leakage risk determination apparatus 300 .

보안 솔루션(200)은 로그 데이터를 수집할 수 있으며, 수집된 로그 데이터를 정보 유출 위험도 판단 장치(300)로 전송할 수 있다. 이때, 로그 데이터는 보안 이벤트 정보 및 사용자 정보를 포함할 수 있다. 보안 이벤트 정보는, 도 3에 도시된 바와 같이, 기업 내 시스템에 접근하는 사용자의 행위 정보를 포함할 수 있다. 예를 들어, 사용자의 행위 정보는, 기밀정보의 암호화 해지 시도, 차단된 웹 하드에 접속하여 파일을 업로드, 암호화된 파일 출력 등과 같은 것일 수 있다. 또한, 사용자 정보는, 도 4에 도시된 바와 같이, 사원번호, 사원명, 사용자가 소유한 단말의 IP 주소, 사용자가 소유한 단말의 MAC 주소 및 보안등급 정보 등일 수 있다. The security solution 200 can collect log data and transmit the collected log data to the information leakage risk determination apparatus 300. [ At this time, the log data may include security event information and user information. As shown in FIG. 3, the security event information may include behavior information of a user accessing the system in the enterprise. For example, the user's behavior information may be such as attempting to decrypt confidential information, uploading a file to a blocked web hard, outputting an encrypted file, and the like. 4, the user information may be employee number, employee name, IP address of the terminal owned by the user, MAC address of the terminal owned by the user, security level information, and the like.

정보 유출 위험도 판단 장치(300)는, 보안 솔루션(200)으로부터 로그 데이터를 수신하고, 수신된 로그 데이터를 머신 러닝 알고리즘을 이용해 학습시킴으로써, 머신 러닝 모델을 생성할 수 있다. 상기 머신 러닝 알고리즘은, 신경망 알고리즘일 수 있다. 이때, 생성된 머신 러닝 모델은, 도 5에 도시된 바와 같이, 사용자별 이벤트에 따른 위험도 값 정보를 포함할 수 있다. The information leakage risk determination device 300 can receive the log data from the security solution 200 and learn the received log data using a machine learning algorithm to create a machine learning model. The machine learning algorithm may be a neural network algorithm. At this time, as shown in FIG. 5, the generated machine learning model may include risk value information according to a user-specific event.

정보 유출 위험도 판단 장치(300)는, 사용자 단말(100)로부터 신규 데이터를 수신하면, 생성된 머신 러닝 모델을 이용해 신규 데이터에 대한 위험도 값을 산출할 수 있다. 산출되는 위험도 값은 0 내지 1의 범위를 가지며, 0에 가까울수록 정상(사용자의 행위가 정보 유출과 관련성이 적음)이고, 1에 가까울수록 위험(사용자의 행위가 정보 유출과 관련성이 많음)임을 의미할 수 있다. 한편, 정보 유출 위험도 판단 장치(300)는, 관리자 단말(400)로부터 재학습 요청 메시지를 수신하면, 재학습 요청 메시지에 포함된 정보를 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재성성할 수 있다. 재학습 요청 메시지는, 관리자의 판단에 따라 위험도 값의 재산출이 이루어지면 정보 유출 위험도 판단 장치(300)로 전송될 수 있다. 한편, 정보 유출 위험도 판단 장치(300)는 데이터베이스를 구비하여, 보안 솔루션(200)으로부터 입력되는 로그 데이터에 포함된 보안 이벤트 정보 및 사용자 정보를 저장할 수 있다. 이때, 생성된 머신 러닝 모델에 포함된 사용자별 이벤트에 따른 위험도 값 정보는, 데이터베이스에 구비된 로그 데이터의 보안 이벤트 정보 및 사용자 정보와 매핑되어, 신규 데이터의 입력시 위험도 값을 판단함에 있어서 참조될 수 있다. 한편, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 보안 이벤트 정보 및/또는 사용자 정보와 동일한 정보에 대한 위험도 값이 데이터베이스에 산출되어 있지 않을 경우, 정보 유출 위험도 판단 장치(300)는 신규 데이터에 포함된 보안 이벤트 정보 및/또는 사용자 정보와 가장 유사한 정보를 참조하여 위험도 값을 추정하여 산출할 수 있다. 예컨대, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 보안 이벤트 정보가 “퇴직 예정자가 암호화된 파일을 삭제하는 경우”일 경우, 이에 대한 위험도 값 산출을 위해 로그 데이터의 보안 이벤트 정보 및 사용자 정보가 매핑된 데이터베이스를 참조할 수 있다. 이때, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 보안 이벤트 정보와 그 기능이 가장 유사한 보안 이벤트 정보들을 기초로 위험도 값을 추정하여 산출할 수 있다. 예컨대, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 보안 이벤트 정보가 “퇴직 예정자가 암호화된 파일을 삭제하는 경우”인 경우, 정확히 일치하는 보안 이벤트 정보가 존재하지 않고 이와 가장 유사한 보안 이벤트 정보들(예를 들어, “퇴직예정자가 암호화된 파일을 이동시키는 경우” 또는 “권한이 없는 자가 암호화된 파일을 삭제하는 경우” 등)이 존재할 때, 상술한 수신된 신규 데이터에 포함된 보안 이벤트 정보와 그 기능이 가장 유사한 보안 이벤트 정보들이 가진 위험도 값을 기초로 위험도 값을 추정하여 산출할 수 있다. 이때, 추정하여 산출되는 위험도 값은 수신된 신규 데이터에 포함된 보안 이벤트 정보와 그 기능이 가장 유사한 보안 이벤트 정보들이 가진 위험도 값의 평균값일 수 있다. Upon receipt of the new data from the user terminal 100, the information leakage risk determination device 300 can calculate the risk value for the new data using the generated machine learning model. The risk value calculated ranges from 0 to 1, and the closer to 0, the more normal (the user's actions are less relevant to the information leakage). The closer to 1, the more dangerous (the user's actions are related to the information leakage) It can mean. On the other hand, upon receiving the re-learning request message from the administrator terminal 400, the information leakage risk determination apparatus 300 re-learns the machine learning model based on the information included in the re-learning request message, You can. The re-learning request message may be transmitted to the information leakage risk determination apparatus 300 when re-calculation of the risk value is performed according to the determination of the administrator. Meanwhile, the information leakage risk determination apparatus 300 may include a database, and may store security event information and user information included in log data input from the security solution 200. [ At this time, the risk value information according to the user-specific event included in the generated machine learning model is mapped to the security event information and the user information of the log data provided in the database and is referred to . Meanwhile, when the risk value for the same information as the security event information and / or the user information included in the received new data is not calculated in the database, the information leakage risk determination apparatus 300 determines whether or not the information leakage risk determination apparatus 300 The risk value can be estimated and calculated by referring to the security event information included in the new data and / or the information most similar to the user information. For example, when the security event information included in the received new data is " when the retired person deletes the encrypted file, " the information leakage risk determination device 300 determines that the security event information of the log data And the database to which the user information is mapped. At this time, the information leakage risk determination apparatus 300 can estimate and calculate the risk value based on the security event information included in the received new data and security event information most similar in function to the security event information. For example, when the security event information included in the received new data is " when the retirement planning person deletes the encrypted file ", the information leakage risk determination apparatus 300 does not have exactly the same security event information, When there is security event information (for example, " when the retiree moves the encrypted file " or " when the unauthorized person deletes the encrypted file ", etc.) The risk value can be estimated and calculated based on the risk value of the security event information and the security event information whose function is closest to the security event information. At this time, the risk value calculated by the estimation may be an average value of the risk values of the security event information included in the received new data and the security event information whose function is closest to the security event information.

또한, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 사용자 정보가 위험도 값 산출을 위해 참조하는 로그 데이터의 보안 이벤트 정보 및 사용자 정보가 매핑된 데이터베이스에 존재하지 않을 경우, 그와 가장 유사한 사용자 정보들을 기초로 위험도 값을 추정하여 산출할 수 있다. 예컨대, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 사용자 정보가 “이름”일 경우, 이에 대한 위험도 값 산출을 위해 로그 데이터의 보안 이벤트 정보 및 사용자 정보가 매핑된 데이터베이스를 참조할 수 있다. 이때, 정보 유출 위험도 판단 장치(300)는 수신된 신규 데이터에 포함된 사용자 정보(예를 들어 “이름”)에 해당하는 위험도 값이 참조한 데이터베이스에 존재하지 않으면, 수신된 신규 데이터에 포함된 사용자 정보(예를 들어 “이름”)와 같은 직급에 해당하는 사용자에게 부여된 보안등급에 해당하는 위험도 값을 기초로, 수신된 신규 데이터에 포함된 사용자 정보(예를 들어 “이름”)에 해당하는 위험도 값을 추정하여 산출할 수 있다. 한편, 정보 유출 위험도 판단 장치(300)는 상기 추정된 위험도 값을 관리자 단말로 전송할 수 있으며, 상기 관리자 단말에 의해 재학습 요청 메시지가 수신될 경우, 상술한 내용을 반복 수행하여 위험도 값을 재산출할 수 있다.Also, when the user information included in the received new data does not exist in the database in which the security event information and the user information of the log data referenced for the risk value calculation are mapped, The risk value can be estimated and calculated based on similar user information. For example, when the user information included in the received new data is " name ", the information leakage risk determination device 300 refers to the database to which the security event information of the log data and the user information are mapped . At this time, if the risk value corresponding to the user information (e.g., " name ") included in the received new data does not exist in the database referenced by the risk information, (For example, " name ") included in the received new data based on the risk value corresponding to the security level assigned to the user corresponding to the rank level Value can be estimated and calculated. On the other hand, the information leakage risk determination apparatus 300 can transmit the estimated risk value to the administrator terminal. When the re-learning request message is received by the administrator terminal, the information leakage risk determination apparatus 300 repeats the above- .

상술한 정보 유출 위험도 판단 장치(300)와 관련한 보다 자세한 설명은 도 3을 통해 후술하기로 한다. A detailed description related to the information leakage risk determination device 300 will be described later with reference to FIG.

도 3을 참조하면, 정보 유출 위험도 판단 장치(300)는, 머신 러닝 모델 생성부(310) 및 위험도 값 판단부(330)를 포함할 수 있다. Referring to FIG. 3, the information leakage risk determination apparatus 300 may include a machine learning model generation unit 310 and a risk value determination unit 330.

머신 러닝 모델 생성부(310)는, 보안 솔루션(200)을 통해 수집된 로그 데이터를 머신 러닝 알고리즘을 이용하여 학습시키고, 머신 러닝 모델을 생성할 수 있다. 로그 데이터는, 보안 솔루션(200)을 통해 수집된 보안 이벤트 정보 및 사용자 정보를 포함할 수 있다. 또한, 생성된 머신 러닝 모델은 사용자별 이벤트에 따른 위험도 값을 포함할 수 있다. The machine learning model generation unit 310 may learn log data collected through the security solution 200 using a machine learning algorithm and generate a machine learning model. The log data may include security event information and user information collected through the security solution 200. In addition, the generated machine learning model may include a risk value according to a user-specific event.

머신 러닝 모델 생성부(310)는, 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 관리자 단말(400)로 전송할 수 있다. 머신 러닝 모델 생성부(310)는, 관리자 단말(400)로부터 재학습에 대한 요청 메시지를 수신하는 경우, 재학습에 대한 요청 메시지에 포함된 정보를 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재생성할 수 있다. 한편, 재학습에 대한 요청 메시지에 포함된 정보는, 관리자에 의해 재산출된 위험도 값을 포함할 수 있다. 관리자는 단말을 통해 머신 러닝 모델 생성부(310)로부터 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 수신하는 경우, 이를 확인하고 위험도 값을 재산출할 수 있다. 위험도 값의 재산출은 관리자의 임의적인 판단에 의해 수행될 수 있다. 예컨대, 관리자는 사용자별 이벤트에 대한 위험도 값이 너무 높다고 판단될 경우, 적절한 수치로 재산출할 수 있으며, 사용자별 이벤트에 대한 위험도 값이 너무 낮다고 판단될 경우, 적절한 수치로 재산출할 수 있다. 이때, 머신 러닝 모델 생성부(310)는, 재산출된 위험도 값을 기초로 머신 러닝 알고리즘(신경망 알고리즘)을 이용해 재학습하여 머신 러닝 모델을 재성성할 수 있다. The machine learning model generating unit 310 may transmit information including the risk value according to the user-specific event to the administrator terminal 400. When receiving the re-learning request message from the administrator terminal 400, the machine learning model generation unit 310 re-learns the information based on the information included in the re-learning request message using the machine learning algorithm, The model can be regenerated. On the other hand, the information included in the re-learning request message may include the risk value recalculated by the administrator. When an administrator receives information including a risk value according to a user-specific event from the machine learning model generation unit 310 through the terminal, the administrator can check the information and output the risk value. The re-calculation of the risk value can be performed by an arbitrary judgment of the manager. For example, an administrator can reassign an appropriate value if the risk value for the user-specific event is determined to be too high, and reassign the appropriate value if the risk value for the user-specific event is determined to be too low. At this time, the machine learning model generating unit 310 may re-learn the machine learning model using a machine learning algorithm (neural network algorithm) based on the re-calculated risk value.

위험도 값 판단부(330)는, 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단한다. 예컨대, 위험도 값 판단부(330)는, 사용자 단말(100)로부터 신규 데이터가 입력되면, 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단할 수 있다. 위험도 값 판단부(330)는, 입력되는 신규 데이터에 포함된 보안 이벤트 정보 및 사용자 정보와 머신 러닝 모델에 포함된 보안 이벤트 정보 및 사용자 정보를 각각 비교하여 신규 데이터에 대한 위험도 값을 산출할 수 있다. 예컨대, 위험도 값 판단부(330)는, 머신 러닝 모델에서 보안 이벤트 정보가 유사한 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 유사도가 높은 순으로 다수 개 선출하고, 상기 선출된 정보에 포함된 사용자 정보와 신규 데이터에 포함된 사용자 정보를 비교하여 가장 유사도가 높은 하나의 정보에 포함된 위험도 값을 신규 데이터의 사용자별 이벤트에 대한 위험도 값으로 산출할 수 있다. 또한, 위험도 값 판단부(330)는, 머신 러닝 모델에서 사용자 정보가 유사한 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 유사도가 높은 순으로 다수 개 선출하고, 상기 선출된 정보에 포함된 보안 이벤트 정보와 신규 데이터에 포함된 보안 이벤트 정보를 비교하여 가장 유사도가 높은 하나의 정보에 포함된 위험도 값을 신규 데이터의 사용자별 이벤트에 대한 위험도 값으로 산출할 수 있다. The risk value determination unit 330 determines a risk value for information leakage using the learned machine learning model. For example, when the new data is input from the user terminal 100, the risk value determination unit 330 can determine the risk value for information leakage using the learned machine learning model. The risk value determination unit 330 may compute the risk value for the new data by comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model . For example, the risk value determination unit 330 may select a plurality of information including a risk value according to a user-specific event having security event information in descending order of similarity in a machine learning model, The risk information included in one piece of information having the highest degree of similarity can be calculated as a risk value for a user-specific event of the new data. In addition, the risk value determination unit 330 may select a plurality of pieces of information including a risk value according to a user-specific event in the machine learning model in descending order of similarity, The risk information included in one piece of information having the highest degree of similarity can be calculated as a risk value for a user-specific event of the new data.

이하, 상술한 본 실시 예에 따른 정보 유출 위험도 판단 장치(300)에서의 정보 유출 위험도 판단 방법에 대해 설명하기로 한다. Hereinafter, a method for determining the information leakage risk level in the information leakage risk determination apparatus 300 according to the present embodiment will be described.

도 7은 본 발명의 일 실시 예에 따른 정보 유출 위험도 판단 방법의 흐름을 도시한 도면이다. 7 is a flowchart illustrating a method of determining an information leakage risk according to an embodiment of the present invention.

도 7을 참조하면, 본 실시 예에 따른 정보 유출 위험도 판단 방법은, 머신 러닝 모델 생성 단계 및 위험도 값 판단 단계를 포함한다.Referring to FIG. 7, the information leakage risk determination method according to the present embodiment includes a machine learning model generation step and a risk value determination step.

머신 러닝 모델 생성 단계에서는, 보안 솔루션(200)을 통해 수집된 로그 데이터를 머신 러닝 알고리즘을 이용하여 학습시키고, 머신 러닝 모델을 생성할 수 있다(410). 로그 데이터는, 보안 솔루션(200)을 통해 수집된 보안 이벤트 정보 및 사용자 정보를 포함할 수 있다. 또한, 생성된 머신 러닝 모델은 사용자별 이벤트에 따른 위험도 값을 포함할 수 있다. 이때, 정보 유출 위험도 판단 장치(300)는, 정보 유출 위험도 판단 장치(300)가 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 관리자 단말(400)로 전송할 수 있다. 정보 유출 위험도 판단 장치(300)는, 관리자 단말(400)로부터 재학습에 대한 요청 메시지를 수신하는 경우, 재학습에 대한 요청 메시지에 포함된 정보를 기초로 머신 러닝 알고리즘을 이용해 재학습하여 머신 러닝 모델을 재생성할 수 있다(430)(450). 한편, 재학습에 대한 요청 메시지에 포함된 정보는, 관리자에 의해 재산출된 위험도 값을 포함할 수 있다. 관리자는 단말을 통해 정보 유출 위험도 판단 장치(300)로부터 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 수신하는 경우, 이를 확인하고 위험도 값을 재산출할 수 있다. 위험도 값의 재산출은 관리자의 임의적인 판단에 의해 수행될 수 있다. 예컨대, 관리자는 사용자별 이벤트에 대한 위험도 값이 너무 높다고 판단될 경우, 적절한 수치로 재산출할 수 있으며, 사용자별 이벤트에 대한 위험도 값이 너무 낮다고 판단될 경우, 적절한 수치로 재산출할 수 있다. 이때, 정보 유출 위험도 판단 장치(300)는, 재산출된 위험도 값을 기초로 머신 러닝 알고리즘(신경망 알고리즘)을 이용해 재학습하여 머신 러닝 모델을 재성성할 수 있다.In the machine learning model generation step, the log data collected through the security solution 200 may be learned using a machine learning algorithm, and a machine learning model may be generated (410). The log data may include security event information and user information collected through the security solution 200. In addition, the generated machine learning model may include a risk value according to a user-specific event. At this time, the information leakage risk determination apparatus 300 can transmit the information including the risk value according to the user-specific event to the administrator terminal 400 by the information leakage risk determination apparatus 300. When the information leakage risk determination apparatus 300 receives the re-learning request message from the administrator terminal 400, the information leakage risk determination apparatus 300 re-learns the information based on the information included in the re- The model may be regenerated (430) (450). On the other hand, the information included in the re-learning request message may include the risk value recalculated by the administrator. When the manager receives the information including the risk value according to the user-specific event from the information leakage risk determination apparatus 300 through the terminal, the administrator can check the information and output the risk value. The re-calculation of the risk value can be performed by an arbitrary judgment of the manager. For example, an administrator can reassign an appropriate value if the risk value for the user-specific event is determined to be too high, and reassign the appropriate value if the risk value for the user-specific event is determined to be too low. At this time, the information leakage risk determination apparatus 300 can re-learn the machine learning model based on the re-calculated risk value using a machine learning algorithm (neural network algorithm).

위험도 값 판단 단계에서는, 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단한다(470). 예컨대, 정보 유출 위험도 판단 장치(300)는, 사용자 단말(100)로부터 신규 데이터가 입력되면, 학습된 머신 러닝 모델을 이용해 정보 유출에 대한 위험도 값을 판단할 수 있다. 정보 유출 위험도 판단 장치(300)는, 입력되는 신규 데이터에 포함된 보안 이벤트 정보 및 사용자 정보와 머신 러닝 모델에 포함된 보안 이벤트 정보 및 사용자 정보를 각각 비교하여 신규 데이터에 대한 위험도 값을 산출할 수 있다. 예컨대, 정보 유출 위험도 판단 장치(300)는, 머신 러닝 모델에서 보안 이벤트 정보가 유사한 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 유사도가 높은 순으로 다수 개 선출하고, 상기 선출된 정보에 포함된 사용자 정보와 신규 데이터에 포함된 사용자 정보를 비교하여 가장 유사도가 높은 하나의 정보에 포함된 위험도 값을 신규 데이터의 사용자별 이벤트에 대한 위험도 값으로 산출할 수 있다. 또한, 정보 유출 위험도 판단 장치(300)는, 머신 러닝 모델에서 사용자 정보가 유사한 사용자별 이벤트에 따른 위험도 값을 포함하는 정보를 유사도가 높은 순으로 다수 개 선출하고, 상기 선출된 정보에 포함된 보안 이벤트 정보와 신규 데이터에 포함된 보안 이벤트 정보를 비교하여 가장 유사도가 높은 하나의 정보에 포함된 위험도 값을 신규 데이터의 사용자별 이벤트에 대한 위험도 값으로 산출할 수 있다.In the risk value determination step, the risk value for information leakage is determined using the learned machine learning model (470). For example, when the new data is input from the user terminal 100, the information leakage risk determination apparatus 300 can determine the risk value for information leakage using the learned machine learning model. The information leakage risk determination device 300 can compute the risk value for the new data by comparing the security event information included in the input new data and the user information with the security event information and the user information included in the machine learning model have. For example, the information leakage risk determination apparatus 300 selects a plurality of information including a risk value according to a user-specific event having similar security event information in descending order of similarity in a machine learning model, The risk information included in one piece of information having the highest degree of similarity can be calculated as a risk value for a user-specific event of the new data by comparing the user information with the user information included in the new data. Also, the information leakage risk determination apparatus 300 selects a plurality of pieces of information including a risk value according to a user-specific event in the machine learning model in descending order of similarity, The risk information included in one piece of information having the highest degree of similarity can be calculated as the risk value for each user of the new data by comparing the event information with the security event information included in the new data.

상술한 바에 따르면, 본 발명은 머신 러닝을 이용해 다양한 정보 유출 관련 이벤트들의 종합 분석을 통해 위험도 값을 산출함에 따라, 정보 유출을 보다 효율적으로 탐지할 수 있는 효과가 있다.As described above, according to the present invention, a risk value is calculated through a comprehensive analysis of various information leakage related events using machine learning, so that information leakage can be more efficiently detected.

본 발명의 실시 예에 따른 방법들은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는, 본 발명을 위한 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The methods according to embodiments of the present invention may be implemented in an application or implemented in the form of program instructions that may be executed through various computer components and recorded on a computer readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer-readable recording medium may be ones that are specially designed and configured for the present invention and are known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

본 명세서는 많은 특징을 포함하는 반면, 그러한 특징은 본 발명의 범위 또는 특허청구범위를 제한하는 것으로 해석되어서는 아니 된다. 또한, 본 명세서의 개별적인 실시 예에서 설명된 특징들은 단일 실시 예에서 결합되어 구현될 수 있다. 반대로, 본 명세서의 단일 실시 예에서 설명된 다양한 특징들은 개별적으로 다양한 실시 예에서 구현되거나, 적절히 결합되어 구현될 수 있다.While the specification contains many features, such features should not be construed as limiting the scope of the invention or the scope of the claims. In addition, the features described in the individual embodiments herein may be combined and implemented in a single embodiment. On the contrary, the various features described in the singular embodiments may be individually implemented in various embodiments or properly combined.

도면에서 동작들이 특정한 순서로 설명되었으나, 그러한 동작들이 도시된 바와 같은 특정한 순서로 수행되는 것으로 또는 일련의 연속된 순서, 또는 원하는 결과를 얻기 위해 모든 설명된 동작이 수행되는 것으로 이해되어서는 안 된다. 특정 환경에서 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 아울러, 상술한 실시 예에서 다양한 시스템 구성요소의 구분은 모든 실시 예에서 그러한 구분을 요구하지 않는 것으로 이해되어야 한다. 상술한 앱 구성요소 및 시스템은 일반적으로 단일 소프트웨어 제품 또는 멀티플 소프트웨어 제품에 패키지로 구현될 수 있다.Although the operations are described in a particular order in the figures, it should be understood that such operations are performed in a particular order as shown, or that all described operations are performed in a series of sequential orders, or to obtain the desired result. In certain circumstances, multitasking and parallel processing may be advantageous. It should also be understood that the division of various system components in the above embodiments does not require such distinction in all embodiments. The above-described application components and systems can generally be packaged into a single software product or multiple software products.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것은 아니다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.

300 : 정보 유출 위험도 판단 장치
310 : 머신 러닝 모델 생성부
330 : 위험도 값 판단부300: Information leakage risk determination device
310: Machine Learning Model Generation Unit
330: Risk value determination unit

Claims

A method for determining an information leakage risk in an information leakage risk determination apparatus using machine learning,
Generating a machine learning model including a risk value according to a user-specific event by learning log data including security event information and user information collected through a security solution using a machine learning algorithm;
Transmitting information including a risk value according to the user-specific event to an administrator terminal;
When a request message for a re-learning including a risk value re-calculated by an administrator from the administrator terminal is received, a machine learning algorithm is used based on the re-calculated risk value included in the re- Learning and reproducing the machine learning model; And
Determining a risk value for information leakage using the learned machine learning model when new data is input,
In the step of determining the risk value,
And a database for storing security event information and user information included in log data input from the security solution, wherein a risk value for information identical to at least one of security event information and user information included in the received new data is included in the log information, And calculating the risk value by referring to the information having the highest degree of similarity with at least one of the security event information and the user information included in the new data when the risk information is not calculated in the database.

delete

The method according to claim 1,
In the step of determining the risk value,
A security risk information determination step of comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model to calculate a risk value for the new data.

[Claim 5 is abandoned upon payment of registration fee.]

5. The method of claim 4,
Comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model to calculate a risk value for the new data,
A plurality of pieces of information including a risk value according to a user-specific event similar to the security event information in the machine learning model are selected in descending order of similarity, and the user information included in the selected information is compared with the user information included in the new data And calculating a risk value included in one piece of information having the highest degree of similarity as a risk value for a user-specific event of new data.

[Claim 6 is abandoned due to the registration fee.]

5. The method of claim 4,
Comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model to calculate a risk value for the new data,
A plurality of pieces of information including a risk value according to a user-specific event similar to user information in a machine learning model are selected in descending order of similarity, and security event information included in the selected information and security event information included in the new data And calculating a risk value included in one piece of information having the highest similarity as a risk value for a user-specific event of the new data.

A machine learning model generating unit that generates a machine learning model including a risk value according to a user-specific event by learning log data including security event information and user information collected through a security solution using a machine learning algorithm; And
And a risk value determiner for determining a risk value for information leakage using the learned machine learning model when new data is input,
And a database for storing security event information and user information included in log data input from the security solution, wherein a risk value for information identical to at least one of security event information and user information included in the received new data is included in the log information, Estimates and calculates a risk value with reference to information having a highest degree of similarity with at least one of security event information and user information included in the new data,
The machine learning model generation unit may include:
Transmitting information including a risk value according to the user-specific event to an administrator terminal,
When a request message for a re-learning including a risk value re-calculated by an administrator from the administrator terminal is received, a machine learning algorithm is used based on the re-calculated risk value included in the re- Learning information to regenerate a machine learning model.

delete

8. The method of claim 7,
The risk value determination unit may include:
Wherein the security risk information includes security event information included in new data to be input and security information and user information included in the machine learning model are compared with each other to calculate a risk value for new data.

[Claim 11 is abandoned upon payment of the registration fee.]

11. The method of claim 10,
Comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model to calculate a risk value for the new data,
A plurality of pieces of information including a risk value according to a user-specific event similar to the security event information in the machine learning model are selected in descending order of similarity, and the user information included in the selected information is compared with the user information included in the new data And calculates a risk value included in one piece of information having the highest degree of similarity as a risk value for a user-specific event of the new data.

[12] has been abandoned due to the registration fee.

11. The method of claim 10,
Comparing the security event information and the user information included in the input new data with the security event information and the user information included in the machine learning model to calculate a risk value for the new data,
A plurality of pieces of information including a risk value according to a user-specific event similar to user information in a machine learning model are selected in descending order of similarity, and security event information included in the selected information and security event information included in the new data And calculates a risk value included in one piece of information having the highest degree of similarity as a risk value for a user-specific event of the new data.