KR100961992B1

KR100961992B1 - Method and Apparatus of cyber criminal activity analysis using markov chain and Recording medium using it

Info

Publication number: KR100961992B1
Application number: KR1020080018496A
Authority: KR
Inventors: 인호; 김도훈
Original assignee: 고려대학교 산학협력단
Priority date: 2008-02-28
Filing date: 2008-02-28
Publication date: 2010-06-08
Also published as: KR20090093143A

Abstract

마르코프 체인을 이용한 사이버 범죄 행위 분석 방법, 그 장치 및 이를 기록한 기록매체가 개시된다.Disclosed are a cyber criminal behavior analysis method using a Markov chain, an apparatus thereof, and a recording medium recording the same.

본 발명에 따른 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법은,Cybercrime behavior analysis method using the Markov chain according to the present invention,

서버 및 로그서버에 저장된 유저의 행위 데이터를 기반으로 유저의 포렌식 행위를 정의하고 상기 정의된 포렌식 행위를 선별하고, 상기 선별된 포렌식 행위에 따라 생성된 증거를 기반으로 상기 포렌식 행위에 대한 전이행렬을 구성하는 단계; 상기 구성된 전이 행렬에 따라 마르코프 체인을 이용하여 상기 포렌식 행위의 우선순위화를 위한 최종 확률 벡터값을 산출함으로써 상기 포렌식 행위의 우선 순위를 설정하는 단계; 상기 설정된 우선 순위에 상기 포렌식 행위에 대한 잡음 페이지를 제거하여 우선 순위를 재설정하는 단계; 상기 재설정된 우선 순위를 기반으로 몬테카를로 시뮬레이션을 수행하여 상기 우선 순위의 신뢰성을 검증하는 단계; 및 상기 검증된 포렌식 행위에 대한 시나리오를 데이터베스에 저장하는 단계를 포함한다.Define the forensic behavior of the user based on the behavior data of the user stored in the server and the log server, and select the forensic behavior defined by the user, and based on the evidence generated according to the selected forensic behavior, the transition matrix for the forensic behavior Constructing; Setting a priority of the forensic behavior by calculating a final probability vector value for prioritizing the forensic behavior using a Markov chain according to the configured transition matrix; Resetting the priority by removing the noise page for the forensic behavior at the set priority; Verifying the reliability of the priority by performing Monte Carlo simulation based on the reset priority; And storing a scenario for the verified forensic behavior in a database.

본 발명에 의하면, 유저 기반의 포렌식 증거를 논리적인 인과관계를 통해 분석함으로써 향후 행위기반, 즉 시나리오 기반의 위협 탐지에 대한 신뢰도를 향상시킬 수 있고, 다양한 위협 시나리오를 데이터베이스화하고, 이에 대한 기존의 로그 데이터의 행위에 대한 분석을 통해 다양한 포렌식 행위를 규명하고, 향후 발생할 수 있는 다양한 위협에 대한 유저의 시나리오를 미리 분석하여 위협에 대해 능동적으로 태처할 수 있고, 위협 탐지에 대한 오탐율을 감소시킬 수 있는 효과가 있다.According to the present invention, by analyzing the user-based forensic evidence through a logical causal relationship, it is possible to improve the reliability of future threat-based, scenario-based threat detection, and to database various threat scenarios. By analyzing the behavior of the log data, it is possible to identify various forensic behaviors, proactively analyze the user's scenarios for various threats that may occur in the future, and actively capture the threats, and reduce the false positive rate for threat detection. It can be effective.

Description

Method and Apparatus of cyber criminal activity analysis using markov chain and Recording medium using it}

본 발명은 디지털 포렌식 증거 분석에 관한 것으로서, 특히 침입자가 남기는 다양한 메타 데이터를 시간을 고려한 디지털 포렌식 증거를 분석하고, 상기 포렌식 행위의 인과 관계를 시간에 따라 규명하기 위하여 마르코프 체인에 따른 분석 방법을 적용하여 침입자에 대한 사이버 범죄 행위에 능동적으로 대처할 수 있고, 침입자의 공격을 정확하게 탐지할 수 있는 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법, 그 장치 및 이를 기록한 기록매체에 관한 것이다.The present invention relates to digital forensic evidence analysis, in particular to analyze the digital forensic evidence considering the various metadata left by the intruder, and to apply the analysis method according to the Markov chain in order to identify the causal relationship of the forensic behavior over time. The present invention relates to a cyber criminal behavior analysis method using a Markov chain that can actively cope with cyber criminal behavior against an intruder and to accurately detect an invader's attack, an apparatus thereof, and a recording medium recording the same.

정보통신기술의 비약적인 발전과 정보기기의 광범위한 활용으로 인하여 정부, 기업, 개인 증 모든 경제주체의 생활 방식이 크게 변화하고 있고, 경제 사회 전반의 시스템이 근본적으로 바뀌고 있으며, 지식과 정보가 사회발전의 원동력이 되고 있는 지식 정보 사회로 재편되고 있다.Due to the rapid development of information and communication technology and the widespread use of information equipment, the way of life of all economic subjects such as government, corporations and individuals is greatly changing, the system of economic society in general is fundamentally changing, and knowledge and information It is being reorganized into the knowledge and information society that is the driving force.

그러나, 정보화의 진행과 더불어 정보 시스템의 불법 침입, 마비, 파괴, 프라이버시 침해 및 개인 정보의 오남용, 인터넷을 통한 범죄 행위, 암호 기술의 부 정 사용, 전자거래의 안전 및 신뢰성 저해, 지적재산권 침해, 불건전 정보의 유통 등 각종 정보화의 역기능들이 심각한 사회문제로 대두 되고 있다.However, with the progress of informatization, illegal invasion, paralysis, destruction, invasion of privacy and misuse of personal information, criminal acts over the Internet, fraudulent use of cryptographic technologies, compromise of safety and reliability of electronic transactions, infringement of intellectual property rights, Various dysfunctions of informatization such as circulation of unhealthy information are emerging as serious social problems.

이와 같이, 정보의 유출, 변조등 역기능의 폐해가 커짐에 따라 국가 및 공공기관은 물론 민간 영역에서도 정보보호의 필요성이 증가하고 있다.In this way, the necessity of information protection is increasing in the private sector as well as in the state and public institutions, as the adverse effects of information leakage and tampering are increased.

과거 기밀성 위주의 보안에서 가용성, 무결성이 포함된 종합적인 보안의 필요성이 강조되기 때문에 정보보호의 개념과 영역이 확대되고 있는 것이다.In the past, confidentiality-based security emphasized the need for comprehensive security, including availability and integrity, to expand the concept and scope of information security.

이는, 어떠한 조직도 보안 위협을 효과적으로 탐지할 수 없다면, 결코 사고대응을 원활하게 할 수 없게 되는 상황에 처하게 된다. This means that if no organization can effectively detect security threats, it will never be able to respond to incidents.

실제로, 자동화 기술을 이용한 대응 방법도 필요하지만, 상황에 따라 전문가들의 판단이 중요시될 수 있다.In practice, there is a need for a countermeasure using automation technology, but expert judgment may be important depending on the situation.

종래는 포렌식 절차로서 네트워크 기반의 포렌식으로 특히, 침입 탐지나 방화벽 같은 툴에 의해 침입 판단 여부 정보를 이용하여 포렌식 분석을 통해 공격자 역추적을 수행하였으나, 이는 유저 행위에 대한 개념을 도입한 것이 아니므로, 다양한 시나리오에 대응되는 침입자의 공격 패턴을 정확하게 추정할 수 없고, 단순히 공격자 역추적에 의존하여 사이버 범죄 행위를 사전에 방지하기 곤란한 문제점이 있다. Conventionally, forensic procedures are network-based forensics. In particular, attackers trace back through forensic analysis using intrusion detection information by tools such as intrusion detection or firewall, but this does not introduce the concept of user behavior. In addition, there is a problem in that it is difficult to accurately estimate an attacker's attack pattern corresponding to various scenarios, and it is difficult to prevent cyber criminal behavior in advance by simply relying on the attacker traceback.

따라서, 본 발명이 해결하고자 하는 첫 번째 과제는 유저 행위에 대한 포렌식 개념을 도입하여 침입 후에 행동하는 다양한 범죄 행위를 추정하고, 이에 대한 최적의 행동 시나리오를 구성함으로써 포렌식 행위에 대한 신뢰성을 향상시킬 수 있는 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법을 제공하는 것이다.Accordingly, the first problem to be solved by the present invention is to introduce a forensic concept of user behavior, to estimate various criminal behaviors acting after intrusion, and to construct an optimal behavior scenario to improve reliability of forensic behavior. It is to provide a cyber criminal behavior analysis method using the Markov chain.

본 발명이 해결하고자 하는 두 번째 과제는 상기 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법을 적용한 마르코프 체인을 이용한 사이버 범죄 행위 분석 장치를 제공하는 것이다.The second problem to be solved by the present invention is to provide a cyber criminal behavior analysis apparatus using the Markov chain applying the cyber criminal behavior analysis method using the Markov chain.

본 발명이 해결하고자 하는 세 번째 과제는 상기 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법을 컴퓨터에서 수행할 수 있도록 프로그램으로 기록된 기록매체를 제공하는 것이다.A third problem to be solved by the present invention is to provide a recording medium recorded by a program so that the computer can perform the cyber criminal behavior analysis method using the Markov chain.

상기 첫 번째 과제를 해결하기 위하여 본 발명은,The present invention to solve the first problem,

서버 및 로그서버에 저장된 유저의 행위 데이터를 기반으로 유저의 포렌식 행위를 정의하고 상기 정의된 포렌식 행위를 선별하고, 상기 선별된 포렌식 행위에 따라 생성된 증거를 기반으로 상기 포렌식 행위에 대한 전이행렬을 구성하는 단계; 상기 구성된 전이 행렬에 따라 마르코프 체인을 이용하여 상기 포렌식 행위의 우선순위화를 위한 최종 확률 벡터값을 산출함으로써 상기 포렌식 행위의 우선 순위를 설정하는 단계; 상기 설정된 우선 순위에 상기 포렌식 행위에 대한 잡음 페이지를 제거하여 우선 순위를 재설정하는 단계; 상기 재설정된 우선 순위를 기반으로 몬테카를로 시뮬레이션을 수행하여 상기 우선 순위의 신뢰성을 검증하는 단계; 및 상기 검증된 포렌식 행위에 대한 시나리오를 데이터베스에 저장하는 단계를 포함하는 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법을 제공한다.Define the forensic behavior of the user based on the behavior data of the user stored in the server and the log server, and select the forensic behavior defined by the user, and based on the evidence generated according to the selected forensic behavior, the transition matrix for the forensic behavior Constructing; Setting a priority of the forensic behavior by calculating a final probability vector value for prioritizing the forensic behavior using a Markov chain according to the configured transition matrix; Resetting the priority by removing the noise page for the forensic behavior at the set priority; Verifying the reliability of the priority by performing Monte Carlo simulation based on the reset priority; And it provides a cyber criminal behavior analysis method using the Markov chain comprising the step of storing the scenario for the verified forensic behavior in a database.

한편, 상기 정의된 포렌식 행위는 상기 유저, 상기 유저의 접속 정보를 저장하는 로그 서버, 및 상기 유저에 의한 어플리케이션 수행 정보를 포함하는 상호 접근 정보에 기반하여 선별된 것을 특징으로 한다.The defined forensic behavior may be selected based on mutual access information including the user, a log server storing the access information of the user, and application performance information by the user.

그리고, 상기 전이행렬은 상기 상호 접근 정보에 따라 구성되는 쌍대 행렬에 의해 구성되는 것을 특징으로 한다.The transition matrix may be configured by a dual matrix configured according to the mutual access information.

아울러, 상기 포렌식 행위의 우선 순위를 설정하는 단계는 상기 포렌식 행위 각각에 가중치를 부여함으로써 포렌식 행위의 우선 순위를 설정하며, 상기 포렌식 행위는 우선 순위 기반의 포렌식 행위, 습관 기반의 포렌식 행위 및 반습관 기반의 포렌식 행위로 분류되는 것을 특징으로 한다.In addition, the setting of the priority of the forensic behavior sets a priority of the forensic behavior by assigning a weight to each of the forensic behaviors, and the forensic behaviors are forensic behaviors based on priority, habit-based forensic behaviors, and counter- habits. It is characterized by being classified as based on forensic behavior.

또한, 상기 잡음 페이지의 제거는 특이값 분해(Singular Value Decomposition:SVD)에 따라 잡음 페이지가 제거되는 것을 특징으로 한다.In addition, the removal of the noise page is characterized in that the noise page is removed according to singular value decomposition (SVD).

한편, 상기 전이행렬을 구성하는 단계는 상기 분류된 포렌식 행위에 대하여 데이터 마이닝을 수행하는 단계를 포함하되, 상기 분류된 포렌식 행위는 XML 또는 AXML의 계층적 문서 형태로 저장되는 것을 특징으로 한다.The step of constructing the transition matrix may include performing data mining on the categorized forensic behavior, wherein the categorized forensic behavior is stored in a hierarchical document form of XML or AXML.

그리고, 상기 최종 확률 벡터값

는And the final probability vector value.

Is

상기 데이터 마이닝에 따라 미리 결정된 초기값(initial value)을

라 하고, 상기 유저의 행위 데이터에 기반한 전이 행렬을

라 하고, 상기 유저에 의해 수행되는 포렌식 행위 데이터의 총 수를

이라고 할 때, 수학식

에 의해 구성되는 것을 특징으로 한다.Initial value determined in advance according to the data mining (initial value)

A transition matrix based on the user's behavior data

The total number of forensic behavior data performed by the user

Speaking of equations

It is characterized by consisting of.

아울러, 상기 검증된 포렌식 행위에 대한 시나리오를 데이터베스에 저장하는 단계는 유저별 데이터 베이스를 별도로 구성하고, 상기 유저별 로그 데이터에 변경이 발생할 경우 실시간으로 상기 데이터 베이스를 삭제 및 수정을 수행하는 단계를 포함하는 것을 특징으로 한다.In addition, storing the verified forensic behavior scenario in a database may include separately configuring a database for each user, and deleting and modifying the database in real time when a change occurs in the log data for each user. Characterized in that it comprises a.

상기 두 번째 과제를 해결하기 위하여 본 발명은,The present invention to solve the second problem,

유저의 포렌식 행위를 정의하고, 상기 정의된 포렌식 행위를 선별하는 포렌식 행위 정의부; 상기 선별된 포렌식 행위에 따라 생성된 증거를 기반으로 상기 포렌식 행위에 대한 전이행렬을 구성하는 전이행렬 구성부; 상기 구성된 전이 행렬에 따라 마르코프 체인을 이용하여 상기 포렌식 행위의 우선순위화를 위한 최종 확률 벡터값을 산출함으로써 상기 포렌식 행위의 우선 순위를 설정하는 우선 순위 설정부; 상기 설정된 우선 순위에 상기 포렌식 행위에 대한 잡음 페이지를 제거하여 우선 순위를 재설정하는 잡음 페이지 제거부; 상기 재설정된 우선 순위를 기반으로 몬테카를로 시뮬레이션을 수행하여 상기 우선 순위의 신뢰성을 검증하는 검증부; 및 상기 검증된 포렌식 행위에 대한 시나리오를 데이터베스에 저장하는 데이터베이스를 포함하는 마르코프 체인을 이용한 사이버 범죄 행위 분석 장치를 제공한다.A forensic behavior definition unit that defines a forensic behavior of a user and selects the defined forensic behavior; A transition matrix constructing unit constituting a transition matrix for the forensic behavior based on the evidence generated according to the selected forensic behavior; A priority setting unit for setting a priority of the forensic behavior by calculating a final probability vector value for prioritizing the forensic behavior using a Markov chain according to the configured transition matrix; A noise page removal unit for resetting the priority by removing the noise page for the forensic behavior at the set priority; A verification unit which verifies the reliability of the priority by performing Monte Carlo simulation based on the reset priority; And it provides a cyber criminal behavior analysis apparatus using the Markov chain including a database for storing the scenario for the forensic behavior verified in the database.

여기서, 상기 포렌식 행위 정의부는 상기 유저, 상기 유저의 접속 정보를 저장하는 로그 서버, 및 상기 유저에 의한 어플리케이션 수행 정보를 포함하는 상호 접근 정보에 기반하여 포렌식 행위를 선별하는 것을 특징으로 한다.Here, the forensic behavior defining unit may select forensic behavior based on mutual access information including the user, a log server storing the access information of the user, and application performance information by the user.

아울러, 상기 우선 순위 설정부는 상기 포렌식 행위 각각에 가중치를 부여함으로써 포렌식 행위의 우선 순위를 설정하며, 상기 포렌식 행위는 우선 순위 기반의 포렌식 행위, 습관 기반의 포렌식 행위 및 반습관 기반의 포렌식 행위로 분류되는 것을 특징으로 한다.In addition, the priority setting unit sets the priority of the forensic behavior by assigning a weight to each of the forensic behaviors, and the forensic behaviors are classified into priority-based forensic behavior, habit-based forensic behavior, and semi- habit-based forensic behavior. It is characterized by.

한편, 상기 잡음 페이지 제거부는 특이값 분해에 따라 잡음 페이지를 제거하는 것을 특징으로 한다.The noise page remover may remove the noise page according to the singular value decomposition.

또한, 상기 전이행렬 구성부는 상기 분류된 포렌식 행위에 대하여 데이터 마이닝을 수행하는 데이터 마이닝 모듈을 포함하되, 상기 분류된 포렌식 행위는 XML 또는 AXML의 계층적 문서 형태로 저장되는 것을 특징으로 한다.In addition, the transition matrix configuration unit includes a data mining module that performs data mining on the classified forensic behavior, wherein the classified forensic behavior is stored in a hierarchical document form of XML or AXML.

그리고, 상기 최종 확률 벡터값

는 상기 데이터 마이닝에 따라 미리 결정된 초기값(initial value)을

라 하고, 상기 유저의 행위 데이터에 기반한 전이 행렬을

이라고 할 때, 수학식

에 의해 구성되는 것을 특징으로 한다.And the final probability vector value.

Is an initial value determined according to the data mining.

A transition matrix based on the user's behavior data

The total number of forensic behavior data performed by the user

Speaking of equations

It is characterized by consisting of.

상기 세 번째 과제를 해결하기 위하여 본 발명은, 상기 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법을 컴퓨터에서 수행할 수 있도록 프로그램으로 기록된 기록매체를 제공한다.In order to solve the third problem, the present invention provides a recording medium recorded by a program so that the computer can perform the cyber criminal behavior analysis method using the Markov chain.

본 발명은 종래의 호스트나 네트워크 기반이 아닌 유저 행위에 대한 포렌식 개념을 도입하여, 유저의 침입 후에 행동하는 유저의 다양한 행위 패턴을 추정하고, 최적의 포렌식 행위 시나리오를 분석하고, 이를 검증한 후 이를 데이터 베이스 화한다.The present invention introduces a forensic concept of user behavior that is not based on a conventional host or network, estimates various behavior patterns of a user who acts after a user's intrusion, analyzes an optimal forensic behavior scenario, and verifies it. Database.

이를 위하여 본 발명은, 포렌식 행위에 대한 인과관계를 규명하기 위하여 마르코프(Markov) 체인 방법론에 기반하였고, 확률론적 접근에 의한 오차율을 극복하기 위하여 잡음 제거 알고리즘을 도입하여 오차가 제거된 포렌식 행위를 규명할 수 있다.To this end, the present invention is based on the Markov chain methodology to identify the causal relationship to forensic behavior, and to identify the forensic behavior from which errors are eliminated by introducing a noise reduction algorithm to overcome the error rate by stochastic approach. can do.

뿐만 아니라, 본 발명은 최상의 포렌식 행위에 대한 시나리오를 조합하기 위하여 데이터베이스를 구축하고, 몬테카를로 시뮬레이션의 검증을 통하여 포렌식 행위의 시나리오에 대한 신뢰도를 향상시킬 수 있다.In addition, the present invention can build a database to combine the scenario for the best forensic behavior, and can improve the reliability of the scenario of forensic behavior through the verification of Monte Carlo simulation.

실제로, 어떠한 조직도 보안위협을 효과적으로 탐지할 수 없다면, 결코 사고대응을 원활하게 할 수 없다.Indeed, if no organization can effectively detect a threat, then incident response can never be facilitated.

이는 자동화 기술을 이용한 대응 방법도 중요하지만, 상황에 따라서 전문가들의 판단이 중요시 될 수 있다.This method is also important, but the expert's judgment may be important depending on the situation.

따라서, 공격자의 행위를 분석하여 다음 행위를 사전에 유추하거나 패턴을 찾아볼 필요가 있다.Therefore, it is necessary to analyze the attacker's behavior and infer the next behavior or look for patterns.

즉, 비인가, 비정상 그리고 불법적인 이벤트는 최종 유저에 의해 기록되고, 시스템 관리자에 의해 탐지되며, IDS 경보 장치에 의해 확인되거나 많은 다른 경보시스템에 의해 발견되어 진다.That is, unauthorized, abnormal and illegal events are recorded by the end user, detected by the system administrator, confirmed by the IDS alerting device or discovered by many other alarm systems.

이와 같이, 자동화된 프로세스에서 얻어지는 디지털 포렌식 증거들은 조사 분석을 하기 위한 중요한 정보로 활용될 수 있다.As such, the digital forensic evidence obtained from the automated process can be used as important information for survey analysis.

이는, '누가, 무엇을, 언제, 어디서, 어떻게, 그리고 왜'라는 육하원칙데 대 한 물음에 답하고, 확실한 증거로 사용되기 위해서이다.This is to answer the question about who, what, when, where, how, and why, and to use it as solid evidence.

따라서 하기와 같은 증거들의 수집이 되어야 할 것이다.Therefore, the following evidence should be collected.

1. 시스템의 날짜와 시간1. Date and time of the system

2. 최근 접속자 로그 리스트2. Recent Visitor Log List

3. 전체 파일 시스템의 시간/날짜 스탬프3. Time / date stamp of entire file system

4. 최근 실행 어플리케이션(설치 포함)4. Recently Run Applications (including installation)

5. 최근 오픈 소켓5. Recessed Open Socket

6. 오픈 소켓상에서의 어플리케이션 리스닝6. Listening to Applications on Open Sockets

7. 최근 다른 시스템과의 연결 정보 또는 네트워크 토폴로지7. Recent connection information or network topology with other systems

이와 같은 증거를 기반으로 호스트 기반의 증거, 네트워크 기반의 증거, 그리고 비기술적인 조사 방법 등을 통해 행적 증거를 수집하게 된다.Based on such evidence, trace evidence is collected through host-based, network-based, and non-technical research methods.

본 발명은 이 중에서 비기술적인 조사 방법을 이용하여 포렌식 행위에 대한 사이버 범죄 행위 분석 모델을 구성할 수 있다.The present invention can construct a cyber criminal behavior analysis model for forensic behavior using a non-technical investigation method among these.

이를 위하여, 분서 트랙킹 정보(워드 문서, 이메일, 웹페이지 등), 문서파일이나 웹페이지 캐쉬 영역에 존재하는 타임 스탬프 등의 다양한 증거를 시스템 내에서 수집한다. To this end, various evidences such as document tracking information (word documents, emails, web pages, etc.), time stamps present in document files or web page cache areas are collected in the system.

최근, 디지털 포렌식이 주목을 받고, 다양한 분석 방법론이 제안되고 있으나, 이 대부분은 하드 디스크 등의 시스템에 기록된 정보를 통해 포렌식 행위를 분석하는 방법에 속한다.Recently, digital forensics have attracted attention and various analysis methodologies have been proposed, but most of them belong to a method for analyzing forensic behavior through information recorded in a system such as a hard disk.

이 같은 단편적인 증거 자료가 신빙성을 가지기 위해서는 증거가 나타나고 펴현될 수 있는 행위를 묘사할 필요가 있다.In order to be credible, such fragmentary evidence needs to describe the behavior in which evidence can appear and be revealed.

따라서, 본 발명은 이와 같은 증거 자료를 통하여 시스템 내에서 행해질 수 있는 다양한 포렌식 행위의 인과관계를 규명하고, 이를 통하여 침입자의 비중이 높은 행위를 선별하는데 그 핵심이 있다.Therefore, the present invention is to identify the causal relationship between the various forensic actions that can be performed in the system through such evidence, and the core of the present invention is to select the high incidence of intruders.

이하, 본 발명의 바람직한 실시예를 첨부도면에 의거하여 상세히 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

한편, 다음에 예시하는 본 발명의 실시예는 여러 가지 다른 형태로 변형할 수 있으며, 본 발명의 범위가 다음에 상술하는 실시예에 한정되는 것은 아니다. 본 발명의 실시예는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위하여 제공된다.In addition, the embodiment of the present invention illustrated below may be modified in various other forms, and the scope of the present invention is not limited to the embodiments described below. Embodiments of the invention are provided to more fully illustrate the invention to those skilled in the art.

도 1은 본 발명에 따른 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법의 흐름도이다.1 is a flowchart of a cyber criminal behavior analysis method using a Markov chain according to the present invention.

도 1을 참조하면, 우선, 서버 및 로그서버에 저장된 유저의 행위 데이터를 기반으로 유저의 포렌식 행위를 정의하고 상기 정의된 포렌식 행위를 선별한다(110 과정)Referring to FIG. 1, first, forensic behavior of a user is defined based on behavior data of a user stored in a server and a log server, and the forensic behavior defined above is selected (step 110).

이는, 최초 데이터 마이닝(data mining)을 수행하는 것으로, 유저와 각각의 포렌식 행위에 전이 행렬을 구성하기 위한 각 구성요소 값을 산출하기 위한 것이다.This is to perform initial data mining, and to calculate each component value for constructing a transition matrix between the user and each forensic behavior.

우선, 침입 유저 혹은 관심 유저를 정의하고, 이들에 의한 웹 브라우저의 실행, 로그 서버 탐색, 소프트웨어 다운로드, 특정의 어플리케이션 설치, 특정의 어 플리케이션의 실행 등과 같이 유저의 포렌식 행위를 분류한다.First, the intrusion user or the user of interest is defined, and the forensic behavior of the user is classified such as the execution of the web browser, the log server search, the software download, the installation of a specific application, and the execution of a specific application.

이는 도 2에서 볼 수 있는 바와 같이, 유저(210)에 의해 브라우저 웹(220)이 실행될 수 있고, 로그 서버(230)의 탐색을 수행할 수 있으며, 유저에 의해 특정 소프트웨어 다운로드(240)의 과정이 수행될 수 있고, 특정의 어플리케이션의 설치(250) 및 어플리케이션의 실행(260)이 수행될 수 있다.As shown in FIG. 2, the browser web 220 may be executed by the user 210, search of the log server 230, and the process of downloading a specific software 240 by the user. This may be performed, and the installation 250 of the specific application and the execution 260 of the application may be performed.

이는 도 2에서 나타나는 경로가 하나의 행위는 단독적으로 나타날 수 있고, 또는 여러 가지 경로에 의해 특정 행위가 수행될 수 있다.The path shown in FIG. 2 may be a single action alone, or a specific action may be performed by various paths.

본 발명은 도 2에서 나타난 특정의 행위에 수반되는 복수 개의 경로에 대하여 특정 포렌식 행위가 발생되는 시나리오를 포렌식 행위 각각에 대하여 우선 순위를 설정하여 이를 기반으로 포렌식 행위에 대한 데이터 베이스를 구축하는데 있다. The present invention sets a priority for each forensic behavior in a scenario in which a specific forensic behavior occurs with respect to a plurality of paths associated with the specific behavior shown in FIG. 2, and builds a database for the forensic behavior based on this.

포렌식 행위를 분류하기 위한 데이터 마이닝은 하기의 표 1과 같이 준비 단계, 취합 단계, 분석 단계로 나눌 수 있다.Data mining for classifying forensic behavior may be divided into a preparation step, an aggregation step, and an analysis step as shown in Table 1 below.

상기와 같은 데이터 마이닝을 통하여 포렌식 행위를 분류하고, 전이 행렬을 구성하게 된다.Through the data mining as described above, forensic behavior is classified and a transition matrix is constructed.

그 다음, 상기 선별된 포렌식 행위에 따라 생성된 증거를 기반으로 상기 포렌식 행위에 대한 전이행렬을 구성한다(120 과정).Next, a transition matrix for the forensic behavior is constructed based on the evidence generated according to the selected forensic behavior (step 120).

상기 정의된 포렌식 행위는 상기 유저, 상기 유저의 접속 정보를 저장하는 로그 서버, 및 상기 유저에 의한 어플리케이션 수행 정보를 포함하는 상호 접근 정보에 기반하여 선별될 수 있으며, 상기 전이행렬은 상기 상호 접근 정보에 따라 구성되는 쌍대 행렬에 의해 구성될 수 있다.The defined forensic behavior may be selected based on mutual access information including the user, a log server storing the access information of the user, and application performance information by the user, and the transition matrix is the mutual access information. It can be configured by a dual matrix consisting of.

본 발명에서 있어서, 쌍대 행렬을 구성하는 것은 포렌식 행위가 각각 독립적으로 수행될 수 있으나, 시나리오 기반의 포렌식 행위에 있어서, 각 행동은 다른 행동과 맞물려서 수행될 수 있으므로, 한 쌍의 행동간의 관계성을 도출하기 위하여 쌍대 행렬을 구성하고, 이를 기반으로 전이 행렬을 생성할 수 있다.In the present invention, constructing a dual matrix may be performed independently of forensic behavior, but in scenario-based forensic behavior, each behavior may be performed in conjunction with other behaviors, and thus a relationship between a pair of behaviors may be performed. To derive the dual matrix, a transition matrix can be generated based on the dual matrix.

또한, 전이행렬을 구성하기 위하여 상기 선별된 포렌식 행위에 대하여 데이터 마이닝을 수행할 수 있고, 상기 선별된 포렌식 행위는 XML 또는 AXML의 계층적 문서 형태로 저장될 수 있다.In addition, data mining may be performed on the selected forensic behavior to construct a transition matrix, and the selected forensic behavior may be stored in a hierarchical document form of XML or AXML.

이는 각각의 포렌식 행위는 계층적으로 분류할 수 있으므로, 계층적인 문서 형태를 가지는 XML 또는 AXML 문서로 저장할 수 있음은 물론이다.Since each forensic behavior can be classified hierarchically, it can be stored as an XML or AXML document having a hierarchical document type.

한편, 도 3에 의하면, 상기 생성된 증거에 기반한 유저의 행위를 각각의 노드로 구성할 수 있다.On the other hand, according to Figure 3, the user's behavior based on the generated evidence can be configured in each node.

도 3은 포렌식 행위와 관련된 각 노드의 관계성(interaction)을 도시한 것이다.3 illustrates the interaction of each node with respect to forensic behavior.

도 3와 같이, 유저에 의해 수행되는 각각의 행위에 대하여, 포렌식 행위의 우선 순위는 마르코프 체인에 기반하여 우선순위가 연속적으로 정의될 수 있다.As shown in FIG. 3, for each action performed by a user, the priority of the forensic action may be continuously defined based on the Markov chain.

도 3을 기반으로, 각각의 포렌식 행위에 대한 우선 순위의 값을 산출하기 위하여, 포렌식 행위에 대한 전이행렬을 구성하게 된다.Based on FIG. 3, in order to calculate a priority value for each forensic behavior, a transition matrix for the forensic behavior is configured.

상기 전이행렬을 구성하기 위하여 본 발명은 한 쌍의 포렌식 행위를 기반으로 구성되는 하이퍼링크 매트릭스(Hyperlink matrix)인 전이 행렬

를 사용하며,

개의 포렌식 행위를 기반으로 마르코프 연산의 최종 확률 벡터값

를 구성할 수 있다.In order to construct the transition matrix, the present invention is a transition matrix which is a hyperlink matrix constructed based on a pair of forensic behaviors.

Using the

Final probability vector of Markov operations based on forensic behavior

Can be configured.

상기 전이행렬

를 이용하여 마르코프 연산의 최종 확률 벡터값

를 구성하면 하기의 수학식 1과 같다.The transition matrix

Final probability vector value of the Markov operation

When configured as shown in Equation 1 below.

이와 같이, 상기 최종 확률 벡터값

라 하고, 상기 유저의 행위 데이터에 기반한 전이 행렬을

이라고 할 때, 상기 수학식 1에 의해 연산된다.As such, the final probability vector value

Is an initial value determined according to the data mining.

A transition matrix based on the user's behavior data

The total number of forensic behavior data performed by the user

Is calculated by the above equation (1).

상기 초기값

는

의 범위를 가지게 되는데, 본 발명에서는 데 이터 마이닝 기법에서 통상적으로 사용되는 거듭 제곱 방법(Power method)를 사용하여, 초기값

를 0.85로 정의한다. 이는 크기가 큰 행렬의 경우에 가장 큰 고유치(eigenvalue)를 구하는데 있어서 초기값

를 0.85로 정의할 수 있다.The initial value

Is

In the present invention, an initial value is used by using a power method commonly used in data mining techniques.

Is defined as 0.85. This is the initial value for finding the largest eigenvalue for large matrices.

Can be defined as 0.85.

또한, 상기 포렌식 행위

에 대하여, 상기 수학식 1을 정규화하면, 하기의 수학식 2에 따라 상기 최종 확률 벡터값인 행렬

에 대하여 열벡터

로 구성할 수 있으며, 이는 하기의 수학식 2과 같다.In addition, the forensic behavior

For Equation 1, the equation 1 is normalized. The matrix is the final probability vector value according to Equation 2 below.

Column vector

It may be configured as, which is shown in Equation 2 below.

상기 수학식 2에서 볼 수 있는 바와 같이, 포렌식 행위의 우선 순위는 확률에 의해 연산되므로, 각각의 포렌식 행위에 대한 총 확률의 합은 1이 된다.As can be seen in Equation 2, since the priority of the forensic behavior is calculated by the probability, the sum of the total probabilities for each forensic behavior is 1.

그리고, 전이 행렬을 구성함과 함께 각각의 포렌식 행위들의 연결에 있어서 가중치를 산정할 수 있다.In addition, a weight may be calculated in the concatenation of each forensic behavior while constructing a transition matrix.

이는 시나리오 기반의 포렌식 행위에 있어서, 습관성에 기반하여 우선 순위 기반의 포렌식 행위, 습관 기반의 포렌식 행위 및 반습관 기반의 포렌식 행위로 분류할 수 있다.In scenario-based forensic behavior, this can be classified into priority-based forensic behavior, habit-based forensic behavior, and semi- habit-based forensic behavior based on habitability.

즉, 우선 순위 기반의 포렌식 행위는 습관성이 배제된 포렌식 행위를 의미하 며, 습관 기반의 포렌식 행위는 유저로부터 자명하게 알려진 포렌식 행위를 의미하며, 반습관 기반의 포렌식 행위는 우선 순위 기반의 포렌식 행위와 습관 기반의 포렌식 행위의 중간 범주에 속하는 포렌식 행위를 의미한다.In other words, priority-based forensic behavior refers to forensic behavior without addictive habits, habit-based forensic behavior refers to forensic behavior that is known from users, and semi- habit-based forensic behavior refers to priority-based forensic behavior. And forensic behavior, which falls into the middle category of habit-based forensic behavior.

이와 같이, 상기의 각각의 포렌식 행위에 대하여 가중치를 선정하고, 이로부터 전이행렬에 따라 포렌식 행위의 우선순위를 산출할 수 있다.As such, weights may be selected for each of the forensic actions, and the priority of the forensic actions may be calculated according to the transition matrix.

하기의 수학식 3 내지 수학식 5는 각각의 포렌식 행위에 대한 가중치의 산정식을 나타낸다.Equations 3 to 5 below represent equations for weighting the respective forensic behaviors.

상기 수학식 3은 우선 순위 기반의 포렌식 행위 결과 벡터에 관한 것으로, 우선 순위 기반의 포렌식 행위 결과 벡터

는 전이행렬

(

)의 역수의 개념으로 역행렬을 의미하며, 우선 순위 기반의 포렌식 행위

를 고려한 행위 결과 벡터

는 그 종속성을 고려하여 결과 벡터

의 절대값의 역수로 표현될 수 있다.Equation 3 relates to a forensic action result vector based on priority, and a forensic action result vector based on priority.

Is the transition matrix

(

Is the inverse of the concept of the inverse matrix, priority-based forensic behavior

Behavior result vector

Considering the dependencies

It can be expressed as the inverse of the absolute value of.

상기 수학식 4에 의하면, 습관 기반의 포렌식 행위 결과 벡터에 관한 것으 로, 습관 기반의 포렌식 행위 결과 벡터

에 있어서, 각각의 행위를 나타내는 웹 상태(web State:

) 집합의 포렌식 행위인

내의 각 습관에 대한 가중치 값 벡터

의 총 합이

이전의 습관에 대한 가중치 값 벡터와 반비례함을 알 수 있고, 습관 기반의 포렌식 행위

를 고려한 행위 결과 벡터

는 그 종속성을 고려하여 결과 벡터

와 가중치 값 행렬

의 곱의 전체 합의 절대값 역수는

와 반비례로 표현될 수 있다.According to Equation 4, the habit-based forensic behavior result vector relates to a habit-based forensic behavior result vector.

In which the web state represents each action.

) The forensic act of aggregation

Weight value vector for each habit within

Sum of

Habit-based forensic behavior, inversely proportional to the weight vector of previous habits.

Behavior result vector

Considering the dependencies

Matrix with weights

The absolute reciprocal of the sum of the products of

It can be expressed in inverse proportion to.

상기 수학식 5는 반습관 기반의 포렌식 행위 결과 벡터에 관한 것으로, 반 습관 기반의 포렌식 행위 결과 벡터

는 전이행렬

(

)의 역수의 개념으로 역행렬을 의미하며, 반 습관 기반의 포렌식 행위

를 고려한 행위 결과 벡터

는 그 종속성을 고려하여 결과 벡터

와 각 습관에 대한 가중치 값 행렬

의 곱의 전체 합의 절대값은 각 반습관에 대한 가중치 값 행렬

와 반비례로 표현될 수 있다.Equation 5 relates to a semi- habit based forensic behavior result vector, and a semi habit based forensic behavior result vector.

Is the transition matrix

(

The inverse matrix means the inverse of the concept, and anti- habit based forensic behavior

Behavior result vector

Considering the dependencies

Matrix of weights for each habit

The absolute value of the total sum of the products of is the weighted value matrix for each counter habit.

It can be expressed in inverse proportion to.

그 다음, 상기 구성된 전이 행렬에 따라 마르코프 체인을 이용하여 상기 포 렌식 행위의 우선순위화를 위한 최종 확률 벡터값을 산출함으로써 상기 포렌식 행위의 우선 순위를 설정한다(130 과정).Next, the priority of the forensic behavior is set by calculating a final probability vector value for prioritizing the forensic behavior using a Markov chain according to the configured transition matrix (step 130).

상기 120 과정에서 생성된 전이 행렬을 이용하여 시간적으로 연속되어 발생하는 포렌식 행위를 마르코프 체인을 이용하여 포렌식 행위의 우선순위화를 위한 최종 확률 벡터값을 산출하게 된다.A final probability vector value for prioritizing the forensic behavior is calculated using the Markov chain for the forensic behavior that occurs continuously in time using the transition matrix generated in step 120.

이는 상기 수학식 4내지 수학식 6에 따라 연산된 가중치와 상기 120과정에서 생성된 전이행렬을 이용하여 각각의 포렌식 행위에 대한 우선순위화를 위한 최종 확률 벡터값을 산출한다.This calculates a final probability vector value for prioritizing each forensic behavior using the weights calculated according to Equations 4 to 6 and the transition matrix generated in step 120.

실제로 포렌식 행위는 다양한 행위를 수반하기에 그 관계성이 복잡하며, 상기 포렌식 행위의 관계성은 도 3에 도시된 바와 같다.In practice, the forensic behavior is complicated because it involves various behaviors, and the forensic behavior is shown in FIG. 3.

도 3에 대한 유저의 전체적인 포렌식 행위는

군으로 표현할 수 있으며, 이는 각각의 포렌식 행위에 대한 각 노드의 구성요소

로 나타낼 수 있다.The overall forensic behavior of the user for FIG.

It can be expressed as a group, which is the component of each node for each forensic behavior.

It can be represented as.

이는 각각의 상황이 상호 연관성을 가지고 있음을 의미하는데, 이는 상호 서로 종속 관계를 유지함을 나타내고 있다.This means that each situation has an interrelationship, which maintains a mutually dependent relationship.

이는 도 3에 따라 사전 행위가 사후 행위에 직접적인 연관을 주는 마르코프 가정(Markov assumption)에 부합하며, 상호 종속 관계를

인 전이행렬로 나타낼 수 있고, 이는 상기 수학식 1 및 수학식 2에 나타나 있다.This conforms to the Markov assumption, in which the predecessor directly relates to the after action according to FIG.

It can be represented by the phosphorus transition matrix, which is shown in the equation (1) and (2).

즉, 마르코프 성질을 이용한 본 발명의 핵심은 전이행렬

를 구성하는 것으로, 종속적 관계를 나타내는 포렌식 행위와 그 맥락이 동일함을 알 수 있다.That is, the core of the present invention using the Markov property is the transition matrix

By constructing, it can be seen that the context is the same as the forensic behavior representing the dependent relationship.

따라서 최종적으로 마르코프 체인을 적용하면, 전술한 수학식 2와 같은 최종 포렌식 행위의 추론 벡터 값으로 나타나게 된다.Therefore, when the Markov chain is finally applied, it is represented by the inference vector value of the final forensic behavior as shown in Equation 2 above.

그 다음, 상기 설정된 우선 순위에 상기 포렌식 행위에 대한 잡음 페이지를 제거하여 우선 순위를 재설정한다(140 과정).Next, the priority is reset by removing the noise page for the forensic behavior at the set priority (step 140).

본 발명에서는 상기 마르코프 체인에 따라 산출된 우선순위화를 위한 최종 확률 벡터값에 대해 정확성을 기하기 위하여 확률 연산상에서 발생할 수 있는 미세한 오류를 수정하기 위하여 잡음 페이지 제거 알고리즘(Noise Page Elimination Algorithm:NPEA)을 적용한다.In the present invention, a noise page elimination algorithm (NPEA) is used to correct minute errors that may occur in the probability operation in order to correct the final probability vector value for prioritization calculated according to the Markov chain. Apply.

즉, 마르코프 체인에 따라 산출된 우선순위화를 위한 최종 확률 벡터값 산출에 있어서 실제로 학습에 따라 훈련되기 전의 데이터의 관계성에서 유발될 수 있는 확률적 오차를 제거하기 위한 것으로, 잡음 페이지를 제거하여 순수하게 정제된 포렌식 행위의 시나리오 경로를 형성할 수 있다.That is, in the calculation of the final probability vector value for prioritization calculated according to the Markov chain, the noise page is removed to remove probabilistic errors that may be caused by the relationship between the data before training according to the learning. It is possible to form scenario paths of purely refined forensic behavior.

한편, 잡음 페이지 제거는 아래와 같은 과정에 따라 수행된다.Meanwhile, noise page removal is performed according to the following procedure.

우선, 유저에 의해 수행되는 전체 행위를 나타내는 루트(Root) 내에서 존재하는 행렬을

으로 표현할 수 있다. 그리고 행렬

의 원소

는 하기의 수학식 6에 따라 표현될 수 있다.First, a matrix exists within the root that represents the entire action performed by the user.

It can be expressed as And the matrix

Element of

May be expressed according to Equation 6 below.

한편, 상기 행렬

의 전체 사이즈는

과 동일하게

으로 표시된다.Meanwhile, the matrix

Overall size

Same as

Is displayed.

그리고, 유저에 의하여 새로 열리는 페이지의 크기인

는 행렬

로 표현되고, 행렬

의 원소는 하기의 수학식 7에 따라 표현된다.And, the size of the page newly opened by the user

Is a matrix

Represented by a matrix

The element of is expressed according to the following equation.

한편, 본 발명에 따른 잡음 페이지 제거 알고리즘은 기본적으로 특이값 분해(Singular Value Decomposion:SVD)에 기반한다.Meanwhile, the noise page removal algorithm according to the present invention is basically based on singular value decomposition (SVD).

특이값 분해는 행렬의 스펙트럼 이론을 임의의 직사각행렬에 대해 일반화한 것이다. 여기에 스펙트럼 이론을 이용하면 직교 정사각행렬의 고유값을 기저로 하여 대각행렬로 분해할 수 있다.Singular value decomposition is a generalization of the spectral theory of a matrix for any rectangular matrix. Using spectral theory, we can decompose a diagonal matrix based on the eigenvalues of an orthogonal square matrix.

여기서, 상기 특이값 분해에 대한 정의는 하기와 같다.Here, the definition of the singular value decomposition is as follows.

행렬

를 실수 또는 복수수의 집합

의 원소로 이루어진

행렬이라 가정하면, 상기 행렬

는 하기의 수학식 8과 같이 세 행렬의 곱으로 나타낼 수 있다.procession

A set of real or plural numbers

Consisting of elements of

Assume that it is a matrix

Can be expressed as the product of three matrices, as shown in Equation 8 below.

상기 수학식 8에서

는

유니터리 행렬(unitary matrix)이고,

행렬

은 대각선에 음수가 아닌 수를 가지고, 나머지는 모두 0의 값을 가지는 행렬이며,

은

의 켤레행렬이며

유니터리 행렬을 가리킨다.In Equation 8

Is

Unitary matrix,

procession

Is a matrix of non-negative numbers on the diagonal, all of which are zero.

silver

Is a pair of

Pointer to a unitary matrix.

이와 같이 세 행렬의 곱으로 나타내는 것을 행렬

의 특이값 분해라고 한다.Thus, what is represented by the product of three matrices

This is called singular value decomposition of.

또한,

행렬

에 대하여 하기의 수학식 9의 조건을 만족하는 벡터

과

이 존재할 때, 음수가 아닌 실수

를 특이값이라고 한다.Also,

procession

A vector satisfying the condition of Equation 9 below.

and

When it exists, a nonnegative real number

Is called the singular value.

상기 수학식 9의 조건을 만족하는

및

를 각각 좌측 특이벡터와 우측 특이벡터로 명명할 수 있다.Satisfying the condition of Equation 9

And

Can be named as the left singular vector and the right singular vector, respectively.

그 다음, 상기 재설정된 우선 순위를 기반으로 몬테카를로 시뮬레이션을 수행하여 상기 우선 순위의 신뢰성을 검증한다(150 과정).Next, Monte Carlo simulation is performed based on the reset priority to verify the reliability of the priority (step 150).

상기의 과정에 따라 잡음이 제거된 포렌식 행위의 우선 순위에 대한 신뢰성을 더욱 높이기 위하여 몬테카를로 시뮬레이션 수행에 따라 우선 순위의 신뢰성을 검증할 수 있다.According to the above process, in order to further increase the reliability of the priority of the forensic behavior in which the noise is removed, the reliability of the priority may be verified by performing Monte Carlo simulation.

몬테카를로 시뮬레이션은 일반적으로 알려진 방법으로 수행될 수 있으며, 이는 하기의 5 과정으로 수행된다.Monte Carlo simulations can be performed in a generally known manner, which is carried out in the following five steps.

1. 몬테카를로 시뮬레이션을 수행하기 위한 매개 함수

를 생성한다. Parametric Functions for Performing Monte Carlo Simulations

.

2. 상기 매개 함수에 입력할 난수

를 발생한다. 여기서 난수는 상기 우선 순위의 측정에 사용된 입력 데이터에 대응되는 난수를 발생하는데, 관리자에 의해 미리 설정된 임계치 이하의 차이값을 가지는 난수를 발생하는 것이 상기 몬테카를로 시뮬레이션의 또 다른 특징이다.2. Random number to input in the parameter function

Occurs. In this case, the random number generates a random number corresponding to the input data used to measure the priority. Another characteristic of the Monte Carlo simulation is that the random number generates a random number having a difference value less than or equal to a preset threshold.

3. 상기 난수에 따라 생성된 모델 측정 결과값

를 저장한다.3. Model measurement result generated according to the random number

Save it.

4. 상기 2 및 3 과정을 반복수행한다.4. Repeat steps 2 and 3 above.

5. 상기 반복 수행에 따른 결과값

를 히스토그램, 통계처리, 및 신뢰 구간 설정에 따라 분석하여 최종 결과, 즉 우선 순위값을 출력한다.5. Result value according to the above repetition

Is analyzed according to histogram, statistical processing, and confidence interval setting to output a final result, that is, a priority value.

마지막으로, 상기 검증된 포렌식 행위에 대한 시나리오를 데이터베스에 저장한다(160 과정).Finally, the scenario for the verified forensic behavior is stored in the database (step 160).

이와 같이, 상기 몬테카를로 시뮬레이션에 따라 포렌식 행위의 우선 순위를 검증하고, 검증된 포렌식 행위에 대한 시나리오를 데이터 베이스에 저장함으로써, 추후에 유저의 침입에 따른 포렌식 행위에 대처할 수 있으며, 상기 데이터베이스에 저장된 우선순위에 기반한 포렌식 행위의 시나리오를 다른 활용 업체에 제공하여, 범용적으로 사이버 위험에 대응할 수 있도록 할 수 있다.In this way, by verifying the priority of the forensic behavior according to the Monte Carlo simulation, and by storing the scenario for the forensic behavior verified in the database, it is possible to cope with the forensic behavior in accordance with the intrusion of the user later, the priority stored in the database Scenarios of forensic behavior based on rankings can be provided to other users to enable them to respond to cyber risks on a universal basis.

이를 위하여, 본 발명은 유저별 데이터 베이스를 별도로 구성하고, 상기 유저별 로그 데이터에 변경이 발생할 경우 실시간으로 상기 데이터 베이스를 삭제 및 수정을 수행함으로써, 항상 갱신된 유저의 포렌식 행위에 대한 시나리오를 생성할 수 있다.To this end, the present invention configures a user-specific database separately, and deletes and modifies the database in real time when a change occurs in the log data for each user, thereby creating a scenario for the forensic behavior of the updated user at all times. can do.

도 4는 상기 도 1의 140 과정에서 적용되는 잡음 페이지 제거 알고리즘의 개략적인 흐름도를 도시한 것이다.FIG. 4 is a schematic flowchart of a noise page removal algorithm applied in operation 140 of FIG. 1.

도 4를 참조하면,

행렬

에 대하여 특이값 분해를 수행하여 음수가 아닌 실수

인 특이값을 생성한다(441 과정). 여기서, 특이값은

개 생성될 수 있다.Referring to Figure 4,

procession

A nonnegative real by performing singular value decomposition on

Generate a singular value of phosphorus (step 441). Where the singular value is

Can be generated.

그 다음, 하기 수학식 10에 따라 상기 생성된 특이값을 기반으로 파라미터

를 선택한다(442 과정).Then, the parameter based on the generated singular value according to Equation 10

(Step 442).

그 다음, 상기 442 과정에 따라 선택된 파라미터

를 이용하여 전술한 3개의 유니터리 행렬

를 기반으로 하기의 수학식 11에 따라 파라미터

에 따른 유사행렬

를 생성한다(443 과정).Then, the parameter selected according to step 442 above.

The three unitary matrices described above using

Based on the parameter according to Equation 11 below

Similar matrix according to

Generate (step 443).

그 다음, 상기 443 과정에 사용된 유니터리 행렬

에 따라 하기의 수학식 12에 의거하여

차원에서의 포렌식 행위

의 좌표 벡터

를 생성한다(444 과정).Then, the unitary matrix used in step 443

According to Equation 12 below

Forensic behavior in dimensions

Coordinate vector

(Step 444).

그 다음, 유저에 의해 새로 열리는 페이지에 따라 생성되는

행렬로부터 유사 행렬을 하기의 수학식 13에 따라 연산한다(445 과정).Then, generated according to the new page opened by the user

A similar matrix is calculated from the matrix according to Equation 13 below (step 445).

상기 수학식 13에서

는 행렬

의 각 열벡터를 의미한다.In Equation 13

Is a matrix

Means each column vector of.

그 다음, 상기

행렬과

행렬의 유사성을 측정(446 과정)하는데, 상기 유사성 측정값이 관리자에 의해 미리 설정된 임계값

보다 큰지의 여부를 판단하여(447 과정), 상기 임계값

보다 유사성의 측정값이 큰 경우 유저의 포렌식 행위

를 유지하고(448 과정), 상기 임계값

보다 유사성의 측정값이 크지 않은 경우 유저의 포렌식 행위

를 제거(449 과정)함으로써 잡음 페이지 제거 알고리즘을 종료하게 된다.Then, above

Matrix

The similarity of the matrix is measured (step 446), wherein the similarity measure is a threshold value preset by the administrator.

It is determined whether it is larger (step 447), and the threshold value

Forensic behavior of the user when the measure of similarity is greater

(Step 448), the threshold value

Forensic behavior of the user if the measure of similarity is not large

By removing (449), the noise page removal algorithm is terminated.

여기서, 상기 447과정의 유사성 측정값과 임계값

값과의 대소여부에 관한 판단식은 하기의 수학식 14와 같다.Here, the similarity measurement value and the threshold value of step 447

The determination equation regarding the magnitude with respect to the value is shown in Equation 14 below.

도 5는 본 발명에 적용되는 잡음 페이지 제거 알고리즘에 따라 잡음이 제거된 포렌식 행위의 시나리오를 도시한 것이다.5 illustrates a scenario of forensic behavior in which noise is removed according to a noise page removal algorithm applied to the present invention.

도 5에서 블록(510)은 마르코프 체인에 따라 설정된 포렌식 행위의 우선순위에 기반하여 포렌식 행위의 시나리오를 화살표를 이용하여 도시한 것으로써 필요하지 않은 잡음 페이지(511)가 포함되어 있음을 알 수 있다.In FIG. 5, the block 510 illustrates a scenario of forensic behavior using arrows based on the priority of the forensic behavior set according to the Markov chain, and it can be seen that the noise page 511 is unnecessary. .

이에 대하여, 상기 블록(510)에 대하여 잡음 페이지 제거 알고리즘을 수행하면, 블록(520)에서 볼 수 있는 바와 같이 잡음이 제거된 포렌식 행위의 시나리오를 얻게 된다.In contrast, when the noise page removal algorithm is performed on the block 510, a scenario of the noise-reduced forensic behavior is obtained as shown in block 520.

도 6은 본 발명에 따른 마르코프 체인을 이용한 사이버 범죄 행위 분석 장치의 블록도를 도시한 것이다.6 is a block diagram of an apparatus for analyzing cyber crime behavior using a Markov chain according to the present invention.

한편, 도 6에 대하여 전술한 상술과 중복되는 부분에 대한 상세한 설명은 과감히 생략하기로 한다.On the other hand, detailed description of the portion overlapping with the above-described above with respect to Figure 6 will be boldly omitted.

도 6을 참조하면, 본 발명에 따른 마르코프 체인을 이용한 사이버 범죄행위 분석 장치는 포렌식 행위 정의부(610), 전이 행렬 구성부(620), 우선 순위 설정부(630), 잡음 페이지 제거부(640), 검증부(650) 및 데이터베이스(660)으로 구성될 수 있다.Referring to FIG. 6, the apparatus for analyzing cyber crimes using the Markov chain according to the present invention includes a forensic behavior defining unit 610, a transition matrix constructing unit 620, a priority setting unit 630, and a noise page removing unit 640. ), The verification unit 650 and the database 660.

포렌식 행위 정의부(610)는 유저의 포렌식 행위를 정의하고, 상기 정의된 포렌식 행위를 선별하며, 전이행렬 구성부(620)에서 상기 포렌식 행위 정의부(610)에서 선별된 포렌식 행위에 따라 생성된 증거를 기반으로 상기 포렌식 행위에 대한 전이행렬을 구성한다.The forensic behavior definition unit 610 defines forensic behavior of the user, selects the defined forensic behavior, and is generated according to the forensic behavior selected by the forensic behavior definition unit 610 in the transition matrix configuration unit 620. Construct a transition matrix for the forensic behavior based on evidence.

여기서, 포렌식 행위 정의부(610)는 유저, 상기 유저의 접속 정보를 저장하는 로그 서버, 및 상기 유저에 의한 어플리케이션 수행 정보를 포함하는 상호 접근 정보에 기반하여 포렌식 행위를 선별하여 포렌식 행위를 정의할 수 있다.Here, the forensic behavior defining unit 610 may define forensic behavior by selecting forensic behavior based on mutual access information including a user, a log server storing the access information of the user, and application performance information by the user. Can be.

상기 전이행렬은 상기 상호 접근 정보에 따라 구성되는 쌍대 행렬에 의해 구성되며, 이는 포렌식 행위가 각각의 포렌식 행위가 독립적으로 이루어지기 보다는 상호 시계열적으로 연관된 한 쌍의 포렌식 행위로 구성될 수 있기 때문에 쌍대 행렬에 따라 포렌식 행위의 전이 행렬을 구성할 수 있다. The transition matrix is constituted by a dual matrix constructed according to the mutual access information, which is dual because the forensic behavior may be composed of a pair of forensic behaviors that are related to each other in time series rather than each forensic behavior independently. According to the matrix can be configured a transition matrix of the forensic behavior.

한편, 전이행렬 구성부(620)는 상기 선별된 포렌식 행위에 대하여 데이터 마이닝을 수행하는 데이터 마이닝 모듈(미도시)을 포함하되, 상기 선별된 포렌식 행위는 XML 또는 AXML의 계층적 문서 형태로 저장될 수 있다.On the other hand, the transition matrix configuration unit 620 includes a data mining module (not shown) that performs data mining on the selected forensic behavior, wherein the selected forensic behavior is to be stored in a hierarchical document form of XML or AXML. Can be.

그리고, 상기 최종 확률 벡터값

는 상기 데이터 마이닝에 따라 미리 결정 된 초기값(initial value)을

라 하고, 상기 유저의 행위 데이터에 기반한 전이 행렬을

이라고 할 때, 상기 수학식 1에 의해 구성될 수 있음은 물론이다.And the final probability vector value.

The initial value determined in accordance with the data mining (initial value)

A transition matrix based on the user's behavior data

The total number of forensic behavior data performed by the user

Sure, it can be configured by the above equation (1).

우선 순위 설정부(630)는 상기 구성된 전이 행렬에 따라 마르코프 체인을 이용하여 상기 포렌식 행위의 우선순위화를 위한 최종 확률 벡터값을 산출함으로써 상기 포렌식 행위의 우선 순위를 설정하고, 잡음 페이지 제거부(640)에서 상기 설정된 우선 순위에 상기 포렌식 행위에 대한 잡음 페이지를 특이값 분해에 따라 제거하여 우선 순위를 재설정한다.The priority setting unit 630 sets a priority of the forensic behavior by calculating a final probability vector value for prioritizing the forensic behavior using a Markov chain according to the configured transition matrix, and removes the noise page remover ( At 640, the priority page is reset by removing noise pages for the forensic behavior according to singular value decomposition at the set priority.

상기 우선 순위 설정부(630)는 상기 포렌식 행위 각각에 가중치를 부여함으로써 포렌식 행위의 우선 순위를 설정할 수 있으며, 상기 포렌식 행위는 우선 순위 기반의 포렌식 행위, 습관 기반의 포렌식 행위 및 반습관 기반의 포렌식 행위로 분류될 수 있는 데, 이해 대한 설명은 전술한 바와 같다.The priority setting unit 630 may set the priority of the forensic behavior by assigning a weight to each of the forensic behavior, and the forensic behavior may be a priority-based forensic behavior, a habit-based forensic behavior, and a semi- habit-based forensics. It can be classified as an act, the description of which is as described above.

그러면, 검증부(650)에서 상기 재설정된 우선 순위를 기반으로 몬테카를로 시뮬레이션을 수행하여 상기 우선 순위의 신뢰성을 검증하고, 최종적으로 상기 검증된 포렌식 행위에 대한 시나리오를 데이터베스(660)에 저장한다.Then, the verification unit 650 verifies the reliability of the priority by performing Monte Carlo simulation based on the reset priority, and finally stores the scenario for the verified forensic behavior in the database 660.

본 발명에 따른 마르코프 체인을 이용한 사이버 범죄 행위 분석 방법은 소프트웨어를 통해 실행될 수 있다. 소프트웨어로 실행될 때, 본 발명의 구성 수단들은 필요한 작업을 실행하는 코드 세그먼트들이다. 프로그램 또는 코드 세그먼트들은 프로세서 판독 가능 매체에 저장되거나 전송 매체 또는 통신망에서 반송파와 결합 된 컴퓨터 데이터 신호에 의하여 전송될 수 있다.The cyber criminal behavior analysis method using the Markov chain according to the present invention can be executed through software. When implemented in software, the constituent means of the present invention are code segments that perform the necessary work. The program or code segments may be stored on a processor readable medium or transmitted by a computer data signal coupled with a carrier on a transmission medium or network.

컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 테이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, DVD±ROM, DVD-RAM, 자기 테이프, 플로피 디스크, 하드 디스크(hard disk), 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 장치에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The computer-readable recording medium includes all kinds of recording devices in which data is stored which can be read by a computer system. Examples of computer-readable recording devices include ROM, RAM, CD-ROM, DVD ± ROM, DVD-RAM, magnetic tape, floppy disks, hard disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer devices so that the computer readable code is stored and executed in a distributed fashion.

본 발명은 도면에 도시된 일 실시예를 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며, 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다.Although the present invention has been described with reference to an embodiment shown in the drawings, this is merely an example, and those skilled in the art may have various modifications therefrom and those skilled in the art. It will be appreciated that various modifications and variations of the embodiments are possible therefrom.

그러나, 이와 같은 변형은 본 발명의 기술적 보호범위 내에 있다고 보아야 한다. However, such modifications should be considered to be within the technical protection scope of the present invention.

따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해서 정해져야 할 것이다. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

도 2는 본 발명에 적용되는 유저의 사이버 상에서의 각각의 행동에 대한 구성을 개괄적으로 도시한 것이다.Figure 2 schematically shows the configuration of each action on the cyber of the user applied to the present invention.

Claims

Define the forensic behavior of the user based on the behavior data of the user stored in the server and the log server, and select the forensic behavior defined by the user, and based on the evidence generated according to the selected forensic behavior, Constructing a transition matrix composed of probabilities of occurrence of forensic behavior;

Substituting the constructed transition matrix into a Markov chain representing a state change of the forensic behavior over time, calculating a final probability vector value that is a final probability of a predetermined forensic behavior, and then configuring the final probability vector value. Setting a priority of the forensic behavior according to the magnitude of the vector values;

Resetting the priority by removing the noise page for the forensic behavior at the set priority;

Verifying the reliability of the priority by performing Monte Carlo simulation based on the reset priority; And

And storing the scenario for the verified forensic behavior in a database.

The method of claim 1,

The forensic behavior defined above

And a selection method based on mutual access information including the user, a log server storing access information of the user, and application execution information by the user.

The method of claim 2,

The transition matrix is

Cybercrime behavior analysis method using a Markov chain, characterized in that composed of a dual matrix configured according to the mutual access information.

The method of claim 1,

Setting the priority of the forensic behavior

Another weight vector is generated according to the classification of the forensic behavior, and the resulting weight vector is substituted into the Markov chain to be the final probability vector.

Final probability vector of the value vector

After calculating the, the calculated probability vector value column vector

Sets the priority of the forensic behavior according to the size of the element vector values constituting

The forensic behavior is classified into priority-based forensic behavior, habit-based forensic behavior, and anti-habitation-based forensic behavior.

The method of claim 1,

The removal of the noise page

Cybercrime behavior analysis method using Markov chain, characterized in that noise page is removed by singular value decomposition.

The method of claim 1,

Configuring the transition matrix

Performing data mining on the selected forensic behavior;

The selected forensic behavior is cyber criminal behavior analysis method using a Markov chain, characterized in that stored in the form of a hierarchical document of XML or AXML.

The method of claim 6,

The final probability vector value

Is

Initial value determined in advance according to the data mining (initial value)

A transition matrix based on the user's behavior data

The total number of forensic behavior data performed by the user

In this regard, the cyber criminal behavior analysis method using the Markov chain, characterized by the following equation 1.

(One)

The method of claim 1,

Storing the scenario for the verified forensic behavior in a database;

Comprising a separate database for each user, and if the change occurs in the log data for each user, the step of deleting and modifying the database in real time comprising the cyber criminal behavior analysis method using a Markov chain.

A recording medium recorded by a program for performing the method of any one of claims 1 to 8 on a computer.

A forensic behavior definition unit that defines a forensic behavior of a user and selects the defined forensic behavior;

A transition matrix constructing unit constituting a transition matrix composed of probabilities of occurrence of a forensic behavior at a next time in a forensic behavior at a current time based on the evidence generated according to the selected forensic behavior;

Substituting the constructed transition matrix into a Markov chain representing a state change of the forensic behavior over time, calculating a final probability vector value that is a final probability of a predetermined forensic behavior, and then configuring the final probability vector value. A priority setting unit for setting the priority of the forensic behavior according to the magnitude of vector values;

A noise page removal unit for resetting the priority by removing the noise page for the forensic behavior at the set priority;

A verification unit which verifies the reliability of the priority by performing Monte Carlo simulation based on the reset priority; And

Apparatus for analyzing cyber criminal behavior using the Markov chain including a database for storing the scenario for the verified forensic behavior in a database.

The method of claim 10,

The forensic behavior definition unit

And a forensic behavior screening device based on mutual access information including the user, a log server storing access information of the user, and application performance information by the user.

The method of claim 11,

The transition matrix is

Apparatus for analyzing cyber crimes using a Markov chain, characterized in that formed by a dual matrix configured according to the mutual access information.

The method of claim 10,

The priority setting unit

Final probability vector of the value vector

After calculating the, the calculated probability vector value column vector

The forensic behavior is cyber criminal behavior analysis apparatus using the Markov chain, characterized in that classified as priority-based forensic behavior, habit-based forensic behavior and semi- habit-based forensic behavior.

The method of claim 10,

The noise page removal unit

An apparatus for analyzing cyber criminal behavior using a Markov chain, characterized by removing noise pages according to singular value decomposition.

The method of claim 10,

The transition matrix component is

Including a data mining module for performing data mining on the selected forensic behavior,

The selected forensic behavior is cyber criminal behavior analysis apparatus using the Markov chain, characterized in that stored in the form of a hierarchical document of XML or AXML.

The method of claim 15,

The final probability vector value

Is

A transition matrix based on the user's behavior data

The total number of forensic behavior data performed by the user

In this case, the cyber criminal behavior analysis device using the Markov chain, characterized by the following equation 2.

(2)