KR20230009306A

KR20230009306A - Method and apparatus for detecting malicious communication based session

Info

Publication number: KR20230009306A
Application number: KR1020220081831A
Authority: KR
Inventors: 김혁준
Original assignee: (주)나루씨큐리티
Priority date: 2021-07-08
Filing date: 2022-07-04
Publication date: 2023-01-17

Abstract

An operation method of a malicious communication detection device comprises the steps of: collecting session data between managed computing devices and external servers; grouping session data for a certain period of time into the same source and destination and calculating statistical index values for each grouped session data group; extracting communication indicators for the source and destination using the statistical index values of each session data group; and detecting any source and destination as an infected computing device and an attacker server involved in malicious communication using communication indexes for each source and destination pair. Accordingly, it is possible to quickly detect and respond to threats from malicious communication.

Description

Session-based malicious communication detection method and apparatus {METHOD AND APPARATUS FOR DETECTING MALICIOUS COMMUNICATION BASED SESSION}

본 개시는 정보보호체계를 우회하는 악성 통신 탐지에 관한 것이다.The present disclosure relates to detection of malicious communication bypassing an information protection system.

명령제어채널을 통한 공격은 수개월에 걸쳐 공격대상 네트워크에 머무르며 정보수집, 공격도구다운로드, 내부망공격, 시스템파괴, 정보유출 등의 행위를 수행한다. 공격대상 네트워크는 대부분 방화벽, 침입탐지장치 등으로 보호되므로, 공격자는 악성코드가 첨부된 메일발송, 인터넷서핑 중 사용자 모르게 악성파일 다운로드, 설치된 프로그램 공급자의 전산자원 공격 등 다양한 방법을 통해 초기침입 경로를 확보할 수 있다.The attack through the command control channel stays on the attack target network for several months and performs activities such as information collection, attack tool download, internal network attack, system destruction, and information leakage. Attack target networks are mostly protected by firewalls and intrusion detection devices, so attackers can find the initial intrusion route through various methods, such as sending mails with malicious codes attached, downloading malicious files without the user's knowledge while surfing the Internet, and attacking the computer resources of installed program providers. can be secured

한편, 지속적으로 명령제어채널을 유지하는 것이 아니라, 공격자에 의해 감염된 내부망 장치가 외부에 위치한 공격자에게 접속을 주기적으로 시도하여 명령제어채널을 생성한다. 감염된 장치는 내부망 보안관리자의 탐지를 피하기 위해, 정규포트를 통한 비정규 프로토콜 통신, 비정규 포트 사용, 외부 클라우드 서비스 사용 등 다양한 방법을 통해 외부에 위치한 공격자와 주기적으로 접속할 수 있다.Meanwhile, instead of continuously maintaining a command control channel, an internal network device infected by an attacker periodically attempts to connect to an external attacker to create a command control channel. In order to avoid detection by the internal network security manager, the infected device can periodically connect to external attackers through various methods such as non-regular protocol communication through regular ports, non-regular ports, and external cloud services.

이러한 내부망 공격을 탐지하기 위해, 주로 패킷의 페이로드를 확인하는 시그니처 기반 탐지가 사용되고 있으나, 네트워크의 통신 속도보다 위협 탐지를 위한 패킷 분석 시간이 오래 걸리는 문제가 있다. 또한, 제로데이(ZeroDay) 및 알려지지 않은 공격의 경우 시그니처에 대한 정보가 식별되지 않아 위협이 탐지할 수 없는 문제가 있다.In order to detect such internal network attacks, signature-based detection that checks the payload of packets is mainly used, but there is a problem in that packet analysis for threat detection takes longer than network communication speed. In addition, in the case of zero-day and unknown attacks, there is a problem that the threat cannot be detected because information about the signature is not identified.

한국등록특허 KR10-1464367는 이를 해결하기 위해, 세션의 생성 빈도를 기초로 명령제어채널을 탐지하는 기술을 제안하고 있으나, 세션 생성 빈도만으로는 최근의 다양한 형태의 악성 통신을 탐지하는 데 한계가 있다.Korean Registered Patent KR10-1464367 proposes a technique for detecting a command control channel based on session creation frequency to solve this problem, but there is a limit to detecting various types of recent malicious communications only with session creation frequency.

본 개시는 세션 기반 악성 통신 탐지 방법 및 장치치를 제공한다.The present disclosure provides a method and apparatus for detecting session-based malicious communication.

본 개시는 세션 데이터 기반으로 지속성, 은닉성, 접근성, 신호성, 방향성 및 확장성을 포함하는 통신 지표들을 추출하고, 이들을 이용하여 악성 통신을 탐지하는 방법 및 장치를 제공한다.The present disclosure provides a method and apparatus for extracting communication indicators including persistence, concealment, accessibility, signaling, directivity, and scalability based on session data and detecting malicious communication using them.

한 실시예에 따른 악성 통신 탐지 장치의 동작 방법으로서, 관리하는 컴퓨팅 장치들과 외부 서버들이 연결된 세션 데이터를 수집하는 단계, 일정 기간마다의 세션 데이터를 동일한 출발지와 목적지끼리 그룹핑하고, 그룹핑된 세션 데이터 그룹마다의 통계 지표값을 계산하는 단계, 각 세션 데이터 그룹의 통계 지표값을 이용하여, 해당 출발지와 목적지에 대한 통신 지표들을 추출하는 단계, 그리고 출발지와 목적지쌍마다의 통신 지표들을 이용하여, 임의 출발지 및 목적지를 악성 통신에 관계된 감염된 컴퓨팅 장치 및 공격자 서버로 탐지하는 단계를 포함한다.A method of operating a malicious communication detection device according to an embodiment, comprising: collecting session data in which managed computing devices and external servers are connected; grouping session data for each period with the same origin and destination; and grouping the grouped session data. Calculating statistical index values for each group, extracting communication indexes for the corresponding source and destination using the statistical index values of each session data group, and using the communication indexes for each pair of source and destination, Detecting the source and destination as the infected computing device and attacker server involved in the malicious communication.

상기 통신 지표들은 지속성, 은닉성, 접근성, 신호성, 방향성, 그리고 확장성 중 적어도 복수를 포함할 수 있다.The communication indicators may include at least a plurality of persistence, concealment, accessibility, signaling, directivity, and scalability.

상기 지속성은 세션이 존재하는 시간 구간 수를 기초로 주기적으로 생성되는 세션을 탐지하는데 사용되는 통신 지표로서, 상기 통계 지표값 중 타임스탬프 개수 및 접속 유지 시간을 기초로 계산될 수 있다.The persistence is a communication index used to detect sessions that are periodically created based on the number of time intervals in which sessions exist, and can be calculated based on the number of timestamps and the connection maintenance time among the statistical index values.

상기 은닉성은 세션 생성 빈도가 낮고 장시간 유지되는 세션을 탐지하는데 사용되는 통신 지표로서, 상기 통계 지표값 중 세션 수를 기초로 계산될 수 있다.The concealment is a communication index used to detect a session that has a low session generation frequency and is maintained for a long time, and may be calculated based on the number of sessions among the statistical index values.

상기 접근성은 최소한의 통신을 수행하여 세션을 유지하는 백도어형 세션을 탐지하는데 사용되는 통신 지표로서, 상기 통계 지표값 중, 출발지와 목적지 전송 패킷의 합에 대비되는 접속 유지 시간의 합을 기초로 계산될 수 있다.The accessibility is a communication indicator used to detect a backdoor-type session that maintains a session by performing minimal communication, and is calculated based on the sum of connection maintenance times compared to the sum of source and destination transmission packets among the statistical index values. It can be.

상기 신호성은 최소한의 통신을 수행하여 세션을 유지하는 비콘형 세션을 탐지하는데 사용되는 통신 지표로서, 상기 통계 지표값 중, 출발지 및 목적지 전송 패킷의 합에 대비되는 출발지 및 목적지 전송 바이트의 합을 기초로 계산될 수 있다.The signaling is a communication index used to detect a beacon-type session maintaining a session by performing minimal communication, and is based on the sum of source and destination transmission bytes compared to the sum of source and destination transmission packets among the statistical index values. can be calculated as

상기 방향성은 아웃바운드 데이터량을 기반으로 정보 유출에 사용되는 세션을 탐지하는데 사용되는 통신 지표로서, 상기 통계 지표값 중, 출발지 전송 바이트의 합과 목적지 전송 바이트의 합의 차이를 기초로 계산될 수 있다.The directionality is a communication index used to detect a session used for information leakage based on the amount of outbound data, and may be calculated based on a difference between the sum of source transmission bytes and the destination transmission byte sum among the statistical index values. .

상기 확장성은 시간에 따라 특정 목적지에 연결되는 출발지 수의 증가를 기초로 내부망 감염 확산을 탐지하는데 사용되는 통신 지표로서, 목적지마다 연결된 출발지 IP 주소 수를 기초로 계산될 수 있다.The scalability is a communication index used to detect the spread of an internal network infection based on an increase in the number of departure points connected to a specific destination over time, and can be calculated based on the number of source IP addresses connected to each destination.

상기 탐지하는 단계는 임의 출발지 및 목적지에 대한 상기 통신 지표들 중 지속성의 값, 접근성의 값, 그리고 신호성의 값을 기준값들과 비교하여, 상기 악성 통신에 관계된 감염된 컴퓨팅 장치 및 공격자 서버를 탐지할 수 있다.The detecting step may detect an infected computing device and an attacker server related to the malicious communication by comparing persistence values, accessibility values, and signaling values among the communication indicators for an arbitrary source and destination with reference values. there is.

상기 탐지하는 단계는 임의 출발지 및 목적지에 대한 상기 통신 지표들 중 은닉성의 값 및 방향성의 값과 기준값들을 비교하여, 데이터 유출 통신에 관계된 감염된 컴퓨팅 장치 및 공격자 서버를 탐지할 수 있다.In the detecting step, an infected computing device and an attacker server related to data exfiltration communication may be detected by comparing values of confidentiality and directionality with reference values among the communication indices for an arbitrary source and destination.

상기 통계 지표값은 타임스탬프 개수, 접속 유지 시간의 합, 출발지 전송 바이트의 합, 목적지 전송 바이트의 합, 출발지 전송 패킷의 합, 목적지 전송 패킷의 합, 그리고 세션 수를 포함할 수 있다.The statistical indicator value may include the number of timestamps, the sum of connection maintenance times, the sum of source transmitted bytes, the sum of destination transmitted bytes, the sum of source transmitted packets, the sum of destination transmitted packets, and the number of sessions.

다른 실시예에 따른 악성 통신 탐지 장치의 동작 방법으로서, 출발지와 목적지의 세션 데이터를 수집하는 단계, 상기 세션 데이터를 기초로, 타임스탬프 개수, 접속 유지 시간의 합, 출발지 전송 바이트의 합, 목적지 전송 바이트의 합, 출발지 전송 패킷의 합, 목적지 전송 패킷의 합, 그리고 세션 수를 포함하는 통계 지표값을 계산하는 단계, 상기 통계 지표값을 이용하여, 악성 통신 탐지에 사용되는 통신 지표들을 추출하는 단계, 그리고 상기 통신 지표들을 이용하여, 상기 출발지 및 상기 목적지를 감염된 컴퓨팅 장치 및 공격자 서버로 탐지하는 단계를 포함한다.A method of operating a malicious communication detection apparatus according to another embodiment, comprising: collecting session data of a source and a destination; based on the session data, the number of timestamps, the sum of connection maintenance times, the sum of transmission bytes at the source, and transmission to the destination Calculating statistical index values including the sum of bytes, the sum of source transmitted packets, the sum of destination transmitted packets, and the number of sessions; extracting communication indicators used for malicious communication detection using the statistical index values; and detecting the source and the destination as an infected computing device and an attacker server using the communication indicators.

상기 탐지하는 단계는 상기 통신 지표들 중 지속성의 값, 접근성의 값, 그리고 신호성의 값을 기준값들과 비교하여, 상기 출발지 및 상기 목적지가 악성 통신에 관계된 상기 감염된 컴퓨팅 장치 및 상기 공격자 서버인지 탐지할 수 있다. 상기 지속성은 세션이 존재하는 시간 구간 수를 기초로 주기적으로 생성되는 세션을 탐지하는데 사용되는 통신 지표로서, 상기 타임스탬프 개수 및 접속 유지 시간을 기초로 계산될 수 있다. 상기 접근성은 최소한의 통신을 수행하여 세션을 유지하는 백도어형 세션을 탐지하는데 사용되는 통신 지표로서, 상기 출발지 전송 패킷의 합 및 상기 목적지 전송 패킷의 합에 대비되는 상기 접속 유지 시간의 합을 기초로 계산될 수 있다. 상기 신호성은 최소한의 통신을 수행하여 세션을 유지하는 비콘형 세션을 탐지하는데 사용되는 통신 지표로서, 상기 출발지 전송 패킷의 합 및 상기 목적지 전송 패킷의 합에 대비되는 상기 출발지 전송 바이트의 합 및 상기 목적지 전송 바이트의 합을 기초로 계산될 수 있다.The detecting step may detect whether the source and the destination are the infected computing device and the attacker server related to malicious communication by comparing values of persistence, accessibility, and signality among the communication indicators with reference values. can The persistence is a communication index used to detect sessions that are periodically created based on the number of time intervals in which sessions exist, and can be calculated based on the number of timestamps and the connection maintenance time. The accessibility is a communication index used to detect a backdoor-type session maintaining a session by performing minimal communication, and is based on the sum of the connection maintenance time compared to the sum of the source transport packets and the destination transport packets. can be calculated. The signaling is a communication indicator used to detect a beacon-type session maintaining a session by performing minimal communication, and is a sum of the source transmission bytes and the destination compared to the sum of the source transport packets and the destination transport packet. It can be calculated based on the sum of transmitted bytes.

상기 탐지하는 단계는 상기 통신 지표들 중 은닉성의 값 및 방향성의 값과 기준값들을 비교하여, 상기 출발지 및 상기 목적지가 데이터 유출 통신에 관계된 상기 감염된 컴퓨팅 장치 및 상기 공격자 서버인지 탐지할 수 있다. 상기 은닉성은 세션 생성 빈도가 낮고 장시간 유지되는 세션을 탐지하는데 사용되는 통신 지표로서, 상기 세션 수를 기초로 계산될 수 있다. 상기 방향성은 아웃바운드 데이터량을 기반으로 정보 유출에 사용되는 세션을 탐지하는데 사용되는 통신 지표로서, 상기 출발지 전송 바이트의 합과 상기 목적지 전송 바이트의 합의 차이를 기초로 계산될 수 있다.In the detecting step, it is possible to detect whether the source and the destination are the infected computing device and the attacker server involved in data exfiltration communication by comparing values of confidentiality and directivity among the communication indices and reference values. The secrecy is a communication index used to detect a session that has a low frequency of session creation and is maintained for a long time, and can be calculated based on the number of sessions. The directionality is a communication index used to detect a session used for information leakage based on the amount of outbound data, and may be calculated based on a difference between the sum of source transmission bytes and the destination transmission byte sum.

상기 동작 방법은 시간에 따라 상기 목적지에 연결된 연결되는 출발지 수가 증가하면, 상기 목적지에 의한 내부망 감염 확산을 탐지하는 단계를 더 포함할 수 있다.The operating method may further include detecting spread of an internal network infection by the destination when the number of connected departure points connected to the destination increases over time.

실시예에 따르면 온프레미스(on-Premise) 또는 클라우드의 네트워크에서 세션 기반으로 내부통신(East-West) 및 외부통신(North-South)의 통계적 변화를 분석함으로써, 빠른 시간에 악성 통신의 위협 징후를 탐지 및 대응할 수 있다.According to the embodiment, by analyzing statistical changes in internal communication (East-West) and external communication (North-South) on a session-based basis in an on-premise or cloud network, threat signs of malicious communication can be quickly detected. can detect and respond.

실시예에 따르면 탐지된 악성 통신을 역추적하여 대응함으로써 추가 피해를 최소화할 있다.According to the embodiment, additional damage can be minimized by backtracking and responding to the detected malicious communication.

실시예에 따르면 지속성, 은닉성, 접근성, 신호성, 방향성 및 확장성을 포함하는 통신 지표들을 통해, 제로데이 및 알려지지 않은 위협 징후를 미리 탐지할 수 있다. According to the embodiment, zero-day and unknown threat signs may be detected in advance through communication indicators including persistence, concealment, accessibility, signaling, directivity, and scalability.

실시예에 따르면 수집한 통신 지표들을 이용하여, 통신 지표들 간의 연간 관계로부터 악성 및 정상 행위를 식별하는 인공지능 모델을 훈련시킬 수 있다.According to the embodiment, an artificial intelligence model for identifying malicious and normal behaviors may be trained from the annual relationship between communication indicators using the collected communication indicators.

도 1은 한 실시예에 따른 악성 통신 탐지 장치에 관계된 네트워크 환경을 개략적으로 설명하는 도면이다.
도 2는 한 실시예에 따른 지속성 통신 지표에 관련된 통신 특성을 설명하는 도면이다.
도 3은 한 실시예에 따른 은닉성 통신 지표에 관련된 통신 특성을 설명하는 도면이다.
도 4는 한 실시예에 따른 신호성 통신 지표에 관련된 통신 특성을 설명하는 도면이다.
도 5는 한 실시예에 따른 방향성 통신 지표에 관련된 통신 특성을 설명하는 도면이다.
도 6은 한 실시예에 따른 악성 통신 탐지 방법의 흐름도이다.1 is a diagram schematically illustrating a network environment related to a malicious communication detection apparatus according to an exemplary embodiment.
2 is a diagram illustrating communication characteristics related to a persistent communication index according to an embodiment.
3 is a diagram illustrating communication characteristics related to a confidentiality communication index according to an embodiment.
4 is a diagram illustrating communication characteristics related to signaling communication indicators according to an exemplary embodiment.
5 is a diagram illustrating communication characteristics related to a directional communication index according to an exemplary embodiment.
6 is a flowchart of a malicious communication detection method according to an embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily carry out the present disclosure. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly describe the present disclosure in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

설명에서, 도면 부호 및 이름은 설명의 편의를 위해 붙인 것으로서, 장치들이 반드시 도면 부호나 이름으로 한정되는 것은 아니다.In the description, reference numerals and names are attached for convenience of explanation, and devices are not necessarily limited to reference numerals or names.

설명에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In the description, when a part is said to "include" a certain component, it means that it may further include other components without excluding other components unless otherwise stated. In addition, terms such as “… unit”, “… unit”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. there is.

도 1은 한 실시예에 따른 악성 통신 탐지 장치에 관계된 네트워크 환경을 개략적으로 설명하는 도면이고, 도 2는 한 실시예에 따른 지속성 통신 지표에 관련된 통신 특성을 설명하는 도면이고, 도 3은 한 실시예에 따른 은닉성 통신 지표에 관련된 통신 특성을 설명하는 도면이고, 도 4는 한 실시예에 따른 신호성 통신 지표에 관련된 통신 특성을 설명하는 도면이며, 도 5는 한 실시예에 따른 방향성 통신 지표에 관련된 통신 특성을 설명하는 도면이다.1 is a diagram schematically illustrating a network environment related to a malicious communication detection apparatus according to an embodiment, FIG. 2 is a diagram illustrating communication characteristics related to a persistent communication indicator according to an embodiment, and FIG. 3 is a diagram illustrating one embodiment A diagram illustrating communication characteristics related to a confidentiality communication indicator according to an example, FIG. 4 is a diagram illustrating communication characteristics related to a signaling communication indicator according to an embodiment, and FIG. 5 is a directional communication indicator according to an embodiment It is a diagram explaining communication characteristics related to .

도 1을 참고하면, 악성 통신 탐지 장치(간단히, 탐지 장치라고 함)(100)는 온프레미스(on-Premise) 또는 클라우드의 네트워크에서 생성된 세션들을 기초로 내부통신(East-West) 및 외부통신(North-South)의 통계적 변화를 나타내는 통신 지표들을 분석하고, 통신 지표들을 이용하여 악성 통신을 탐지한다. Referring to FIG. 1, a malicious communication detection device (simply referred to as a detection device) 100 performs internal communication (East-West) and external communication based on sessions created in an on-premise or cloud network. (North-South) communication indicators representing statistical changes are analyzed, and malicious communication is detected using the communication indicators.

탐지 장치(100)는 온프레미스 또는 클라우드에 구축된 컴퓨팅 장치들(200-1, 200-2, 200-3, …)이 외부 서버들(300, 400)과 연결된 세션 데이터를 수집한다. 여기서 세션은 두 통신 장치 사이의 데이터 교환을 위해 생성된 연결(connection)을 의미한다. 세션 데이터는 타임스탬프, 출발지 정보, 목적지 정보, 접속 유지 시간, 송수신 데이터 크기 등을 포함할 수 있다. 출발지 정보는 출발지 IP 주소, 출발지 포트 등을 포함한다. 목적지 정보는 목적지 IP 주소, 목적지 포트 등을 포함한다. 송수신 데이터 크기는 출발지가 전송한 바이트 수/목적지가 전송한 바이트 수, 또는 출발지가 전송한 패킷 수/목적지가 전송한 패킷 수 등을 포함할 수 있다. 세션 데이터는 출발지 및 목적지를 기준으로 성사된 연결 과정 중에 발생되는 네트워크 패킷들을 조합하여 획득될 수 있다. The detection device 100 collects session data in which the on-premise or cloud-built computing devices 200-1, 200-2, 200-3, ... are connected to the external servers 300 and 400. Here, a session refers to a connection created for data exchange between two communication devices. Session data may include a timestamp, origin information, destination information, connection maintenance time, transmission/reception data size, and the like. Source information includes a source IP address, a source port, and the like. The destination information includes destination IP address, destination port, and the like. The size of transmitted/received data may include the number of bytes transmitted by the source/number of bytes transmitted by the destination, or the number of packets transmitted by the source/number of packets transmitted by the destination. Session data may be obtained by combining network packets generated during a connection process established based on a source and a destination.

세션 데이터는 표 1과 같이, 네트워크 패킷으로부터 획득한 항목들을 포함할 수 있다.As shown in Table 1, session data may include items obtained from network packets.

featurefeature HTTP 로그HTTP log InfoInfo 타임스탬프timestamp tsts This is the time of the first packetThis is the time of the first packet 연결 식별자connection identifier uiduid unique identifier of the connectionunique identifier of the connection 출발지 IP 주소(src_ip)Source IP address (src_ip) id.orig_hid.orig_h 출발지 IP 주소source IP address 출발지 포트(src_port)Source port (src_port) id.orig_pid.orig_p 출발지 포트port of departure 목적지 IP 주소(dst_ip)Destination IP address (dst_ip) id.resp_hid. resp_h 목적지 IP 주소destination IP address 목적지 포트(dst_port)Destination port (dst_port) id.resp_pid.resp_p 목적지 포트destination port 접속 유지 시간keep-alive time durationduration How long the connection lasted.How long the connection lasted. 출발지 전송 바이트 수(src_bytes)The number of bytes transmitted from the source (src_bytes) orig_ip_bytesorig_ip_bytes Number of IP level bytes that the originator sentNumber of IP level bytes that the originator sent 목적지 전송 바이트 수 (dst_bytes)Destination transfer bytes (dst_bytes) resp_ip_bytesresp_ip_bytes Number of IP level bytes that the responder sent　Number of IP level bytes that the responder sent 출발지 전송 패킷 수(src_pkts)The number of packets transmitted from the source (src_pkts) orig_pktsorig_pkts Number of packets that the originator sent.　Number of packets that the originator sent. 목적지 전송 패킷 수(dst_pkts)Number of packets sent to destination (dst_pkts) resp_pktsresp_pkts Number of packets that the responder sent.Number of packets that the responder sent.

탐지 장치(100)는 세션 데이터를 일정 기간 집계하고, 동일한 출발지와 목적지를 가지는 세션 데이터를 하나의 세션 데이터 그룹으로 그룹핑한 후, 출발지와 목적지로 식별되는 세션 데이터 그룹별로 통계 지표값을 계산한다. 탐지 장치(100)는 세션 데이터 중에서 동일한 출발지와 목적지를 가지는 세션들을 식별하고, 표 2와 같이, 출발지와 목적지의 세션 데이터 그룹별 통계 지표값을 계산할 수 있다. 통계 지표는 타임스탬프 개수, 접속 유지 시간의 합, 출발지 전송 바이트의 합, 목적지 전송 바이트의 합, 출발지 전송 패킷의 합, 목적지 전송 패킷의 합, 그리고 세션 수를 포함할 수 있다. 탐지 장치(100)는 출발지 IP 주소, 그리고 목적지 IP 주소 및 목적지 포트를 이용하여, 동일한 출발지와 목적지를 가지는 세션 데이터를 동일한 연결의 분석 데이터로 그룹핑할 수 있다.The detection device 100 aggregates session data for a certain period of time, groups session data having the same source and destination into one session data group, and then calculates a statistical indicator value for each session data group identified as the source and destination. The detection device 100 may identify sessions having the same source and destination among session data, and calculate statistical index values for each session data group of the source and destination, as shown in Table 2. The statistical index may include the number of timestamps, the sum of connection maintenance times, the sum of source transmitted bytes, the sum of destination transmitted bytes, the sum of source transmitted packets, the sum of destination transmitted packets, and the number of sessions. The detection device 100 may group session data having the same source and destination into analysis data of the same connection using a source IP address, a destination IP address, and a destination port.

featurefeature InfoInfo 타임스탬프(ts) 개수
(ts_unique)Timestamp (ts) count
(ts_unique) number of unique values for ‘ts’ column for each groupnumber of unique values for ‘ts’ column for each group 접속 유지 시간의 합(duration_sum)Sum of connection duration (duration_sum) sum of ‘duration’ column for each groupsum of ‘duration’ column for each group 출발지 전송 바이트의 합(src_bytes_sum)Sum of Source Transmitted Bytes (src_bytes_sum) sum of ‘src_bytes’ column for each groupsum of ‘src_bytes’ column for each group 목적지 전송 바이트의 합
(dst_bytes_sum)Sum of Destination Transmitted Bytes
(dst_bytes_sum) sum of ‘dst_bytes’ column for each groupsum of ‘dst_bytes’ column for each group 출발지 전송 패킷의 합(src_pkts_sum)Sum of source transmitted packets (src_pkts_sum) sum of ‘src_pkts’ column for each groupsum of ‘src_pkts’ column for each group 목적지 전송 패킷의 합
(dst_pkts_sum)Sum of destination transmitted packets
(dst_pkts_sum) sum of ‘dst_pkts’ column for each groupsum of ‘dst_pkts’ column for each group 세션 수(count)Sessions (count) Number of sessionsNumber of sessions

탐지 장치(100)는 출발지와 목적지의 세션 데이터 그룹별 통계 지표값을 이용하여, 각 세션 데이터 그룹의 지속성, 은닉성, 접근성, 신호성, 방향성 및 확장성을 포함하는 통신 지표들을 추출한다. 탐지 장치(100)는 통신 지표들을 이용하여 해당 세션 데이터 그룹의 출발지 및 목적지가 악성 통신을 수행하는지 판단하고, 이를 통해 악성 통신에 관련된 감염된 컴퓨팅 장치와 공격자 서버를 탐지할 수 있다. 악성 통신은 표적 공격, BotNet, UnKnown Malware, 내부자 위협, 데이터 유출 등의 부정한 목적으로 연결된 세션을 의미한다. 지속성, 은닉성, 접근성, 신호성, 방향성 및 확장성을 포함하는 통신 지표들은 표 3과 같이 정의될 수 있다.The detection device 100 extracts communication indicators including persistence, concealment, accessibility, signaling, directivity, and scalability of each session data group using statistical index values for each session data group of the source and destination. The detection device 100 may determine whether the source and destination of the corresponding session data group perform malicious communication using communication indicators, and through this, detect an infected computing device and an attacker server related to malicious communication. Malicious communication refers to sessions connected for fraudulent purposes, such as targeted attacks, BotNets, Unknown Malware, insider threats, and data leakage. Communication indicators including persistence, concealment, accessibility, signaling, directivity, and scalability can be defined as shown in Table 3.

통신지표communication indicator 역할role 지속성persistence 세션이 존재하는 시간 구간 수(BIN)를 기초로 주기적으로 생성되는 세션 탐지Detect sessions that are created periodically based on the number of time intervals (BIN) in which sessions exist 은닉성concealment 세션 수를 기초로 세션 생성 빈도가 낮고 장시간 유지되는 세션 탐지Detect infrequent and long-lived sessions based on the number of sessions 접근성accessibility 최소한의 통신을 수행하여 세션을 유지하는 백도어형 세션을 탐지Detect backdoor-type sessions that maintain sessions by performing minimal communication 신호성signaling 최소한의 통신을 수행하여 세션을 유지하는 비콘형 세션을 탐지Detects beacon-type sessions that maintain sessions by performing minimal communication 방향성directional 아웃바운드 데이터량을 기반으로 정보 유출에 사용되는 세션을 탐지Detect sessions used for information exfiltration based on outbound data volume 확장성scalability 시간에 따라 특정 목적지에 연결되는 출발지 수의 증가를 기초로 내부망 감염 확산을 탐지Detecting the spread of internal network infections based on the increase in the number of origins connecting to a specific destination over time

컴퓨팅 장치들(200-1, 200-2, 200-3, …) 중에서 악성 통신에 관련된 컴퓨팅 장치(200-1)를 감염된 컴퓨팅 장치라고 부를 수 있다. 외부 서버들(300, 400) 중에서 악성 통신에 관련된 서버(300)를 공격자 서버라고 부르고, 공격자 서버는 명령제어(Command & Control, C&C) 서버라고도 부를 수 있다.컴퓨팅 장치들(200-1, 200-2, 200-3, …)은 방화벽, 침입탐지장치 등의 정보보호시스템에 의해 보호된다고 가정한다. 그러면, 외부에 존재하는 공격자는 정보보호시스템에 의해 보호되는 내부망에 접근하기 위해 내부 장치와 주기적으로 연결하는 명령제어채널을 생성해야 한다. 이를 위해, 공격자는 컴퓨팅 장치들(200-1, 200-2, 200-3, …) 중에서, 어느 컴퓨팅 장치(200-1)를 악성코드가 첨부된 메일이나 프로그램을 통해 감염시킬 수 있다. 감염된 컴퓨팅 장치(200-1)는 공격자 서버(300)와 주기적으로 접속하여 명령을 수신하고, 명령에 따른 동작을 수행하게 된다. 탐지 장치(100)는 각 세션 데이터 그룹의 세션 데이터를 이용하여 분석한 통신 지표들을 기초로 명령제어채널의 특성을 찾아낼 수 있다. Among the computing devices 200-1, 200-2, 200-3, ..., the computing device 200-1 involved in malicious communication may be referred to as an infected computing device. Among the external servers 300 and 400, the server 300 related to malicious communication is called an attacker server, and the attacker server can also be called a command & control (C&C) server. Computing devices 200-1 and 200 -2, 200-3, …) is assumed to be protected by an information protection system such as a firewall and an intrusion detection device. Then, an external attacker must create a command control channel that is periodically connected to an internal device in order to access the internal network protected by the information security system. To this end, an attacker may infect a computing device 200-1 among the computing devices 200-1, 200-2, 200-3, ... through an e-mail or program with a malicious code attached thereto. The infected computing device 200-1 periodically connects to the attacker server 300, receives commands, and performs operations according to the commands. The detection device 100 may find characteristics of a command control channel based on communication indicators analyzed using session data of each session data group.

탐지 장치(100)가 지속성, 은닉성, 접근성, 신호성, 방향성 및 확장성의 통신 지표들을 계산하는 방법에 대해 자세히 설명한다.A method for the detection device 100 to calculate communication indicators of persistence, concealment, accessibility, signaling, directivity, and scalability will be described in detail.

먼저 지속성 통신 지표에 대해 설명한다. 지속성은 상태 전송 또는 명령 확인을 위해 주기적으로 생성되는 세션을 탐지하기 위한 통신 지표이다.First, the persistent communication index is explained. Persistence is a communication metric for detecting sessions that are created periodically for status transmission or command acknowledgment.

도 2를 참고하면, 감염된 컴퓨팅 장치(200-1)는 공격자 서버(300)로부터 명령을 수신하기 전까지, 주기적으로 공격자 서버(300)에 접속하여 자신의 상태를 전송하거나, 명령을 확인한다. 감염된 컴퓨팅 장치(200-1)와 공격자 서버(300) 사이의 지속적이고 기계적인 네트워크 통신을 파악하기 위해, 탐지 장치(100)는 각 세션 데이터 그룹의 세션 데이터를 이용하여, 해당 출발지와 목적지의 연결에 대한 지속성을 나타내는 통신 지표를 계산할 수 있다.Referring to FIG. 2 , the infected computing device 200-1 periodically accesses the attacker server 300 and transmits its status or checks the command until a command is received from the attacker server 300 . In order to identify continuous and mechanical network communication between the infected computing device 200-1 and the attacker server 300, the detection device 100 uses the session data of each session data group to connect the corresponding source and destination. It is possible to calculate a communication index representing the persistence of

탐지 장치(100)는 동일한 출발지와 목적지로 그룹핑된 일정 기간(예를 들면, 24시간 또는 12시간)의 세션 데이터에서, 타임스탬프 및 접속 유지 시간(duration)을 기초로, 세션이 존재하는 시간 구간 수(간단히, 세션 시간 구간 수 또는 BIN이록 함)를 계산하고, 이를 지속성 통신 지표로 사용할 수 있다. 탐지 장치(100)는 표 2의 통계 지표값 중 타임스탬프 개수(ts_unique) 및 접속 유지 시간(duration)를 이용하여, 세션 시간 구간 수를 계산할 수 있다. 시간 구간이 공격자의 접속 주기보다 짧으면, 공격자가 주기적으로 공격하고 있음에도 불구하고, 명령제어채널이 어느 시간 구간에 생성되지 않을 수 있다. 따라서, 주기적으로 생성되는 명령제어채널을 탐지할 수 있도록 시간 구간이 설정되는데, 예를 들면, 시간 구간은 명령제어채널의 접속 주기보다 충분히 긴 시간, 예를 들면 1시간, 30분 등으로 설정될 수 있다. 설명에서는 시간 구간은 1시간 단위로 설정된다고 가정한다. The detection device 100 is a time interval in which a session exists, based on a timestamp and a connection duration (duration), in session data of a certain period (eg, 24 hours or 12 hours) grouped to the same origin and destination. The number (simply, the number of session time intervals or BIN) can be calculated and used as a persistent communication index. The detection device 100 may calculate the number of session time intervals using the number of timestamps (ts_unique) and the connection maintenance time (duration) among the statistical index values of Table 2. If the time interval is shorter than the attacker's access period, the command control channel may not be created in a certain time interval even though the attacker periodically attacks. Therefore, a time interval is set to detect a command control channel that is created periodically. For example, the time interval may be set to a time sufficiently longer than the access period of the command control channel, for example, 1 hour or 30 minutes. can In the description, it is assumed that the time interval is set in units of one hour.

탐지 장치(100)는 시간 구간 동안 한번이라도 세션이 생성되면 생성으로 판단하고, 일정 기간 동안 세션이 생성된 시간 구간 수를 누적할 수 있다. 세션 연결을 위해 최초 생성되는 패킷의 타임스탬프를 기초로, 각 시간 구간에서 세션이 생성되는지 확인할 수 있으므로, 탐지 장치(100)는 타임스탬프 개수(ts_unique)를 이용하여 세션 시간 구간 수(BIN)를 계산할 수 있다. 예를 들면, 감염된 컴퓨팅 장치(200-1)와 공격자 서버(300) 간의 12시간 동안의 세션 데이터에서, 타임스탬프 개수(ts_unique)가 12이면, 세션 시간 구간 수(BIN)는 12이고, 이를 지속성 값으로 사용할 수 있다. The detection device 100 may determine that a session is created if a session is created even once during a time interval, and may accumulate the number of time intervals in which a session is generated during a certain period of time. Based on the timestamp of the first packet generated for session connection, it is possible to check whether a session is created in each time interval, so the detection device 100 calculates the number of session time intervals (BIN) using the timestamp number (ts_unique). can be calculated For example, in session data for 12 hours between the infected computing device 200-1 and the attacker server 300, if the number of timestamps (ts_unique) is 12, the number of session time intervals (BIN) is 12, which is value can be used.

한편, 접속 유지 시간(duration)이 시간 구간(1시간) 이상인 세션이 존재할 수 있는데, 타임스탬프는 세션 발생 시각이므로, 타임스탬프 개수(ts_unique)만으로 세션이 존재하는 시간 구간 수(BIN)가 정확히 계산되지 않을 수 있다. 따라서, 탐지 장치(100)는 수학식 1과 같이, 각 세션의 접속 유지 시간을 시간 구간으로 나눈 몫을 오프셋(offset)으로 계산하고, 오프셋을 타임스탬프 개수(ts_unique)에 더해, 접속 유지 시간이 시간 구간보다 긴 세션이 존재하는 시간 구간 수를 보정할 수 있다. 예를 들어, 컴퓨팅 장치(200-2)와 일반 외부 서버(400) 간의 12시간 동안의 세션 데이터에서, 3개의 세션이 생성되고, 접속 유지 시간(duration)이 1시간 이상인 세션이 2개 존재하는 경우, 접속 유지 시간이 1시간 이상인 두 세션에 의해 오프셋은 2가 된다. 따라서, 컴퓨팅 장치(200-2)와 일반 외부 서버(400)의 연결에서, 세션이 존재한 시간 구간 수(BIN)는 5로 계산될 수 있다. On the other hand, there may be sessions with a duration of more than a time interval (1 hour). Since the timestamp is the session occurrence time, the number of time intervals (BIN) in which sessions exist is accurately calculated only by the number of timestamps (ts_unique). It may not be. Therefore, as shown in Equation 1, the detection device 100 calculates the quotient of dividing the connection maintenance time of each session by the time interval as an offset, and adds the offset to the number of timestamps (ts_unique), so that the connection maintenance time is The number of time intervals in which sessions longer than the time interval exist may be corrected. For example, in session data for 12 hours between the computing device 200-2 and the general external server 400, three sessions are created, and two sessions having a connection duration of 1 hour or more exist In this case, the offset becomes 2 due to the two sessions in which the connection duration is longer than 1 hour. Accordingly, in connection between the computing device 200 - 2 and the general external server 400 , the number of time intervals in which the session exists (BIN) may be calculated as 5.

[수학식 1][Equation 1]

지속성 값=세션 시간 구간 수(BIN)= 타임스탬프 개수(ts_unique)+오프셋Persistence value = number of session time intervals (BIN) = number of timestamps (ts_unique) + offset

이와 같이, 탐지 장치(100)는 세션 시간 구간 수(BIN)를 이용하여, 출발지와 목적지의 지속성 통신 지표를 계산할 수 있다. 지속성 값이 전체 시간 구간 수에 가까울수록 지속적이고 빈번한 연결이므로, 해당 연결의 출발지 및 목적지가 감염된 컴퓨팅 장치 및 공격자 서버로 추정될 수 있다.In this way, the detection device 100 may calculate the continuous communication index of the source and the destination using the number of session time intervals (BIN). The closer the persistence value is to the total number of time intervals, the more persistent and frequent the connection, so the source and destination of the connection can be estimated to be the infected computing device and the attacker's server.

다음에서 은닉성 통신 지표에 대해 설명한다. 은닉성은 터널링 등을 통해 장시간 유지되는 명령 수행 세션을 탐지하기 위한 통신 지표이다. In the following, the confidentiality communication index is explained. Concealment is a communication indicator for detecting a command execution session maintained for a long time through tunneling or the like.

도 3을 참고하면, 감염된 컴퓨팅 장치(200-1)는 RAT 프로그램 등의 터널링을 통해 공격자 서버(300)와 연결되고, 공격자의 명령에 따른 동작을 수행하게 된다. 그러면, 감염된 컴퓨팅 장치(200-1)는 명령된 동작을 수행하는 동안 연결된 세션을 지속적으로 유지한다. 따라서, 정상 서버와 클라이언트에서 발생되는 세션의 수와 비교하면, 감염된 컴퓨팅 장치(200-1)가 공격자 서버(300)는 상대적으로 세션 생성 빈도가 낮다. 감염된 컴퓨팅 장치(200-1)가 공격자 서버(300)의 명령을 수행하는 행동을 파악하기 위해, 탐지 장치(100)는 각 세션 데이터 그룹의 세션 데이터를 이용하여, 해당 출발지와 목적지의 연결에 대한 은닉성을 나타내는 통신 지표를 계산할 수 있다.Referring to FIG. 3 , the infected computing device 200-1 is connected to the attacker server 300 through tunneling such as a RAT program, and performs an operation according to the attacker's command. Then, the infected computing device 200-1 continuously maintains the connected session while performing the commanded operation. Therefore, compared to the number of sessions generated by normal servers and clients, the attacker server 300 of the infected computing device 200-1 has a relatively low frequency of session creation. In order to identify the action of the infected computing device 200-1 executing the command of the attacker server 300, the detection device 100 uses the session data of each session data group to determine the connection between the corresponding source and destination. A communication index representing concealment can be calculated.

탐지 장치(100)는 동일한 출발지와 목적지로 그룹핑된 일정 기간의 세션 데이터에서, 세션 시간 구간 수(BIN) 대비 생성된 세션 수를 은닉성 통신 지표로 사용할 수 있다. 은닉성 값은 예를 들면 수학식 2와 같이, 세션 수를 세션 시간 구간 수(BIN)로 나눈 값으로 계산될 수 있고, 시간 구간 수의 가중치를 주기 위해 시간 구간 수를 제곱할 수 있다. 세션 수는 표 2의 통계 지표값 중 count가 사용될 수 있다.The detection device 100 may use the number of sessions generated versus the number of session time intervals (BIN) as a confidentiality communication indicator in session data of a certain period grouped to the same source and destination. The concealment value may be calculated as a value obtained by dividing the number of sessions by the number of session time intervals (BIN), for example, as in Equation 2, and the number of time intervals may be squared to give a weight to the number of time intervals. For the number of sessions, count among the statistical index values in Table 2 can be used.

[수학식 2][Equation 2]

은닉성 값=세션 수/BINConfidentiality value = number of sessions/BIN

이와 같이, 탐지 장치(100)는 세션 시간 구간 수(BIN) 대비 생성된 세션 수를 이용하여, 출발지와 목적지의 은닉성 통신 지표를 계산할 수 있다. 은닉성 값이 기준값보다 작으면 컴퓨팅 장치가 공격자의 명령에 따른 동작을 수행 중인 것으로 추정할 수 있다. In this way, the detection device 100 may calculate the confidentiality communication index of the source and the destination using the number of created sessions compared to the number of session time intervals (BIN). If the confidentiality value is smaller than the reference value, it can be estimated that the computing device is performing an operation according to an attacker's command.

다음에서 접근성 통신 지표에 대해 설명한다. 접근성은 상태 추적(Stateful Inspection) 기반 등의 정보보호시스템을 우회하기 위해 최소한의 통신을 수행하여 세션을 유지하는 백도어형 세션을 탐지하기 위한 통신 지표이다. In the following, accessibility communication indicators are described. Accessibility is a communication indicator for detecting a backdoor type session that maintains a session by performing minimal communication to bypass information protection systems such as stateful inspection.

상태 추적 기반 정보보호시스템은 클라이언트와 서버 간의 통신 상태를 모니터링하여 연결 테이블을 만들고 관리하면서, 세밀한 트래픽의 제어를 수행한다. 공격자 서버(300)가 이를 우회하여 감염된 컴퓨팅 장치(200-1)에 접속하기 위해 백도어형 세션이 사용될 수 있다. 감염된 컴퓨팅 장치(200-1)는 공격자 서버(300)와의 세션을 수립한 뒤, 정보보호시스템이 세션을 차단하지 못하도록 백도어형 세션을 사용하여 해당 세션을 지속적으로 유지하기 위한 최소한의 통신을 수행한다. 여기서, 감염된 컴퓨팅 장치(200-1)가 공격자 서버(300)와의 터널링 등을 통해, 역방향 접속을 위한 링크를 형성하는 것을 백도어형 세션이라고 부른다.A state tracking based information protection system monitors the communication state between a client and a server, creates and manages a connection table, and controls detailed traffic. A backdoor-type session may be used so that the attacker server 300 bypasses this and accesses the infected computing device 200-1. After the infected computing device 200-1 establishes a session with the attacker server 300, it uses a backdoor type session to prevent the information security system from blocking the session and performs minimum communication to continuously maintain the session. . Here, the formation of a link for reverse access by the infected computing device 200-1 through tunneling with the attacker server 300 is called a backdoor type session.

탐지 장치(100)는 동일한 출발지와 목적지로 그룹핑된 일정 기간의 세션 데이터에서, 패킷 수 대비 접속 유지 시간을 접근성 통신 지표로 사용할 수 있다. 탐지 장치(100)는 장시간 동안 세션이 연결되었으나 전송되는 패킷 수가 적은 연결이라면, 백도어형 세션으로 정보보호체계를 우회하는 감염된 컴퓨팅 장치(200-1)와 공격자 서버(300)로 추정할 수 있다.The detection device 100 may use the number of packets versus the connection maintenance time as an accessibility communication indicator in session data of a certain period grouped to the same source and destination. The detection device 100 can assume that the infected computing device 200-1 and the attacker server 300 bypass the information protection system with a backdoor type session if the session is connected for a long time but the number of transmitted packets is small.

탐지 장치(100)는 수학식 3과 같이, 접속 유지 시간의 합(duration_sum)을, 출발지 전송 패킷의 합(src_pkts_sum) 및 목적지 전송 패킷의 합(dst_pkts_sum)으로 나눈 값을 접근성 통신 지표로 사용할 수 있다. 수학식 3에서 분모에 접속 유지 시간의 합(duration_sum)을 추가항, 접근성 값을 정규화할 수도 있다.As shown in Equation 3, the detection device 100 may use a value obtained by dividing the sum of the connection maintenance times (duration_sum) by the sum of source transport packets (src_pkts_sum) and the destination transport packet sum (dst_pkts_sum) as an accessibility communication indicator. . In Equation 3, the sum of the connection maintenance times (duration_sum) may be added to the denominator, and the accessibility values may be normalized.

[수학식 3][Equation 3]

접근성 값= duration_sum/(duration_sum + src_pkts_sum+ dst_pkts_sum)accessibility value = duration_sum/(duration_sum + src_pkts_sum+ dst_pkts_sum)

이와 같이, 탐지 장치(100)는 접속 유지 시간의 합(duration_sum), 출발지 전송 패킷의 합(src_pkts_sum) 및 목적지 전송 패킷의 합(dst_pkts_sum)을 이용하여, 출발지와 목적지의 접근성 통신 지표를 계산할 수 있다. 수학식 3으로 계산된 접근성 값이 기준값(예를 들면, 1)에 가까우면, 전송 패킷 수가 적다는 것이므로, 출발지와 목적지는 정보보호시스템을 우회하기 위해 최소한의 통신을 수행하여 세션을 유지하는 상태라고 추정할 수 있다. In this way, the detection device 100 may calculate the accessibility communication index of the source and the destination using the sum of the connection maintenance times (duration_sum), the sum of source transport packets (src_pkts_sum), and the destination transport packet sum (dst_pkts_sum). . If the accessibility value calculated by Equation 3 is close to the reference value (eg, 1), the number of transmitted packets is small, so the source and destination maintain a session by performing minimal communication to bypass the information protection system can be inferred.

다음에서 신호성 통신 지표에 대해 설명한다. 신호성은 정보보호체계를 우회하여 명령제어채널을 생성하기 위한 비콘형 세션을 탐지하기 위한 통신 지표이다. 여기서, 비콘형 세션은 소량 데이터를 전송하는 세션을 의미한다.In the following, signaling communication indicators are described. Signalability is a communication indicator for detecting a beacon type session to bypass the information protection system and create a command control channel. Here, the beacon type session means a session that transmits a small amount of data.

도 4를 참고하면, 정상 서버와 클라이언트는 데이터를 전송하기 위해 세션을 생성하기 때문에, 정해진 형식에 따라 패킷에 데이터를 최대한 포함하여 전송한다. 반면, 감염된 컴퓨팅 장치(200-1)와 공격자 서버(300)는 세션을 유지하기 위해, 패킷에 소량의 데이터를 포함하여 전송한다. Referring to FIG. 4 , since a normal server and a client create a session to transmit data, data is included in a packet as much as possible and transmitted according to a predetermined format. On the other hand, the infected computing device 200-1 and the attacker server 300 include a small amount of data in the packet and transmit it to maintain the session.

탐지 장치(100)는 동일한 출발지와 목적지로 그룹핑된 일정 기간의 세션 데이터에서, 전송 패킷 수 대비 전송 데이터량을 신호성 통신 지표로 사용할 수 있다. 탐지 장치(100)는 신호성 통신 지표를 통해, 세션을 유지하기 위해 최소한의 데이터로 통신을 수행하는 세션을 파악할 수 있다. The detection device 100 may use the amount of transmitted data compared to the number of transmitted packets as a signaling communication index in session data of a certain period grouped to the same source and destination. The detection device 100 may determine a session in which communication is performed with minimum data to maintain the session through a signaling communication indicator.

탐지 장치(100)는 수학식 4와 같이, 출발지 및 목적지 전송 바이트의 합(src_bytes_sum+dst_bytes_sum)을 출발지 및 목적지 전송 패킷의 합(src_pkts_sum+dst_pkts_sum)으로 나눈 값을 신호성 통신 지표로 사용할 수 있다. MTU에 의해 한 패킷의 최대 크기는 1500바이트이므로, 수학식 4에서 1500으로 나누어 신호성 값을 정규화할 수도 있다.As shown in Equation 4, the detection device 100 may use a value obtained by dividing the sum of source and destination transmission bytes (src_bytes_sum + dst_bytes_sum) by the sum of source and destination transmission packets (src_pkts_sum + dst_pkts_sum) as a signaling communication index. Since the maximum size of one packet is 1500 bytes by MTU, the signaling value may be normalized by dividing by 1500 in Equation 4.

[수학식 4][Equation 4]

신호성 값= (src_bytes_sum+dst_bytes_sum)/(src_pkts_sum+dst_pkts_sum)Signaling value = (src_bytes_sum+dst_bytes_sum)/(src_pkts_sum+dst_pkts_sum)

이와 같이, 탐지 장치(100)는 패킷 수에 대비 전송되는 데이터량을 이용하여, 출발지와 목적지의 신호성 통신 지표를 계산할 수 있다. 신호성 값이 기준값(0에 가까운 값)보다 작으면, 출발지와 목적지는 정보보호시스템을 우회하기 위해 최소한의 통신을 수행하여 세션을 유지하는 상태라고 추정할 수 있다. In this way, the detection device 100 may calculate the signaling communication indicator of the source and destination using the amount of data transmitted compared to the number of packets. If the signality value is smaller than the reference value (a value close to 0), it can be assumed that the source and the destination are in a state of maintaining a session by performing minimal communication to bypass the information protection system.

다음에서 방향성 통신 지표에 대해 설명한다. 방향성은 세션을 통해 전송되는 인바운드/아웃바운드 데이터량을 기반으로 데이터 흐름의 방향성을 확인하여 정보 유출의 징후를 탐지하는 통신 지표이다. In the following, the directional communication indicators are explained. Directionality is a communication indicator that detects signs of information leakage by checking the directionality of data flow based on the amount of inbound/outbound data transmitted through the session.

도 5를 참고하면, 정상 서버와 클라이언트 사이의 데이터 흐름은 클라이언트에서 서버에 요청하는 행위가 많기 때문에, 아웃바운드 데이터량보다 인바운드 데이터량이 많다. 반면, 감염된 컴퓨팅 장치(200-1)는 공격자 서버(300)로 내부 데이터를 전송하기 위해 동작하므로, 감염된 컴퓨팅 장치(200-1)에서 데이터 유출 행위가 발생된다면 아웃바운드 데이터량이 늘어나서, 데이터 흐름의 방향성이 아웃바운드로 역전되는 특성이 발생한다.Referring to FIG. 5 , since a data flow between a normal server and a client involves many requests from a client to a server, the amount of inbound data is greater than the amount of outbound data. On the other hand, since the infected computing device 200-1 operates to transmit internal data to the attacker server 300, if a data leakage action occurs in the infected computing device 200-1, the amount of outbound data increases, resulting in a decrease in data flow. A characteristic in which the directionality is reversed to the outbound occurs.

탐지 장치(100)는 인바운드와 아웃바운드의 데이터 전송량을 기반으로 데이터 흐름의 방향을 측정한다. 탐지 장치(100)는 동일한 출발지와 목적지로 그룹핑된 일정 기간의 세션 데이터에서, 출발지 전송 바이트의 합(src_bytes_sum)과 목적지 전송 바이트의 합(dst_bytes_sum)을 이용하여 방향성 통신 지표를 계산할 수 있다. 탐지 장치(100)는 수학식 5와 같이, 출발지 전송 바이트의 합(src_bytes_sum)과 목적지 전송 바이트의 합(dst_bytes_sum)의 차이를, 전체 전송 바이트의 합으로 나눈 값을 방향성 통신 지표로 사용할 수 있다.The detection device 100 measures the direction of data flow based on the amount of inbound and outbound data transmission. The detection device 100 may calculate a directional communication index by using the sum of source transmission bytes (src_bytes_sum) and the destination transmission byte sum (dst_bytes_sum) in session data of a certain period grouped to the same source and destination. As shown in Equation 5, the detection device 100 may use a value obtained by dividing the difference between the sum of source transmitted bytes (src_bytes_sum) and the destination sum of transmitted bytes (dst_bytes_sum) by the sum of all transmitted bytes as a directional communication index.

[수학식 5][Equation 5]

방향성 값= (src_bytes_sum-dst_bytes_sum)/(src_bytes_sum+dst_bytes_sum)Direction value = (src_bytes_sum-dst_bytes_sum)/(src_bytes_sum+dst_bytes_sum)

이와 같이, 탐지 장치(100)는 출발지 전송(아웃바운드) 데이터량이 목적지 전송(인바운드) 데이터량보다 많으면, 정상 서버의 목적과는 다르게 발생되는 데이터 흐름으로 판단하고, 데이터 유출 행위가 의심되는 연결로 추정할 수 있다. In this way, if the amount of source transmission (outbound) data is greater than the destination transmission (inbound) data amount, the detection device 100 determines that it is a data flow that occurs differently from the purpose of the normal server, and establishes a connection suspected of data leakage. can be estimated

다음에서 확장성 통신 지표에 대해 설명한다. 확장성은 내부망에 다발적인 영향을 유발할 수 있는 잠재된 위협을 지닌 서버를 탐지하기 위한 통신 지표이다. In the following, scalable communication metrics are explained. Scalability is a communication index to detect servers with potential threats that can cause multiple effects on the internal network.

악성 코드 등에 감염된 컴퓨팅 장치(200-1)는 공격자 서버(300)에 연결되고, 내부망에 추가적인 감염을 시도하여 감염 규모가 확장된다. 이러한, 내부망 감염 확산으로 인해 특정 목적지로 연결되는 세션 수가 늘어나게 된다.The computing device 200-1 infected with malicious code is connected to the attacker's server 300 and attempts an additional infection to the internal network, thereby expanding the infection scale. Due to the spread of internal network infection, the number of sessions connected to a specific destination increases.

탐지 장치(100)는 세션 데이터에서, 목적지 IP 주소 및 포트로 특정되는 목적지마다, 연결된 출발지 IP 주소 수를 집계하고, 출발지 IP 주소 수를 확장성 통신 지표로 계산할 수 있다. 시간에 따라 특정 목적지의 확장성 값이 증가하면, 내부망 감염 확산으로 탐지될 수 있다.The detection device 100 may count the number of connected source IP addresses for each destination specified by the destination IP address and port in the session data, and calculate the number of source IP addresses as a scalability communication index. If the scalability value of a specific destination increases over time, it can be detected as an internal network infection spread.

이와 같이, 탐지 장치(100)는 확장성 값을 이용하여 내부 컴퓨팅 장치들이 연결된 서버를 식별하고, 식별된 서버를 내부망에 다발적인 영향을 유발할 수 있는 서버로 추정할 수 있으며, 내부 컴퓨팅 장치들의 감염 확산 정도를 파악할 수 있다.In this way, the detection device 100 can use the scalability value to identify the server to which the internal computing devices are connected, estimate the identified server as a server that can cause multiple effects on the internal network, and The degree of spread of the infection can be determined.

예를 들어, 탐지 장치(100)가 날짜별로 출발지와 목적지로 그룹핑된 세션 데이터 그룹을 사용하여, 표 4와 같이, 출발지와 목적지쌍에 대한 통신 지표들을 계산할 수 있다. For example, the detection device 100 may calculate communication parameters for a source and destination pair, as shown in Table 4, using a session data group grouped by date into source and destination.

날짜date src_ipsrc_ip dst_ipdst_ip dst_portdst_port 지속성
(BIN)persistence
(BIN) 신호성signaling 접근성accessibility 방향성directional 은닉성concealment 01010101 aa AA 16041604 1717 0.0450.045 0.8580.858 0.1630.163 0.05260.0526 01010101 bb BB 8181 1818 0.0320.032 0.8330.833 -0.252-0.252 0.18750.1875 01010101 cc CC 47824782 2424 0.0530.053 0.1520.152 0.2150.215 12931293 01020102 aa AA 16041604 2121 0.0450.045 0.8580.858 0.1590.159 0.0430.043 01020102 bb BB 8181 2121 0.0320.032 0.8330.833 -0.253-0.253 0.0430.043 01020102 cc CC 47824782 2424 0.0530.053 0.1580.158 0.2150.215 15151515

출발지a-목적지A, 그리고 출발지b-목적지B의 지속성 값(BIN)이 일정 수준 이상이고, 신호성 값이 0에 가깝고, 접근성 값이 1에 가까우며, 은닉성 값이 0에 가깝기 때문에, 탐지 장치(100)는 이들을 세션 유지 목적으로 최소한의 통신을 하는 감염된 컴퓨팅 장치와 공격자 서버로 추정할 수 있다. 출발지c-목적지C의 지속성 값이 전체 시간 구간 수에 가깝고, 신호성 값이 0에 가깝기 때문에, 탐지 장치(100)는 감염된 출발지c가 목적지C로부터 명령을 수신하기 전까지, 주기적으로 목적지C에 접속하여 자신의 상태를 전송하거나, 명령을 확인하는 것으로 추정할 수 있다. Since the persistence value (BIN) of source a-destination A and source b-destination B is above a certain level, the signality value is close to 0, the accessibility value is close to 1, and the concealment value is close to 0, the detection device In (100), they can be assumed to be the infected computing device and the attacker server that have minimal communication for the purpose of maintaining a session. Since the persistence value of source c-destination C is close to the total number of time intervals and the signaling value is close to 0, the detection device 100 periodically accesses destination C until the infected source c receives a command from destination C. It can be inferred that it transmits its status or confirms the command.

도 6은 한 실시예에 따른 악성 통신 탐지 방법의 흐름도이다.6 is a flowchart of a malicious communication detection method according to an embodiment.

도 6을 참고하면, 탐지 장치(100)는 컴퓨팅 장치들과 외부 서버들이 연결된 세션 데이터를 수집한다(S110).Referring to FIG. 6 , the detection device 100 collects session data in which computing devices and external servers are connected (S110).

탐지 장치(100)는 일정 기간(예를 들면, 하루)마다의 세션 데이터를 동일한 출발지와 목적지끼리 그룹핑하고, 그룹핑된 세션 데이터 그룹마다의 통계 지표값을 계산한다(S120). 탐지 장치(100)는 일정 기간의 세션 데이터를 동일한 출발지와 목적지의 세션 데이터로 그룹핑할 수 있다. 탐지 장치(100)는 출발지 IP 주소, 그리고 목적지 IP 주소 및 목적지 포트를 이용하여, 전체 세션 데이터를 동일한 출발지와 목적지를 가지는 각 연결의 분석 데이터로 그룹핑할 수 있다. 탐지 장치(100)는 각 연결의 분석 데이터에서 타임스탬프 개수, 접속 유지 시간의 합, 출발지 전송 바이트의 합, 목적지 전송 바이트의 합, 출발지 전송 패킷의 합, 목적지 전송 패킷의 합, 그리고 세션 수를 포함하는 통계 지표값을 계산할 수 있다. The detection device 100 groups session data for a certain period (eg, one day) with the same origin and destination, and calculates a statistical indicator value for each grouped session data group (S120). The detection device 100 may group session data of a certain period into session data of the same source and destination. The detection device 100 may group all session data into analysis data of each connection having the same source and destination using the source IP address, destination IP address, and destination port. The detection device 100 determines the number of timestamps, the sum of connection maintenance times, the sum of source transmission bytes, the sum of destination transmission bytes, the sum of source transmission packets, the sum of destination transmission packets, and the number of sessions in the analysis data of each connection. Statistical index values can be calculated.

탐지 장치(100)는 각 세션 데이터 그룹의 통계 지표값을 이용하여, 해당 출발지와 목적지에 대한 통신 지표들을 추출한다(S130). 통신 지표들은 지속성, 은닉성, 접근성, 신호성, 방향성 및 확장성을 포함할 수 있다. 여기서, 확장성은 목적지를 기준을, 이에 연결된 출발지 수로 계산될 수 있다.The detection device 100 extracts communication indicators for the corresponding source and destination using statistical indicator values of each session data group (S130). Communication metrics may include persistence, concealment, accessibility, signaling, directionality and scalability. Here, scalability can be calculated based on the destination and the number of departure points connected to it.

탐지 장치(100)는 출발지와 목적지쌍마다의 통신 지표들을 이용하여, 임의 출발지 및 목적지를 악성 통신에 관계된 감염된 컴퓨팅 장치 및 공격자 서버로 탐지한다(S140). 탐지 장치(100)는 시간에 따른 통신 지표의 변화 패턴을 보고, 임의 출발지 및 목적지가 악성 통신에 관계된 감염된 컴퓨팅 장치 및 공격자 서버인지 탐지할 수 있다. 탐지 장치(100)는 통신 지표들을 이용하여 악성 통신의 종류를 탐지할 수 있다. 탐지 장치(100)는 통신 지표들 중 악성 통신의 뚜렷한 특징을 가지는 하나의 통신 지표를 이용하여 악성 통신을 탐지할 수 있고, 복수의 통신 지표들을 종합해서 악성 통신을 탐지할 수 있다. 이를 통해, 탐지 장치(100)는 공격자 서버가 정보보호시스템을 우회하기 위해, 내부망의 컴퓨팅 장치를 감염시키고, 감염된 컴퓨팅 장치가 공격자 서버와 세션 연결한 것인지 탐지할 수 있다.The detection device 100 detects an arbitrary source and destination as an infected computing device and an attacker server related to malicious communication using communication indicators for each source and destination pair (S140). The detection device 100 may look at a change pattern of the communication indicator over time and detect whether an arbitrary source and destination are an infected computing device and an attacker server related to malicious communication. The detection device 100 may detect the type of malicious communication using communication indicators. The detection device 100 may detect malicious communication by using one communication index having distinct characteristics of malicious communication among communication indexes, and may detect malicious communication by synthesizing a plurality of communication indexes. Through this, the detection device 100 may infect the computing device of the internal network so that the attacker server bypasses the information protection system, and detects whether the infected computing device has a session connection with the attacker server.

이와 같이, 실시예에 따르면 온프레미스(on-Premise) 또는 클라우드의 네트워크에서 세션 기반으로 내부통신(East-West) 및 외부통신(North-South)의 통계적 변화를 분석함으로써, 빠른 시간에 악성 통신의 위협 징후를 탐지 및 대응할 수 있다.In this way, according to the embodiment, by analyzing statistical changes in internal communication (East-West) and external communication (North-South) on a session basis in an on-premise or cloud network, malicious communication can be detected in a short time. Detect and respond to threat signs.

이상에서 설명한 본 개시의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 개시의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present disclosure described above are not implemented only through devices and methods, and may be implemented through a program that realizes functions corresponding to the configuration of the embodiments of the present disclosure or a recording medium on which the program is recorded.

이상에서 본 개시의 실시예에 대하여 상세하게 설명하였지만 본 개시의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 개시의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 개시의 권리범위에 속하는 것이다.Although the embodiments of the present disclosure have been described in detail above, the scope of the present disclosure is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present disclosure defined in the following claims are also included in the present disclosure. that fall within the scope of the right.

Claims

As a method of operating a malicious communication detection device,
Collecting session data in which managed computing devices and external servers are connected;
Grouping session data for each period with the same origin and destination, and calculating a statistical index value for each grouped session data group;
Extracting communication indicators for the corresponding source and destination using statistical indicator values of each session data group; and
Detecting an arbitrary source and destination as an infected computing device and an attacker server involved in malicious communication using communication indicators for each source and destination pair.
Operation method including.

In paragraph 1,
wherein the communication indicators include at least a plurality of persistence, concealment, accessibility, signaling, directivity, and extensibility.

In paragraph 2,
The persistence is a communication index used to detect a session that is periodically created based on the number of time intervals in which sessions exist, and is calculated based on the number of timestamps and the connection maintenance time among the statistical index values.

In paragraph 2,
The concealment is a communication index used to detect a session with a low frequency of session creation and a long-term session, and is calculated based on the number of sessions among the statistical index values.

In paragraph 2,
The accessibility is a communication indicator used to detect a backdoor-type session that maintains a session by performing minimal communication, and is calculated based on the sum of connection maintenance times compared to the sum of source and destination transmission packets among the statistical index values. to be, how it works.

In paragraph 2,
The signaling is a communication index used to detect a beacon-type session maintaining a session by performing minimal communication, and is based on the sum of source and destination transmission bytes compared to the sum of source and destination transmission packets among the statistical index values. Method of operation, calculated as

In paragraph 2,
The directionality is a communication index used to detect a session used for information leakage based on the amount of outbound data, which is calculated based on a difference between a sum of source transmission bytes and a destination transmission byte sum among the statistical index values. Way.

In paragraph 2,
The scalability is a communication index used to detect the spread of an internal network infection based on an increase in the number of sources connected to a specific destination over time, and is calculated based on the number of source IP addresses connected to each destination.

In paragraph 1,
The step of detecting
Detecting an infected computing device and an attacker server related to the malicious communication by comparing a persistence value, an accessibility value, and a signaling value among the communication indicators for an arbitrary source and destination with reference values.

In paragraph 1,
The step of detecting
An operation method of detecting an infected computing device and an attacker server related to data exfiltration communication by comparing a value of confidentiality and a value of directionality and reference values among the communication indicators for an arbitrary source and destination.

In paragraph 1,
The above statistical index value is
A method of operation, including a number of timestamps, a sum of connection hold times, a sum of source transmitted bytes, a sum of destination transmitted bytes, a sum of source transmitted packets, a sum of destination transmitted packets, and a number of sessions.

As a method of operating a malicious communication detection device,
Collecting session data of origin and destination;
Based on the session data, a statistical index value including the number of timestamps, the sum of connection maintenance times, the sum of source transmitted bytes, the sum of destination transmitted bytes, the sum of source transmitted packets, the sum of destination transmitted packets, and the number of sessions. step of calculating ,
Extracting communication indicators used for detecting malicious communication using the statistical indicator values; and
Detecting the source and the destination as an infected computing device and an attacker server using the communication indicators.
Operation method including.

In paragraph 12,
The step of detecting
Detect whether the source and the destination are the infected computing device and the attacker server related to malicious communication by comparing values of persistence, accessibility, and signality among the communication indicators with reference values;
The persistence is a communication index used to detect a session that is periodically created based on the number of time intervals in which sessions exist, and is calculated based on the number of timestamps and the connection maintenance time,
The accessibility is a communication index used to detect a backdoor-type session maintaining a session by performing minimal communication, and is based on the sum of the connection maintenance time compared to the sum of the source transport packets and the destination transport packets. is calculated,
The signality is a communication index used to detect a beacon-type session maintaining a session by performing minimal communication, and is the sum of the source transport bytes and the destination compared to the sum of the source transport packets and the destination transport packet. The method of operation, calculated based on the sum of transmitted bytes.

In paragraph 12,
The step of detecting
Detect whether the source and the destination are the infected computing device and the attacker server involved in data exfiltration communication by comparing the value of confidentiality and the value of directionality and reference values among the communication indicators;
The concealment is a communication index used to detect a session that has a low frequency of session creation and is maintained for a long time, and is calculated based on the number of sessions,
The directionality is a communication index used to detect a session used for information leakage based on an outbound data amount, and is calculated based on a difference between the sum of source transmission bytes and the destination transmission byte sum.

In paragraph 12,
Detecting the spread of an internal network infection by the destination when the number of connected origins connected to the destination increases over time.
Further comprising a method of operation.