KR101124615B1

KR101124615B1 - Apparatus and methdd of searching group activity malicious code

Info

Publication number: KR101124615B1
Application number: KR1020080006826A
Authority: KR
Inventors: 이희조; 최현상
Original assignee: 고려대학교 산학협력단
Priority date: 2008-01-22
Filing date: 2008-01-22
Publication date: 2012-03-19
Also published as: KR20090080841A

Abstract

본 발명은 악성코드에 감염된 호스트들의 그룹행위를 기반으로 집단행동 악성코드를 검색하는 방법에 관한 것으로, 검색 대상 네트워크의 트래픽 데이터를 수집하여 상기 트래픽 데이터를 검토하여 서로 다른 호스트에서 일어나는 공통행위인 그룹행위와 상기 그룹행위의 행위대상을 기준으로 복수의 호스트 그룹을 작성하고, 상기 복수의 호스트 그룹 각각에 속한 호스트가 얼마나 일치하는지를 나타내는 상기 복수의 호스트 그룹간 유사도를 측정하고, 상기 유사도를 이용하여 집단행동 악성코드를 찾는다. The present invention relates to a method of searching for collective behavioral malware based on group behavior of hosts infected with malware, and collects traffic data of a search target network and examines the traffic data to group common behaviors occurring in different hosts. Create a plurality of host groups based on the action and the object of action of the group action, measure the similarity between the plurality of host groups indicating how the hosts belonging to each of the plurality of host groups match, and use the similarity to group Find behavioral malware.

봇넷, 그룹행위, 집단행동 악성코드 Botnets, group behaviors, collective behavior malware

Description

Group behavior malware detection method and device {APPARATUS AND METHDD OF SEARCHING GROUP ACTIVITY MALICIOUS CODE}

본 발명은 집단행동 악성코드 검색 방법에 관한 것으로 특히, 악성코드에 감염된 호스트들의 그룹행위를 기반으로 집단행동 악성코드를 검색하는 방법에 관한 것이다. The present invention relates to a method for searching group behavior malware, and more particularly, to a method for searching for group behavior malware based on group behavior of hosts infected with malware.

인터넷 기술의 발달과 이용인구의 증가로 인터넷을 구성하는 많은 기반시설들이 급격히 설치되면서 보안적 요소를 치밀하게 준비하지 못한 관계로 인터넷의 많은 취약점이 노출되고, 이런 취약점을 악용하려는 세력들이 등장하고 불특정 다수를 공격하는 사이버 테러가 빈번히 발생한다. 이러한 사이버 공격은 갈수록 지능화 조직화 되고 있으며 경제적 이득을 취득하기 위해 전문적이고 조직적인 네트워크를 형성해 나가고 있다. 이처럼 근래 사이버 공격자들은 불법행위를 대행하여 금전적 이득을 취득하기 위해 자신들의 존재와 위치 추적이 쉽지 않도록 하고, 자신들의 통제 하에 수많은 컴퓨터를 제어하는 기술을 고안하게 되었으며 이러한 목적으로 등장하게 된 것이 봇넷이다. Due to the rapid development of many infrastructures that make up the Internet due to the development of Internet technology and the increase of the population, many vulnerabilities of the Internet are exposed due to the incomplete preparation of security factors. Cyber terrorism, which attacks the majority, frequently occurs. These cyber attacks are becoming increasingly intelligent and organized, and are forming professional and organized networks to gain economic benefits. As such, cyber attackers have come up with technology to control their existence and location, to control numerous computers under their control in order to obtain financial gains for illegal activities, and botnets have emerged for this purpose. .

공격자는 수천에서 수만 대의 컴퓨터에 봇을 설치하고 이들을 네트워크를 통 해 동시에 제어하여 악성행위를 대행하게 한다. 봇넷의 규모가 커지며 이를 이용하여 공격하는 종류와 방법도 다양하게 나타나고 있으며 그에 따른 피해 또한 점점 심각해지고 있어서 사이버 보안의 최대의 이슈로 대두되고 있다. Attackers install bots on thousands and tens of thousands of computers and control them over the network at the same time, making them perform malicious activities. As botnets grow in size, there are various types and methods of attack using them, and the damages are becoming more serious, which is the biggest issue of cyber security.

최근에 행해지는 많은 분산 서비스 거부(Distributed denial of service, 이하 "DDoS"라 함) 공격, 스패밍(spamming), 파밍(Pharming) 등의 다양한 악성공격들은 대부분 봇넷에 의해서 행해지고 있다. 악의를 가진 해커가 PC(personal computer)를 감염시켜 자신이 마음대로 조종할 수 있는 봇으로 만들고 이렇게 감염된 PC들의 수천, 수만대를 네트워크를 통해 일제히 조종하면서 악성행위들을 수행한다. 봇들을 조종, 통제하는 권한을 가진 봇 마스터(Bot master)에 의해 원격 조종되는 수천에서 수십만 대의 봇들이 네트워크로 연결되어 있는 형태를 봇넷이라고 한다. Recently, many malicious attacks such as distributed denial of service (DDoS), spamming, and pharming are mostly conducted by botnets. A malicious hacker infects a personal computer (PC), making it a bot that can be controlled at will, and performing malicious activities by controlling thousands and tens of thousands of these PCs all over the network. A botnet is a network of thousands to hundreds of thousands of bots that are remotely controlled by a bot master who has the authority to control and control them.

보통 여러 가지 경로를 통해서 PC가 봇에 감염되게 되는데 스팸 메일에 실행코드를 클릭하여 감염되는 경우, 웜에 감염코드를 싣고 취약성을 가진 PC들을 감염시키는 경우, 메신저를 통해 감염되는 경우 등 그 수법이 매우 다양하다. 게다가 루트킷을 이용한 감염을 수행하는 경우가 많아 감염여부를 쉽게 확인하기 힘들다. In general, PCs are infected by bots through various paths. If you click on the executable code in the spam mail, you can load the infection code into the worm and infect the vulnerable PCs. Very diverse In addition, infections using rootkits are often difficult to identify.

봇 마스터는 자신과 봇넷 사이의 명령전달 및 제어를 하기 위하여 IRC(Internet Relay Chat) 채널을 주로 이용 한다. 1. 감염이 된 PC는 자동으로 2. 최근의 봇 코드를 받아 자신을 업데이트하고 3. 명령/제어 서버로 접속을 한다. 5. 봇넷에서 주로 TCP 6665 ~ 6669 포트를 사용하며, DDoS공격과 피싱, 파밍, 스패밍등 사이버 상의 많은 공격을 수행한다. 현재 대부분의 DDoS공격과 스팸메일은 봇넷 을 통해 발생하고 있는 것이 보고 되었고 이를 통해 많은 심각한 피해들이 발생하고 있다. The bot master uses the IRC (Internet Relay Chat) channel mainly for command transfer and control between itself and the botnet. 1. The infected PC automatically receives the latest bot code 2. Updates itself and connects to the command / control server. 5. The botnet mainly uses TCP 6665 ~ 6669 ports and performs many cyber attacks such as DDoS attack, phishing, pharming, and spamming. Currently, most DDoS attacks and spam e-mails have been reported through botnets, and many serious damages have occurred.

그 동안 봇넷은 기업에 큰 피해를 입혀 왔다. 악성 봇이 감염시킬 대상을 찾기 위해 네트워크를 스캔하면서 트래픽을 크게 증가시키기 때문에 네트워크가 매우 느려지거나 장애를 일으켜서 기업에서는 거의 업무를 할 수 없게 된다. 게다가 요즘에는 봇넷이 다른 보안 공격에 악용되면서 그 위험과 피해는 더욱 크게 증가하는 실정이다.In the meantime, botnets have caused significant damage to businesses. Malicious bots can greatly increase traffic as they scan the network for targets to infect, making the network very slow or disruptive, leaving the enterprise almost unworkable. Moreover, as botnets are used in other security attacks these days, the risks and damages increase even more.

봇넷을 통해 악성코드나 스파이웨어가 배포되고, 스팸 메일이 많이 뿌려진다. 특히 봇넷을 통해 이뤄지는 DDoS 공격의 피해는 매우 크다. Malware or spyware is distributed through botnets, and spam mails are heavily distributed. In particular, the damage from DDoS attacks through botnets is very large.

2007년 2월에 발생한, 전세계 13개 루트 도메인 네임 서버(domain name server, 이하 "DNS"라 함) 서버에 대한 DDoS 공격이나, 2006년 하반기부터 늘어나기 시작한 국내 화상 채팅 서비스 등 성인용 서비스 업체들에 대한 공격, 2007년 10월에 있었던 게임 아이템 중개 사이트에 대한 공격에 이르기까지 봇넷을 통한 공격은 막강한 위력을 발휘하였다. For adult service providers, including DDoS attacks against 13 root domain name servers ("DNS") servers worldwide in February 2007, and domestic video chat services that began to grow in the second half of 2006. Attacks on botnets, including the attacks on the game item brokerage site in October 2007, were extremely powerful.

국내, 해외 할 것 없이 봇넷은 매우 광범위하게 퍼져 있다. 한국정보보호진흥원(KISA)에서는, 어떤 PC가 봇 제어 서버에 연결하려고 할 때 DNS 서버에서 실제 IP가 아닌 특정 IP를 돌려줌으로써 봇 제어 서버와의 연결을 차단하는 DNS 싱크홀 방식으로 국내 봇에 감염된 PC의 통계를 내고 있는데, 2007년 10월 현재 국내 악성 봇 감염율은 11.7%에 이른다. (한국정보보호진흥원, 인터넷침해사고 동향 및 분석월보, 2007.10. 여기에서는 국내 악성 봇 감염율을 전세계 악성 봇 감염 추정 PC 중 국내 봇 감염 PC가 차지하는 비율로 정의한다.)Botnets are very widespread, at home and abroad. In the Korea Information Security Agency (KISA), when a PC attempts to connect to a bot control server, it is infected with a domestic bot by a DNS sinkhole method that blocks the connection with the bot control server by returning a specific IP from the DNS server. PC statistics are published. As of October 2007, the domestic malicious bot infection rate reached 11.7%. (Korea Information Security Agency, Internet Infringement Incident Trend and Analysis Monthly Report, October 2007. Here, we define the domestic malicious bot infection rate as the percentage of domestic bot infected PCs among the estimated malicious bot infection PCs worldwide.)

인터넷의 아버지라고 불리우는 구글의 빈트 서브 부사장은 2007년 1월 스위스 다보스에서 열린 세계경제포럼에서 봇넷의 확산을 전세계적인 유행병이라고 비유하며 인터넷에 접속하는 6억 대의 PC 중 1억~1억5000만대가 봇넷으로 이용되고 있고, 이 컴퓨터의 이용자들은 대개 봇넷에 참여하기를 원하지 않는 피해자들이라고 밝혔다. Vint Serve, Google's father of the Internet, said at the World Economic Forum in Davos, Switzerland, in January 2007 that the spread of botnets is a global epidemic, with 100 million to 150 million PCs connected to the Internet. Are being used as botnets, and users of these computers are usually victims who do not want to join the botnet.

통계나 추정에 따라 편차가 있긴 하지만 전세계적으로 봇에 감염되어 봇넷의 좀비 PC로 활동하는 PC의 수가 매우 많다는 것은 분명해 보인다.Although there are variations based on statistics and estimates, it seems clear that there are a large number of PCs infected with bots worldwide and acting as zombies on botnets.

최근 봇넷에 대응하기 위해 봇을 탐지하는 여러 기술들이 제안이 되고 있으나 아직까지 제안된 기술들이 원론적 기술 제안 수준의 한계점을 나타내고 있다. Recently, a number of techniques for detecting bots have been proposed to cope with botnets, but the proposed techniques have shown the limitations of the level of the original technology proposal.

본 발명이 이루고자 하는 기술적 과제는 효율적인 집단행동 악성코드 검색 방법을 제공하는 것이다. The technical problem to be achieved by the present invention is to provide an efficient collective behavior malware search method.

상기 과제를 달성하기 위한 본 발명의 하나의 특징에 따른 집단행동 악성코드 검색 방법은 검색 대상 네트워크의 트래픽 데이터를 수집하여 상기 트래픽 데이터를 검토하여 서로 다른 호스트에서 일어나는 공통행위인 그룹행위와 상기 그룹행위의 행위대상을 기준으로 복수의 호스트 그룹을 작성하고, 상기 복수의 호스트 그룹 각각에 속한 호스트가 얼마나 일치하는지를 나타내는 상기 복수의 호스트 그룹 간 유사도를 측정하고, 상기 유사도를 이용하여 집단행동 악성코드를 찾는다. Group behavior malware search method according to an aspect of the present invention for achieving the above object is to collect the traffic data of the searched network to review the traffic data and the common behavior that occurs in different hosts and the group behavior Create a plurality of host groups based on the target of the behavior, measure similarity between the plurality of host groups indicating how closely the hosts belonging to each of the plurality of host groups match, and use the similarity to find the group behavior malware. .

상기 과제를 달성하기 위한 본 발명의 다른 특징에 따른 집단행동 악성코드 검색장치는 검색 대상 네트워크의 트래픽 데이터를 수집하는 제1 수단, 상기 제1 수단이 수집한 트래픽 데이터를 검토하여 그룹행위와 상기 그룹행위의 행위대상을 기준으로 복수의 호스트 그룹을 작성하는 제2 수단 및 상기 복수의 호스트 그룹 중에서 제1 그룹과 제2 그룹에 각각 속한 호스트들이 얼마나 일치하는지를 나타내는 유사도를 측정하여 상기 유사도를 이용하여 집단행동 악성코드를 찾는 제3 수단을 포함하고, 상기 제1 그룹과 상기 제2 그룹은 동일한 행위대상에 대한 동일한 그룹행위를 한 인접한 단위 시간의 그룹이다. In accordance with another aspect of the present invention, there is provided a group behavior malicious code search apparatus comprising: first means for collecting traffic data of a search target network; A second means for creating a plurality of host groups based on the action target of the action, and a similarity measure indicating how the hosts belonging to the first group and the second group among the plurality of host groups match each other and using the similarity group And third means for finding behavioral malware, wherein the first group and the second group are contiguous unit time groups that have performed the same group behavior for the same behavior object.

이상과 같이 본 발명에 의하면, 그룹행위를 기반으로 집단행동 악성코드를 검색함으로써 검색의 정확성을 높일 수 있고 검색율을 향상시킬 수 있다. As described above, according to the present invention, the accuracy of the search can be improved and the search rate can be improved by searching for the group behavior malware based on the group behavior.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise. In addition, the terms “… unit”, “… group”, etc. described in the specification mean a unit for processing at least one function or operation, which may be implemented in hardware or software or a combination of hardware and software.

먼저, 봇넷의 행동 순서 및 봇넷의 특성에 대해 도 1을 참조하여 설명한다. 도 1은 봇넷의 라이프 사이클을 나타낸 도면이다. First, the behavior sequence of the botnet and the characteristics of the botnet will be described with reference to FIG. 1. 1 is a diagram illustrating the life cycle of a botnet.

봇은 집단행동 악성코드에 감염된 호스트이고, 봇넷은 봇들을 조정 통제하는권한을 가진 봇 마스터(Bot master)에 의해 원격 조종되는 수천에서 수십만 대의 봇들이 네트워크로 연결되어 있는 형태를 의미한다. Bots are hosts infected with collective behavioral malware, and botnets are networks of thousands to hundreds of thousands of bots remotely controlled by a bot master that has the authority to control and control bots.

도 1에 도시된 바와 같이, 봇넷은 공격자가 만든 개인의 봇 악성코드를 이용해 취약성을 갖는 호스트를 감염시킨다(S110). 감염 시키는 방법은 이메일의 첨부파일을 이용하는 방법, 메신저를 통해 감염시키는 방법, 인터넷 웜을 이용하는 방법 등 다양하다.As shown in FIG. 1, the botnet infects a host having a vulnerability using a bot malware of an individual made by an attacker (S110). There are various ways to infect, including how to use attachments in emails, how to infect via messenger, and how to use Internet worms.

다음으로, 감염된 호스트는 실제 봇 코드를 다운로드한다(S120). S110 단계에서 직접 봇 코드를 다운로드 하지 않는 이유는 봇 코드 사이즈가 크고, 봇을 전파시키는 공격자가 자신의 봇 코드를 최신 데이터로 자주 업데이트를 하기 때문이다.Next, the infected host downloads the actual bot code (S120). The reason for not downloading the bot code directly in step S110 is that the bot code size is large and the attacker who propagates the bot frequently updates his bot code with the latest data.

감염된 호스트는 DNS 룩업(look up)을 수행한다(S130). 감염된 호스트는 다운로드한 봇 코드를 실행하여 봇 코드 내에 존재하는 명령 및 제어(Command and Control, 이하 "C&C"라 함) 채널 서버의 도메인 이름을 DNS로 전송하여 C&C 채널 서버의 IP 주소를 DNS에게 질의 한다. 거의 대부분의 봇은 DNS 룩업 과정을 수행한다. C&C 채널 서버는 봇 마스터가 봇을 통제하기 위한 채널 서버이고, 감염된 호스트의 봇 코드가 C&C 채널 서버의 도메인 이름을 이용하여 자동으로 C&C 채널에 접속한다.The infected host performs a DNS look up (S130). The infected host executes the downloaded bot code and sends the domain name of the Command and Control (C & C) channel server in the bot code to DNS to query DNS for the IP address of the C & C channel server. do. Most bots perform a DNS lookup process. The C & C channel server is a channel server for the bot master to control the bot, and the bot code of the infected host automatically connects to the C & C channel using the domain name of the C & C channel server.

감염된 호스트는 DNS로부터 응답으로 받은 IP 주소를 이용해 C&C 채널 서버에 접속을 하여 C&C 채널에 합류한다(S140). 일반적으로 아이알씨(Internet relay chat, 이하 "IRC"라 함)가 C&C 채널서버로 이용된다. The infected host accesses the C & C channel server using the IP address received in response to the DNS and joins the C & C channel (S140). In general, Internet relay chat (hereinafter referred to as "IRC") is used as a C & C channel server.

봇 마스터는 C&C 채널 서버에 접속하여 C&C 채널 서버를 통해 봇 호스트들을 통제하고 봇 호스트들에게 악성공격 명령을 전달한다(S150, S160). 그러면 악성공격 명령을 받은 봇 호스트들은 DDoS, 스패밍, 개인정보 유출 등의 악성 봇 행위를 수행한다. The bot master connects to the C & C channel server and controls the bot hosts through the C & C channel server and transmits a malicious attack command to the bot hosts (S150 and S160). The bot hosts that received the malicious attack command then perform malicious bot actions such as DDoS, spamming, and leakage of personal information.

봇넷은 봇 마스터의 공격 명령을 수행하기 위해 보통 IRC 채널 서버에 상주하여 대기하고 있다. 채널 서버에 접속, 채널 서버에서 대기, 공격의 수행 등 봇넷의 행위는 그룹으로 행위 특성을 나타내는 그룹행위적 특성이 있다는 점에서 정상 호스트들의 행위와 구분된다. 그룹행위란 단위 시간에 서로 다른 호스트에서 일어나는 공통행위를 의미한다. 봇넷의 그룹행위적 특성을 위에서 설명한 봇넷의 행동 순서와 연관 지어 살펴보면 다음과 같다.Botnets usually reside on an IRC channel server and wait to perform attack commands from the bot master. Botnet behaviors such as accessing channel servers, waiting on channel servers, and conducting attacks are distinguished from those of normal hosts in that they have group behavioral characteristics that represent behavioral characteristics in groups. Group behavior refers to common behavior that occurs at different hosts in a unit time. The group behavioral characteristics of botnets are related to the botnet's order of action as described above.

봇넷은 취약성을 갖는 호스트를 탐지하고, 봇 악성코드를 웜을 통해 전파하는 과정에서 그룹행위 특성을 나타낸다. 즉 봇넷에 속한 호스트들은 호스트 스캐닝, 포트 스캐닝 및 웜 코드 전파의 악성행위를 하는데 이 과정에서 그룹행위적 특 성을 확인 할 수 있다.Botnets exhibit group behavior in the process of detecting vulnerable hosts and spreading bot malware through worms. In other words, hosts belonging to botnets perform malicious behaviors of host scanning, port scanning, and worm code propagation. In this process, group behavioral characteristics can be confirmed.

감염된 호스트의 봇 코드 다운로드 과정에서 봇 코드를 다운로드하는 행위도그룹행위적 특성이 있다. Downloading bot code in the process of downloading bot code from infected hosts also has group behavior.

봇넷의 구성호스트들이 C&C 채널 서버를 찾기 위한 DNS 룩업과정에서 DNS에 질의를 하는 행위도 그룹 행위적 특성을 갖는다. The behavior of querying DNS in the DNS lookup process for botnet constituent hosts to find C & C channel servers also has group behavior.

C&C 채널에 합류한 봇넷의 구성 호스트들이 자신의 채널 접속 상태를 알리기 위해 IRC가 사용하는 Ping과 Pong 메시지를 주기적으로 주고 받는 행위도 그룹행위적 특성을 갖는다.The botnet constituent hosts joining the C & C channel periodically send and receive the Ping and Pong messages used by the IRC to inform their channel connection status.

봇넷에 속한 호스트들이 악성공격을 수행하는 과정에서 DDoS, 스패밍 등의 악성행위도 그룹행위적 특성을 나타낸다.In the process of malicious attacks by hosts belonging to botnets, malicious behaviors such as DDoS and spamming also exhibit group behavioral characteristics.

정상적인 그룹행위와 봇넷의 그룹행위는 차이점을 지니며, DNS 룩업의 경우는 표 1과 같은 차이점이 있다. There is a difference between normal group behavior and botnet group behavior, and DNS lookups are as shown in Table 1.

도메인 네임에 접속하는 소스 IPSource IP to connect to domain name 행위 패턴Behavior pattern DNS 타입DNS type 봇넷 DNS 질의 특성Botnet DNS Query Characteristics 고정된 크기의 그룹(봇넷 멤버들)Fixed size group (botnet members) 그룹행위가 간헐적으로 나타남Group behavior is intermittent 주로 DDNS(dynamic DNS)DDNS (dynamic DNS) 정상 DNS 질의 특성Normal DNS Query Characteristics 정상 유저들Normal users 비그룹행위가 무작위적으로 계속 나타남Non-Group Behavior Continues at Random 주로 DNSPrimarily DNS

다음으로, 본 발명의 실시예에 따른 집단행동 악성코드 검색장치에 대해 도 2를 참조하여 설명한다. 도 2는 본 발명의 실시예에 따른 집단행동 악성코드 검색 장치의 구조도이다. Next, the collective behavior malware search apparatus according to an embodiment of the present invention will be described with reference to FIG. 2 is a structural diagram of a collective behavior malware search apparatus according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 실시예에 따른 집단행동 악성코드 검색장치는 데이터 수집기(210), 그룹행위 분류기(220), 유사도 분석기(230) 및 봇넷 리포터(240)를 포함한다. As shown in FIG. 2, the group behavior malware search apparatus according to the embodiment of the present invention includes a data collector 210, a group behavior classifier 220, a similarity analyzer 230, and a botnet reporter 240.

데이터 수집기(210)는 센서와 연결되어 있어 센서로부터 네트워크를 통해 전달되는 데이터를 모으는 일을 수행한다. 센서는 TCP 트래픽 데이터 및 UDP 트래픽 데이터를 모아서 저장하였다가 또는 실시간으로 데이터 수집기(210)에게 전달한다. TCP 트래픽 데이터는 IRC 트래픽 데이터(TCP 6667번 포트 등)를 포함하며, UDP 트래픽 데이터는 DNS 트래픽 데이터(UDP 53번 포트 등)를 포함한다. 센서가 여러 곳에 걸쳐 존재 할수록 봇넷의 탐지 효과를 높일 수 있고, 일반적으로 특정 네트워크의 메인라우터 또는 보조 DNS(캐쉬서버)의 앞에서 존재한다. 수집되는 데이터는 데이터 수집기(210)가 전달 받기 적합한 데이터 구조를 갖고, UDP 소켓 또는 TCP 소켓으로 전달된다. The data collector 210 is connected to the sensor to collect data transmitted from the sensor through the network. The sensor collects the TCP traffic data and the UDP traffic data and stores them or delivers them to the data collector 210 in real time. TCP traffic data includes IRC traffic data (TCP 6667, etc.), UDP traffic data includes DNS traffic data (UDP 53, etc.). The more the sensor exists in several places, the more effective the detection of the botnet, and usually exists in front of the main router or secondary DNS (cache server) of a particular network. The collected data has a data structure suitable for the data collector 210 to receive and is transferred to a UDP socket or a TCP socket.

그룹행위 분류기(220)는 데이터 수집기(210)로부터 전달받은 데이터를 검토하여 트래픽의 그룹행위에 따라 호스트 그룹을 만들어 자료구조에 저장하고, 그룹에 속한 호스트 리스트를 유사도 분석기(230)로 전달하는 역할을 한다. 즉, 그룹행위 분류기(220)는 행위대상(Object) 및 그룹행위을 기준으로 호스트 그룹을 찾아 그룹에 속한 호스트 리스트를 자료구조에 저장한다. 자료구조로는 연결 리스트 또는 해쉬 테이블(hash table)을 이용할 수 있다. 연결 리스트는 메모리를 적게 사용하는 장점을 지니나 유사도 측정에 시간이 많이 걸리고, 해쉬 테이블은 메모리를 많이 사용하는 단점이 있으나, 유사도 측정에 시간이 적게 걸리는 장점을 갖는다.The group behavior classifier 220 reviews the data received from the data collector 210, creates a host group according to the group behavior of traffic, stores the data in a data structure, and delivers the host list belonging to the group to the similarity analyzer 230. Do it. That is, the group behavior classifier 220 finds a host group based on the object and the group behavior and stores the host list belonging to the group in the data structure. The data structure can be a linked list or a hash table. Although the linked list has a merit of using less memory, the similarity takes a lot of time to measure the similarity, and the hash table has a disadvantage of using a lot of memory, but the merit of taking the similarity takes less time.

유사도 분석기(230)는 그룹행위 분류기(220)로부터 전달받은 그룹들 중에서 동일한 행위대상에 대한 동일한 그룹행위를 한 인접한 단위 시간의 두 개의 그룹에 대해서 두 개의 그룹에 속한 호스트들이 얼마나 서로 일치하는지를 계산하여 두 개의 그룹간의 유사도(Similarity)를 측정한다. 측정된 유사도가 유사도 임계값(threshold)을 넘으면 두 개의 그룹 중 적어도 하나의 그룹에 속한 호스트들을 봇넷으로 판단하여 봇넷에 속한 호스트 리스트 및 봇넷의 행위대상을 봇넷 리포터(240)로 보낸다. 그리고, 유사도가 λ_s-δ<S< λ_s인 경우, 두 개의 그룹 중 적어도 하나에 속한 호스트들을 의심되는 그룹으로 결정하고 의심되는 그룹에 속한 호스트 리스트 및 의심되는 그룹의 행위대상을 봇넷 리포터(240)로 전달한다. 여기서 여기서 λs 는 유사도의 임계값, δ는 임의의 유사도의 오차범위 한계 값이다.The similarity analyzer 230 calculates how much the hosts belonging to two groups match each other for two groups of adjacent unit time having the same group behavior for the same behavior among the groups received from the group behavior classifier 220. The similarity between two groups is measured. When the measured similarity exceeds the similarity threshold, the hosts belonging to at least one of the two groups are determined as the botnet, and the host list belonging to the botnet and the action target of the botnet are sent to the botnet reporter 240. And, if the similarity is λ _s -δ <S <λ _s , the host belonging to at least one of the two groups is determined as the suspect group, and the host list belonging to the suspect group and the action target of the suspect group are determined by the botnet reporter ( 240). Where λs is the threshold of similarity and δ is the margin of error of any similarity.

봇넷 리포터(240)는 유사도 분석기(230)로부터 전달 받은 봇넷에 속한 호스트 리스트 및 봇넷의 행위대상을 데이터베이스에 저장한다. 탐지 정확성을 높이기 위해 그룹과 행위 대상의 상관관계 분석(Correlation Analysis)를 수행할 수도 있다. 그리고, 봇넷 리포터(240)는 의심되는 그룹으로 판단된 그룹의 행위대상을 그룹행위 분류기(220)의 블랙 리스트로 전달한다. The botnet reporter 240 stores the host list belonging to the botnet received from the similarity analyzer 230 and the action target of the botnet in a database. Correlation Analysis between groups and targets can be performed to improve detection accuracy. In addition, the botnet reporter 240 transmits the action target of the group determined as the suspect group to the black list of the group action classifier 220.

이하, 본 발명의 실시예에 따른 집단행동 악성코드 검색방법에 대해 도 3을 참조하여 설명한다. 도 3은 본 발명의 실시예에 따른 집단행동 악성코드 검색방법의 순서도이다. Hereinafter, a method for searching collective behavior malicious code according to an embodiment of the present invention will be described with reference to FIG. 3. 3 is a flow chart of a group behavior malware search method according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 데이터 수집기(210)는 센서를 통해 데이터를 수집하여 그룹행위 분류기(220)로 전달한다(S310). As shown in FIG. 3, the data collector 210 collects data through the sensor and transmits the data to the group behavior classifier 220 (S310).

그룹행위 분류기(220)는 데이터 수집기(210)로부터 전달받은 데이터를 이용해 호스트 그룹을 작성한다(S320).The group behavior classifier 220 creates a host group using the data received from the data collector 210 (S320).

단위 시간 t 동안 검색 대상 네트워크에서 행위대상 O에 대한 행위 A를 한 호스트가 n_t개 있을 때, 수학식 1을 만족하면 n_t개의 호스트를 그룹 G(O,A)로 작성한다. When there are n _t hosts having act A for the act target O in the search target network for a unit time t, n _t hosts are created as a group G (O, A) when Equation 1 is satisfied.

여기서, N은 검색 대상 네트워크의 전체 호스트의 개수이고, λ는 임의의 임계값이다. Here, N is the total number of hosts of the search target network, and λ is an arbitrary threshold.

따라서, 호스트 그룹을 작성하기 위해서 필요한 요소는 n, λ, t, O이다.Therefore, elements necessary for creating a host group are n, lambda, t, and O.

탐지하고자 하는 봇넷의 규모 즉, 봇넷에 속한 봇의 개수가 b라면 검색 대상 네트워크에 탐지하고자 하는 봇넷에 속한 호스트가 존재할 확률(P_b)은 b/N이고, λ는 P_b이다. Scale to detect botnet That is, if the number of the robot belonging to the botnet b probability of a host belonging to the botnet to be detected in the search target network (P _b) is b / N, is a λ P _b.

그리고, 그룹행위가 나타나는 각 상황에 따라 다음과 같이 단위 시간 t와 행위 대상 O를 결정할 수 있다.The unit time t and the target object O can be determined as follows according to each situation in which the group action appears.

DNS 룩업 과정 또는 C&C 채널 이주 과정에서 관찰되는 DNS 트래픽의 경우를 살펴보면, 행위대상 O는 봇 호스트들이 도메인 네임 서버로 질의하는 DNS 쿼리(query)의 도메인 네임(domain name, DN)이고, 단위시간 t는 DNS TTL(time to live) 값을 고려하여 수학식 2과 같이 결정한다. In the case of DNS traffic observed during DNS lookup or C & C channel migration, the target object O is the domain name (DN) of the DNS query that bot hosts query the domain name server, and the unit time t Is determined by Equation 2 in consideration of the DNS TTL (time to live) value.

Min(TTL of DDNS)< t < Max(TTL of normal DNS)Min (TTL of DDNS) <t <Max (TTL of normal DNS)

C&C 트래픽은 연속적으로 발생하는 와치독 트래픽(Watchdog Traffic)과 일시적으로 발생하는 명령 및 제어 트래픽(Command and Control Traffic)을 포함한다.C & C traffic includes continuous watchdog traffic and temporary command and control traffic.

와치독 트래픽을 살펴보면, 행위대상 O는 TCP 트래픽의 도착지 IP 주소(destination IP address of TCP traffic) 또는 IRC의 핑퐁(Ping/Pong of IRC)이다. TCP 트래픽의 도착지는 IRC 서버이고, IRC의 핑퐁은 보통 90초 주기로 발생한다. 단위 시간 t는 Ping/Pong duration을 고려하여 수학식 3과 같이 결정한다.Looking at watchdog traffic, the action target O is a destination IP address of TCP traffic or a Ping / Pong of IRC. The destination of the TCP traffic is the IRC server, and the ping pong of the IRC usually occurs at 90 second intervals. The unit time t is determined as shown in Equation 3 in consideration of the Ping / Pong duration.

Min(Ping/Pong duration)< t < Max(Ping/Pong duration)Min (Ping / Pong duration) <t <Max (Ping / Pong duration)

일시적으로 발생하는 명령 및 제어 트래픽을 살펴보면, 행위대상 O는 TCP 트래픽의 출발지 IP 주소(source IP address of TCP traffic)이다. TCP 트래픽의 출발지는 IRC 서버이다. 명령 및 제어 트래픽이 어떤 주기로 발생할지 알 수 없으므로 임의의 시간 동안 발생하는 명령 및 제어 패킷의 양을 평가해서 명령 및 제어 트래픽이 어떤 분포를 따르는지 계산하여 이를 기반으로 단위 시간 t를 결정한다.Looking at the temporarily generated command and control traffic, the action target O is the source IP address of TCP traffic. The source of TCP traffic is the IRC server. Since it is impossible to know in what period the command and control traffic will occur, the amount of command and control packets that occur during any time is evaluated to calculate what distribution the command and control traffic follows and determine the unit time t based on this.

공격 트래픽(Attack traffic)은 DDoS 공격 트래픽과 스패밍 트래픽을 포함한다. 공격 트래픽을 살펴보면, 행위대상 O는 TCP/UDP/ICMP(Internet Control Message Protocol) 트래픽의 도착지 IP 주소(destination IP address of TCP/UDP/ICMP traffic)이다. TCP/UDP/ICMP 트래픽의 도착지는 공격 목표(Victim)이 다. 공격 트래픽이 어떤 주기로 발생할지 알 수 없으므로 임의의 시간 동안 발생하는 공격 패킷의 양을 평가해서 공격 트래픽 분포가 어떠한 분포를 따르는지 계산하여 이를 기반으로 단위 시간 t를 결정한다.Attack traffic includes DDoS attack traffic and spamming traffic. Looking at the attack traffic, the action target O is the destination IP address of TCP / UDP / ICMP traffic. The destination for TCP / UDP / ICMP traffic is the attack target (Victim). Since it is impossible to know in what period the attack traffic occurs, the unit time t is determined based on the calculation of the distribution of the attack traffic distribution by evaluating the amount of attack packets occurring at any time.

업데이트 트래픽을 살펴보면, 행위대상 O는 TCP 트래픽의 도착지 IP 주소 또는 도메인 네임이다. TCP 트래픽의 도착지는 업데이트 서버이다. 단위 시간 t는 봇넷의 코드 업데이트 주기를 기반으로 수학식 4와 같이 결정한다. Looking at the update traffic, the action target O is the destination IP address or domain name of the TCP traffic. The destination of TCP traffic is the update server. The unit time t is determined as shown in Equation 4 based on the code update period of the botnet.

Min(Update duration)< t < Max(Update duration)Min (Update duration) <t <Max (Update duration)

호스트 그룹을 작성할 때, 성능을 높이기 위해서 블랙 리스트 및 화이트 리스트를 이용한다. When creating host groups, use black lists and white lists to improve performance.

블랙 리스트는 봇넷으로 탐지되지는 않았으나 의심되는 그룹으로 (Suspicious Group)으로 구분된 그룹의 행위 대상이 기록된 리스트로서, 호스트 그룹 작성 시 블랙 리스트의 데이터를 우선적으로 처리한다. The black list is a list of action targets of the group classified as Suspicious Group, which is not detected by the botnet, but the data of the black list is processed first when the host group is created.

화이트 리스트는 널리 알려진 유명한 정상 도메인 및 IP들에 대해서 데이터 처리를 줄이기 위해서 준비하는 행위대상 리스트이다. 화이트 리스트의 행위대상에 대해서는 그룹행위를 구분하는 과정을 수행하지 않는다.The white list is a list of actions that are prepared to reduce data processing for well-known and well-known normal domains and IPs. The process of classifying group actions is not performed for the targets of the white list.

유사도 분석기(230)는 작성된 복수의 그룹간 유사도를 측정한다(S330).The similarity analyzer 230 measures similarity between the plurality of created groups (S330).

유사도 분석기(230)는 복수의 그룹들 중에서 동일한 행위대상에 대한 동일한 그룹행위를 한 두 개의 단위 시간의 두 개의 그룹에 속한 호스트들이 얼마나 서로 일치하는지를 계산하여 유사도(Similarity)를 측정한다. 또는 복수의 유사도를 구 해 평균값을 이용할 수도 있다.The similarity analyzer 230 measures similarity by calculating how the hosts belonging to two groups of two unit times having the same group behavior for the same behavior object among the plurality of groups match each other. Alternatively, a plurality of similarities may be obtained and an average value may be used.

즉, 그룹 G₁(O,A)은 단위시간 t₁동안 행위대상 O에 대해 그룹행위 A를 한 그룹이고, 그룹 G₂(O,A)은 단위시간 t₂동안 행위대상 O에 대해 그룹행위 A를 한 그룹이고, 그룹 G₁(O,A)과 그룹 G₂(O,A)에 동시에 속한 호스트가 얼마나 있는지를 고려하여 유사도를 구한다. 단위시간 t₁과 단위시간 t₂는 서로 인접한 단위시간일 수도 있고, 그렇지 않을 수도 있다. That is, group G ₁ (O, A) is a group that has performed group action A on the target object O for unit time t ₁ , and group G ₂ (O, A) has group action on the target object O for unit time t ₂ . A similarity is obtained by considering how many hosts belong to a group A and simultaneously belong to group G ₁ (O, A) and group G ₂ (O, A). The unit time t ₁ and the unit time t ₂ may or may not be adjacent unit times.

또는 그룹 G₃(O,A)은 단위시간 t₃동안 행위대상 O에 대해 그룹행위 A를 한 그룹이라고 할 때, 그룹 G₁(O,A)와 그룹 G₂(O,A)의 유사도를 구하고, 그룹 G₂(O,A)와 그룹 G₃(O,A)의 유사도를 구하고, 그룹 G₁(O,A)와 그룹 G₃(O,A)의 유사도를 각각 구하여 평균값을 이용할 수도 있다. Alternatively, group G ₃ (O, A) is a group of act A for the target object O for unit time t ₃ , and the similarity between group G ₁ (O, A) and group G ₂ (O, A) Obtain the similarity between group G ₂ (O, A) and group G ₃ (O, A), and obtain the similarity between group G ₁ (O, A) and group G ₃ (O, A), respectively, and use the average value. have.

그룹 G₁(O,A)와 그룹 G₂(O,A)의 유사도는 수학식 5를 이용하여 구할 수 있다. The similarity between the group G ₁ (O, A) and the group G ₂ (O, A) can be obtained using Equation 5.

여기서, |G|는 그룹 G에 속한 호스트의 개수이고, |G|≠0이다. G₁∩G₂는 그룹 G₁(O,A)와 그룹 G₂(O,A)에 동시에 속한 호스트들로 이루어진 그룹을 의미한다. Where | G | is the number of hosts belonging to group G, and | G | ≠ 0. G ₁ ∩G ₂ means a group consisting of hosts belonging to group G ₁ (O, A) and group G ₂ (O, A) simultaneously.

유사도 분석기(230)는 수학식 5를 통해 구한 유사도가 유사도 임계값(λ_s)을 넘으면 그룹 G₁(O,A) 또는 그룹 G₂(O,A)에 속한 호스트들을 봇넷으로 판단한다(S340).The similarity analyzer 230 determines that hosts belonging to the group G ₁ (O, A) or the group G ₂ (O, A) are botnets when the similarity obtained through Equation 5 exceeds the similarity threshold λ _s (S340). ).

유사도가 1에 가까울수록 봇넷일 확률이 높고, 0에 가까울수록 정상 그룹일 확률이 높아진다. 일반적으로 봇넷은 정상 그룹은 0~0.3 이내의 유사도 값을 갖는다. The closer to 1, the more likely it is to be a botnet, and the closer to 0, the more likely it is to be a normal group. In general, botnets have similarity values within 0 to 0.3 for normal groups.

유사도 분석기(230)는 수학식 5를 통해 구한 유사도가 λ_s-δ보다 크고 λ_s보다 작은 경우, G₁(O,A)와 그룹 G₂(O,A)에 동시에 속한 호스트들로 이루어진 그룹을 의심되는 그룹으로 결정하고, 의심되는 그룹의 행위대상을 블랙 리스트에 추가한다. 는 임의의 유사도의 오차범위 한계 값이다.The similarity analyzer 230 is a group consisting of hosts simultaneously belonging to G ₁ (O, A) and group G ₂ (O, A) when the similarity obtained through Equation 5 is greater than λ _s −δ and less than λ _s. Is determined to be a suspect group, and the suspect group's action is added to the black list. Is the margin of error of any similarity.

본 발명의 실시예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하기 위한 프로그램, 그 프로그램이 기록된 기록 매체 등을 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시예의 기재로부터 본 발명이 속하는 기술분야의 전문가라면 쉽게 구현할 수 있는 것이다.Embodiments of the present invention are not implemented only through the above-described apparatus and / or method, but may be implemented through a program for realizing a function corresponding to the configuration of the embodiments of the present invention, a recording medium on which the program is recorded, and the like. Such implementations may be readily implemented by those skilled in the art from the description of the above-described embodiments.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

도 1은 봇넷의 라이프 사이클을 나타낸 도면이다. 1 is a diagram illustrating the life cycle of a botnet.

도 2는 본 발명의 실시예에 따른 집단행동 악성코드 검색 장치의 구조도이다.2 is a structural diagram of a collective behavior malware search apparatus according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 집단행동 악성코드 검색방법의 순서도이다.3 is a flow chart of a group behavior malware search method according to an embodiment of the present invention.

Claims

Collecting traffic data transmitted and received on the search target network;

Reviewing the traffic data, a plurality of groups based on a group action, which is an action performed by different hosts per unit time in the same way, with respect to predetermined traffic data, and a target of action, which is information in which group actions are performed among the information constituting the traffic data. Creating a host group;

Measuring similarity between the plurality of host groups indicating how much the hosts belonging to each of the plurality of host groups match; And

Using the similarity to find a group behavior malicious code,

The group action is an action of querying DNS during a DNS (domain name server) lookup process,

The traffic data is DNS traffic, and the action target is a domain name of a DNS query,

Collective behavior malware detection method.

Collecting traffic data transmitted and received on the search target network;

Using the similarity to find a group behavior malicious code,

The group activity is an act of periodically sending and receiving ping and pong messages used by IRC (Internet Relay Chat) to notify hosts of C & C channels. ,

The traffic data is watchdog traffic, and the action target is an IP (internet protocol) address of the traffic data;

Collective behavior malware detection method.

Collecting traffic data transmitted and received on the search target network;

Using the similarity to find a group behavior malicious code,

The group action is a distributed denial of service attack,

The traffic data is attack traffic, and the action target is an IP (internet protocol) address of a destination of the traffic data,

Collective behavior malware detection method.

The method according to any one of claims 1 to 3

Creating the plurality of host groups

Counting the number of hosts that have performed the same group action for the same action object that is information related to predetermined traffic data on the search target network for a unit time; And

And if the value obtained by dividing the number of hosts by the total number of hosts belonging to the search target network is equal to or greater than a threshold value, creating hosts having the same group behavior as one group.

5. The method of claim 4,

And the threshold is a probability that the collective behavioral malware exists in the search target network.

5. The method of claim 4,

If the traffic data is DNS (Domain Name Server) traffic, the unit time is determined according to DNS TTL (time to live).

The method of claim 6,

The unit time is larger than the minimum value of the TTL (time to live) of the dynamic DNS and less than the maximum value of the TTL (time to live) of the normal DNS collective behavior malware detection method.

5. The method of claim 4,

If the traffic data is watchdog traffic, the unit time is greater than the minimum value of the ping-pong period and less than the maximum value of the ping-pong period. Collective behavior malware detection method.

5. The method of claim 4,

If the traffic data is attack traffic, the unit time is determined based on the distribution followed by the attack traffic. How to detect malware.

The method according to any one of claims 1 to 3,

The similarity is a value obtained by dividing the number of hosts belonging to a first group and a second group among the plurality of host groups by the number of hosts belonging to the first group and the host belonging to the first group and the second group simultaneously. The sum of the number divided by the number of hosts belonging to the second group divided by two,

A group action in which the first group and the second group are groups that have performed the same group action on the same object How to detect malware.

The method of claim 10,

Determining that the hosts belonging to the first group or the second group are hosts infected with the collective behavioral malware when the similarity exceeds a predetermined similarity threshold. Collective behavior malware detection method.

The method of claim 11,

Determining hosts belonging to the first group or the second group as a suspect group when the similarity is greater than the difference between the similarity threshold and the similarity error range limit value and less than the similarity threshold; And

And adding the same action target of the first group and the second group to the blacklist.

The method of claim 12,

The step of creating the plurality of host groups is a group behavior malware search method using the black list.

First means for collecting traffic data transmitted and received on a search target network;

Reviewing the traffic data collected by the first means, based on the action target which is the group action performed by the different hosts per unit time for the predetermined traffic data and the group action is performed among the information constituting the traffic data Second means for creating a plurality of host groups; And

And a third means of measuring similarity indicating how closely the hosts belonging to the first group and the second group among the plurality of host groups match, and using the similarity to find a group behavior malicious code.

The first group and the second group are groups that have performed the same group action with respect to the same action object,

The group actions include host scanning, port scanning, worm code propagation, bot code downloading, querying DNS during the DNS (domain name server) lookup process, and hosts joined to the Command and Control (C & C) channel. Periodically send and receive ping and pong messages used by Internet Relay Chat (IRC) to announce their channel connection status, distributed denial of service attacks, and spamming One of the

If the traffic data is DNS traffic, the action target is a domain name of a DNS query,

When the traffic data is attack traffic, the action target is an IP (internet protocol) address of the destination of the traffic data,

If the traffic data is attack traffic, the action target is the IP address of the destination of the traffic data,

If the traffic data is command and control traffic, the action target is the source IP address of the traffic data,

Collective behavior malware search device.

The method of claim 14,

The host group is

Collective Behavior Malware Detection Device, which is a group of hosts whose threshold is equal to or greater than the same group behavior for the same time.

The method of claim 15,

The similarity is obtained by dividing the number of hosts belonging to the first group and the second group by the number of hosts belonging to the first group and the number of hosts belonging to the first group and the second group simultaneously. Group behavior malware detection device that is the sum of the value divided by the number of hosts in the group 2 divided by 2.

The method of claim 16

And wherein the third means determines the hosts belonging to the first group or the second group as hosts infected with the group behavior malware when the similarity exceeds a predetermined similarity threshold.