KR20100098242A

KR20100098242A - System and method for spamming botnet by detecting and intercepting botnet

Info

Publication number: KR20100098242A
Application number: KR1020090017304A
Authority: KR
Inventors: 강재우; 최재훈; 신한준; 석원식; 송동수
Original assignee: (주)다우기술
Priority date: 2009-02-27
Filing date: 2009-02-27
Publication date: 2010-09-06
Also published as: KR101048159B1

Abstract

PURPOSE: A system and a method for detecting and blocking a botnet are provided to enable a target URL to configure a list for the same IPs about a group mail traffic, thereby easily distinguishing a botnet corresponding to the list. CONSTITUTION: A storage unit(102) stores IP lists which are matched with botnet identification information. An IP filtering unit(103) loads an IP list of a specific botnet. The IP filtering unit filters mail traffic in which an IP of the IP list is included. A detecting unit(101) compares an e-mail log with the botnet identification information. If botnet activity more than a preset level is detected, the detecting unit controls the IP list to be loaded in the IP filtering unit.

Description

System and method for spamming botnet by detecting and intercepting botnet}

본 발명은 봇넷에 의한 스팸의 대량발송을 IP를 근거로 미리 차단하는 동시에 봇넷의 멤버쉽 규모를 효과적으로 판단할 수 있는 봇넷 탐지 및 차단 시스템에 대한 것으로서, 더욱 상세히는 봇넷이 공통 타겟으로 삼는 URL의 해쉬 값을 근거로 IP들에 대하여 리스트를 작성하고 상기 리스트를 근거로 필터링함과 동시에 봇넷의 멤버쉽 규모를 용이하게 파악할 수 있는 봇넷 탐지 및 차단 시스템에 대한 것이다.The present invention relates to a botnet detection and blocking system that can effectively block the mass sending of spam by botnets on the basis of IP, and can effectively determine the size of membership of the botnet. The present invention relates to a botnet detection and blocking system that can make a list of IPs based on a value, filter based on the list, and at the same time easily identify the membership size of a botnet.

봇넷이 컴퓨터 과학 분야에서 전 세계적인 문제로 떠오르고 있다. 악성 코드에 감염된 일반 사용자들의 PC(= compromised PC or Bot)들이 네트워크로 연결되어 하나의 일괄적인 행동을 동시다발적으로 수행하는 것을 봇넷이라고 하는데, 도 1에 도시된 바와 같이 수백 만대의 감염 PC들(bot)이 소수의 조종자(bot-master)에 의해 임의대로 조작되는 것이 대표적인 특징이다. 수백 만대의 PC가 개인의 의지로 조작 가능하므로 악용될 소지가 매우 높고 그 파괴력도 어마어마한 것이 사실이다. 실제로 봇넷(Botnet)은 주로 대량 스팸 메일 발송, 분산된 서비스 거부 공격(DDoS), 악성 코드의 확산을 목적으로 사용되며 전 세계적으로 인터넷이 보편화된 요즘 각종 online service에 막대한 피해를 입히고 있다.Botnets are emerging as a global problem in computer science. It is called a botnet that multiple PCs (= compromised PC or Bot) infected by malicious code are connected to a network and perform a batch of actions simultaneously. As shown in FIG. 1, millions of infected PCs A typical feature is that the bot is manipulated arbitrarily by a few bot-masters. Since millions of PCs can be operated at will of the individual, it is very likely to be exploited and its destructive power is enormous. In fact, Botnet is mainly used for mass spam mailing, distributed denial of service attacks (DDoS), and spread of malicious codes, and it is causing great damage to the various online services that the Internet is widely used worldwide.

봇넷들이 여러 방면에서 악영향을 끼치고 있다고는 하지만 이를 잡아내는 것은 결코 쉬운 일이 아니다. 봇넷의 핵심적인 특성인 "분산성"과 "유저 의존성" 때문이다. 하나 혹은 소수의 PC들이 같은 태스크(task: 악의적 행동)를 반복적으로 수행한다면 이를 찾아내는 일은 크게 어렵지 않다. 하지만 수백 만대의 PC가 같은 태스크를 1~2번씩 수행한다면 이 태스크를 수행하는 주체를 찾아내는 일은 매우 어려울 것이다. 심지어는 그런 악의적 태스크가 수행되고 있는지 조차 모를 수도 있다. 또 하나 고려해야 할 점은 태스크에 동원되는 밧(bot)들은 일반 유저들의 개인 PC라는 것이다. 이들은 자발적으로 태스크에 참여하는 것이 아니기 때문에 단순히 조종자가 명령을 내리는 순간에 컴퓨터가 켜져 있을 경우에만 태스크에 참여하게 된다. 쉽게 말해 각각의 밧들이 언제 켜져 있고(참여), 언제 꺼져(비 참여) 있을지 아무도 예측할 수 없다.Although botnets are badly affected in many ways, catching them is no easy task. This is due to the core characteristics of botnets, "dispersibility" and "user dependency". If one or a few PCs perform the same task repeatedly, it's not difficult to find them. But if millions of PCs perform the same task once or twice, it will be very difficult to find out who is doing it. You may not even know whether such malicious tasks are being performed. Another thing to consider is that the bots used for the tasks are the personal PCs of the average user. Because they do not voluntarily participate in the task, they only participate in the task if the computer is turned on at the moment the operator gives the command. In other words, no one can predict when each bat will be on (participate) and when (off) it will be off.

이렇듯 봇넷은 매우 광범위하고 불규칙적이며 예측이 힘든 특징을 가지고 있으면서 인터넷 환경에 미치는 악영향은 매우 크기 때문에 최근 컴퓨터 과학 분야에서 핫 이슈로 떠오르는 분야이다.As such, botnets are very widespread, irregular, and unpredictable, and have a significant impact on the Internet environment.

봇넷에 관련된 연구는 네트워크, 보안, 데이터 마이닝 등 다양한 분야에서 지금까지도 활발하게 이루어 지고 있다. 하지만 그 연구들이 봇넷의 정확한 크기와 행동패턴(behavioral pattern), 전개패턴(evolution pattern)등을 명확하게 밝혀내 지는 못하고 있는 실정이다.Research on botnets has been actively conducted in various fields such as network, security, and data mining. However, the studies do not clarify the exact size, behavioral pattern, and evolution pattern of botnets.

초기의 연구는 대부분 네트워크 패킷 플로우(network packet flow)에 집중되어 있었다. 그 중에는 조종자가 봇넷에게 명령을 내리는 C&C(Command and Control) 서버에 직접 들어가 메시지 전송 체계와 botnet의 규모를 확인했던 IRC Infiltration 방식, 그리고 DNS Lookup information과 같이 external trace를 이용해 봇넷의 특성을 알아보는 연구가 있었다.Early research was mostly focused on network packet flow. Among them, the controller directly enters the command and control (C & C) server that commands the botnet, and studies the characteristics of the botnet using external traces such as IRC infiltration method that checks the message transmission system, botnet size, and DNS Lookup information. There was.

하지만 네트워크 트래픽에 관련된 연구만으로 정확한 봇넷의 행동 패턴을 파악하기엔 부족하였다. 인터넷 시스템 구성상 PC들(즉 bot-master와 bot들)이 주고 받는 패킷의 정확한 내용을 파악하기 힘들기 때문이다. 현실적으로 네트워크 단에서 PC들이 주고 받는 패킷을 모두 열어 확인해 본다는 것은 불가능한 일이다.However, research on network traffic was not enough to determine the exact botnet behavior patterns. This is because it is difficult to know the exact contents of packets sent and received by PCs (ie bot-master and bots). In reality, it is impossible to open and check all packets sent and received by PCs at the network end.

최근 봇넷에 대한 연구는 네트워크 트래픽에서 스팸 메일 로그 분석으로 많이 옮겨 오는 추세이다. 메일 로그를 이용한 연구는 메일의 내용을 (네트워크 패킷에 비해) 손쉽게 확인할 수 있고, 메일의 최종 발신자와 메일 수신자를 정확하게 파악할 수 있다. 이로 인해 IP들의 집단적 움직임을 보다 명확하게 파악할 수 있고, 메일 내용의 유사성을 통해 IP들의 특정 봇넷 멤버쉽 여부를 확실하게 잡아낼 수 있다는 장점이 있다.Recently, research on botnets has shifted from network traffic to spam log analysis. A study using mail logs makes it easy to see the content of mail (compared to network packets) and to pinpoint the sender and recipient of the mail. As a result, the collective movement of IPs can be more clearly identified, and the similarity of mail contents can reliably detect whether or not specific IPs have a specific botnet membership.

이메일 로그를 이용한 최초의 연구는 target URL의 일치를 통해 봇넷을 찾아보았던 연구가 있었다. 하지만 이들이 지적한 pivot point가 되는 IP의 역할을 밝히지 못했던 점이 아쉬움으로 남는다. 스팸 메일의 발송 특징들을 클러스터링해 봇넷을 탐색했던 연구는 메일의 내용 확인을 통한 정확한 봇넷 멤버쉽의 구분 없이 IP의 활동 패턴만을 분석하여, IP가 봇넷의 중복 감염이나 2개 이상의 봇넷의 동시 활동에 대한 대비가 없었던 것이 아쉬웠다.The first study using e-mail logs was one that looked up botnets by matching target URLs. However, it remains unfortunate that they did not reveal the role of IP as the pivot point. A study that explored botnets by clustering spam mailing features analyzes only the activity patterns of IP without identifying the exact botnet membership by checking the contents of the mail. It was a shame that there was no contrast.

이후 메일 로그를 통한 봇넷 연구는 더욱 활발해지고 발전되었는데, 그 중 하나가 메일 로그 분석에 dynamic IP의 영향력을 파악하려는 시도가 있었으나, aggressive sampling rate로 제약 받는 data 상황으로 한정되어 실용성이 낮았고, 메일의 특정 feature를 뽑아 signature의 개념을 도입하여 봇넷을 탐지하거나, AutoRE라는 spam filtering system을 제안하기도 하였다. 이 시스템은 유사한 URL 형태를 보이는 메일들을 한데 묶을 수 있는 regular expression을 사용하여 filtering rate를 높였지만 유사 URL에 대한 검색에는 한계가 있었다.Since then, botnet research through mail logs has become more active and developed. One of them attempted to grasp the influence of dynamic IP on the mail log analysis, but it was limited to the data situation constrained by aggressive sampling rate. We extracted specific features and introduced the concept of signature to detect botnets or suggested a spam filtering system called AutoRE. The system increased the filtering rate by using regular expressions that could group mails with similar URLs, but the search for similar URLs was limited.

이처럼 메일 로그를 사용한 봇넷 연구들이 활발하게 진행되고 있지만, 전체적으로 세밀한 봇넷의 행동 패턴은 좀처럼 파악하지 못하고 있는 실정이다.As such, botnet research using mail logs has been actively conducted, but the detailed behavior patterns of botnets are rarely understood.

따라서, 본 발명은 봇넷이 대량의 스팸 메일 발송에 동원된다는 점에 착안하여, 대량의 스팸 메일 로그를 통해 일부 필요한 정보를 추출하여 특정 봇넷을 용이하게 구분하고, IP와 해쉬를 이용하여 특정봇넷에 속한 IP들의 활동을 용이하게 차단할 수 있는 시스템 및 방법을 제공하는데 그 목적이 있다.Therefore, the present invention focuses on the fact that the botnet is mobilized to send a large amount of spam mails, and extracts some necessary information through a large amount of spam mail logs to easily distinguish the specific botnets, and uses the IP and the hash to identify the specific botnets. The purpose is to provide a system and method that can easily block the activity of the belonging IP.

또한, 본 발명은 대량의 메일 로그 속에서 봇넷에 의해 발송된 스팸 메일을 찾아내고 이들을 발송한 최종 발송자(bot)들의 유사 발송 행동을 파악해 각각의 봇넷 멤버쉽을 밝혀 각 봇넷의 활동을 효과적으로 차단하는 시스템 및 방법을 제공하는데 그 목적이 있다.In addition, the present invention finds spam mails sent by botnets in a large volume of mail logs, grasps similar sending behaviors of the final bots that send them, reveals each botnet membership, and effectively blocks the activity of each botnet. And to provide a method.

더하여, 본 발명은 각 봇넷들의 구체적인 규모와 스팸 발송 패턴을 찾아내고 시간의 흐름에 따른 봇넷의 확장 또는 소멸에 따라 자체적으로 시스템을 갱신하여 용이하게 대처할 수 있는 시스템 및 방법을 제공하는데 그 목적이 있다.In addition, it is an object of the present invention to find a specific size and spam sending pattern of each botnet, and to provide a system and method that can easily cope with by updating the system itself as the botnet expands or dies over time. .

또한, 본 발명은 특정 봇넷이 URL의 변형을 통해 웹페이지 링크를 우회하는 방법으로 스팸메일을 발송하더라도 용이하게 이를 탐지하여 해당 봇넷의 활동을 기록하고 차단할 수 있는 시스템 및 방법을 제공하는데 그 목적이 있다.In addition, the present invention provides a system and method that can easily detect and block the activity of the botnet by sending a spam mail in a way that bypasses the web page link through a modification of the URL. have.

상기한 과제를 달성하기 위하여 본 발명에 따른 적어도 하나 이상의 메일 서버와 연결되어 실시간 스트림(stream) 형태로 수신되는 메일 트래픽으로부터 이메 일 로그를 추출하여 봇넷에 의한 스팸메일을 차단하는 봇넷 행동 탐지 및 차단 시스템은, 봇넷식별정보와 매칭되는 IP 리스트를 저장하는 저장부와, 상기 저장부로부터 특정 봇넷의 IP 리스트를 로드하고, 상기 IP 리스트에 설정된 IP가 포함되는 메일 트래픽을 필터링하는 IP 필터링부와, 상기 이메일 로그를 추출하여 상기 저장부의 봇넷식별정보와 비교하여 기설정된 수준 이상으로 봇넷활동이 검출되면 상기 봇넷식별정보에 매칭되는 IP 리스트를 상기 저장부로부터 상기 IP 필터링부에 로드하도록 제어하는 탐지부로 구성될 수 있다.In order to achieve the above object, a botnet behavior detection and blocking for extracting an email log from a mail traffic received in a real time stream form in connection with at least one mail server according to the present invention to block spam mail by a botnet is performed. The system includes a storage unit for storing an IP list matching the botnet identification information, an IP filtering unit for loading an IP list of a specific botnet from the storage unit, and filtering mail traffic including an IP set in the IP list; Extracts the email log and compares the botnet identification information to the botnet identification information of the storage unit. Can be configured.

이때, 상기 봇넷식별정보는 봇넷이 타겟으로 하는 URL을 포함할 수 있으며, 상기 탐지부는 In this case, the botnet identification information may include a URL targeted by the botnet, the detection unit

를 통해 상기 P-Value 값이 기설정된 임계값 이하일 때 상기 봇넷식별정보에 매칭되는 IP 리스트를 저장부로부터 상기 필터링부에 로드하도록 하며, TN은 단위시간동안 측정된 유효 IP 개수이며, BN은 상기 IP 리스트의 IP 개수이며, IN은 단위시간당 입력 스트림을 통해 관찰된 모든 IP의 수이며, HIT는 IN 중 BN에 포함된 IP의 개수인 것을 특징으로 한다.When the P-Value value is less than or equal to a predetermined threshold through the IP list matching the botnet identification information from the storage unit to load from the filtering unit, TN is the number of effective IPs measured during the unit time, BN is the IP number of the IP list, IN is the number of all the IP observed through the input stream per unit time, HIT is characterized in that the number of IP included in the BN of IN.

또한, 상기 탐지부는 실시간으로 상기 P-Value 값을 계산하여 기설정된 임계값 이상으로 복귀하는 경우 상기 IP 필터링부를 제어하여 상기 특정 봇넷에 포함된 IP의 메일 트래픽을 필터링하지 않을 수 있다.In addition, when the detector calculates the P-Value in real time and returns to a predetermined threshold value or more, the detector may control the IP filtering unit to filter the mail traffic of the IP included in the specific botnet.

한편, 본 발명에 따른 봇넷 탐지 및 차단 시스템은 상기 IP 필터링부에 의해 필터링된 메일 트래픽의 로그에서 URL을 포함하는 샘플 해쉬를 추출하는 해쉬 샘플러부와, 상기 해쉬 샘플러부에서 추출된 샘플 해쉬를 근거로 상기 IP 필터링부에 필터링되지 않은 메일 트래픽의 해쉬와 비교하여, 일치하는 경우 필터링하고 상기 필터링부에 로드된 IP 리스트에 상기 필터링 되지 않은 메일 트래픽의 IP를 추가하여 갱신시키는 해쉬 필터링부를 더 포함할 수도 있다.Meanwhile, the botnet detection and blocking system according to the present invention is based on a hash sampler unit for extracting a sample hash including a URL from the log of the mail traffic filtered by the IP filtering unit, and a sample hash extracted from the hash sampler unit. Compared to the hash of the unfiltered mail traffic to the IP filtering unit, and further matches the hash filtering unit for filtering and updating by adding the IP of the unfiltered mail traffic to the IP list loaded in the filtering unit. It may be.

이때, 상기 IP 필터링부는 상기 해쉬 필터링부에 의해 갱신된 IP 리스트와 대응하는 상기 저장부에 저장된 IP 리스트를 갱신시킬 수 있다.In this case, the IP filtering unit may update the IP list stored in the storage unit corresponding to the IP list updated by the hash filtering unit.

또한, 상기 해쉬 필터링부는 상기 IP 필터링부에 로드된 IP 리스트에 대응하는 상기 저장부에 저장된 봇넷식별정보에 상기 샘플 해쉬에 대한 정보를 추가하여 갱신시킬 수 있다.The hash filtering unit may update the sample hash by adding the information about the sample hash to the botnet identification information stored in the storage unit corresponding to the IP list loaded in the IP filtering unit.

더하여 본 발명에 따른 봇넷 탐지 및 차단 시스템은, 상기 해쉬 필터링부를 통해 필터링되지 않은 메일 트래픽의 로그에서 URL을 포함하는 해쉬를 추출하여 동일한 해쉬를 가진 IP를 묶어 벡터를 생성하고, 벡터 사이의 연관도를 KL_Divergence를 통해 계산하여 상기 연관도가 기설정된 임계치 이하인 경우 각 벡터에 포함된 IP와 해쉬를 하나로 묶어 새로운 봇넷식별정보와 IP 리스트를 생성하여 상기 저장부에 갱신시키는 임시 해쉬 저장부를 더 포함할 수 있다.In addition, the botnet detection and blocking system according to the present invention extracts a hash including a URL from a log of unfiltered mail traffic through the hash filtering unit, generates a vector by tying an IP having the same hash, and the degree of association between the vectors. If the correlation is less than the predetermined threshold by calculating the KL_Divergence may further include a temporary hash storage unit for generating a new botnet identification information and IP list by updating the storage unit by combining the IP and the hash included in each vector into one have.

이때, 상기 임시 해쉬 저장부는 상이한 두 벡터에 포함된 단위 시간 동안 스팸 메일을 발송한 IP 개수의 분포, mail 개수의 분포 및 IP histogram을 확률분포로 변환하고, 각 벡터에서 동일한 종류의 확률분포를 각각 p와 q로서 D(p||q) + D(q||p)에 대입하여 각 확률분포에 대한 D(p||q) + D(q||p)의 KL_Divergence 값을 합산하여 상기 연관도를 구하며,In this case, the temporary hash storage unit converts a distribution of IP number, a distribution of mail number, and an IP histogram into a probability distribution for each unit time included in two different vectors, and converts the same type of probability distribution in each vector. Substitute the KL_Divergence values of D (p || q) + D (q || p) for each probability distribution by substituting D (p || q) + D (q || p) as p and q Finding degrees,

인 것을 특징으로 할 수 있다.

It can be characterized by.

또한, 상기 임시 해쉬 저장부는 상기 새로운 IP 리스트에 포함된 IP를 가진 메일 트래픽이 지속적으로 상기 해쉬 필터링부로부터 전송되어 In addition, the temporary hash storage unit is continuously sent from the hash filtering unit mail traffic having the IP included in the new IP list

를 통해 상기 P-Value 값이 기설정된 임계값 이하일 때 상기 새로운 IP 리스트와 봇넷식별정보를 상기 저장부에 갱신시키며, TN은 단위시간동안 측정된 유효 IP 개수이며, BN은 상기 IP 리스트의 IP 개수이며, IN은 단위시간당 입력 스트림을 통해 관찰된 모든 IP의 수이며, HIT는 IN 중 BN에 포함된 IP의 개수인 것을 특징으로 할 수 있다.Update the new IP list and botnet identification information to the storage unit when the P-Value value is less than or equal to a predetermined threshold through TN, the number of effective IPs measured during a unit time, and BN is the number of IPs of the IP list. IN may be the number of all IPs observed through the input stream per unit time, and HIT may be the number of IPs included in BN among INs.

한편, 상기 탐지부는 상기 저장부에 저장된 IP 리스트의 IP 마다 위험도를 부가하여, 상기 IP가 발송하는 메일 트래픽의 지속 여부에 따라 상기 위험도를 지수적으로 감소시켜 기설정된 임계치보다 낮은 경우에 해당하는 IP를 상기 IP 리스트에서 삭제할 수 있으며, 상기 위험도는 On the other hand, the detection unit adds a risk for each IP of the IP list stored in the storage unit, the exponentially decrease the risk according to whether or not the mail traffic sent by the IP exponentially lower than the predetermined threshold IP Can be deleted from the IP list, and the risk is

이며,

,

상기 R_i,t는 t시간에 i번 IP가 가지는 위험도이며, 상기 i는 상기 IP 리스트 에 포함된 IP의 번호이며, 상기 t는 시간이며, 상기 H는 상기 IP 리스트에 저장된 모든 IP가 발송하는 스팸 메일간격 분포도의 신뢰도 95% 구간이 되는 포인트(=반감기)인 것을 특징으로 한다.R _{i, t} is the risk that IP i has at time t, i is the number of IPs included in the IP list, t is time, and H is sent by all IPs stored in the IP list. Characterized in that the point (= half-life) is a 95% confidence interval of the distribution interval of spam mail interval.

본 발명의 다른 실시예에 따른 적어도 하나 이상의 메일 서버와 연결되어 실시간 스트림(stream) 형태로 수신되는 메일 트래픽으로부터 이메일 로그를 추출하여 봇넷에 의한 스팸메일을 차단하는 봇넷 행동 탐지 및 차단 시스템은, 봇넷식별정보와 매칭되는 IP 리스트를 저장하는 저장부와, 상기 이메일 로그를 추출하여 상기 봇넷식별정보와 비교하는 탐지부와, 상기 탐지부의 비교 결과 기설정된 수준 이상으로 봇넷 활동이 검출되면 상기 메일 트래픽을 수신하여 URL을 포함하는 해쉬를 추출하며, 이후 수신되는 메일 트래픽의 해쉬가 동일한 IP 끼리 묶어 벡터를 생성하며, 상이한 벡터 사이의 연관도를 KL_Divergence를 통해 계산하여 상기 연관도가 기설정된 임계치 이하인 경우 상기 벡터를 통합하여 생성된 새로운 IP 리스트와 상기 해쉬를 포함하는 봇넷식별정보를 상기 저장부에 갱신시키는 임시 해쉬 저장부를 포함할 수 있다.According to another embodiment of the present invention, a botnet behavior detection and blocking system connected to at least one mail server and extracting an email log from a mail traffic received in a real-time stream form to block spam mail by a botnet, A storage unit for storing an IP list matching the identification information, a detection unit for extracting the email log and comparing the botnet identification information, and if the botnet activity is detected to be higher than a predetermined level as a result of the comparison of the detection unit, the mail traffic is detected. Receives and extracts a hash including a URL, and then, a hash of received mail traffic is generated by combining the same IPs together, and calculating a correlation between different vectors through KL_Divergence, where the association is less than a predetermined threshold. New IP list created by integrating vector and botnet containing the hash It may include hash temporary storage unit for updating the specific information in the storage unit.

이때, 상기 임시 해쉬 저장부는 상기 새로운 IP 리스트에 포함된 IP를 가진 메일 트래픽이 지속적으로 상기 해쉬 필터링부로부터 전송되어 In this case, the temporary hash storage unit is continuously sent mail traffic having the IP included in the new IP list from the hash filtering unit

또한, 상기 임시 해쉬 저장부는 상기 새로운 IP 리스트의 IP 마다 위험도를 부가하여, 상기 IP가 발송하는 메일 트래픽의 지속 여부에 따라 상기 위험도를 지수적으로 감소시켜 기설정된 임계치보다 낮은 경우에 해당하는 IP를 상기 IP 리스트에서 삭제할 수 있으며, 상기 위험도는 In addition, the temporary hash storage unit adds a risk for each IP of the new IP list, exponentially decreases the risk according to whether the mail traffic sent by the IP is continued, and selects an IP corresponding to a case lower than a preset threshold. Can be deleted from the IP list, and the risk is

이며,

,

상기 R_i,t는 t시간에 i번 IP가 가지는 위험도이며, 상기 i는 상기 IP 리스트에 포함된 IP의 번호이며, 상기 t는 시간이며, 상기 H는 상기 IP 리스트에 저장된 모든 IP가 발송하는 스팸 메일간격 분포도의 신뢰도 95% 구간이 되는 포인트(=반감기)인 것을 특징으로 한다.R _{i, t} is the risk that IP i has at time t, i is the number of IPs included in the IP list, t is time, and H is sent by all IP stored in the IP list. Characterized in that the point (= half-life) is a 95% confidence interval of the distribution interval of spam mail interval.

본 발명의 또 다른 실시예에 따른 적어도 하나 이상의 메일 서버와 연결되어 실시간 스트림(stream) 형태로 수신되는 메일 트래픽으로부터 이메일 로그를 추출하여 봇넷에 의한 스팸메일을 차단하는 방법은 봇넷의 활동 여부를 탐지하는 탐지부가 상기 이메일 로그에 포함된 해쉬를 기설정된 봇넷식별정보와 비교하는 제 1단계와, 상기 제 1단계의 결과 상기 해쉬가 봇넷식별정보와 비교하여 기설정된 수준 이상으로 봇넷 활동이 검출되면 상기 탐지부가 상기 봇넷식별정보에 매칭되는 IP 리스트를 저장하는 저장부를 제어하여 상기 IP 리스트를 IP에 대응하는 메일 트래 픽을 필터링하는 IP 필터링부에 로드하도록 하는 제 2단계와, 상기 탐지부가 상기 메일 트래픽을 상기 IP 필터링부에 전달하는 제 3단계와, 상기 IP 필터링부가 상기 IP 리스트를 근거로 상기 메일 트래픽의 IP와 비교하여 상기 메일 트래픽의 필터링 여부를 결정하는 제 4단계로 이루어질 수 있다.According to another embodiment of the present invention, a method for blocking spam mail by a botnet by extracting an email log from mail traffic received in a real-time stream in connection with at least one mail server detects whether a botnet is active. A first step in which the detection unit compares the hash included in the e-mail log with preset botnet identification information, and when the hash is detected above the predetermined level by comparing the hash with the botnet identification information as a result of the first step, A second step of detecting, by the detector, storing the IP list matching the botnet identification information so as to load the IP list into an IP filtering unit that filters the mail traffic corresponding to the IP; Transmitting the IP filtering unit to the IP filtering unit; and the IP filtering unit based on the IP list. A fourth step of determining whether to filter the mail traffic is compared with the IP of the mail traffic.

이때, 상기 봇넷식별정보는 봇넷이 타겟으로 하는 URL에 대한 해쉬 정보를 포함할 수 있으며, 상기 제 1단계는 상기 해쉬가 상기 봇넷식별정보와 일치하는 경우 상기 탐지부가In this case, the botnet identification information may include hash information about a URL targeted by the botnet, and the first step may include detecting if the hash matches the botnet identification information.

를 통해 상기 P-Value 값이 기설정된 임계값 이하일 때 상기 봇넷식별정보에 매칭되는 IP 리스트를 저장부로부터 상기 필터링부에 로드하도록 하는 단계를 더 포함하며, TN은 단위시간동안 측정된 유효 IP 개수이며, BN은 상기 IP 리스트의 IP 개수이며, IN은 단위시간당 입력 스트림을 통해 관찰된 모든 IP의 수이며, HIT는 IN 중 BN에 포함된 IP의 개수인 것을 특징으로 할 수 있다.And loading the IP list matching the botnet identification information from the storage unit to the filtering unit when the P-Value value is less than or equal to a predetermined threshold value, wherein TN is the number of effective IPs measured during a unit time. Where BN is the number of IPs in the IP list, IN is the number of all IPs observed through the input stream per unit time, and HIT is the number of IPs included in BN among INs.

또한, 상기 제 4단계는 상기 IP 필터링부가 상기 P-Value 값이 기설정된 임계값 이상으로 복귀하는 경우 상기 탐지부의 제어에 의해 상기 특정 봇넷에 포함된 IP의 메일 트래픽을 필터링하지 않는 것을 특징으로 할 수 있다.In the fourth step, the IP filtering unit does not filter the mail traffic of the IP included in the specific botnet under the control of the detection unit when the P-Value returns to a predetermined threshold value or more. Can be.

또한, 상기 봇넷 탐지 및 차단 방법은 로그에서 해쉬를 추출하는 해쉬 샘플러부가 상기 IP 필터링부에 의해 필터링된 메일 트래픽의 로그에서 URL을 포함하는 샘플 해쉬를 추출하는 제 5단계와, 상기 제 5단계에서 추출된 샘플 해쉬를 근거로 필터링하는 해쉬 필터링부가 상기 IP 필터링부에 필터링되지 않은 메일 트래픽의 해쉬를 상기 샘플 해쉬와 비교하여 일치하는 경우 필터링하고 상기 필터링부에 로드된 IP 리스트에 상기 필터링 되지 않은 메일 트래픽의 IP를 추가하여 갱신시키는 제 6단계를 더 포함할 수 있다.In addition, the botnet detection and blocking method includes a fifth step of the hash sampler extracting the hash from the log extracts a sample hash including the URL from the log of the mail traffic filtered by the IP filtering unit, and in the fifth step The hash filtering unit filtering based on the extracted sample hash filters the hash of the unfiltered mail traffic by comparing the sample hash with the sample hash and filters the unfiltered mail in the IP list loaded in the filtering unit. The method may further include a sixth step of updating and adding IP of the traffic.

이때, 상기 제 6단계에서 상기 IP 필터링부가 상기 해쉬 필터링부에 의해 갱신된 IP 리스트와 대응하는 상기 저장부에 저장된 IP 리스트를 갱신시키는 제 7단계를 더 포함할 수 있다.In this case, the sixth step may further include a seventh step of updating the IP list stored in the storage unit corresponding to the IP list updated by the hash filtering unit.

더하여, 상기 제 6단계에서 상기 해쉬 필터링부가 상기 IP 필터링부에 로드된 IP 리스트에 대응하는 상기 저장부에 저장된 봇넷식별정보에 상기 샘플 해쉬에 대한 정보를 추가하여 갱신시키는 제 8단계를 더 포함할 수 있다.In addition, in the sixth step, the hash filtering unit may further include an eighth step of updating and adding the information about the sample hash to the botnet identification information stored in the storage unit corresponding to the IP list loaded in the IP filtering unit. Can be.

한편, 상기 제 6단계에서 새로운 IP 리스트와 봇넷식별정보를 상기 저장부에 생성시키는 임시 해쉬 저장부가 상기 해쉬 필터링부를 통해 필터링되지 않은 메일 트래픽의 로그에서 URL을 포함하는 해쉬를 추출하여 동일한 해쉬를 가진 IP를 묶어 벡터를 생성하고, 벡터 사이의 연관도를 KL_Divergence를 통해 계산하여 상기 연관도가 기설정된 임계치 이하인 경우 각 벡터에 포함된 IP와 해쉬를 하나로 묶어 새로운 봇넷식별정보와 IP 리스트를 생성하여 상기 저장부에 갱신시키는 제 9단계를 더 포함할 수도 있다.Meanwhile, in the sixth step, the temporary hash storage unit generating the new IP list and the botnet identification information in the storage unit extracts the hash including the URL from the log of the unfiltered mail traffic through the hash filtering unit and has the same hash. Generate a vector by tying IPs, calculate the association between the vectors through KL_Divergence, and generate new botnet identification information and IP list by tying IP and hash included in each vector into one when the association is less than a predetermined threshold. The ninth step of updating to the storage may be further included.

이때, 상기 제 9단계는 상기 임시 해쉬 저장부가 상이한 두 벡터에 포함된 단위 시간 동안 스팸 메일을 발송한 IP 개수의 분포, mail 개수의 분포 및 IP histogram을 확률분포로 변환하고, 각 벡터에서 동일한 종류의 확률분포를 각각 p 와 q로서 D(p||q) + D(q||p)에 대입하여 각 확률분포에 대한 D(p||q) + D(q||p)의 KL_Divergence 값을 합산하여 상기 연관도를 구하는 단계를 더 포함하며,In this case, the ninth step converts the distribution of IP number, mail number distribution, and IP histogram into a probability distribution in which the temporary hash storage unit sends spam mails for a unit time included in two different vectors. The KL_Divergence of D (p || q) + D (q || p) for each probability distribution by substituting D (p || q) + D (q || p) as p and q, respectively. Summing to obtain the degree of association;

인 것을 특징으로 할 수 있다.

It can be characterized by.

또한, 상기 제 9단계는 상기 임시 해쉬 저장부가 상기 새로운 IP 리스트에 포함된 IP를 가진 메일 트래픽이 지속적으로 상기 해쉬 필터링부로부터 전송되어 In the ninth step, mail traffic having an IP included in the new IP list is continuously transmitted from the hash filtering unit.

를 통해 상기 P-Value 값이 기설정된 임계값 이하일 때 상기 새로운 IP 리스트와 봇넷식별정보를 상기 저장부에 갱신시키는 제 10단계를 더 포함하며, TN은 단위시간동안 측정된 유효 IP 개수이며, BN은 상기 IP 리스트의 IP 개수이며, IN은 단위시간당 입력 스트림을 통해 관찰된 모든 IP의 수이며, HIT는 IN 중 BN에 포함된 IP의 개수인 것을 특징으로 할 수 있다.The method further includes a tenth step of updating the new IP list and the botnet identification information to the storage unit when the P-Value is less than or equal to a predetermined threshold value, wherein TN is the number of effective IPs measured during a unit time, and BN Is the number of IPs in the IP list, IN is the number of all IPs observed through the input stream per unit time, HIT may be characterized in that the number of IPs included in the BN of IN.

본 발명은 봇넷에 의해 전송되는 집단 메일 트래픽에 대하여 메일의 내용을 일일이 확인하지 않고 타겟 URL이 동일한 IP를 집단으로 리스트를 구성하여, 상기 리스트에 대응하는 봇넷을 용이하게 구분할 수 있을 뿐 아니라 상기 IP를 근거로 봇넷을 파악하여 계산상 복잡도를 감소시킴으로써 신속한 봇넷의 차단이 이루어지 도록 하는 효과가 있다.According to the present invention, a list of IPs having the same target URL can be grouped together without checking the contents of the mails for the group mail traffic transmitted by the botnet, so that the botnets corresponding to the list can be easily distinguished. It is effective to identify botnets and reduce computational complexity so that botnets can be blocked quickly.

또한, 본 발명은 상기 봇넷에 속하지만 등록되지 않아 필터링 되지 않는 IP들에 대하여 상기 리스트에 포함된 IP들이 타겟으로 하는 URL을 추출하여 추가적인 상기 봇넷에 속한 IP들을 필터링함으로써 봇넷에 추후 포함되는 밧들도 필터링하여 차단할 수 있는 효과가 있다.The present invention also extracts URLs targeted by IPs included in the list for IPs that belong to the botnet but are not registered and are not filtered. It has the effect of blocking by filtering.

더하여, 본 발명은 상기 봇넷이 URL 변형(variation)을 통해 기등록된 타겟 URL과 동일한 웹페이지를 링크하면서도 상기 타겟 URL의 해쉬와 일치하지 않아 필터링되지 않은 IP들에 대해서도 상기 해쉬 값을 추출하여 필터링되지 않은 IP의 URL에 대한 해쉬와 비교하도록 하여 IP 뿐만 아니라 해쉬를 통한 복수의 필터링 과정을 통해 봇넷의 멤버쉽을 용이하게 파악할 수 있는 효과가 있다.In addition, the present invention, while the botnet is linked to the same web page with the target URL registered through the URL variation (variation), but does not match the hash of the target URL to extract and filter the hash value even for IPs that are not filtered By comparing with the hash of the URL of the IP that is not, there is an effect that the membership of the botnet can be easily identified through a plurality of filtering processes through the hash as well as the IP.

이외에도, 본 발명은 IP를 이용한 필터링부와 해쉬 값을 이용하는 필터링부 사이에 피드백 과정이 이루어지도록 함으로써 상기 리스트에 등록되지 않고 상기 필터링 과정 중에서 추가적으로 필터링 되는 IP들을 상기 리스트에 갱신시켜 봇넷의 확산에 대해 실시간으로 대응할 수 있는 자체 학습효과를 가지는 시스템을 제공할 수 있다.In addition, the present invention allows a feedback process between a filtering unit using an IP and a filtering unit using a hash value, so that IPs not additionally registered in the list and additionally filtered during the filtering process are updated in the list, thereby proliferating a botnet. It is possible to provide a system having a self-learning effect that can respond in real time.

또한, 본 발명은 IP 또는 URL을 포함하는 해쉬 값을 통해 필터링 과정을 진행함으로써 시스템 구성상 복잡도를 기존 시스템에 비해 크게 감소시킬 수 있으며, 이를 통한 시스템 구성 비용을 현저히 감소시킬 수 있는 효과가 있다.In addition, the present invention can significantly reduce the complexity of the system configuration compared to the existing system by performing a filtering process through the hash value including the IP or URL, thereby significantly reducing the system configuration cost.

본 발명에 따른 봇넷 탐지 및 차단 시스템은 적어도 하나 이상의 메일 서버와 연결되어 상기 메일 서버에 수신되는 메일 트래픽이 봇넷에 의한 스팸 메일인지를 파악하기 위해 제안된 것이다. The botnet detection and blocking system according to the present invention is proposed to determine whether the mail traffic received by the mail server is connected to at least one mail server and is spam mail by the botnet.

이때, 상기 메일 트래픽은 데이터가 실시간으로 끊임없이 생성되고 전송되는스트림(stream) 형식의 데이터로서, 본 발명은 상기 메일 트래픽을 실시간으로 분석하여 봇넷의 활동여부를 파악할 수 있을 뿐 아니라 상기 봇넷(botnet)에 속해 밧(bot)으로 활동하는 IP에 대한 정보를 효과적으로 필터링하여 차단할 수 있도록 한다.In this case, the mail traffic is data in the form of a stream in which data is constantly generated and transmitted in real time, and the present invention can analyze the mail traffic in real time to determine whether the botnet is active, as well as the botnet. Effectively filter and block information on IPs acting as bots.

본 발명에 따른 봇넷 탐지 및 차단 시스템은 상술한 바와 같이 스트림 데이터(stream data)에 초점을 맞추고 있으며, 도 2에 도시된 바와 같이 Input Log Stream에 입력되는 상기 메일 트래픽으로부터 메일 로그를 실시간으로 추출하여 시스템 처리 과정을 밟고 차단되지 않은 나머지 로그 스트림(log stream)을 포함하는 메일 트래픽은 봇넷에 의한 스팸 메일이 아닌 것으로 판단하여 시스템을 통과해 빠져나가도록 하거나, 본 시스템과 연결된 타 스팸 필터링 시스템과 연결하여 스팸 메일 판단을 받도록 할 수 있다.The botnet detection and blocking system according to the present invention focuses on stream data as described above, and extracts a mail log from the mail traffic input to the Input Log Stream in real time as shown in FIG. Mail traffic containing the rest of the log stream that has not gone through the system processing and is not blocked by the botnet will be passed through the system as it is not spam by the botnet, or connected with other spam filtering systems connected to the system. Can be judged spam.

본 발명에 따른 봇넷 탐지 및 차단 시스템의 구성을 도 2를 참고하여 상세히 살펴보면, 해쉬를 포함하는 봇넷식별정보와 매칭되는 IP 리스트를 저장하는 저장부(BN Repositroy)(102)와, 상기 저장부로부터 특정 봇넷의 IP 리스트를 로드하고 상기 IP 리스트에 설정된 IP가 포함되는 메일 트래픽을 필터링하는 IP 필터링부(IP Filter)(103)와, 상기 이메일 로그를 추출하여 상기 봇넷식별정보와 비교하고 일치하는 경우 상기 봇넷식별정보에 매칭되는 IP 리스트를 상기 저장부로부터 상기 IP 필터링부에 로드하도록 제어하는 탐지부(Out Breaking Detector)(101)로 구성될 수 있다.Looking at the configuration of the botnet detection and blocking system according to the present invention in detail with reference to Figure 2, the storage unit (BN Repositroy) 102 for storing the IP list matching the botnet identification information including the hash, and from the storage unit An IP filter 103 for loading an IP list of a specific botnet and filtering mail traffic including an IP set in the IP list, and extracting the email log to compare with the botnet identification information and match with each other; An out breaking detector 101 may be configured to control loading of the IP list matching the botnet identification information from the storage unit to the IP filtering unit.

상기 탐지부(101)는 IP의 집단 행동을 잡아내는 모듈로서, 특정 해쉬값을 가지고 스패밍(spamming)을 하고 있는 IP들이 집단적으로 대량의 스팸을 보내고 있는지를 파악한다. 다시 말해, 특정 봇넷이 특정 해쉬값을 포함하고 있는 스팸 메일의 대량 발송을 시작했는지를 검출한다.The detection unit 101 is a module that catches the collective behavior of the IP, and identifies whether IPs spamming with a specific hash value collectively send a large amount of spam. In other words, it detects whether a particular botnet has initiated mass sending of spam mail containing a specific hash value.

상기 특정 해쉬값은 스팸 메일이 지정하는 타겟 URL에 대한 해쉬값일 수 있으며, 상기 봇넷식별정보에 포함될 수 있다. 상기 탐지부(101)는 상기 봇넷식별정보를 스팸 메일을 필터링하는 다른 모듈로부터 획득할 수 있으며, 하기에서 설명하는 메일 트래픽에서 URL을 포함하는 해쉬 샘플을 추출하여 상기 봇넷식별정보로서 등록할 수도 있다.The specific hash value may be a hash value for a target URL designated by spam mail, and may be included in the botnet identification information. The detection unit 101 may obtain the botnet identification information from another module for filtering spam mails, and may extract and register a hash sample including a URL in the mail traffic described below as the botnet identification information. .

한편, 상기 탐지부(101)는 집단 발송을 시작했는지에 대한 판단을 P-value 값을 통해 알아낼 수 있는데, 상기 P-value는 통계학에서 사용하는 특정 기준 값으로 정상 행동 패턴(자연 발생적)을 유지할때는 일정 수치 이상을 유지하다가 이상 행동 패턴(인위적 조작)을 보이면 정해진 수치 이하로 떨어지게 된다.On the other hand, the detection unit 101 can find out whether or not to start the group sending through the P-value value, the P-value is a specific reference value used in statistics to maintain a normal behavior pattern (naturally occurring) When you keep above a certain value, if you show an abnormal behavior pattern (artificial manipulation), it falls below a certain number.

상기 P-value는 하기의 수학식 1에 의해 계산된다.The P-value is calculated by Equation 1 below.

TN : 전체 유효 IP 개수TN: Total Effective IPs

BN : 해당 봇넷의 IP 개수BN: Number of IPs in the botnet

IN : 단위시간당 입력 stream을 통해 관찰된 모든 IP의 수IN: the number of all IPs observed through the input stream per unit time

HIT : IN 중 BN에 포함된 IP의 개수HIT: Number of IPs included in BN among IN

상기 P-value 값의 의미는 시스템이 확보한 유효 IP들 중 현재 단위시간에 들어온 IP 개수 만큼 뽑았을 때, 특정 봇넷에 속한 IP의 개수가 입력 스트림으로 들어오는 메일 트래픽의 로그에서 발견된 만큼이 발생할 가능성을 의미한다. 다시 말해, 단위 시간 동안 들어온 입력 스트림속에서 관찰된 특정 봇넷 소속의 IP의 개수가 자연 발생적으로 가능한 수치인지를 판단하는 기준이 된다. 자연 발생적으로 가능한 수치이면 봇넷이 활동을 하지 않고 있는 것이고, 자연 발생적으로 가능한 수치가 아니면 인위적 조작에 의해 동시 집단적으로 활동하고 있다는 뜻이 되어 봇넷이 활동을 하고 있다고 판단하는 것이다.The meaning of the P-value value is that the number of IPs belonging to a specific botnet is found in the log of mail traffic coming into the input stream when the number of valid IPs obtained by the system is drawn in the current unit time. It means the possibility. In other words, it is a criterion for determining whether the number of IPs belonging to a specific botnet observed in the input stream for a unit time is a naturally possible value. If it is a naturally possible number, the botnet is not active, and if it is not a naturally possible value, it means that it is working collectively by artificial manipulation, and judges that the botnet is active.

상기 IP 필터링부(103)는 상기 탐지부에서 활동이 감지된 봇넷이 나타나면 상기 특정 봇넷의 IP 리스트(A)를 상기 저장부로부터 IP 필터링부에 올리게 된다. IP 리스트(A)가 상기 IP 필터링부(103)에 올라오게 되면, 상기 IP 필터링부(103)는 상기 IP 리스트(A)에 포함된 IP를 근거로 메일 트래픽을 필터링하게 된다.The IP filtering unit 103 uploads the IP list A of the specific botnet from the storage unit to the IP filtering unit when a botnet in which the activity is detected by the detection unit appears. When the IP list A comes up on the IP filtering unit 103, the IP filtering unit 103 filters the mail traffic based on the IP included in the IP list A.

이후, 상기 IP 필터링부(103)에 의해 필터링된 메일 트래픽과 그렇지 않은 메일 트래픽에 대하여, 해쉬값을 근거로 추가적인 필터링 과정을 진행할 수 있다. 이를 통해, 상기 메일 트래픽이 타겟으로 하는 URL의 해쉬값을 추출하여, URL의 변형(variation)을 막거나 상기 IP 리스트(A)에 등록되지 않아 상기 IP 필터링부(103)에 의해 필터링되지 않은 메일 트래픽을 상기 해쉬값을 근거로 필터링할 수 있다.Thereafter, an additional filtering process may be performed on the mail traffic filtered by the IP filtering unit 103 and the mail traffic not on the basis of the hash value. Through this, the hash value of the URL targeted by the mail traffic is extracted to prevent variation of the URL or not registered in the IP list A so that the mail is not filtered by the IP filtering unit 103. Traffic may be filtered based on the hash value.

이와 같은 해쉬를 통한 필터링 과정은 상기 IP 필터링부에 의해 필터링된 메일 트래픽의 로그에서 URL을 포함하는 샘플 해쉬를 추출하는 해쉬 샘플러부(hash sampler)와, 상기 해쉬 샘플러부에서 추출된 샘플 해쉬를 근거로 상기 IP 필터링부에 필터링되지 않은 메일 트래픽의 해쉬와 비교하여 일치하는 경우 필터링하고 상기 필터링부에 로드된 IP 리스트에 상기 필터링 되지 않은 메일 트래픽의 IP를 추가하여 갱신시키는 해쉬 필터링부(hash filter)를 통해 이루어질 수 있다.The filtering process through the hash is based on a hash sampler for extracting a sample hash including a URL from a log of mail traffic filtered by the IP filtering unit, and a sample hash extracted from the hash sampler. A hash filter for filtering when the IP filtering unit compares with the hash of unfiltered mail traffic and updates the IP of the unfiltered mail traffic by adding to the IP list loaded in the filtering unit. It can be done through.

상기 해쉬 샘플러부는 URL 변형(variation)을 잡아내는 것이다. URL 변형이란 봇넷이 스팸 메일을 발송할 때 기존 스팸 필터링 시스템에 쉽게 잡히지 않기 위해 타겟 URL을 여러 개의 변종으로 만들어 나누어 뿌리는 것을 말한다. 예를 들면 어떤 봇넷이 www.aaa.com 이라는 사이트를 광고하고자 하는데, 모든 스팸 메일에 www.aaa.com이라는 주소를 넣었다가 해당 URL이 블랙 리스트에 올라 버리게 되면 모든 광고가 소용이 없어지게 된다. 이를 피하기 위해 원래 구축된 웹 사이트의 주소(www.aaa.com) 이외에 다른 수십가지의 URL variation(abc.aaa.com, abd.aaa.com, acd.aaa.com 등)을 만들어 web page redirection을 통해 원래 목적의 사이트로 이동시키는 방법을 사용한다.The hash sampler section catches URL variations. URL modifications mean that when a botnet sends spam, the target URL is broken down into multiple variants to avoid being easily caught by existing spam filtering systems. For example, if a botnet wants to advertise a site called www.aaa.com, but puts the address www.aaa.com in every spam message and the URL is blacklisted, then all the ads are useless. To avoid this, create dozens of other URL variations (abc.aaa.com, abd.aaa.com, acd.aaa.com, etc.) in addition to the web site's original address (www.aaa.com). Use a method to move to the original site.

따라서, 봇넷의 URL 변형기술을 잡아 내기 위해 상기 해쉬 샘플러부는 상기 IP 필터링부에 의해 필터링된 IP를 가진 메일 트래픽의 샘플 해쉬를 추출하여 상기 해쉬 필터링부에 전송함으로써 상기 IP 리스트에 포함되지 않아 상기 IP 필터링부에 의해 필터링 되지 않은 IP를 가진 메일 트래픽에 대하여 필터링 할 수 있도록 한다.Therefore, in order to catch the URL modification technique of the botnet, the hash sampler unit extracts a sample hash of mail traffic having the IP filtered by the IP filtering unit and transmits the sample hash to the hash filtering unit so that the IP is not included in the IP list. Allows filtering of mail traffic with IPs not filtered by the filtering unit.

우선, 기본적인 해쉬 샘플러부(104)와 해쉬 필터링부(105)의 동작을 도 3(A)를 통해 상세히 설명하면, IP 필터링부(103)에 포함된 IP 리스트(110)에 IP B에 대한 정보가 포함되지 않아 IP B는 필터링되지 않는다. 그러나 상기 IP B는 상기 IP 리스트(110)에 포함된 상기 IP A와 동일한 URL을 포함하고 있어 동일한 봇넷에 포함되어 있을 가능성이 매우 높다.First, operations of the basic hash sampler 104 and the hash filter 105 will be described in detail with reference to FIG. 3 (A). The information on IP B is included in the IP list 110 included in the IP filter 103. Is not included, so IP B is not filtered. However, since the IP B includes the same URL as the IP A included in the IP list 110, it is highly likely that the IP B is included in the same botnet.

따라서, 상기 IP 필터링부(103)는 IP A를 포함하는 메일 트래픽을 필터링하여 상기 해쉬 샘플러부(104)에 전달하며, 상기 해쉬 샘플러부(104)는 상기 IP A를 포함하는 메일 트래픽의 메일 로그에서 URL(www.ABC.com)을 포함하는 해쉬를 추출하여 상기 해쉬 필터링부(105)에 전달하도록 한다.Accordingly, the IP filtering unit 103 filters the mail traffic including the IP A and delivers the mail traffic to the hash sampler 104, and the hash sampler 104 logs the mail traffic of the mail traffic including the IP A. Extracts a hash including a URL (www.ABC.com) and transmits the hash to the hash filtering unit 105.

이후, 상기 해쉬 필터링부(105)는 상기 해쉬 샘플러부(104)에서 전달된 해쉬 정보를 가지고, 상기 해쉬와 동일한 URL(www.ABC.com)에 대한 해쉬를 포함하는 상기 IP B에 대한 필터링을 수행하여 상기 IP B이 전송하는 메일 트래픽을 차단할 수 있다.Thereafter, the hash filtering unit 105 performs the filtering on the IP B including the hash of the same URL (www.ABC.com) as the hash with the hash information transmitted from the hash sampler 104. By doing so, it is possible to block mail traffic transmitted by the IP B.

이후, 상기 해쉬 필터링부(106)는 상기 IP B를 상기 IP 필터링부(103)에 있 는 IP 리스트(110)에 갱신시키고(①), 후속적으로 상기 IP B가 전송하는 메일 트래픽이 상기 IP 필터링부(103)에서 필터링될 수 있도록 할 수 있다. 더하여 상기 IP 필터링부(103)는 상기 해쉬 필터링부(105)에 의해 갱신된 IP 리스트(110)를 근거로 상기 저장부(102)에 저장된 IP 리스트(111)를 갱신시켜(②), 차후에 상기 IP B가 추가된 IP 리스트(111)에 대응하는 봇넷이 활동할 경우 상기 IP B에 의한 메일 트래픽 전송이 IP 필터링부(103)에서 필터링될 수 있도록 하여 프로세스 과정을 단축 시킴과 동시에 IP 리스트(111)를 실시간으로 시스템 자체적으로 갱신하여 효율적으로 봇넷의 활동을 차단할 수 있다.Thereafter, the hash filtering unit 106 updates the IP B to the IP list 110 in the IP filtering unit 103 (①), and subsequently, the mail traffic transmitted by the IP B is the IP. The filtering unit 103 may be filtered. In addition, the IP filtering unit 103 updates the IP list 111 stored in the storage unit 102 on the basis of the IP list 110 updated by the hash filtering unit 105 (2). When the botnet corresponding to the IP list 111 to which IP B has been added is active, mail traffic transmission by the IP B can be filtered by the IP filtering unit 103 to shorten the process and at the same time, the IP list 111. In real time, the system itself can be updated to effectively block botnet activity.

한편, 도 3(B)에 도시된 바와 같이 상기 해쉬 샘플러부(104)와 해쉬 필터링부(105)를 통해 상술한 URL 변형을 차단할 수 있는데, 조종자(bot-master)가 상기 IP 리스트(110)에 기등록된 IP C를 가지는 밧을 통해 IP A의 URL(www.ABC.com)을 AAA.ABC.com 으로 변형하여 스팸 메일을 전송하는 경우 상기 IP C는 상기 IP 필터링부(103)에 의해 필터링된다.Meanwhile, as illustrated in FIG. 3B, the above-described URL modification may be blocked through the hash sampler 104 and the hash filter 105, and a bot-master may block the IP list 110. When sending a spam mail by modifying the URL (www.ABC.com) of IP A to AAA.ABC.com through a bat having an IP C registered in the IP C, the IP filtering unit 103 Is filtered.

이후, 상기 IP C를 통해 상술한 바와 같이 해쉬 샘플러부(104)는 기존 URL의 변형 형태인 URL(AAA.ABC.com)을 추출할 수 있게 된다. 상기 변형 URL(AAA.ABC.com)은 상기 해쉬 샘플러부(104)에 의해 상기 해쉬 필터링부(105)에 등록되고, 상기 IP 필터링부(103)에 의해 필터링되지 않으면서 상기 IP A의 URL 정보와 동일하지 않아 상기 해쉬 필터링부(105)에 의해 필터링되지 않는 IP D에 대해서도 상기 변형 URL(AAA.ABC.com)를 추출하여 상기 해쉬 필터링부(105)에 등록시킴으로써 상기 IP D를 포함하는 메일 트래픽을 효과적으로 필터링할 수 있다. 더하 여, 상기 해쉬 필터링부(105)는 상기 IP C와 마찬가지로 상기 IP D도 상기 IP 필터링부(103)의 IP 리스트(110)에 등록시켜(①), 후발적으로 IP D에 의해 발송되는 메일 트래픽을 필터링하도록 하며, IP 필터링부(103)는 갱신된 IP D를 포함하는 IP 리스트(110)를 근거로 저장부(102)의 IP 리스트(111)를 갱신시켜(②) 다양한 URL 변형에 대하여 자체적인 확장이 이루어지도록 함으로써 효과적인 필터링 과정이 이루어지도록 할 수 있다.Thereafter, as described above through the IP C, the hash sampler 104 may extract a URL (AAA.ABC.com), which is a variation of the existing URL. The modified URL (AAA.ABC.com) is registered in the hash filtering unit 105 by the hash sampler 104, and the URL information of the IP A without being filtered by the IP filtering unit 103. Mail including the IP D by extracting the modified URL (AAA.ABC.com) and registering the modified URL (AAA.ABC.com) to the hash filtering unit 105 even for the IP D which is not the same as that not filtered by the hash filtering unit 105. You can effectively filter traffic. In addition, the hash filtering unit 105 registers the IP D in the IP list 110 of the IP filtering unit 103 similarly to the IP C (①), and mail traffic that is later sent by the IP D. The IP filtering unit 103 updates the IP list 111 of the storage unit 102 on the basis of the IP list 110 including the updated IP D (2). By enabling phosphor expansion, an effective filtering process can be achieved.

한편, 상기 해쉬 필터링부는 상기 해쉬 샘플러부로부터 전달받은 URL을 포함하는 해쉬를 봇넷식별정보로서 생성하여, 필터링된 IP가 포함된 IP 리스트에 매칭되도록 상기 저장부에 저장하도록 할수도 있어, 상기 탐지부가 상기 봇넷식별정보를 이용하여 변형 URL을 통해 메일 트래픽을 발생시킬 경우 초기에 탐지부에 의해 감지하도록 할 수 있다.Meanwhile, the hash filtering unit may generate a hash including the URL received from the hash sampler unit as botnet identification information and store the hash in the storage unit so as to match the IP list including the filtered IP. When the mail traffic is generated through the modified URL using the botnet identification information, the detection unit may be initially detected.

상술한 과정을 이용하여, 초기 집단 활동으로 감지되어 봇넷으로 판단된 IP 셋(set) 뿐만 아니라, 초기 셋과 같은 내용을 광고하는 후발 발송 IP들도 지속적으로 상기 IP 리스트에 갱신하여 하나의 멤버쉽으로 묶어낼 수 있어 보다 정확하게 봇넷의 멤버쉽을 찾아낼 수 있다. 또한, 상기 해쉬 샘플러부를 통해 모든 IP에 대하여 URL을 포함하는 해쉬 값을 추출하고, 상기 해쉬 필터링부에 갱신시키도록 함으로써 변형 URL에 대해서도 효과적인 필터링이 이루어질 수 있다.Using the above-described process, not only the IP set detected as the initial group activity and judged to be the botnet, but also the late-release IPs advertising the same content as the initial set are continuously updated to the IP list as one membership. This allows you to more accurately find botnet membership. In addition, by extracting a hash value including URLs for all IPs through the hash sampler and updating the hash filtering unit, effective filtering may be performed on the modified URL.

더하여, 상기 IP 필터링부는 상기 해쉬 샘플러부와 해쉬 필터링부를 통해 모든 IP가 발송하는 메일 트래픽에 대하여 피드백 과정을 거쳐 봇넷 여부를 재차 확인하여 IP 리스트에 등록되지 않은 IP를 추가적으로 상기 IP 리스트에 등록하여 갱 신시킴으로써, 실시간으로 활동하는 봇넷의 IP 멤버쉽을 보다 확실하게 찾아내고 차단할 수 있다.In addition, the IP filtering unit performs a feedback process on the mail traffic sent by all the IPs through the hash sampler unit and the hash filtering unit again to check whether or not the botnet is registered, and additionally registers an IP not registered in the IP list to the IP list. By doing so, you can more reliably find and block IP membership of botnets that are active in real time.

한편, 도 2의 상기 탐지부(101)는 상기 IP 필터링부와 해쉬 필터링부에 필터링되어 봇넷으로 판단된 IP 리스트 중 상기 IP 필터링부에 의해 기설정된 시간동안 상기 저장부에 갱신이 이루어지지 않거나 상기 IP 필터링부에 로드되지 않아 현재 활동이 끝났다고 판단되는 IP 리스트를 상기 저장부에서 따로 저장 및 관리할 수 있다. Meanwhile, the detector 101 of FIG. 2 is not updated or updated in the storage unit for a preset time by the IP filtering unit in the IP list determined by the IP filtering unit and the hash filtering unit and determined to be a botnet. The storage unit may separately store and manage an IP list that is determined to be finished because the current activity is not loaded in the IP filtering unit.

상기 탐지부는 저장되어 관리를 받는 모든 IP 리스트에 포함된 각각의 IP에 대하여 시간의 흐름에 따라 각각 위험도를 부여할 수 있다. 위험도란 봇넷 활동성 지수라고 볼 수 있는데, 위험도가 높으면 현재 혹은 가까운 미래에 다시 봇넷 활동에 참여할 수 있음을 나타내며, 위험도가 낮으면 앞으로 활동할 가능성이 낮다는 뜻이 된다. 상기 탐지부는 상기 위험도가 일정 기준 이하로 떨어지게 되면 더 이상 봇넷 활동에 참여하는 IP가 아니라고 판단되어 상기 저장부에 저장된 IP 리스트에서 삭제할 수 있다.The detector may assign a risk to each IP included in all IP lists stored and managed over time. Risk can be viewed as a botnet activity index, with a high risk indicating that you can rejoin botnet activity now or in the near future, and a low risk means you are unlikely to be in the future. The detection unit may determine that the risk is no longer an IP participating in the botnet activity when the risk falls below a predetermined criterion, and may delete it from the IP list stored in the storage unit.

상기 봇넷의 위험도 계산은 반감기를 사용하여 시간의 흐름에 따라 지수적으로 감소된다. 반감기는 상기 IP 리스트에 저장된 모든 IP 의 발송 패턴을 근거로 하여 구하게 되는데, 상기 IP 리스트에 저장된 특정 IP(spamming IP)가 발송하는 스팸메일 사이 간격(inter-arrival time)의 분포를 통해 2 SD(standard deviation)을 넘어서는 포인트가 반감기가 된다.The risk calculation of the botnet is exponentially reduced over time using a half-life. The half-life is calculated based on the sending pattern of all the IPs stored in the IP list. The half-life is determined based on the distribution of the inter-arrival time between spam mails sent by a specific IP stored in the IP list. Points beyond the standard deviation are half-lives.

위험도 계산은 다음 수학식 2와 같이 계산될 수 있다.The risk calculation may be calculated as in Equation 2 below.

이며,

,

상기 R_i,t는 t시간에 i번 IP가 가지는 위험도이며, 상기 i는 상기 IP 리스트에 포함된 IP의 번호이며, 상기 t는 시간이며, 상기 H는 상기 IP 리스트에 저장된 모든 IP가 발송하는 스팸 메일간격 분포도의 신뢰도 95% 구간이 되는 포인트(=반감기)이다.R _{i, t} is the risk that IP i has at time t, i is the number of IPs included in the IP list, t is time, and H is sent by all IP stored in the IP list. It is a point (= half-life) that is a 95% confidence interval of the distribution interval of spam mails.

상기 수학식 2에 따라 삭제된 IP는 상술한 필터링 프로세스 과정을 통해 다시 봇넷 활동 IP로 상기 탐지부에 의해 적발될 수 있으며, 상기 탐지부는 위험도를 새로 부여 하여 반감기를 통해 위험도를 확인할 수 있다.The deleted IP according to Equation 2 may be detected by the detector by the botnet activity IP again through the above-described filtering process, and the detector may check the risk through half-life by granting a new risk.

한편, 상기 탐지부를 통해 상기 P-value를 만족시키지 못하여 IP 필터링부와 해쉬 필터링부를 통한 필터링 과정을 거치지 않거나, 상기 IP 필터링부와 해쉬 필터링부에 의해서도 필터링되지 않은 메일 트래픽은 임시 해쉬 저장부에 저장될 수 있다. Meanwhile, mail traffic not filtered by the IP filtering unit and the hash filtering unit because the P-value is not satisfied through the detector or stored by the IP filtering unit and the hash filtering unit is stored in the temporary hash storage unit. Can be.

상술한 바와 같이 봇넷 활동 여부는 상기 탐지부에서 상기 봇넷에 의한 메일 트래픽의 메일 로그가 상기 봇넷식별정보와 일치하거나, 상기 P-value를 만족시키는 경우여야 하는데, 봇넷 활동 초기여서 봇넷식별정보가 없거나 상기 P-value를 만족 시키지 못하는 경우 상기 임시 해쉬 저장부로 해당 메일 트래픽이 저장된다.As described above, whether or not the botnet activity is detected should be the case where the mail log of the mail traffic by the botnet matches the botnet identification information or satisfies the P-value. If the P-value is not satisfied, the mail traffic is stored in the temporary hash storage unit.

즉, 상기 임시 해쉬 저장부는 봇넷의 초기 활동에 따라 새로운 봇넷식별정보와 IP 리스트를 생성시킬 수 있다. 이 과정을 도 4를 통해 상세히 설명하기로 한 다.That is, the temporary hash storage unit may generate new botnet identification information and IP list according to the initial activity of the botnet. This process will be described in detail with reference to FIG. 4.

상기 도 4를 참고하면, 상기 임시 해쉬 저장부(106)는 상기 탐지부(101) 또는 본 시스템의 필터링 과정을 통해 필터링되지 않은 메일 트래픽으로부터 해쉬를 추출하여 같은 해쉬를 발송하는 IP들을 따로 모아 벡터 형태로 관리할 수 있다. 도시한 바와 같이, 상기 임시 해쉬 저장부(106)는 해쉬 A를 발송하는 IP-set A와 해쉬 B를 발송하는 IP-set B로 묶어서 벡터 A와 벡터 B를 생성시킬 수 있다.Referring to FIG. 4, the temporary hash storage unit 106 separately extracts hashes from unfiltered mail traffic through filtering of the detection unit 101 or the system, and separately collects IPs that send the same hash. It can be managed in the form. As shown in the drawing, the temporary hash storage unit 106 may generate the vector A and the vector B by grouping the IP-set A that sends the hash A and the IP-set B that sends the hash B.

이때, 상기 벡터 A와 벡터 B는 상술한 바와 같이 URL 변형을 통해 상기 해쉬에 포함된 URL이 동일한 웹 페이지를 타겟으로 지정할 수 있는데, 이와 같은 경우 상기 벡터 A와 벡터 B가 동일한 봇넷에 속할 가능성이 매우 높다. 따라서, 상기 벡터 A와 벡터 B의 연관성을 조사하여 상기 벡터 A와 벡터 B가 동일한 봇넷에 속한 것인지를 판단하고, 동일한 봇넷에 속한 벡터들이라고 판단될 경우 상기 벡터 A와 벡터 B를 통합할 필요가 있다.In this case, as described above, the vector A and the vector B may target a web page having the same URL included in the hash through URL modification, in which case the vector A and the vector B may belong to the same botnet. Very high. Therefore, by examining the association between the vector A and the vector B, it is determined whether the vector A and the vector B belong to the same botnet, and when it is determined that the vectors belong to the same botnet, it is necessary to integrate the vector A and the vector B. have.

상기 임시 해쉬 저장부(106)는 상기 벡터 A와 벡터 B의 연관성을 판단하기 위하여 KL_Divergence를 이용하여 상기 벡터 A와 벡터 B의 연관도를 수치화할 수 있다. 상기 연관도를 나타내기 위해서, 상기 임시 해쉬 저장부(106)는 벡터 A와 벡터 B에 각각 포함된 단위 시간 동안 메일 트래픽을 발생한 IP 개수의 분포와, mail 개수의 분포와 IP histogram을 확률분포로 변환하여 사용한다. 상기 세가지 확률분포을 이용하게 되면, 동일한 시기에 IP 규모가 유사한 벡터가 비슷한 양의 IP로 비슷한 양의 메일을 발송한 연관성 있는 벡터를 찾을 수 있다. 이와 같은 방법을 사용하는 이유는 서로 같은 조종자(bot-master)에게 조종당하는 수많은 밧들이 같은 시기에 명령을 받아 활동을 시작하고, 혹 그들이 서로 다른 target URL을 광고하도록 명령을 받았더라도 발송하는 시간과 시간당 메일을 발송하는 패턴이 굉장히 유사하기 때문이다.The temporary hash storage unit 106 may quantify the degree of association between the vector A and the vector B using KL_Divergence to determine the association between the vector A and the vector B. In order to show the correlation, the temporary hash storage unit 106 uses a probability distribution as a distribution of the number of IPs generating mail traffic, a distribution of mails, and an IP histogram for a unit time included in the vectors A and B, respectively. Convert it and use it. Using these three probability distributions, we can find a relevant vector in which vectors of similar IP sizes send similar amounts of mail with similar amounts of IP at the same time. The reason for using this method is that many bats controlled by the same bot-master receive orders at the same time to start their activities, or even when they are ordered to advertise different target URLs. This is because the pattern of sending mail per hour is very similar.

따라서, 상기 임시 해쉬 저장부(106)는 세가지 확률분포마다 동일한 확률분포끼리 벡터 A의 확률분포 p와 벡터 B의 확률분포 q로서 하기 수학식 3에 대입하여 KL_Divergence 값을 구할 수 있다.Accordingly, the temporary hash storage unit 106 may obtain the KL_Divergence value by substituting the same probability distribution for each of the three probability distributions as the probability distribution p of the vector A and the probability distribution q of the vector B as shown in Equation 3 below.

상기 KL_Divergence는 대칭적(symmetric)이지 않다. 그러므로 비대칭한 벡터 A와 벡터 B의 확률분포에 대한 KL_Divergence 값을 구하기 위해서 D(p||q) + D(q||p)를 사용한다.The KL_Divergence is not symmetric. Therefore, we use D (p || q) + D (q || p) to find the KL_Divergence values for the probability distributions of the asymmetric vectors A and B.

이렇게 해서 구한 KL_Divergence 값은 두 확률 분포가 일치하면 0, 서로 다를수록 큰 값을 갖게 된다. 따라서, 상기 세가지 확률분포에 대하여 구해진 각 KL_Divergence 값을 취합하여 기설정된 값(threshold) 이하인 경우 상기 벡터 A와 벡터 B는 연관성이 높다고 판단하여 상기 벡터 A와 벡터 B를 합칠 수 있다. 반대로, 기설정된 값 이상인 경우 상기 벡터 A와 벡터 B는 서로 연관성이 없는 벡터이며 각기 다른 봇넷에 속한 벡터로 판단하여, 벡터 A와 B의 IP 셋을 합치지 않고 그대로 유지한다.The KL_Divergence value thus obtained has a value of 0 if the two probability distributions coincide with each other and a larger value if the values are different from each other. Accordingly, when the KL_Divergence values obtained for the three probability distributions are collected and less than or equal to a predetermined threshold, the vector A and the vector B may be determined to have a high correlation, and the vector A and the vector B may be combined. On the contrary, when more than a predetermined value, the vector A and the vector B are determined to be vectors having no relation with each other and belong to different botnets, and thus maintain the IP sets of the vectors A and B without being combined.

이후, 상기 임시 해쉬 저장부(106)는 상기 벡터 A와 벡터 B가 합쳐진 경우 상기 벡터 A와 벡터 B에 포함된 IP와 해쉬를 취합하여 IP 리스트 및 봇넷식별정보를 생성할 수 있다. 반면에, 상기 벡터 A와 B가 연관성이 없어 서로 독립된 상태로 유지된 경우 각각의 IP 셋과 해쉬를 포함하는 봇넷식별정보를 그대로 유지한다.Subsequently, when the vector A and the vector B are combined, the temporary hash storage unit 106 may generate an IP list and botnet identification information by collecting IPs and hashes included in the vector A and the vector B. On the other hand, when the vectors A and B are not associated with each other and remain independent from each other, the botnet identification information including the respective IP sets and hashes is maintained as they are.

이와 같이, 상기 임시 해쉬 저장부(106)는 저장된 모든 벡터 사이의 연관도를 구하여 상술한 바와 같이 합치거나 벡터 그대로 보존하여 새로운 IP 리스트와 봇넷식별정보를 생성할 수 있다. 이후, 상기 임시 해쉬 저장부(106)는 새로운 IP 리스트에 포함된 IP가 지속적으로 전달되어 상기 P-value를 만족하는 경우 상기 새로운 IP 리스트와 봇넷식별정보를 상기 저장부(102)에 올려 상기 탐지부의 제어에 의해 새로운 봇넷에 대한 필터링이 이루어지도록 하며, 새로운 IP 리스트에 포함된 IP가 이후로 발견되지 않는 경우 상기 위험도를 부여하여 삭제하도록 할 수 있다.As described above, the temporary hash storage unit 106 may generate a new IP list and botnet identification information by obtaining associations between all stored vectors and combining or storing the vectors as described above. Thereafter, the temporary hash storage unit 106 sends the new IP list and the botnet identification information to the storage unit 102 when the IP included in the new IP list is continuously delivered to satisfy the P-value. Filtering of the new botnet is performed by negative control, and if the IP included in the new IP list is not found later, the risk may be given and deleted.

도 5는 본 발명에 따른 봇넷 탐지 및 차단 시스템에 대한 전체 순서도를 도시한 도면으로서, 상기 도 5를 참고하여 상세히 설명하면 Input Log Stream을 통해 전송되는 메일 트래픽을 수신한 상기 탐지부(101)는 상기 메일 트래픽의 메일 로그로부터 해쉬를 추출하여 일치하는 봇넷식별정보가 있는지 상기 저장부(102)를 검색할 수 있다(①). 일치하는 봇넷식별정보가 있는 경우 상기 봇넷식별정보에 대응하는 IP 리스트를 상기 IP 필터링부(103)에 로드하게 되고, 상기 IP 필터링부(103)는 상기 메일 로그로부터 IP를 추출하여 일치하는 IP가 포함된 경우 상기 메일 트래픽을 필터링하게 된다(②).FIG. 5 is a flowchart illustrating the entire botnet detection and blocking system according to the present invention. Referring to FIG. 5, the detection unit 101 receiving mail traffic transmitted through an input log stream is described. A hash may be extracted from the mail log of the mail traffic to search the storage unit 102 for matching botnet identification information (①). If there is a matching botnet identification information, the IP list corresponding to the botnet identification information is loaded into the IP filtering unit 103, and the IP filtering unit 103 extracts an IP from the mail log, and the matching IP is determined. If included, the mail traffic is filtered (②).

필터링된 IP를 포함하는 메일 트래픽은 해쉬 샘플러부로 전송되며, 상기 해쉬 샘플러부(104)는 URL을 포함하는 해쉬 샘플을 추출하여 상기 해쉬 필터링 부(105)로 전송할 수 있다(③). 이때, 상기 해쉬 샘플러부(104)에 의해 추출되는 해쉬는 상기 IP가 타겟으로 하는 URL이 복수인 경우 복수로 추출하여 상기 해쉬 필터링부(105)에 복수의 URL 정보를 전송할 수도 있다.Mail traffic including the filtered IP is transmitted to the hash sampler unit, and the hash sampler 104 may extract a hash sample including the URL and transmit the hash sample to the hash filtering unit 105 (③). In this case, the hash extracted by the hash sampler 104 may extract a plurality of URLs targeted by the IP and transmit a plurality of URL information to the hash filter 105.

상기 해쉬 필터링부(105)는 상기 해쉬 샘플러부(104)로부터 전송된 해쉬를 가지고 상기 IP 필터링부(103)에 의해 필터링되지 않은 메일 트래픽의 해쉬와 비교하여 상기 전송된 해쉬와의 비교를 통해 일치하는 경우 필터링하도록 할 수 있다(④). 이때, 상기 해쉬 필터링부(105)를 통해 필터링된 메일 트래픽이 포함하는 IP는 상기 IP 필터링부(103)에 로드된 IP 리스트에 추가하여 갱신시킬 수 있으며, 상기 IP 필터링부(103)는 갱신된 IP 리스트를 상기 저장부(102)에 업로드하여 상기 저장부(102)의 IP 리스트를 갱신시킬 수 있다(⑤).The hash filtering unit 105 has a hash transmitted from the hash sampler 104 and matches the hash of mail traffic not filtered by the IP filtering unit 103 by comparing with the transmitted hash. If it does, it can be filtered (④). In this case, the IP included in the mail traffic filtered through the hash filtering unit 105 may be updated in addition to the IP list loaded in the IP filtering unit 103, and the IP filtering unit 103 may be updated. The IP list of the storage unit 102 may be updated by uploading the IP list to the storage unit 102 (⑤).

이후, 상기 해쉬 필터링부(105)에 의해서도 필터링 되지 않은 메일 트래픽의 해당 메일 로그에 해쉬가 포함되어 있는지 여부에 따라 해쉬가 포함된 경우 임시 해쉬 저장부(106)로 전송된다(⑥).Thereafter, the hash filtering unit 105 is transmitted to the temporary hash storage unit 106 when the hash is included depending on whether or not the hash is included in the corresponding mail log of the unfiltered mail traffic (⑥).

상기 임시 해쉬 저장부(106)는 해쉬 필터링부에 의해서도 필터링되지 않은 메일 트래픽을 실시간으로 수신하여 해쉬가 일치하거나 해쉬 변형(hash variation)에 의해 연관성 있는 메일 트래픽끼리 묶어 새로운 IP 리스트와 봇넷식별정보를 생성하여 새롭게 등장한 봇넷에 대한 등록을 할 수 있다.The temporary hash storage unit 106 receives the unfiltered mail traffic in real time even by the hash filtering unit and binds the mail traffic associated with the hash match or the hash variation to combine the new IP list and the botnet identification information. You can create and register a new botnet.

이후, 상기 임시 해쉬 저장부(106)로 상기 IP 리스트에 저장된 IP를 포함하는 메일 트래픽이 지속적으로 수신되어 상기 P-value를 만족하는 경우 상기 임시 해쉬 저장부(106)는 상기 IP 리스트와 봇넷식별정보를 상기 저장부(102)에 올려 새 로운 봇넷에 대한 탐지가 이루어지도록 등록할 수 있다(⑦).Thereafter, when the mail traffic including the IP stored in the IP list is continuously received by the temporary hash storage unit 106 and the P-value is satisfied, the temporary hash storage unit 106 identifies the IP list and the botnet. The information may be uploaded to the storage unit 102 and registered to detect a new botnet (⑦).

한편, 상기 해쉬 필터링부(105)를 통해 필터링되지 않은 메일 트래픽이 해쉬를 가지지 않는 경우 봇넷에 의한 메일 발송이 아니라고 판단된 메일 트래픽은 contents filter(107)로 전송될 수 있다. 상기 contents filter(107)로 전송된 메일 트래픽은 타겟 URL을 가지고 있지 않거나 타겟 URL이 있어도 봇넷의 특성이 분산된 IP들의 집단 발송에 의한 스팸 메일이 아닌 소수 IP에 의한 대량 발송으로 판단된 것들이다. 이러한 메일 트래픽들은 기존의 spam filtering 기술인 contents filter(107)에 의해 스팸 여부를 판단받게 되며 이마저도 통과하면 정상 메일로 구분이 된다. On the other hand, if the unfiltered mail traffic through the hash filtering unit 105 does not have a hash, the mail traffic determined that the mail is not sent by the botnet may be transmitted to the contents filter (107). The mail traffic transmitted to the contents filter 107 is determined to be mass mailings by a small number of IPs, not spam mails by mass mailings of distributed IPs, even if the target URLs do not have or have a target URL. These mail traffic is judged whether or not spam by the contents filter 107 which is a conventional spam filtering technology, and even passes through it is classified as normal mail.

도 1은 봇넷(botnet)의 구성을 도시한 구성도. 1 is a configuration diagram showing the configuration of a botnet;

도 2는 본 발명에 따른 봇넷 탐지 및 차단 시스템의 구성도.2 is a block diagram of a botnet detection and blocking system according to the present invention.

도 3은 본 발명에 따른 봇넷 탐지 및 차단 시스템의 필터링 과정을 도시한 구성도.Figure 3 is a block diagram showing a filtering process of the botnet detection and blocking system according to the present invention.

도 4는 본 발명에 따른 봇넷 탐지 및 차단 시스템의 필터링 대상 봇넷에 대한 생성과정을 도시한 구성도. Figure 4 is a block diagram showing the generation process for the filtering target botnet of the botnet detection and blocking system according to the present invention.

도 5는 본 발명에 따른 봇넷 탐지 및 차단 시스템의 전체 플로우를 도시한 도면.5 illustrates the overall flow of a botnet detection and blocking system in accordance with the present invention.

***도면의 주요 부분에 대한 부호의 설명****** Description of the symbols for the main parts of the drawings ***

101: 탐지부 102: 저장부101: detector 102: storage unit

103: IP 필터링부 104: 해쉬 샘플러부103: IP filtering unit 104: hash sampler unit

105: 해쉬 필터링부 106: 임시 해쉬 저장부105: hash filtering unit 106: temporary hash storage unit

107: contents filter 110: IP 필터링부의 IP 리스트107: contents filter 110: IP list of the IP filtering unit

111: 저장부의 IP 리스트111: IP list of storage

Claims

In the botnet behavior detection and blocking system that is connected to at least one mail server and extracts an email log from the mail traffic received in the form of a real-time stream to block spam mail by the botnet,

A storage unit for storing an IP list matching the botnet identification information;

An IP filtering unit for loading an IP list of a specific botnet from the storage unit and filtering mail traffic including an IP set in the IP list; And

If the botnet activity is detected by extracting the email log and comparing with the botnet identification information of the storage unit, the control unit loads the IP list matching the botnet identification information from the storage unit to the IP filtering unit. part

Botnet detection and blocking system.

The method according to claim 1,

The botnet identification information includes a botnet detection and blocking system comprising a URL targeted by the botnet.

The method according to claim 1,

The detection unit

When the P-Value value is less than or equal to a predetermined threshold through the IP list matching the botnet identification information from the storage unit to load from the filtering unit, TN is the number of effective IPs measured during the unit time, BN is the IP number of the IP list, IN is the number of all IP observed through the input stream per unit time, HIT is the number of IP included in the BN of the botnet detection and blocking system.

The method according to claim 3,

The detector detects and blocks botnets by calculating the P-Value in real time to control the IP filtering unit to filter mail traffic of IP included in the specific botnet when the P-Value value is returned to a predetermined threshold value or more. system.

The method according to claim 1,

A hash sampler unit extracting a sample hash including a URL from a log of mail traffic filtered by the IP filtering unit; And

On the basis of the sample hash extracted from the hash sampler unit, a comparison is made with the hash of the unfiltered mail traffic in the IP filtering unit. Hash filtering unit to update by adding IP

Botnet detection and blocking system further comprising.

The method according to claim 5,

And the IP filtering unit updates the IP list stored in the storage unit corresponding to the IP list updated by the hash filtering unit.

The method according to claim 5,

And the hash filtering unit adds and updates information about the sample hash to the botnet identification information stored in the storage unit corresponding to the IP list loaded in the IP filtering unit.

The method according to claim 5,

The hash filtering unit extracts a hash including a URL from a log of unfiltered mail traffic, generates a vector by tying an IP having the same hash, and calculates the degree of association between the vectors through KL_Divergence. The botnet detection and blocking system further comprises: a temporary hash storage unit generating a new botnet identification information and an IP list and updating the storage unit by binding the IP and the hash included in each vector to one below the threshold value.

The method according to claim 8,

The temporary hash storage converts the distribution of IP number, mail number distribution, and IP histogram into a probability distribution for each unit time included in two different vectors, and converts the same type of probability distribution in each vector to p and Substituting D (p || q) + D (q || p) as q and summing the KL_Divergence values of D (p || q) + D (q || p) for each probability distribution Seeking,

Botnet detection and blocking system, characterized in that.

The method according to claim 8,

The temporary hash storage unit continuously transmits mail traffic having an IP included in the new IP list from the hash filtering unit.

Update the new IP list and botnet identification information to the storage unit when the P-Value value is less than or equal to a predetermined threshold through TN, the number of effective IPs measured during a unit time, and BN is the number of IPs of the IP list. And IN is the number of all IPs observed through the input stream per unit time, and HIT is the number of IPs included in BN among INs.

The method according to claim 1,

The detection unit adds a risk for each IP of the IP list stored in the storage unit, exponentially decreases the risk according to whether or not the mail traffic sent by the IP is continued, and detects an IP corresponding to a case lower than a preset threshold. Botnet detection and blocking system, characterized in that the deletion from the IP list.

The method of claim 11,

The risk is

,

R _{i, t} is the risk that IP i has at time t, i is the number of IPs included in the IP list, t is time, and H is sent by all IP stored in the IP list. Botnet detection and blocking system, characterized in that the point (= half-life) is a 95% confidence interval of the interval distribution of spam email.

A detection unit extracting the email log and comparing the botnet identification information;

As a result of comparing the detection unit, if the botnet activity is detected above a predetermined level, it receives the mail traffic and extracts a hash including a URL. Then, a hash of the received mail traffic is generated by combining the same IPs, and generating a vector between different vectors. A temporary hash storage unit for calculating the association degree of the network through KL_Divergence and updating the storage unit with a new IP list generated by integrating the vector and the botnet identification information including the hash when the association degree is less than or equal to a preset threshold.

Botnet detection and blocking system comprising a.

14. The method of claim 13,

The temporary hash storage unit adds a risk for each IP of the new IP list, exponentially decreases the risk according to whether or not the mail traffic sent by the IP is continued, and sets the IP corresponding to the IP that is lower than a predetermined threshold. Botnet detection and blocking system, characterized in that the deletion from the list.

The method according to claim 15,

The risk is

,

A method of blocking spam mails by botnets by extracting an email log from mail traffic received in a real time stream in connection with at least one mail server,

A first step of detecting whether a botnet is active by comparing a hash included in the email log with preset botnet identification information;

As a result of the first step, when the hash is detected at a botnet activity higher than a predetermined level in comparison with the botnet identification information, the detection unit controls a storage unit which stores an IP list matching the botnet identification information and stores the IP list in the IP. A second step of loading the corresponding mail traffic into an IP filtering unit for filtering;

A third step of transmitting, by the detection unit, the mail traffic to the IP filtering unit; And

A fourth step of determining whether to filter the mail traffic by comparing the IP filtering unit with the IP of the mail traffic based on the IP list;

Botnet detection and blocking method consisting of.

The method according to claim 17,

The botnet identification information is a botnet detection and blocking system, characterized in that the hash information for the URL targeted by the botnet.

The method according to claim 17,

In the first step, if the hash matches the botnet identification information, the detection unit

And loading the IP list matching the botnet identification information from the storage unit to the filtering unit when the P-Value value is less than or equal to a predetermined threshold value, wherein TN is the number of effective IPs measured during a unit time. Where BN is the number of IPs in the IP list, IN is the number of all IPs observed through the input stream per unit time, and HIT is the number of IPs included in the BN among INs.

The method of claim 19,

In the fourth step, when the IP filtering unit returns the P-Value to a predetermined threshold value or more, the botnet detection comprises not filtering the mail traffic of the IP included in the specific botnet by the control of the detection unit. And blocking method.

The method according to claim 17,

A fifth step of extracting a hash containing a URL from a log of mail traffic filtered by the IP filtering unit by a hash sampler extracting a hash from a log; And

The hash filtering unit filtering the hash based on the sample hash extracted in the fifth step compares the hash of the unfiltered mail traffic with the sample hash and filters the hash of the unfiltered mail traffic to the IP list loaded in the filtering unit. A sixth step of adding and updating the IP of the unfiltered mail traffic;

The botnet detection and blocking method further comprising.

23. The method of claim 21,

And the seventh step of updating the IP list stored in the storage unit corresponding to the IP list updated by the hash filtering unit in the sixth step.

23. The method of claim 21,

The hash filtering unit may further include an eighth step of updating the hash information by adding the information about the sample hash to the botnet identification information stored in the storage unit corresponding to the IP list loaded in the IP filtering unit. Botnet detection and blocking systems.

23. The method of claim 21,

In the sixth step, the temporary hash storage unit generating the new IP list and the botnet identification information in the storage unit extracts a hash including a URL from the log of unfiltered mail traffic through the hash filtering unit to obtain an IP having the same hash. Generate a vector by combining and calculate the association between the vectors through KL_Divergence, and when the association is less than a preset threshold, combine the IP and hash included in each vector into one to generate a new botnet identification information and an IP list, and the storage unit And a ninth step of updating to the botnet detection and blocking method.

27. The method of claim 24,

In the ninth step, the distribution of IP number, mail number distribution, and IP histogram for which the temporary hash storage unit has sent spam mails for a unit time included in two different vectors is converted into a probability distribution, and the probability of the same kind is obtained in each vector. Substituting the distribution as p and q into D (p || q) + D (q || p) and summing the KL_Divergence values of D (p || q) + D (q || p) for each probability distribution Obtaining the association degree further;

Botnet detection and blocking system, characterized in that.

27. The method of claim 24,

In the ninth step, mail traffic having an IP included in the new IP list is continuously transmitted from the hash filtering unit.

And a tenth step of updating the new IP list and botnet identification information to the storage unit when the P-Value value is less than or equal to a preset threshold value, wherein TN is an effective number of IPs measured during a unit time period. BN is the number of IPs in the IP list, IN is the number of all the IP observed through the input stream per unit time, HIT is the number of IPs included in the BN of the botnet detection and blocking system.