TWI634769B

TWI634769B - Method for detecting domain name transformation botnet through proxy server log

Info

Publication number: TWI634769B
Application number: TW105130289A
Authority: TW
Inventors: 鄭棕翰; 陳建智; 周國森; 黃秀娟; 施君熹
Original assignee: 中華電信股份有限公司
Priority date: 2016-09-20
Filing date: 2016-09-20
Publication date: 2018-09-01
Also published as: TW201815142A

Abstract

本發明有關於一種通過代理伺服器日誌偵測域名變換(domain flux)殭屍網路的方法，以網站分析模組執行以下步驟，包含：存取代理伺服器日誌資料；再根據一節點過濾演算法對日誌資料中常規的用戶代理連線過濾掉；並根據一網站分群演算法，在剩餘紀錄中進行分群，以取得日誌內容中外部網站之間相關聯的集合；取得外部網站的連線拓樸，並以一連線特徵演算法判斷集合是否符合domain flux的連線特徵；以及，濾除剩餘紀錄當中與正常應用程式或內容傳遞網路(Content Delivery Network，CDN)連接之網域，其餘網域即判斷為domain flux殭屍網路。 The invention relates to a method for detecting a domain flux botnet through a proxy server log, and the website analysis module performs the following steps, including: accessing the proxy server log data; and then filtering the algorithm according to a node. Filtering out the regular user agent connection in the log data; and grouping the remaining records according to a website grouping algorithm to obtain the associated collection between the external websites in the log content; obtaining the connection topology of the external website And use a link feature algorithm to determine whether the set conforms to the link characteristics of the domain flux; and filter out the remaining records in the domain connected to the normal application or Content Delivery Network (CDN), and the rest of the network The domain is judged as the domain flux botnet.

Description

Method for detecting domain name transformation botnet through proxy server log

本發明有關於一種通過代理日誌偵測惡意網路之方法，特別是有關於一種通過代理伺服器日誌偵測域名變換(domain flux)殭屍網路的方法。 The present invention relates to a method for detecting a malicious network through a proxy log, and more particularly to a method for detecting a domain flux botnet through a proxy server log.

針對代理伺服器日誌資料的整理及分析，是現今資訊安全控管的基礎，而隨著巨量資料時代到來，如何在巨量的日誌資料當中撈取出有價值的資訊，係領域內技術人員的主要課題，若能夠發展出一種可以篩選出惡意網域的群集，並進一步鎖定企業或公司內部的受害主機，對於需要進行嚴謹資安控管的企業或公司而言，將具有明顯之助益。 The organization and analysis of the proxy server log data is the basis of today's information security control. With the advent of the huge data era, how to extract valuable information from a huge amount of log data is the technical staff of the field. The main subject, if it can develop a cluster that can screen out malicious domains, and further lock the victim host within the enterprise or company, it will have obvious benefits for enterprises or companies that need to carry out strict security control.

如中華民國專利公告號I455546，其係一種偵測快速變動網域技術之惡意網域的方法與系統，其主要是利用路由器資訊所包含的路由器主機名稱與網路位址自治系統號碼等等資訊，配合路由器主機的特定部分名稱相同，或是網路封包傳送時間大於預設的一個檢查值作為基礎，來判斷是否為惡意網域，然而，其準確性有所誤差，並有著將正常的應用程式錯誤的報告為惡意網域的可能性。 For example, the Republic of China Patent Publication No. I455546, which is a method and system for detecting a malicious domain of rapidly changing domain technology, mainly uses router host name and network address autonomous system number and the like included in the router information. With the same name of the specific part of the router host, or the network packet transmission time is greater than the preset one to determine whether it is a malicious domain, however, its accuracy is inaccurate and has a normal application. The program error reported the possibility of a malicious domain.

而除上開先前技術之外，領域中亦已發展出透過網路搜尋引擎與WHOIS網站查詢得到網域名稱相關網域的集合結果，以找出僅包含少量搜尋結果之可疑網站，最後再依據相關聯網域的集合與其搜尋結果數目，判斷是否為可疑殭屍網路的中繼站域名的技術方法，唯其仍無法準確篩選出域名變換(domain flux)殭屍網路，domain flux技術若應用殭屍網路，是一種可以透過既有的DNS服務，或是網域生成算法，來實現多個域名與同一IP位址相關聯，進而逃避統一資源定位符(Uniform Resource Locator，URL)檢測的技術。 In addition to the previous technology, the domain has also developed a network search engine and the WHOIS website to obtain the domain name related domain. Collecting results to find suspicious websites with only a small number of search results, and finally determining whether it is a technical method of the relay station domain name of the suspicious botnet based on the collection of related network domains and the number of search results, but it still cannot accurately filter out the domain name. Domain flux botnet, domain flux technology, if applied to the botnet, is a way to achieve multiple domain names associated with the same IP address through existing DNS services or domain generation algorithms, thereby escaping The technology of Uniform Resource Locator (URL) detection.

承上，各種針對代理伺服器日誌資料所延伸出的惡意網站防護方法，因應日新月異的技術，仍有著諸多的可能性，可以各自針對問題進行改良，而一種針對domain flux殭屍網路的篩選方法，則係目前領域中人亟其需要的。 In view of the various malicious website protection methods extended by the proxy server log data, there are still many possibilities in response to the ever-changing technology, which can be improved for each problem, and a screening method for the domain flux botnet. It is the need of people in the current field.

本發明提出一種通過代理伺服器日誌偵測域名變換(domain flux)殭屍網路的方法，係因攻擊者為了使殭屍網路(Botnet)的存活率提高，會經常使用domain flux技術以避免被輕易查獲進而封鎖，但由於惡意程式連線至外部特定網站的行為都會詳細被記錄在代理伺服器的日誌資料內，故本發明之發想為透過分析代理伺服器的日誌資料，並透過網域聯集之結果來取得符合domain flux連線行為之聯集後得出行為可能為domain flux殭屍網路的方法。 The present invention proposes a method for detecting a domain flux botnet through a proxy server log, because the attacker often uses domain flux technology to avoid being easily used in order to improve the survival rate of the botnet (Botnet). The seizure is further blocked, but since the behavior of the malicious program connecting to a specific external website is recorded in the log data of the proxy server in detail, the present invention is made by analyzing the log data of the proxy server and communicating through the domain. The result of the set is to obtain a method that is consistent with the domain flux connection behavior and that the behavior may be a domain flux botnet.

本發明之一種通過代理伺服器日誌偵測域名變換(domain flux)殭屍網路之方法，其主要係透過一網站分析模組執行複數步驟，首先為，網站分析模組存取代理伺服器日誌資料，並根據一節點過濾演算法對各該日誌資料中用戶代理(user-agent)資訊代表之連線狀況進行過濾，以將屬於與常規網站連線的日誌內容濾除；其中，該節點過濾演算法係利用用戶代理的節點維度值(degree)此種特徵，來過濾有名的網站，詳細來說，此步驟係透過映射歸納(MapReduce)架構，以外部網站其終端URL作為鍵(Key)，用戶代理當作值(Value)，可以有效率的得到每個外部網站被不同用戶代理連結的次數之清單，藉由過濾清單中維度值大的網站，即可初步過濾掉相對有名的網站，本發明係保留在長天期流量資訊中維度值小於一預設閾值(例如，閾值為10)的外部網站，其意旨為，除了過濾掉較有名的網站之外，僅保留只有被相當少數用戶代理(User-agent)所連線的外部網站通常表示其用途特殊，這些外部網站有高機率為惡意中繼站。 The invention discloses a method for detecting a domain flux botnet through a proxy server log, which mainly performs a plurality of steps through a website analysis module, firstly, the website analysis module accesses the proxy server log data. And filtering the connection status of the user-agent information representative in each log data according to a node filtering algorithm, so as to belong to the common The log content of the website connection is filtered out; wherein the node filtering algorithm uses the node dimension value of the user agent to filter the famous website. In detail, this step is summarized by mapping (MapReduce) The architecture uses the terminal URL of the external website as the key (Key), and the user agent acts as a value (Value), which can efficiently obtain a list of the times each external website is linked by different user agents, by filtering the dimension values in the list. A large website can initially filter out relatively well-known websites. The present invention retains an external website whose dimension value is less than a predetermined threshold (for example, a threshold of 10) in long-term traffic information, and is intended to be filtered out. In addition to the more famous websites, only the external websites that are only connected by a very small number of user-agents (User-agents) usually indicate that they have special uses. These external websites have a high probability of being a malicious relay station.

下一步驟為，該網站分析模組根據一網站分群演算法，在上一步驟過濾後剩餘的日誌內容的連線紀錄中，匹配外部網站中具有相同客戶端網際網路協定位址(Client IP)以及用戶代理連線紀錄者，以分群找出日誌內容中外部網站之間相關聯的集合；本步驟之目的，係為透過日誌中的流量資訊所提供之資訊，以將外部網站間建立關聯性，其旨在將被同一個程式所連結到的外部網站，皆視為有關的網站；詳細來說，本發明係將具有相同Client IP和用戶代理連線紀錄的外部網站都分在同一集合內，其係代表這些網站是有關連之特徵的，而由於這個建立外部網站集合之特徵(Client IP跟用戶代理)，可以透過日誌記錄中的欄位內容是否完全匹配(Exact Match)來判斷，故這種網站分群演算法亦可以透過MapReduce中的鍵與值(Key-value)架構來實做。 The next step is that the website analysis module matches the external client's internet protocol address (Client IP) in the connection record of the remaining log content in the previous step according to a website grouping algorithm. And the user agent connection record, to find out the associated collections between the external websites in the log content by grouping; the purpose of this step is to link the external websites through the information provided by the traffic information in the log. Sexuality, which is intended to treat external websites linked to by the same program as related websites; in detail, the present invention divides external websites having the same Client IP and user agent connection records into the same collection. In the meantime, it means that these websites are related to each other, and because of the characteristics of the set of external websites (Client IP and user agent), it can be judged by whether the contents of the log records are Exact Match. Therefore, this kind of website grouping algorithm can also be implemented through the key-value architecture in MapReduce.

其中，該網站分群演算法可以透過聯集查找的方法進行分群，即以該網站分析模組以剩餘之日誌內容的連線紀錄建立若干集合後，進行下列步驟：以各集合內的個別連線紀錄資料作為元素，若連線紀錄連接的外部網站具有相同之Client IP及用戶代理者，則判斷為交集，並將元素有交集的集合合併；刪除集合內元素數量大於預設閥值的集合；將與其餘集合不相交集的集合判斷為獨立集合；以及，重複上述三步驟直至所有集合被判斷為獨立集合。 The website grouping algorithm can be grouped by the method of association search, that is, the connection module of the website analyzes the remaining log content. After establishing a set of records, the following steps are performed: the individual connection records in each set are used as elements, and if the external sites connected to the connection record have the same Client IP and user agent, the intersection is judged and the elements are The set of intersections is merged; the set of elements in the set is greater than the preset threshold; the set of disjoint sets with the remaining sets is determined as a separate set; and the above three steps are repeated until all the sets are judged as independent sets.

承上步驟，接著，該網站分析模組取得日誌內容中Client IP的連線紀錄和Client IP及連線之外部網站的連線拓樸；其中，本發明通過以上步驟以不相交集的網址集合對連線紀錄做集合分類後，所得出同一集合之連線紀錄僅會連結到同一集合的網域(domain)，同一集合的連線紀錄所隱含的意義係為具有相同目的之程式所產生出的網路行為，這個步驟將取得Client IP以及外部網址的連線拓樸，並且取得Client IP在代理伺服器日誌中的連線資訊，例如，對不相交集合的網址字串透過雜湊(hash)方法取得其群組代號，配合連線紀錄中所記載每個網址對應連線資訊，透過MapReduce的Multiple Input機制以網址當作key即可對照出得群組代號對應的連線紀錄集合以及群組的連線拓樸圖。 Following the steps, the website analysis module obtains the connection record of the Client IP in the log content and the connection topology of the client IP and the external website of the connection; wherein the present invention uses the above steps to set the URLs of the disjoint sets After the collection records are classified, the connection records of the same collection will only be linked to the same set of domains. The meaning of the connection records of the same collection is generated by programs with the same purpose. Out of the network behavior, this step will get the connection topology of the Client IP and the external URL, and obtain the connection information of the Client IP in the proxy server log, for example, the hash of the URL string of the disjoint collection (hash) The method obtains the group code, and cooperates with the connection information of each website recorded in the connection record, and uses the multiple input mechanism of MapReduce to use the URL as the key to compare the connection record set corresponding to the group code and the group. The connection topology of the group.

再來，該網站分析模組將外部網站與相關聯的集合間之關係透過連線拓樸呈現，並以一連線特徵演算法判斷集合是否符合domain flux的連線特徵，其中，所謂的domain flux的連線特徵，係指連線係透過少數的Client IP和相同的用戶代理(User-agent)以連線至許多相異的外部網址網域；實作上，該連線特徵演算法係用以運算一個集合中的終端URL數與Client IP數的比值，若計算出的比值超過了一預設閥值，即代表集合的網站符合domain flux連線特徵的網域變動行為。 Then, the website analysis module presents the relationship between the external website and the associated collection through the connection topology, and uses a connection feature algorithm to determine whether the collection conforms to the connection characteristics of the domain flux, wherein the so-called domain The connection feature of flux means that the connection is connected to many different external URL domains through a small number of Client IPs and the same User-agent; in practice, the connection feature algorithm is It is used to calculate the ratio of the number of terminal URLs in a set to the number of Client IPs. If the calculated ratio exceeds a preset threshold, it means that the website of the set meets the domain change line of the domain flux connection feature. for.

最後，該網站分析模組取出符合domain flux連線特徵的集合，濾除其中代表與正常的網路應用程式(例如，防毒程式軟體)或內容傳遞網路(Content Delivery Network，CDN)等等所連接之網域，該網站分析模組即可將其餘網域判斷為domain flux的殭屍網路並予以警戒。 Finally, the website analysis module extracts the collections that conform to the domain flux connection characteristics, and filters out the representative and normal network applications (for example, anti-virus software) or content delivery network (CDN). Connected to the domain, the website analysis module can identify the remaining domains as domain bots and alert them.

以上，即為本發明之通過代理伺服器日誌偵測domain flux殭屍網路的方法，可以透過domain flux殭屍網路所可能具有的特徵，在巨量長天期的日誌資料中，挖掘出疑似domain flux殭屍網路的外部網站，以便進行進一步的預防措施。 The above is the method for detecting the domain flux botnet through the proxy server log, which can mine the suspected domain in the log data of the huge long-term period through the characteristics of the domain flux botnet. The external website of the flux botnet for further precautions.

A‧‧‧電腦 A‧‧‧ computer

B‧‧‧電腦 B‧‧‧ computer

C‧‧‧電腦 C‧‧‧ computer

E‧‧‧電腦 E‧‧‧ computer

F‧‧‧電腦 F‧‧‧ computer

G‧‧‧電腦 G‧‧‧ computer

H‧‧‧電腦 H‧‧‧ computer

1‧‧‧攻擊者 1‧‧‧ Attacker

2‧‧‧攻擊者 2‧‧‧ Attackers

10‧‧‧群集 10‧‧‧ cluster

11‧‧‧中繼站 11‧‧‧ Relay Station

12‧‧‧中繼站 12‧‧‧ Relay Station

13‧‧‧中繼站 13‧‧‧Relay station

20‧‧‧群集 20‧‧‧ cluster

21‧‧‧中繼站 21‧‧‧ Relay Station

22‧‧‧中繼站 22‧‧‧Relay station

23‧‧‧中繼站 23‧‧‧ Relay Station

30‧‧‧中繼站 30‧‧‧Relay station

S201~S205‧‧‧方法步驟 S201~S205‧‧‧ method steps

圖1為本發明通過代理伺服器日誌偵測domain flux殭屍網路的情境示意圖。 FIG. 1 is a schematic diagram of a scenario for detecting a domain flux botnet through a proxy server log according to the present invention.

圖2為為本發明通過代理日誌對外部網站分群之方法流程圖。 2 is a flow chart of a method for grouping external websites by proxy logs according to the present invention.

圖3係為本發明一個用戶代理連結的次數之統計清單的範例示意圖。 FIG. 3 is a schematic diagram showing an example of a statistical list of the number of times a user agent is connected.

圖4係為本發明將一簡單的日誌資料轉換成外部網站集合的範例示意圖。 FIG. 4 is a schematic diagram showing an example of converting a simple log data into an external website collection according to the present invention.

圖5係為本發明以集合回推網路連線群組的範例示意圖。 FIG. 5 is a schematic diagram of an example of a set back push network connection group according to the present invention.

圖6係為本發明domain flux的網路行為拓樸圖範例示意圖。 6 is a schematic diagram showing an example of a network behavior topology diagram of a domain flux according to the present invention.

圖7係一防毒軟體透過domain flux方法進行連線行為的範例示意圖。 FIG. 7 is a schematic diagram showing an example of an antivirus software performing a connection behavior through a domain flux method.

圖8係一種代理伺服器日誌資料的範例示意圖。 Figure 8 is a schematic diagram showing an example of proxy server log data.

圖9係為透過本發明之方法篩選出符合domain flux殭屍網路之實施例示意圖。 Figure 9 is a schematic diagram showing an embodiment of screening a domain flux botnet in accordance with the method of the present invention.

圖10係為透過本發明之方法篩選出符合domain flux殭屍網路之實施例示意圖。 Figure 10 is a schematic diagram showing an embodiment of screening a domain flux botnet in accordance with the method of the present invention.

為了使本發明的目的、技術方案及優點更加清楚明白，下面結合附圖及實施例，對本發明進行進一步詳細說明。應當理解，此處所描述的具體實施例僅用以解釋本發明，但並不用於限定本發明。 The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

圖1係為本發明通過代理伺服器日誌偵測domain flux殭屍網路的情境示意圖，攻擊者將惡意程式植入被害的主機後即可使其成為殭屍電腦(Bots)，而殭屍電腦可以進行竊取機密或敏感資料等惡意行為，而現今攻擊者為了提高殭屍網路(Botnet)之存活率，會使用domain flux的技術來迴避檢查，但由於惡意程式連到外部特定網站的行為都會被記錄在代理伺服器日誌資料內，故本發明仍可將其解析出來，圖1中，攻擊者1利用惡意程式將目標企業內部網路中的電腦A和電腦B和電腦C變為殭屍電腦，電腦A和電腦B和電腦C中的惡意程式會連結至中繼站11、中繼站12以及中繼站13，而電腦C在執行正常連線時連接到有名網站；另外，攻擊者2利用惡意程式將目標企業內部網路中的電腦D和電腦E和電腦H變為殭屍電腦，電腦A和電腦B和電腦C中的惡意程式會連結至中繼站21、中繼站22以及中繼站23，而電腦D在執行正常連線時連接到有名網站；其中電腦C和電腦D會連到中繼站也會連到名網站；另外，有名網站30係與執行正常應用程式的電腦C、電腦D、電腦E、電腦F和電腦G連結，其中的電腦與中繼站有著交錯的連結關係，且殭屍網路係透過domain flux來迴避一般檢查，將需要透過本發明的方法，才能夠有效率的在Proxy Server的日誌資料中，透過變成殭屍電腦的電腦A、電腦B、電腦E和電腦H的紀錄反追蹤出疑為殭屍網路中繼站網域的方法。 FIG. 1 is a schematic diagram of a scenario for detecting a domain flux botnet through a proxy server log. The attacker can embed a malicious program into a victim host to make it a zombie computer (Bots), and the zombie computer can steal. Malicious behavior such as confidential or sensitive data, and today's attackers use domain flux technology to avoid checking for the survival rate of the botnet, but the behavior of malicious programs connected to specific websites will be recorded in the agent. In the server log data, the present invention can still be parsed out. In Figure 1, the attacker 1 uses a malicious program to turn the computer A and the computer B and the computer C in the target enterprise internal network into zombie computers, computer A and The malicious programs in computer B and computer C are connected to the relay station 11, the relay station 12, and the relay station 13, and the computer C connects to the famous website when performing normal connection; in addition, the attacker 2 uses the malicious program to target the internal network of the enterprise. Computer D and Computer E and Computer H become zombie computers, computer A and computer B and computer C in the evil The program connects to the relay station 21, the relay station 22, and the relay station 23, and the computer D connects to a famous website when performing normal connection; wherein the computer C and the computer D are connected to the relay station and connected to the website; in addition, the famous website 30 It is connected to the computer C, computer D, computer E, computer F and computer G that execute the normal application. The computer and the relay station have a staggered connection relationship, and the botnet avoids the general inspection through the domain flux. The method of the present invention can efficiently trace the suspected botnet network domain through the records of the computer A, the computer B, the computer E and the computer H which become zombies in the log data of the Proxy Server.

接著，圖2係為本發明通過代理日誌對外部網站分群之方法流程圖，首先，步驟S201係為以用戶代理維度值過濾有名網站，透過MapReduce的架構，網站分析模組獲取日誌中每個外部網站被不同用戶代理連結的次數清單，過濾清單中維度值過大的網站即可將有名的網站先濾掉，如圖3所示，其係為一個用戶代理連結的次數之統計清單的範例示意圖，在其中可以觀察到大多數的網站都只有被一組用戶代理連線過的紀錄(佔比92.77%)，而有名的網站例如www.google.com，其則是被22825個不同的用戶代理連線過；透過上述觀察可以得知，本發明透過檢查外部網站被不同的用戶代理連線紀錄，可以針對有名的網站先做初步篩選。以圖1範例討論的話，有名網站30會在S201的步驟先被慮除。 2 is a flow chart of a method for grouping external websites by proxy logs. First, step S201 is to filter a famous website by using a user agent dimension value. Through the MapReduce architecture, the website analysis module obtains each external part of the log. A list of the number of times the website is linked by different user agents. If the website with too large a dimension value is filtered, the famous website can be filtered out first. As shown in Figure 3, it is an example of a statistical list of the number of times the user agent is connected. It can be observed that most of the websites have only records that have been connected by a group of user agents (92.77%), while the famous websites such as www.google.com are connected by 22,825 different user agents. Through the above observations, it can be known that the present invention can be initially screened for a famous website by checking the external website for different user agent connection records. As discussed in the example of Figure 1, the well-known website 30 will be considered first in the step of S201.

再來，請繼續參照圖2，步驟S202則是透過聯集查找來獲取外部網站的關聯集合，主要係將代理伺服器日誌資料轉換為外部網站集合之形式來呈現；本發明係以網站分析模組將具有相同Client IP和用戶代理連線紀錄的外部網站分至同個集合，代表這些網站之間是有關連的，其可以透過MapReduce的Key-value架構進行實做；請參考圖4，其係將一簡單的日誌資料轉換成外部網站集合的範例示意圖，由圖4中可以觀察到，其係以Client IP和用戶代理作為鍵，而終端的URL(Dest Url)作為值進行聯集查找，最後將範例日誌資料透過聯集查找所產出的結果係為“CnC1,CnC2,CnC3”以及“CnC4,CnC5,CnC6”此兩個不相交的集合，而關於聯集查找如何實施的詳細方法，將在之後段落進行說明。 Then, please continue to refer to FIG. 2, step S202 is to obtain an association set of the external website through the association search, mainly to convert the proxy server log data into a form of an external website collection; the present invention is based on the website analysis mode. The group divides the external websites with the same Client IP and user agent connection records into the same collection, which means that the websites are related, and they can be transparent. MapReduce's Key-value architecture is implemented; please refer to Figure 4, which is a schematic diagram of a simple log data converted into an external website collection. It can be observed in Figure 4, which is based on Client IP and user agent. The key, and the URL of the terminal (Dest Url) is used as a value for the joint search. Finally, the results of the sample log data through the association search are "CnC1, CnC2, CnC3" and "CnC4, CnC5, CnC6". A disjoint set, and a detailed method of how the union lookup is implemented, will be explained in the following paragraphs.

接著，步驟S203係以集合回推網路連線群組，其中，網站分析模組透過互不相交的網址集合來對連線紀錄做分群，得出之同一群組的連線紀錄僅連線到同一集合的網域(domain)，同一群組之連線紀錄所表示的意義為其乃係同一目的之程式所產生出的網路行為；在本步驟中，網站分析模組會取得Client IP與外部網址的連線拓樸，並且取得Client IP在代理伺服器日誌中的連線資訊；如圖5中以集合回推網路連線群組的範例示意圖所示，本發明之網站分析模組對“CnC1,CnC2,CnC3”與CnC4,CnC5,CnC6”此兩不相交集合以雜湊(hash)方式獲取其Group-ID之後，再將終端URL的CnC1~CnC6依其所應屬的集合給予Group-ID，再搭配連線紀錄中所記錄下之CnC1~CnC6的連線資訊，透過MapReduce的Multiple Input以網址當作Key，即可對照出Group-ID對應的連線紀錄群組與群組之連線拓樸圖，如圖中下方所示。 Next, step S203 is to push back the network connection group, wherein the website analysis module groups the connection records through the mutually disjointed address sets, and the connection records of the same group are only connected. To the same set of domains, the connection records of the same group represent the meaning of the network behavior generated by the program of the same purpose; in this step, the website analysis module will obtain the Client IP. The connection with the external URL is topological, and the connection information of the Client IP in the proxy server log is obtained; as shown in the example diagram of the set back push network connection group in FIG. 5, the website analysis mode of the present invention After the two disjoint sets of "CnC1, CnC2, CnC3" and CnC4, CnC5, CnC6" are obtained in a hash manner, the CnC1~CnC6 of the terminal URL is given according to the set of the group to which it belongs. Group-ID, together with the connection information recorded in the connection record CnC1~CnC6, through the MapReduce Multiple Input as the URL, you can compare the connection record group and group corresponding to the Group-ID. The connection topology is shown at the bottom of the figure.

再來，步驟S204係為判斷行為符合domain flux特徵，是網站分析模組分別將各個連線的群組以其網路連線拓樸圖呈現，接著判斷群組是否符合domain flux的連線特徵；其中，domain flux的連線特徵為少數的Client IP連線至眾多的外部網址，如圖6所示，其係為domain flux的網路行為拓樸圖範例示意圖，在單一集合內，只有少數Client IP使用相同的用戶代理(User-agent)連線多個不同網域，圖中可見，其係由10.107.56.20的Client IP使用相同的用戶代理連線至開頭為sp-install、c-sp-storage、sp-download、sp-alive、sp-setting、sp-storage、Orbtr-install、spms-download、c-api.sec等等的網域，其符合了一種典型以少數Client IP使用相同的用戶代理(User-agent)連線多個不同網域的domain flux的連線特徵。 Then, step S204 is to determine that the behavior conforms to the domain flux feature, and the website analysis module respectively presents each connected group with its network connection topology, and then determines whether the group conforms to the domain flux connection feature. Among them, the domain flux connection feature is a small number of Client IP connections to a number of external URLs, as shown in Figure 6, which is a domain flux network behavior extension In the simple example, only a few Client IPs use the same User-agent to connect multiple different domains. As shown in the figure, the same user agent is used by the Client IP of 10.107.56.20. Connect to the domain starting with sp-install, c-sp-storage, sp-download, sp-alive, sp-setting, sp-storage, Orbtr-install, spms-download, c-api.sec, etc. It conforms to a typical connection feature of a domain flux that uses a single Client IP to connect multiple different domains with the same User-agent.

再來，步驟S205係為過濾正當行為網域，其中，網站分析模組取出符合domain flux連線特徵的集合後，將濾除其中代表與正常的網路應用程式(例如，防毒程式軟體)或CDN等等所連接之網域；如圖7所示，其係一防毒軟體透過domain flux方法進行連線行為的範例示意圖，但這種已知的正常網路應用程式行為，將不會被列為本發明所欲偵測的殭屍網路行為，會在此步驟被排除掉，剩餘的網域才會被判斷為domain flux的殭屍網路；例如在圖7之範例中，可透過將”iavs9x.u.avast.com"的正規表示式加入過濾清單，來過濾掉這種正常網路應用程式。 Then, step S205 is to filter the legitimate behavior domain, wherein the website analysis module takes out the collection that conforms to the domain flux connection feature, and then filters out the representative and normal network application (for example, anti-virus software) or The domain to which the CDN and so on are connected; as shown in Figure 7, it is an example of a connection behavior of an antivirus software through the domain flux method, but this known normal web application behavior will not be listed. The botnet behavior to be detected by the present invention will be excluded in this step, and the remaining domains will be judged as domain bots of the domain flux; for example, in the example of FIG. 7, the "iavs9x" can be accessed. The regular expression of .u.avast.com" is added to the filter list to filter out this normal web application.

接著，如圖8所示，其係舉出為一種代理伺服器日誌資料的範例示意圖，其中，每一行資料分別都代表一條log紀錄的所建立的時間戳(Timestamp)、客戶IP(Client IP)、終端URL(Dest Url)、終端埠(Dest Port)、用戶代理(User-agent)等資訊，更可以額外包含傳送量(Sent Byte)、接收量(Receive Byte)、方法(Method)、路徑(Path)等等資訊，而本發明主要僅使用到其中的客戶IP、終端URL與用戶代理資訊來偵測domain flux殭屍網路，藉由節點維度值的大小來判斷是否為有名的網站，維度值低的代表僅被少量用戶代理連線的外部網站，其有較高的機率為惡意中繼站。 Next, as shown in FIG. 8, it is an example of a proxy server log data, wherein each row of data represents a log record of the established timestamp (Timestamp), client IP (Client IP) Information such as the terminal URL (Dest Url), the terminal port (Dest Port), and the user agent (User-agent) may additionally include a Sent Byte, a Receive Byte, a Method, and a Path ( Path) and the like, and the present invention mainly uses the client IP, terminal URL and user agent information to detect the domain flux botnet, and judges whether the website is a famous website by the size of the node dimension value. Low representation is only external to a small number of user agents The website has a higher chance of being a malicious relay station.

承上，步驟S202中使用網站分群演算法將外部網站集合中有交集的集合合併，進而產生不相交的集合，其聯集查找方法的步驟如下：(一)以集合內的元素為單位，找出集合彼此間的交集，並將有交集的集合合併為一個集合(集合大小超過預設閥值的在此步驟中過濾)；(二)判斷哪些集合是不相交集合(跟其他集合沒有交集)並將它們獨立出來，剩下的集合則回到步驟(一)執行；(三)重複步驟(一)跟步驟(二)直到所有集合都被獨立出來。以圖4作為範例的話，以MapReduce架構，在Item 1~Item 8當中，以Client IP和用戶代理作為key，而終端URL作為值的話，首先可以群集出四個集合，分別為Item 1跟Item 2群集出CnC1,CnC2之值，Item 3與Item 4會群集出CnC2,CnC3之值，Item 5與Item 6會群集出CnC4,CnC5之值，Item 7與Item 8會群集出CnC4,CnC6之值，這四個集合可以作為再聯集的輸入集合，而聯集最後的結果為“CnC1,CnC2,CnC3”跟“CnC4,CnC5,CnC6”這兩個不相交的網址(URL)集合；接著，則依圖5範例所示，在本發明進行步驟S203後，取得兩集合的連線拓樸圖。 In step S202, the website grouping algorithm is used to merge the collections with the intersections in the external website collection, thereby generating disjoint sets. The steps of the association search method are as follows: (1) searching for the elements in the collection as a unit The intersections of the collections are merged, and the collections with intersections are merged into one collection (the collection size exceeds the preset threshold and filtered in this step); (2) Determine which collections are disjoint collections (no intersection with other collections) Separate them and the rest of the collections go back to step (1); (3) repeat steps (1) and (2) until all the collections are independent. Taking Figure 4 as an example, in the MapReduce architecture, in Item 1~Item 8, with Client IP and user agent as the key, and the terminal URL as the value, you can first cluster four sets, which are Item 1 and Item 2 respectively. Clustered out CnC1, CnC2 value, Item 3 and Item 4 will cluster CnC2, CnC3 value, Item 5 and Item 6 will cluster CnC4, CnC5 value, Item 7 and Item 8 will cluster CnC4, CnC6 value, These four sets can be used as input sets for re-association, and the final result of the union is "CnC1, CnC2, CnC3" and two sets of disjoint URLs (CnC4, CnC5, CnC6); then, According to the example of FIG. 5, after the present invention proceeds to step S203, a connection topology of the two sets is obtained.

而本發明係依據同個Client IP連線到許多不同的終端URL，所呈現的一對多關係符合domain flux連線特徵之連線拓樸以判斷殭屍網路，其舉例來說，可以根據在一個集合中的終端URL數值，除以客戶IP數值的結果值超過一定的預設閥值，作為一種判定依據，來判斷集合符合domain flux連線特徵的網域變動行為。 The present invention is based on the same Client IP connection to a number of different terminal URLs, and the one-to-many relationship presented conforms to the connection topology of the domain flux connection feature to determine the botnet. For example, it can be based on The value of the terminal URL in a set, divided by the result value of the client IP value, exceeds a certain preset threshold, and is used as a basis for judging the change behavior of the domain conforming to the domain flux connection feature.

最後，如圖9與圖10所示，其係為一個透過本發明之方法篩選出符合domain flux殭屍網路之實施例示意圖，經過本發明如圖2所示的流程後，取得之群組739663648符合domain flux殭屍網路集合之特徵，進一步，再將此群組上的網域(Domain)於Virus Total網站進行查詢，如圖10所示，可以觀察到確實透過本發明獲取出的整群網域都被登記為惡意的網址，故知本發明之通過代理伺服器日誌偵測domain flux殭屍網路之方法確實有效。 Finally, as shown in FIG. 9 and FIG. 10, it is a schematic diagram of an embodiment of screening a domain flux botnet according to the method of the present invention. After the process shown in FIG. 2 of the present invention, the obtained group 739663648 conforms to the characteristics of the domain flux botnet collection, and further, the domain on the group is queried on the Virus Total website, as shown in the figure. As shown in FIG. 10, it can be observed that the entire group of domains obtained by the present invention are registered as malicious websites, so that the method of detecting the domain flux botnet through the proxy server log of the present invention is effective.

上列詳細說明乃針對本發明之最佳實施例進行具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。 The detailed description of the preferred embodiments of the present invention is intended to be construed as the invention The patent scope of this case.

綜上所述，本發明於技術思想上實屬創新，也具備先前技術不及的多種功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出專利申請，懇請貴局核准本件發明專利申請案以勵發明，至感德便。 In summary, the present invention is innovative in terms of technical ideas, and also has various functions that are not in the prior art, and has fully complied with the statutory invention patent requirements of novelty and progressiveness, and has filed a patent application according to law, and invites you to approve the invention. The patent application was inspired to invent, and it was a matter of feeling.

Claims

A method for detecting a domain flux botnet through a proxy server log, wherein the website analysis module performs the following steps, including: accessing at least one log data stored by the external proxy server; The filtering algorithm filters the connection status of the user-agent information representative in each log data, and filters the log content belonging to the connection with the regular website; according to a website grouping algorithm, the remaining log content In the connection record, match the external client with the same client Internet Protocol address (Client IP) and user agent connection record, and perform grouping to obtain the associated collection between the external websites in the log content; The connection record of the Client IP in the content and the connection topology of the Client IP and the connected external website; the relationship between the external website and the associated collection is presented through the connection topology, and judged by a connection feature algorithm Whether the set conforms to the connection characteristics of the domain flux, wherein the connection feature of the domain flux refers to the use of the same user agent for the Client IP. Domain connection; and take out the collection that conforms to the domain flux connection feature and filter out the domain that is connected to the normal web application or content delivery network (CDN). The remaining domains are Determined as a domain flux botnet.

The method for detecting a domain flux botnet through a proxy server log according to claim 1, wherein the node filtering algorithm is performed by the website analysis module according to each of the long-term periods. The terminal uniform resource locator (URL) in the log data is used as a key, and the user agent information is used as a value. The mapReduce architecture is used to calculate and obtain each external website in the log data. A list of the number of connections of different user agents. The number of times that each external website is connected by a different user agent is equal to the dimension value (Degree) on the node. By the size of the dimension value, it can be determined whether the external website represented by the node is a famous website. Only external sites with dimension values less than the preset threshold will be retained.

The method for detecting a domain flux botnet through a proxy server log according to the first aspect of the patent application, wherein the website grouping algorithm is grouped by a cluster search method, that is, the website analyzes After the module establishes several sets of connection records of the remaining log contents, the following steps are performed: the individual connection record data in each set is taken as an element, and if the external website of the connection record connection has the same Client IP and user agent , judging as an intersection, and merging the set of elements with intersection; deleting the set of elements in the set greater than the preset threshold; determining the set of disjoint sets with the remaining sets as independent sets; Repeat the above three steps until all the sets are judged as independent sets.

The method for detecting a domain flux botnet through a proxy server log according to claim 1, wherein the connection feature algorithm is used to calculate the number of terminal URLs in a set and the client. The ratio of the number of IPs, if the ratio exceeds the preset threshold, represents the behavior of the domain that matches the domain flux connection characteristics of the aggregated website.