TWI636680B

TWI636680B - System and method for detecting suspicious domain names based on semi-passive domain name server

Info

Publication number: TWI636680B
Application number: TW105141332A
Authority: TW
Inventors: 鄭棕翰; 陳奕明; 趙健智; 朱奕叡; 邱裕婷
Original assignee: 中華電信股份有限公司
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2018-09-21
Also published as: TW201822505A

Abstract

本發明有關於一種偵測可疑域名之系統及方法，主要是一種透過一半被動網域名稱伺服器偵測模組對已知或未知的網域名稱同時進行主動及被動的偵測，以取得網域名稱紀錄，並透過一特徵擷取模組，以包含新網域特徵的網域特徵演算法來解析網域名稱伺服器紀錄以進行機器學習以產生分類規則，最後以一可疑域名判定模組根據分類規則判斷未知網域是否可疑。 The present invention relates to a system and method for detecting a suspicious domain name, mainly for actively and passively detecting a known or unknown domain name through a passive domain name server detection module. The domain name record, and through a feature extraction module, the domain name algorithm algorithm including the new domain feature is used to resolve the domain name server record for machine learning to generate classification rules, and finally a suspicious domain name determination module Determine whether the unknown domain is suspicious according to the classification rules.

Description

System and method for detecting suspicious domain names based on semi-passive domain name server

本發明有關於一種偵測可疑域名之系統及方法，尤其是一種同時透過主動及被動管道進行半被動網域名稱伺服器紀錄以判斷可疑域名之系統及方法。 The present invention relates to a system and method for detecting a suspicious domain name, and more particularly to a system and method for determining a suspicious domain name by performing a semi-passive domain name server record through both active and passive channels.

在透過網域名稱伺服器解析來偵測惡意網域的先前技術中，中華民國專利號I455546技術的方法是透過解析路由器資訊，包含路由器主機名稱與網路位址自治系統號碼等，再搭配路由器主機名稱的特定部分相同或網路封包傳送時間大於預設檢查值，來判斷網域是否為惡意網域，然而，此種方法使用網路封包傳送時間做為惡意程度評估的依據，僅適用於較有名的網域名稱和惡意網域名稱間的比較，對於名氣不高的網域名稱則容易產生誤判的現象。 In the prior art of detecting malicious domains through domain name server resolution, the method of the Republic of China Patent No. I455546 technology is to analyze the router information, including the router host name and the network address autonomous system number, etc., and then cooperate with the router. The specific part of the host name is the same or the network packet transmission time is greater than the preset check value to determine whether the domain is a malicious domain. However, this method uses the network packet transmission time as the basis for the malicious degree evaluation, and is only applicable to A comparison between a more famous domain name and a malicious domain name is prone to misjudgment for a domain name that is not well-known.

另外，美國專利號20120198549技術是一種以被動網域名稱伺服器來檢測惡意網域名稱的方法，其方法係透過與ISP業者以及特殊單位的合作，例如某些擁有Authority Domain Name Server(ADNS)的單位，以便取得樣本豐富度高的被動網域名稱伺服器紀錄，接著，再根據被動網域名稱伺服器紀錄當中的資源記錄來解析出良性網域名稱以及惡意網域名稱之間的差異；然而，該方法須仰賴ISP業者，且需與擁有ADNS的特殊單位進行合作，另外，該專利方法使用的網域特徵繁相當多，然而其實際上僅是從時間戳、網域名稱、資源記錄型態與回饋IP等四種屬性延伸而來，因此其對於CDN網域名稱與fast flux網域名稱仍容易產生誤判。 In addition, U.S. Patent No. 20120198549 is a passive domain name server for detecting malicious domain names by cooperating with ISPs and special units, such as some having an Authority Domain Name Server (ADNS). Unit, in order to obtain a record of the passive domain name server with high sample richness, and then parse the benign domain name and malicious network according to the resource records in the passive domain name server record. The difference between domain names; however, the method relies on the ISP and needs to cooperate with a special unit that owns ADNS. In addition, the patented method uses a lot of domain features, but it is actually only a timestamp. Four attributes, such as domain name, resource record type and feedback IP, are extended, so it is easy to misjudge CDN domain name and fast flux domain name.

有鑑於上述先前技術的缺失，透過網域名稱伺服器解析來偵測惡意網域的技術，仍有待限制更少且準確率更佳的改進。 In view of the above-mentioned lack of prior art, the technique of detecting a malicious domain through domain name server resolution still has to be limited and the accuracy is better.

本發明的根據半被動網域名稱伺服器以偵測可疑域名之系統，其至少包含下列三個模組。 The system of the present invention for detecting a suspicious domain name according to a semi-passive domain name server includes at least the following three modules.

本發明包含一半被動網域名稱伺服器偵測模組，該半被動網域名稱伺服器偵測模組係可接收被輸入的一網域清單，該網域清單係為包含有White、Botnet或APT網域的多網域清單，其中，該半被動網域名稱伺服器偵測模組內更包含一主動感應子模組，該主動感應子模組係可以將輸入的該網域清單經自動觸發網域名稱伺服器機制來查詢，以找出其中感興趣的網域名稱，另外，該半被動網域名稱伺服器偵測模組更包含有一被動感應子模組，該被動感應子模組係用以監看該網域清單中的網域名稱伺服器快取封包，以解析出由客戶端所發送的網域名稱伺服器封包；經過該主動感應子模組以及該被動感應子模組處理該網域清單後，該半被動網域名稱伺服器偵測模組可以根據結果產生一半被動網域名稱伺服器紀錄，本發明之半被動網域名稱伺服器偵測模組即是透過上述包含被動及主動的兩種感應子模組來自該網域清單中取得所需的紀錄資料。 The invention comprises a half passive domain name server detection module, and the semi-passive domain name server detection module can receive a list of input domains, the domain list is including White, Botnet or A multi-domain list of the APT domain, wherein the semi-passive domain name server detection module further includes an active sensing sub-module, and the active sensing sub-module can automatically input the input domain list Triggering the domain name server mechanism to query to find the domain name of interest. In addition, the semi-passive domain name server detection module further includes a passive sensing sub-module, and the passive sensing sub-module The system is configured to monitor the domain name server cache packet in the domain list to parse the domain name server packet sent by the client; the active sensing submodule and the passive sensing submodule After processing the domain list, the semi-passive domain name server detection module can generate half of the passive domain name server record according to the result, and the semi-passive domain name server detection module of the present invention is transmitted through Induction module contains two kinds of passive and active domains from the list Obtain the required record information.

本發明進一步包含一特徵擷取模組，該特徵擷取模組係一種用以解析網域名稱伺服器紀錄以進行反覆機器學習的模組，其中，該特徵擷取模組更包含一接收子模組，用以接收該半被動網域名稱伺服器偵測模組所生成之該半被動網域名稱伺服器紀錄，另外，該特徵擷取模組更包含一特徵向量子模組，該特徵向量模組係用以將該半被動網域名稱伺服器紀錄透過計算網域整體可疑程度的一網域特徵演算法運算，以產生一特徵向量，其中，該特徵擷取模組更包含一特徵訓練子模組，透過該特徵向量子模組產生的該特徵向量進行機器訓練，進而產生或精進一網域分類規則。 The present invention further includes a feature capture module, wherein the feature capture module is a module for parsing a domain name server record for performing repeated machine learning, wherein the feature capture module further includes a receiver The module is configured to receive the semi-passive domain name server record generated by the semi-passive domain name server detection module, and further, the feature extraction module further includes a feature vector sub-module, the feature The vector module is configured to calculate the semi-passive domain name server by computing a domain feature algorithm for calculating the overall suspicious degree of the domain to generate a feature vector, wherein the feature capture module further includes a feature The training sub-module performs machine training through the feature vector generated by the feature vector sub-module, thereby generating or refining a domain classification rule.

最後，本發明更包含一可疑域名判定模組，該可疑域名判定模組係透過該特徵擷取模組的機器學習成果來判定網域名稱是否可疑的一模組，其中，該可疑域名判定模組更包含一向量接收子模組，用以接收該特徵擷取模組產生的該特徵向量，而該可疑域名判定模組包含的一向量分類子模組，係用以透過該特徵擷取模組中的該特徵訓練子模組所產生的該網域分類規則，來對該網域清單中的該半被動網域名稱伺服器紀錄各自的該特徵向量進行判斷，以判斷網域名稱是否可疑。 Finally, the present invention further includes a suspicious domain name determining module, wherein the suspicious domain name determining module determines, by the machine learning result of the feature capturing module, a module of whether the domain name is suspicious, wherein the suspicious domain name determining module The group further includes a vector receiving sub-module for receiving the feature vector generated by the feature capturing module, and the suspect domain name determining module includes a vector classifying sub-module for transmitting the feature through the feature The domain classification rule generated by the feature training sub-module in the group is used to determine the feature vector of the semi-passive domain name server record in the domain list to determine whether the domain name is suspicious .

簡而言之，本發明係透過該半被動網域名稱伺服器偵測模組主被動地過濾該網域清單以產生該半被動網域名稱伺服器紀錄，而本發明之該特徵擷取模組係一用以接收該半被動網域名稱伺服器紀錄並計算出特徵向量，並反覆運算訓練以根據特徵向量歸納出網域分類規則並精進之，而本發明的該可疑域名判定模組則是根據該特徵擷取模組精進出的網域分類規則，來繼續對後續取得的該半被動網域名稱伺服器紀錄中資料的特徵向量進行判斷，以判斷網域名稱的潛在風險程度。 Briefly, the present invention actively and passively filters the domain list through the semi-passive domain name server detection module to generate the semi-passive domain name server record, and the feature capture mode of the present invention The group is configured to receive the semi-passive domain name server record and calculate the feature vector, and repeatedly perform the training to summarize the domain classification rule according to the feature vector and improve, and the suspicious domain name determining module of the present invention According to the feature extraction module The domain classification rule further determines the feature vector of the data in the semi-passive domain name server record obtained subsequently to determine the potential risk degree of the domain name.

而上述的該網域特徵演算法，係一種可以針對該半被動網域名稱伺服器紀錄中的各個樣本各自產生複數種特徵向量的演算法，其中，該網域特徵演算法包含有數種先前技術中用以計算特徵向量的部分，然而，不同地，本發明該網域特徵演算法更包含有下列四種特殊的演算法部份，可以更佳的判斷可疑網域，分別為： The above-mentioned domain feature algorithm is an algorithm for generating a plurality of feature vectors for each sample in the semi-passive domain name server record, wherein the domain feature algorithm includes several prior techniques. The portion used to calculate the feature vector, however, differently, the domain feature algorithm of the present invention further includes the following four special algorithm parts, which can better determine the suspicious domain, respectively:

1.針對網域名稱伺服器的IP冒用(DNS Server IP Abuse)以進行網域偵測之演算法，其係用於偵測可見狀態正常實為冒用網域名稱伺服器的IP位址的疑似惡意網域，此種IP冒用目前已知為APT域名在隱匿階段時可能採用的方法之一，其主要實施方式為藉由將IP位址轉向到其他網域名稱伺服器上，以讓自動偵測NX type網域名稱的系統產生誤判，以提高逆向追查之困難度。 1. Algorithm for domain detection for DNS server IP Abuse, which is used to detect the visible state of the IP address of the domain name server. Suspected malicious domain, such IP is currently used as one of the methods that may be used when the APT domain name is in the hidden phase. The main implementation method is to redirect the IP address to other domain name servers. Let the system that automatically detects the NX type domain name generate a false positive to improve the difficulty of reverse tracing.

2.針對高TTL值(Very Large TTL)以進行網域偵測之演算法，因為TTL值較高的域名有較高可能性為良性網域，其係由於經過統計資料，可以發現僅有良性域名的TTL有較高的機會高於一個閥值(22000秒)，而APT以及Botnet的域名則是很少或是幾乎完全不會有這種較高的TTL值。 2. For high-TTL (Very Large TTL) algorithm for domain detection, because the domain name with higher TTL value has a higher probability of being a benign domain, which can be found to be only benign due to statistical data. The TTL of a domain name has a higher chance than a threshold (22,000 seconds), while APT and Botnet's domain name have little or no such high TTL value.

3.針對高執行時間(Meta Attribute Process Time)以進行網域偵測之演算法，特徵擷取執行時間較長的域名有較高可能性為良性網域，其中，先前技術大多僅針對域名本身相關特徵研發，而並未考慮良性域名、殭屍網路域名以及APT域名三者本身因底層結構差異，當會間接影響三者在分析流程中的成本高低，而本發明的實驗數據可證實三者在特徵擷取所耗費的時間當中，良性域名所耗費平均時間較長，其次則是殭屍網路域名，最後則是APT域名。 3. For the algorithm of domain detection for the high Attribute Process Time, the domain name with a longer execution time has a higher probability of being a benign domain. The prior art mostly only targets the domain name itself. Related features developed without considering benign domain names, zombies The network domain name and the APT domain name themselves are indirectly affecting the cost of the three in the analysis process due to the difference in the underlying structure, and the experimental data of the present invention can confirm the time spent by the three in the feature extraction, the benign domain name The average time spent is longer, followed by the botnet domain name, and finally the APT domain name.

4.針對高結果回送數量(Meta Attribute Compression)以進行網域偵測之演算法，正解結果在同一時間點上所回送的IP位址數量大者有較高可能性為良性網域，其中，通常在良性域名的網域名稱正解結果當中，皆會在同一時間點上(時間戳取至小數點第六位之狀況)返回數個IP位址給網域名稱Client端，而殭屍網路域名以及APT域名則否，其原因在於通常良性域名一般採用內容傳遞網路(CDN)架構以確保正常提供服務，因此，藉由同一時間點所返回的IP位址多寡可以做為良性域名與惡意域名的推斷依據。 4. For the algorithm of the domain detection, the number of IP addresses returned at the same time point is higher, and the higher the probability is the benign domain. Usually, in the positive solution of the domain name of the benign domain name, all the IP addresses will be returned to the domain name Client at the same time point (the timestamp is taken to the sixth decimal place), and the botnet domain name And the APT domain name is not because the normal benign domain name generally uses the content delivery network (CDN) architecture to ensure the normal provision of services. Therefore, the number of IP addresses returned by the same point in time can be used as a benign domain name and a malicious domain name. Inferred basis.

對應本發明的根據半被動網域名稱伺服器以偵測可疑域名之系統，本發明的根據半被動網域名稱伺服器以偵測可疑域名之方法步驟如下：1.透過一半被動網域名稱伺服器偵測模組輸入一網域清單；2.該半被動網域名稱伺服器偵測模組包含的一主動感應子模組以自動觸發網域名稱伺服器的機制來查詢該網域清單中的感興趣的網域名稱；3.該半被動網域名稱伺服器偵測模組包含的一被動感應子模組監看該網域清單中的網域名稱伺服器快取封包，並解析出由客戶端所發送的網域名稱伺服器封包； 4.該半被動網域名稱伺服器偵測模組將該網域清單經該主動感應子模組以及該被動感應子模組處理之結果產生一半被動網域名稱伺服器紀錄；5.透過一特徵擷取模組包含的一接收子模組接收該半被動網域名稱伺服器紀錄；6.透過該特徵擷取模組包含的一特徵向量子模組，將該半被動網域名稱伺服器紀錄經計算網域整體可疑程度的一網域特徵演算法運算，以產生一特徵向量；7.透過該特徵擷取模組包含的一特徵訓練子模組，以該特徵向量子模組產生的該特徵向量進行機器訓練，來產生或精進一網域分類規則；8.透過一可疑域名判定模組包含的一向量接收子模組接收該特徵向量；以及9.透過該可疑域名判定模組包含的一向量分類子模組，將該半被動網域名稱伺服器紀錄的該特徵向量經該特徵訓練子模組產生的該網域分類規則判斷，以判斷網域名稱是否可疑。 Corresponding to the system for detecting a suspicious domain name according to the semi-passive domain name server of the present invention, the method for detecting a suspicious domain name according to the semi-passive domain name server of the present invention is as follows: 1. Passing half of the passive domain name servo The device detection module inputs a domain list; 2. The semi-passive domain name server detection module includes an active sensing sub-module that automatically triggers a domain name server mechanism to query the domain list. The domain name of interest; 3. The passive sensor module included in the semi-passive domain name server detection module monitors the domain name server cache packet in the domain list, and parses out The domain name server packet sent by the client; 4. The semi-passive domain name server detection module generates a semi-passive domain name server record by processing the domain list through the active sensing sub-module and the passive sensing sub-module; 5. The receiving module includes a receiving sub-module that receives the semi-passive domain name server record; 6. a feature vector sub-module included in the feature capturing module, the semi-passive domain name server Recording a domain feature algorithm operation for calculating the overall suspicious degree of the domain to generate a feature vector; 7. using a feature training sub-module included in the feature capture module, generated by the feature vector sub-module The feature vector is machine trained to generate or refine a domain classification rule; 8. receiving a feature vector through a vector receiving submodule included in a suspicious domain name determining module; and 9. the suspicious domain name determining module includes a vector classification sub-module, wherein the feature vector recorded by the semi-passive domain name server is judged by the domain classification rule generated by the feature training sub-module to determine whether the domain name is suspicious

綜上可知，本發明的根據半被動網域名稱伺服器以偵測可疑域名之系統及方法，係一種透過分析新型態的網域特徵，並可以進行機器訓練的有效可疑域名偵測方法。 In summary, the system and method for detecting a suspicious domain name according to the semi-passive domain name server of the present invention is an effective suspicious domain name detecting method capable of performing machine training by analyzing the domain characteristics of the novel state.

A‧‧‧電腦 A‧‧‧ computer

B‧‧‧電腦 B‧‧‧ computer

C‧‧‧電腦 C‧‧‧ computer

D‧‧‧電腦 D‧‧‧ computer

E‧‧‧電腦 E‧‧‧ computer

F‧‧‧電腦 F‧‧‧ computer

G‧‧‧電腦 G‧‧‧ computer

H‧‧‧電腦 H‧‧‧ computer

S‧‧‧全球DNS伺服器 S‧‧‧Global DNS Server

1‧‧‧攻擊者 1‧‧‧ Attacker

11‧‧‧中繼站 11‧‧‧ Relay Station

12‧‧‧中繼站 12‧‧‧ Relay Station

13‧‧‧中繼站 13‧‧‧Relay station

2‧‧‧攻擊者 2‧‧‧ Attackers

21‧‧‧中繼站 21‧‧‧ Relay Station

22‧‧‧中繼站 22‧‧‧Relay station

23‧‧‧中繼站 23‧‧‧ Relay Station

30‧‧‧正常伺服器 30‧‧‧Normal server

4‧‧‧DNS伺服器 4‧‧‧DNS server

40‧‧‧企業內部網路 40‧‧‧Intranet

5‧‧‧半被動網域名稱伺服器偵測模組 5‧‧‧Semi-passive domain name server detection module

50‧‧‧主動感應子模組 50‧‧‧Active Sensor Module

501‧‧‧已知的網域清單 501‧‧‧A list of known domains

502‧‧‧未知的網域清單 502‧‧‧Unknown domain list

51‧‧‧被動感應子模組 51‧‧‧ Passive sensing submodule

510‧‧‧DNS快取 510‧‧‧DNS cache

6‧‧‧特徵擷取模組 6‧‧‧Feature capture module

60‧‧‧特徵向量子模組 60‧‧‧Feature Vector Sub-module

61‧‧‧特徵訓練子模組 61‧‧‧Characteristic training sub-module

610‧‧‧網域分類規則 610‧‧‧ Domain Classification Rules

7‧‧‧可疑域名判定模組 7‧‧‧suspicious domain name determination module

70‧‧‧向量接收子模組 70‧‧‧Vector Receiver Module

71‧‧‧向量分類子模組 71‧‧‧Vector Classification Submodule

710‧‧‧未知的網域可疑程度 710‧‧‧Unknown domain suspiciousness

S201~S210‧‧‧步驟流程 S201~S210‧‧‧Step procedure

圖1為本發明根據半被動網域名稱伺服器以偵測可疑域名之系統的應用情境示意圖。 FIG. 1 is a schematic diagram of an application scenario of a system for detecting a suspicious domain name according to a semi-passive domain name server according to the present invention.

圖2為本發明根據半被動網域名稱伺服器以偵測可疑域名之方法之步驟流程圖。 2 is a semi-passive domain name server for detecting a suspicious domain according to the present invention Flow chart of the steps of the method.

圖3為本發明的半被動網域名稱伺服器偵測可疑域名之系統的資料蒐集架構第一示意圖。 FIG. 3 is a first schematic diagram of a data collection architecture of a system for detecting a suspicious domain name by a semi-passive domain name server according to the present invention.

圖4為本發明的半被動網域名稱伺服器偵測可疑域名之系統的資料蒐集架構第二示意圖。 4 is a second schematic diagram of a data collection architecture of a system for detecting a suspicious domain name by a semi-passive domain name server according to the present invention.

圖5為本發明的半被動網域名稱伺服器偵測可疑域名之系統的系統架構示意圖。 FIG. 5 is a schematic diagram of a system architecture of a system for detecting a suspicious domain name by a semi-passive domain name server according to the present invention.

圖6為本發明的半被動網域名稱伺服器偵測可疑域名的紀錄範例示意圖。 6 is a schematic diagram showing an example of a record of a semi-passive domain name server detecting a suspicious domain name according to the present invention.

圖7為本發明的半被動網域名稱伺服器偵測可疑域名的風險評估範例第一示意圖。 FIG. 7 is a first schematic diagram of an example of risk assessment for detecting a suspicious domain name by a semi-passive domain name server according to the present invention.

圖8為本發明的半被動網域名稱伺服器偵測可疑域名的風險評估範例第二示意圖。 FIG. 8 is a second schematic diagram of an example of risk assessment of a semi-passive domain name server for detecting a suspicious domain name according to the present invention.

圖9為本發明的半被動網域名稱伺服器偵測可疑域名的風險評估範例第三示意圖。 FIG. 9 is a third schematic diagram of an example of risk assessment of a semi-passive domain name server for detecting a suspicious domain name according to the present invention.

圖10為本發明的半被動網域名稱伺服器偵測可疑域名系統的範例實施結果第一示意圖。 FIG. 10 is a first schematic diagram showing an exemplary implementation result of a semi-passive domain name server detecting a suspicious domain name system according to the present invention.

圖11為本發明的半被動網域名稱伺服器偵測可疑域名系統的範例實施結果第二示意圖。 FIG. 11 is a second schematic diagram showing an exemplary implementation result of the semi-passive domain name server detecting a suspicious domain name system according to the present invention.

圖12為本發明的網域特徵演算法中的特徵排名範例示意圖。 FIG. 12 is a schematic diagram showing an example of feature ranking in a domain feature algorithm of the present invention.

以下將以實施例結合圖式對本發明進行進一步說明，首先請參照圖1，其係為本發明根據半被動網域名稱伺服器以偵測可疑域名之系統的應用情境示意圖，其中，攻擊者建構Command and Control(C&C)後，不論是藉由控制DNS伺服器4或者將惡意程式植入被害主機後使其成為殭屍電腦(Bots)後，將可進一步把使用者主機的連線導入C&C，或是使用者誤觸C&C下，攻擊者即可以藉此進行惡意行為，本發明則係一種分析某些網域是否具惡意的系統，圖1中，攻擊者1可利用惡意程式將目標企業內部網路中的電腦A和電腦B和電腦C變為殭屍電腦，電腦A和電腦B和電腦C中的惡意程式會連結至中繼站11、中繼站12以及中繼站13，而電腦C亦連接到正常伺服器30；另外，攻擊者2可利用惡意程式將目標企業內部網路中的電腦D和電腦E和電腦H變為殭屍電腦，而電腦D亦連接到正常伺服器30；其中電腦C和電腦D會連到中繼站也會連到正常網站；另外，正常伺服器30係與執行正常應用程式的電腦C、電腦D、電腦F和電腦G連結，其中的電腦與中繼站屬於各自的網域，將需要透過本發明的系統及方法，將企業內部網路40中的電腦可能連線的已知或未知的可疑網域清單輸入，以進行持續的機器學習來建構出分類方法，最後找出可能的惡意網域。 The present invention will be further described in the following with reference to the embodiments. First, please refer to FIG. 1 , which is a schematic diagram of an application scenario of a system for detecting a suspicious domain name according to a semi-passive domain name server according to the present invention. After constructing Command and Control (C&C), you can further import the user host's connection into C&C by controlling the DNS server 4 or by embedding the malicious program into the victim host and making it a zombie (Bots). Or, if the user accidentally touches C&C, the attacker can use this to conduct malicious behavior. The present invention is a system for analyzing whether some domains are malicious. In FIG. 1, the attacker 1 can use the malicious program to target the enterprise. Computer A and computer B and computer C in the internal network become zombie computers, and malicious programs in computer A and computer B and computer C are connected to relay station 11, relay station 12, and relay station 13, and computer C is also connected to the normal servo. In addition, the attacker 2 can use a malicious program to turn the computer D and the computer E and the computer H in the target enterprise internal network into a zombie computer, and the computer D is also connected to the normal server 30; wherein the computer C and the computer D It will connect to the relay station and connect to the normal website. In addition, the normal server 30 is connected to the computer C, computer D, computer F and computer G which execute the normal application, and the computer and the relay station belong to their respective domains. Through the system and method of the present invention, a list of known or unknown suspicious domains that may be connected to a computer in the intranet 40 of the enterprise is input for continuous machine learning to construct a classification method, and finally find out possible Malicious domain.

圖2為本發明根據半被動網域名稱伺服器以偵測可疑域名之方法之步驟流程圖，其中，步驟S201為收集已知屬於White或Botnet或APT的網域清單作為訓練樣本，步驟S202為將收集的樣本輸入半被動網域名稱伺服器偵測模組產生半被動網域名稱伺服器紀錄(日誌紀錄)，步驟S203是透過特徵擷取模組產生已知網域的特徵向量，特徵擷取模組在收到訓練樣本的半被動網域名稱紀錄之後，會透過網域特徵演算法的二十七種網域特徵擷取樣本網域的特徵向量，接著，步驟S204是以訓練樣本的特徵向量進行機器學習，即可產生步驟S205的包含White網域、Botnet網域以及APT網域的網域分類規則，此網域分類規則將透過可疑域名判定模組來使用。 2 is a flow chart of a method for detecting a suspicious domain name according to a semi-passive domain name server according to the present invention. Step S201 is to collect a domain list that is known to belong to White or Botnet or APT as a training sample, and step S202 is The collected sample is input into the semi-passive domain name server detection module to generate a semi-passive domain name server record (log record), and in step S203, the feature vector of the known domain is generated through the feature extraction module, and the feature is After receiving the semi-passive domain name record of the training sample, the module will sample the feature vector of the domain through the twenty-seven domain features of the domain feature algorithm, and then step S204 is to train the sample. Feature vector for machine learning Step S205 includes a domain classification rule of a White domain, a Botnet domain, and an APT domain. The domain classification rule is used by the suspect domain name determination module.

而步驟S206為輸入感興趣的未知網域資料，步驟S207為透過半被動網域名稱伺服器偵測模組產生未知網域的半被動網域名稱伺服器紀錄(日誌紀錄)，再透過步驟S208將未知網域的半被動網域名稱伺服器紀錄輸入特徵擷取模組，以產生對應的特徵向量，接著步驟S209是將未知網域的特徵向量送入具有網域分類規則的可疑域名判定模組之中，以分別計算出該域名之於良性域名、Botnet域名以及APT域名的三種相似程度之向量，最後步驟S210便可以根據向量的數值來瞭解未知網域之可疑程度。 Step S206 is to input the unknown domain data of interest, and step S207 is to generate a semi-passive domain name server record (log record) of the unknown domain through the semi-passive domain name server detection module, and then go through step S208. The semi-passive domain name server record of the unknown domain is input into the feature extraction module to generate a corresponding feature vector, and then step S209 is to send the feature vector of the unknown domain to the suspicious domain name determination mode with the domain classification rule. In the group, the three similarity vectors of the domain name to the benign domain name, the Botnet domain name and the APT domain name are respectively calculated, and finally the step S210 can learn the suspicious degree of the unknown domain domain according to the value of the vector.

圖3為本發明的半被動網域名稱伺服器偵測可疑域名之系統的資料蒐集架構第一示意圖，其中，本發明提出的半被動網域名稱伺服器(SPDNS)偵測與被動網域名稱伺服器(PDNS)偵測的架構極為相似，最大差異處在於本發明的半被動網域名稱伺服器偵測具有主動感應子模組50來主動偵測，以及被動感應子模組51監聽的方式，與以往的被動網域名稱伺服器偵測不同，主動感應子模組50的用途為自動觸發DNS機制以及查詢(query)感興趣的網域名稱，過程中產生的封包即為感興趣的網域資訊；而架構中的被動感應子模組51有別於傳統的被動網域名稱伺服器感應器，其本身不僅能監聽DNS快取510的封包，也能將DNS客戶端所送出的DNS封包進行解析，接著，只要設定全球DNS伺服器S的IP位置，即完成整個半被動網域偵測架構。 3 is a first schematic diagram of a data collection architecture of a system for detecting a suspicious domain name by a semi-passive domain name server according to the present invention, wherein the semi-passive domain name server (SPDNS) detection and passive domain name proposed by the present invention are provided. The architecture of the server (PDNS) detection is very similar. The biggest difference is that the semi-passive domain name server of the present invention detects the active sensing sub-module 50 for active detection and the passive sensing sub-module 51 for monitoring. Different from the previous passive domain name server detection, the purpose of the active sensing sub-module 50 is to automatically trigger the DNS mechanism and query the domain name of interest, and the packet generated in the process is the network of interest. Domain information; the passive sensing sub-module 51 in the architecture is different from the traditional passive domain name server sensor, which not only can listen to the packet of the DNS cache 510, but also can send the DNS packet sent by the DNS client. After the analysis, the entire semi-passive domain detection architecture is completed by setting the IP address of the global DNS server S.

藉由上述主動感應子模組50以及被動感應子模組51的介紹可以得知，本發明的資料蒐集架構可以同時從DNS快取510與DNS伺服器S的互動(被動式管道)以及主動感應子模組50與DNS伺服器S的互動(主動式管道)取得域名的資源記錄(Resource Record,RR)，此種配置可以降低DNS日誌取得門檻與提升日誌內的網域資訊多樣性，半被動網域偵測會在一段時間內利用半被動網域偵測機制不斷向DNS伺服器查詢該域名，藉此在被動感應子模組51當中留下域名本身所對應之資源記錄(RR)資訊產生半被動網域名稱伺服器紀錄(Log紀錄)，之後再將半被動網域名稱伺服器紀錄送去進行特徵萃取。 By the above active sensing submodule 50 and passive sensing submodule As can be seen from the introduction of group 51, the data collection architecture of the present invention can simultaneously interact with the DNS server S from the DNS cache 510 (passive pipeline) and the interaction between the active sensor module 50 and the DNS server S (active pipeline) Obtain the resource record (RR) of the domain name. This configuration can reduce the DNS log capture threshold and improve the diversity of the domain information in the log. Semi-passive domain detection will utilize semi-passive domain detection for a period of time. The measurement mechanism continuously queries the DNS server for the domain name, thereby leaving the resource record (RR) information corresponding to the domain name itself in the passive sensing sub-module 51 to generate a semi-passive domain name server record (Log record), and then The semi-passive domain name server record is sent for feature extraction.

圖4為本發明的半被動網域名稱伺服器偵測可疑域名之系統的資料蒐集架構第二示意圖，其中，係將未知的網域清單502輸入半被動網域名稱伺服器偵測模組5，與其連結的全球DNS伺服器S互動，以輸出對應未知網域清單的半被動網域名稱伺服器紀錄511，方能進行後續特徵向量的計算與進行分類。 4 is a second schematic diagram of a data collection architecture of a system for detecting a suspicious domain name by a semi-passive domain name server according to the present invention, wherein an unknown domain list 502 is input into a semi-passive domain name server detection module 5 The global DNS server S connected to it is connected to output a semi-passive domain name server record 511 corresponding to the unknown domain list, so that the subsequent feature vector can be calculated and classified.

本發明的半被動網域名稱伺服器偵測可疑域名之系統的系統架構示意圖如圖5所示，整體半被動網域名稱伺服器偵測系統可分為兩部分，由半被動網域名稱伺服器偵測模組5與特徵擷取模組6以及可疑域名判定模組7構成，其中，半被動網域名稱伺服器偵測模組5包含有主動管道的主動感應模組50以及被動感應模組51，主要係由被動感應模組51在主動感應模組50與DNS伺服器互動的過程中以及DNS快取510與DNS伺服器的過程中獲取出半被動網域名稱伺服器紀錄，而特徵擷取模組6中的特徵向量子模組60接收半被動網域名稱伺服器紀錄來透過網域特徵演算法進行網域的特徵向量運算，再透過特徵訓練子模組61來進行機器訓練，以產生或精進網域分類規則610。 The system architecture diagram of the system for detecting a suspicious domain name by the semi-passive domain name server of the present invention is shown in FIG. 5. The overall semi-passive domain name server detection system can be divided into two parts, and the semi-passive domain name servo is used. The detection module 5 is composed of a feature capture module 6 and a suspect domain name determination module 7. The semi-passive domain name server detection module 5 includes an active induction module 50 and a passive induction module. The group 51 is mainly obtained by the passive sensing module 51 in the process of interacting with the DNS server by the active sensing module 50 and the process of the DNS cache 510 and the DNS server, and the semi-passive domain name server record is acquired. The feature vector sub-module 60 in the capture module 6 receives the semi-passive domain name server record to perform the domain through the domain feature algorithm. The feature vector operation is performed by the feature training sub-module 61 to perform machine training to generate or refine the domain classification rule 610.

當將已知的網域清單501輸入半被動網域名稱伺服器偵測模組5後，取得對應已知網域的半被動網域名稱伺服器紀錄後輸入特徵擷取模組6，即可訓練以精進網域分類規則610。 After the known domain list 501 is input into the semi-passive domain name server detection module 5, the semi-passive domain name server corresponding to the known domain is recorded and the feature extraction module 6 is input. Training is done with the refined domain classification rule 610.

若將未知的網域清單502輸入半被動網域名稱伺服器偵測模組5後，則取得對應未知網域的半被動網域名稱伺服器紀錄後輸入特徵擷取模組6，即可透過特徵向量子模組60以網域特徵演算法進行網域的特徵向量運算，並將運算出的各網域向量資料輸入可疑域名判定模組7，可疑域名判定模組7中的向量接收子模組70接收向量資料，並透過向量分類子模組71以網域分類規則610進行計算分類，以產生出未知網路的可疑程度710之資訊，而未知網域的特徵向量亦被用以訓練以精進網域分類規則610。 If the unknown domain list 502 is input to the semi-passive domain name server detection module 5, the semi-passive domain name server record corresponding to the unknown domain is obtained, and the feature extraction module 6 is input. The feature vector sub-module 60 performs the feature vector operation of the domain by the domain feature algorithm, and inputs the calculated domain vector data into the suspicious domain name determining module 7, and the vector receiving submodule in the suspicious domain name determining module 7. The group 70 receives the vector data and performs classification calculation by the vector classification sub-module 71 by the domain classification rule 610 to generate the information of the suspicious degree 710 of the unknown network, and the feature vector of the unknown domain is also used for training. Refined Domain Classification Rule 610.

圖6為本發明的半被動網域名稱伺服器偵測可疑域名的一個紀錄範例示意圖，其係蒐集用以訓練的已知的網域清單501輸入半被動網域名稱伺服器偵測模組5後的輸出結果，其中已知的網域清單501包含有許多已知屬於White或Botnet或APT的網域組成之清單，本實施例的樣本挑選如下： FIG. 6 is a schematic diagram of a record of detecting a suspicious domain name by a semi-passive domain name server according to the present invention, which is a collection of a known domain list 501 for training, and a semi-passive domain name server detection module 5; The resulting output, in which the known domain list 501 contains a list of a number of domains known to belong to White or Botnet or APT, the samples of this embodiment are selected as follows:

1.White網域名稱：本發明的實施例範例選自Alexa的top 11,000域名，以及在Alexa的top 90000~100000、100000~300000與300000~500000三個範圍內，個別隨機挑選1000個域名，共14000個訓練樣本。 1.White domain name: An example of the embodiment of the present invention is selected from Alexa's top 11,000 domain name, and in the range of Alexa's top 90000~100000, 100000~30000 and 300000~500000, a plurality of 1000 domain names are randomly selected. 14,000 training samples.

2.Botnet網域名稱：考量到惡意樣本的多樣性與即時性，本發明的實施例自網路公開黑名單DNSBH在2016一月一日至2016三月十四日區間所公佈的惡意域名當中，抽取約10000個域名作為惡意樣本。 2. Botnet domain name: Consider the diversity and immediacy of malicious samples, Embodiments of the present invention extract about 10,000 domain names as malicious samples from the malicious domain names published by the Internet blacklist DNSBH from January 1, 2016 to March 14, 2016.

3.APT網域名稱：本發明的實施例資安公司卡巴斯基於網路上公開的APT事件清單上從中選取340個仍然在活動中的域名作為APT的訓練樣本。 3. APT Domain Name: Embodiment of the Invention The security company Kasper selects 340 still active domain names from the list of APT events published on the Internet as a training sample for the APT.

承上，將已知的網域清單501經由半被動網域名稱伺服器偵測模組5蒐集各個域名的半被動網域名稱伺服器紀錄，其中，半被動網域名稱伺服器偵測模組5中的主動感應子模組50會控制DNS快取510自動觸發DNS機制以及查詢輸入的域名，以取得感興趣的網域資訊，而被動感應子模組51負責監聽與解析DNS封包產生半被動網域名稱伺服器紀錄，結果如圖6的紀錄範例示意圖所示。 The semi-passive domain name server record of each domain name is collected by the known domain list 501 via the semi-passive domain name server detection module 5, wherein the semi-passive domain name server detection module The active sensing sub-module 50 of 5 controls the DNS cache 510 to automatically trigger the DNS mechanism and query the input domain name to obtain the domain information of interest, and the passive sensing sub-module 51 is responsible for monitoring and parsing the DNS packet to generate semi-passive. The domain name server records, and the result is shown in the schematic diagram of the record in FIG. 6.

再將蒐集到的半被動網域名稱伺服器紀錄送入特徵擷取模組6當中進行特徵擷取，以網域特徵演算法進行計算以對各個樣本產生其二十七種特徵向量，並將處理完畢的特徵向量用於機器學習以建立網域分類規則；其中，本實施例中二十七種特徵向量可參照圖12所示，其中有標示者為本發明所提出的四種新型網域特徵，其係因為，為證實新型網域特徵的有效程度，本發明分別進行兩次不同的實驗並進行特徵排名，其中，第一項實驗過程當中，僅使用先前技術所採用的二十三種特徵，發現此時分類器對於APT、Botnet以及White的分類準確率僅有89.9208%；而在第二項實驗過程中(即為本發明的實施例)，使用了既有的二十三種特徵以及四種新型特徵進行準確率的測試，此時的準確率則提升為97.6398%；接著，藉由Gain Ratio的特徵評估法來排序各個特徵的效用程度，排名請參照圖12所示，排名後可發現在所有特徵當中，本發明所提出之針對網域名稱伺服器的IP冒用(DNS Server IP Abuse)、針對高執行時間(Meta Attribute Process Time)以及針對高結果回送數量(Meta Attribute Compression)等特徵的有效程度皆高於既有的網域特徵，而針對高TTL值(Very Large TTL)的特徵效率則為一般，藉由上述實驗以及特徵排名皆證實了本發明所提出的四種新型特徵在增加分類準確率上係為相當有效的。 The collected semi-passive domain name server records are sent to the feature extraction module 6 for feature extraction, and are calculated by the domain feature algorithm to generate twenty-seven feature vectors for each sample, and The processed feature vector is used for machine learning to establish a domain classification rule. The twenty-seven feature vectors in this embodiment can be referred to FIG. 12, wherein the identifier is the four new domains proposed by the present invention. The feature is because, in order to verify the effectiveness of the novel domain features, the present invention performs two different experiments and performs feature ranking, wherein in the first experiment, only the twenty-three used in the prior art are used. Characteristic, it is found that the classification accuracy of the classifier for APT, Botnet and White at this time is only 89.9208%; and in the second experiment process (that is, the embodiment of the present invention), the existing twenty-three features are used. And the accuracy of the four new features, the accuracy rate is increased to 97.6398%; then, by the Gain Ratio feature evaluation method to sort each For the degree of utility of the feature, please refer to FIG. 12 for ranking. After ranking, it can be found that among all the features, the DNS server IP Abuse for the domain name server proposed by the present invention is for the high execution time (Meta). Attribute Process Time) and features such as Meta Attribute Compression are more effective than existing domain features, while feature efficiency for Very Large TTL is average. Both the experiment and the feature ranking confirmed that the four novel features proposed by the present invention are quite effective in increasing the classification accuracy.

而將未知的網域清單502輸入半被動網域名稱伺服器偵測模組5後，可取得對應未知網域的半被動網域名稱伺服器紀錄後輸入特徵擷取模組6進行網域的特徵向量運算，並以可疑域名判定模組7進行計算，以輸出三個機率值以表示該域名的可疑程度，即可判斷以產生未知網路的可疑程度710之資訊，其中，三種機率值分別表示未知域名與White、Botnet、APT三種網域的相似程度，其結果如圖7、8、9三者分別所示、此後，再將圖7、8、9取得的三種網域名分別輸入先前技術的被動網域名稱偵測系統後以輸出結果，並將輸出結果於VirusTotal網站做驗證，其中後兩種的驗證結果如圖10、11所示，可以發現經本發明被判定為Botnet、APT的網域確為惡意網域。 After the unknown domain list 502 is input into the semi-passive domain name server detection module 5, the semi-passive domain name server record corresponding to the unknown domain domain can be obtained, and the input feature extraction module 6 performs the domain domain. The feature vector operation is performed by the suspicious domain name determining module 7 to output three probability values to indicate the suspicious degree of the domain name, so that the information of the suspicious degree 710 of the unknown network can be determined, wherein the three probability values are respectively The similarity between the unknown domain name and White, Botnet, and APT domains is shown. The results are shown in Figure 7, Figure 8, and Figure 9, respectively. Then, the three network domain names obtained in Figure 7, 8, and 9 are respectively input into the prior art. After the passive domain name detection system outputs the result, and the output is verified on the VirusTotal website, the latter two verification results are shown in Figures 10 and 11, and the network determined to be Botnet and APT by the present invention can be found. The domain is indeed a malicious domain.

經上述實施例的說明及驗證，應可瞭解本發明之根據半被動網域名稱伺服器以偵測可疑域名之系統及方法，確為一種極有效的偵測可疑域名技術。 Through the description and verification of the above embodiments, it should be understood that the system and method for detecting a suspicious domain name according to the semi-passive domain name server of the present invention is an extremely effective technique for detecting suspicious domain names.

綜上所述，本發明於技術思想上實屬創新，也具備先前技術不及的多種功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出專利申請，懇請貴局核准本件發明專利申請案以勵發明，至感德便。 In summary, the present invention is innovative in terms of technical ideas, and also has various functions that are not in the prior art, and has fully complied with the statutory invention patent requirements of novelty and progressiveness, and has filed a patent application according to law, and is requested to approve it. This invention patent application is to invent invention, to the sense of virtue.

Claims

A system for detecting a suspicious domain name according to a semi-passive domain name server, comprising: a passive domain name server detection module, which is input with a list of domains with high degree of interest; wherein the semi-passive network The domain name server detection module includes an active sensing submodule, and the active sensing submodule is configured to automatically trigger a domain name server mechanism to query a domain name of interest in the domain list; wherein The semi-passive domain name server detection module includes a passive sensing sub-module for monitoring the domain name server cache packet in the domain list and parsing out the client The domain name server packet sent by the terminal; wherein, after the active sensing submodule and the passive sensing submodule process the domain list, the semi-passive domain name server detecting module generates half according to the result Passive domain name server record; a feature capture module for parsing the domain name server record for machine learning; wherein the feature capture module includes a receiving submodule for receiving the half The semi-passive domain name server record generated by the domain name server detection module; wherein the feature capture module includes a feature vector sub-module, and the semi-passive domain name server record is recorded through the computing network A domain feature algorithm operation of the overall suspicious degree of the domain to generate a feature vector; wherein the feature capture module includes a feature training sub-module, and the feature vector generated by the feature vector sub-module is used for machine training And generating or refining a domain classification rule; and a suspicious domain name determining module, determining whether the domain name is suspicious through the machine learning result of the feature capturing module; wherein the suspicious domain name determining module includes a vector Receiving a sub-module to receive the feature vector generated by the feature capture module; wherein the suspect domain name determination module includes a vector classification sub-module to transmit the domain classification rule generated by the feature training sub-module And determining, by the semi-passive domain name server in the domain list, the feature vector to determine whether the domain name is suspicious.

For example, the system according to the semi-passive domain name server for detecting a suspicious domain name, wherein the domain list is a multi-domain list that may include a White, Botnet or APT domain.

A system for detecting a suspicious domain name according to a semi-passive domain name server, wherein the domain feature algorithm includes IP fraud for a domain name server (DNS Server IP Abuse). The algorithm for performing domain detection is used to detect a suspected malicious domain in which the visible state is normal and the IP address of the domain name server is fraudulently used.

A system for detecting a suspicious domain name according to a semi-passive domain name server, wherein the domain feature algorithm includes a high TTL value for domain detection. Algorithms, domain names with higher TTL values have a higher probability of being a benign domain.

For example, a system according to a semi-passive domain name server for detecting a suspicious domain name, wherein the domain feature algorithm includes a Meta Attribute Process Time for domain detection. The algorithm, which takes a longer time to perform a domain name with a higher execution time, is a benign domain.

For example, a system according to a semi-passive domain name server for detecting a suspicious domain name, wherein the domain feature algorithm includes a Meta Attribute Compression for domain detection. The algorithm, the positive solution results in the number of IP addresses returned at the same time point has a higher probability of being a benign domain.

A method for detecting a suspicious domain name according to a semi-passive domain name server, the method comprising: inputting a domain list through a passive domain name server detection module; the semi-passive domain name server detecting mode The active sensing sub-module included in the group automatically queries the domain name server to query the domain name of interest in the domain list; the semi-passive domain name server detection module includes a passive The sensing sub-module monitors the domain name server cache packet in the domain list, and parses out the domain name server packet sent by the client; the semi-passive domain name server detection module will The domain list generates half of the passive domain name server record by the active sensing sub-module and the passive sensing sub-module processing result; receiving the semi-passive network through a receiving sub-module included in a feature capturing module Domain name server record; through a feature vector sub-module included in the feature capture module, the semi-passive domain name server records a domain feature of the overall suspicious degree of the calculated domain a method for generating a feature vector; and a feature training sub-module included in the feature capture module, and performing machine training on the feature vector generated by the feature vector sub-module to generate or refine a domain classification rule Receiving the feature vector through a vector receiving sub-module included in a suspicious domain name determining module; and using the vector classification sub-module included in the suspect domain name determining module to record the feature of the semi-passive domain name server The vector is judged by the domain classification rule generated by the feature training sub-module to determine whether the domain name is suspicious.

For example, a method for detecting a suspicious domain name according to a semi-passive domain name server, which is a multi-domain list that may include a White, Botnet or APT domain.

For example, a method for detecting a suspicious domain name according to a semi-passive domain name server, wherein the domain feature algorithm includes IP fraud for a domain name server (DNS Server IP Abuse). The algorithm for performing domain detection is used to detect a suspected malicious domain in which the visible state is normal and the IP address of the domain name server is fraudulently used.

For example, the method for detecting a suspicious domain name according to the semi-passive domain name server of the seven patent application scopes, wherein the domain feature algorithm includes a high TTL value for domain detection. Algorithms, domain names with higher TTL values have a higher probability of being a benign domain.

For example, a method for detecting a suspicious domain name according to a semi-passive domain name server, wherein the domain feature algorithm includes a Meta Attribute Process Time for domain detection. The algorithm, which takes a longer time to perform a domain name with a higher execution time, is a benign domain.

For example, according to the method of detecting a suspicious domain name according to a semi-passive domain name server, the domain feature algorithm includes a Meta Attribute Compression for domain detection. The algorithm, the positive solution results in the number of IP addresses returned at the same time point has a higher probability of being a benign domain.