TWI777766B

TWI777766B - System and method of malicious domain query behavior detection

Info

Publication number: TWI777766B
Application number: TW110133747A
Authority: TW
Inventors: 陳勝裕; 蔡天浩; 陳彥仲; 施君熹
Original assignee: 中華電信股份有限公司
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2022-09-11
Also published as: TW202311994A

Abstract

A system and a method of malicious domain query behavior detection are provided. The system includes a transceiver, a storage medium, and a processor. The processor is coupled to the transmitter and the storage medium, and is configured to execute a plurality of modules, and the plurality of modules include: a domain name filtering module, filtering domain name service query records in network traffic according to a preset list so as to obtain a non-existent domain (NXDomain) query record; a feature computing module, computing a similarity of an internet protocol (IP) address to be tested based on the NXDomain query record; and a detection module, in response to the similarity is greater than a threshold value, detecting whether the IP address to be tested is a host infected by malicious programs through a machine learning model to generate a detection result, and outputting the detection result through the transceiver.

Description

System and method for detecting malicious domain query behavior

本發明是有關於一種網路安全技術，且特別是有關於一種偵測惡意網域查詢行為的系統及方法。The present invention relates to a network security technology, and in particular, to a system and method for detecting malicious network domain query behavior.

由於網際網路的興起，現今網域名稱服務（Domain Name Service，DNS）已成為上網不可或缺的服務。然而，多數的單位或使用者並不會特別關注DNS查詢的流量與內容。網路犯罪者為了保持與受害主機的溝通渠道順暢，會使用一台主機當作中控中心，此主機又稱為命令與控制伺服器（Command and Control Server，C&C Server），由此主機作為中繼站進行指令派送與收容竊取到的受害主機的私密資訊。因為命令與控制伺服器在整個犯罪的過程中扮演重要角色，網路犯罪者會使用各種方法延長其存活的時間與增加其隱蔽性以躲避偵測。Due to the rise of the Internet, Domain Name Service (DNS) has become an indispensable service for surfing the Internet. However, most units or users do not pay special attention to the traffic and content of DNS queries. In order to maintain a smooth communication channel with the victim host, cybercriminals will use a host as a central control center, which is also called a Command and Control Server (C&C Server), and this host acts as a relay station. Perform command dispatch and contain the stolen private information of the victim host. Because command and control servers play an important role in the entire criminal process, cybercriminals use various methods to prolong their survival and increase their stealth to evade detection.

網域生成演算法（Domain Generation Algorithm，DGA）的技術至今為止還是駭客策劃網路攻擊時的主要手段。DGA惡意程式常常是一個進階持續性滲透攻擊的工具。DGA惡意程式除了不易偵測外，駭客還同時可以賦予DGA所產生的網域名稱不同功能進行惡意活動。Domain Generation Algorithm (DGA) technology is still the main method for hackers to plan network attacks. DGA malware is often a tool for advanced persistent penetration attacks. In addition to being difficult to detect, DGA malware can also assign different functions to the domain name generated by DGA to carry out malicious activities.

有鑑於此，本發明提出一種偵測惡意網域查詢行為的系統及方法，可分析DNS查詢來偵測出DGA所產生的惡意網域。In view of this, the present invention proposes a system and method for detecting malicious network domain query behavior, which can analyze DNS query to detect malicious network domain generated by DGA.

本發明的實施例提供一種偵測惡意網域查詢行為的系統，包括：收發器，接收網路流量；儲存媒體，儲存多個模組；以及處理器，耦接所述收發器與所述儲存媒體，經配置以執行所述多個模組，其中所述多個模組包括：網域名稱過濾模組，根據預設清單過濾所述網路流量之中的網域名稱服務查詢紀錄而取得不存在網域查詢紀錄；特徵計算模組，根據所述不存在網域查詢紀錄計算待測網際協議位址的相似度；以及偵測模組，反應於所述相似度大於門檻值，通過機器學習模型偵測所述待測網際協議位址是否為受惡意程式感染的主機以產生偵測結果，並且通過所述收發器輸出所述偵測結果。An embodiment of the present invention provides a system for detecting malicious network domain query behavior, including: a transceiver, which receives network traffic; a storage medium, which stores a plurality of modules; and a processor, which is coupled to the transceiver and the storage A medium configured to execute the plurality of modules, wherein the plurality of modules include: a domain name filtering module obtained by filtering domain name service query records in the network traffic according to a preset list There is no network domain query record; the feature calculation module calculates the similarity of the IP address to be tested according to the non-existent network domain query record; and the detection module, in response to the similarity greater than the threshold value, through the machine The learning model detects whether the IP address to be tested is a host infected with a malicious program to generate a detection result, and outputs the detection result through the transceiver.

本發明的實施例提供一種偵測惡意網域查詢行為的方法，包括：接收網路流量；根據預設清單過濾所述網路流量之中的網域名稱服務查詢紀錄而取得不存在網域查詢紀錄；根據所述不存在網域查詢紀錄計算待測網際協議位址的相似度；反應於所述相似度大於門檻值，通過機器學習模型偵測所述待測網際協議位址是否為受惡意程式感染的主機以產生偵測結果；以及輸出所述偵測結果。An embodiment of the present invention provides a method for detecting malicious network domain query behavior, including: receiving network traffic; filtering domain name service query records in the network traffic according to a preset list to obtain the absence of network domain query record; calculate the similarity of the IP address to be tested according to the non-existent network domain query record; in response to the similarity being greater than the threshold value, detect whether the IP address to be tested is malicious through a machine learning model The program infects the host to generate detection results; and outputs the detection results.

基於上述，本發明所提供的偵測惡意網域查詢行為的系統及方法，結合異常行為分析方面與人工智慧技術的輔助，利用在DNS查詢中出現不存在網域的異常查詢進行行為分析，並利用機器學習模型偵測異常行為中的DGA網域。藉此，可以有效地找出DGA惡意程式所連線的惡意中繼站，降低DGA惡意程式的危害，阻止進階持續性滲透攻擊的入侵，並防止機敏資料被竊取。Based on the above, the system and method for detecting malicious network domain query behavior provided by the present invention, combined with the analysis of abnormal behavior and the assistance of artificial intelligence technology, use the abnormal query that does not exist in the DNS query to perform behavior analysis, and Use machine learning models to detect DGA domains in anomalous behavior. In this way, the malicious relay station connected by the DGA malware can be effectively found, the harm of the DGA malware can be reduced, the intrusion of advanced persistent penetration attacks can be prevented, and sensitive data can be prevented from being stolen.

本發明的部份實施例接下來將會配合附圖來詳細描述，以下的描述所引用的元件符號，當不同附圖出現相同的元件符號將視為相同或相似的元件。這些實施例只是本發明的一部份，並未揭示所有本發明的可實施方式。更確切的說，這些實施例只是本發明的專利申請範圍中的系統與方法的範例。Some embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Element symbols quoted in the following description will be regarded as the same or similar elements when the same element symbols appear in different drawings. These examples are only a part of the invention and do not disclose all possible embodiments of the invention. Rather, these embodiments are merely exemplary of systems and methods within the scope of the present invention.

圖1是依照本發明實施例的一種偵測惡意網域查詢行為的系統的方塊圖。請參照圖1，系統10可包括處理器100、收發器200以及儲存媒體300。處理器100耦接收發器200以及儲存媒體300。處理器100可經配置以執行儲存媒體200所儲存的多個模組。FIG. 1 is a block diagram of a system for detecting malicious domain query behavior according to an embodiment of the present invention. Referring to FIG. 1 , the system 10 may include a processor 100 , a transceiver 200 and a storage medium 300 . The processor 100 is coupled to the transceiver 200 and the storage medium 300 . The processor 100 may be configured to execute a plurality of modules stored by the storage medium 200 .

處理器100例如是中央處理單元（central processing unit，CPU），或是其他可程式化之一般用途或特殊用途的微控制單元（micro control unit，MCU）、微處理器（microprocessor）、數位信號處理器（digital signal processor，DSP）、可程式化控制器、特殊應用積體電路（application specific integrated circuit，ASIC）、圖形處理器（graphics processing unit，GPU）、影像訊號處理器（image signal processor，ISP）、影像處理單元（image processing unit，IPU）、算數邏輯單元（arithmetic logic unit，ALU）、複雜可程式邏輯裝置（complex programmable logic device，CPLD）、現場可程式化邏輯閘陣列（field programmable gate array，FPGA）或其他類似元件或上述元件的組合。處理器100可存取和執行儲存於儲存媒體300中的多個模組和各種應用程式以執行系統10的各種功能。The processor 100 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose micro control unit (micro control unit, MCU), microprocessor (microprocessor), digital signal processing digital signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processor (graphics processing unit, GPU), image signal processor (image signal processor, ISP) ), image processing unit (IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (field programmable gate array) , FPGA) or other similar elements or a combination of the above. The processor 100 can access and execute various modules and various application programs stored in the storage medium 300 to perform various functions of the system 10 .

收發器200可接收網路流量。收發器200以無線或有線的方式傳送及接收訊號。收發器200還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。The transceiver 200 may receive network traffic. The transceiver 200 transmits and receives signals in a wireless or wired manner. Transceiver 200 may also perform operations such as low noise amplification, impedance matching, frequency mixing, up or down frequency conversion, filtering, amplification, and the like.

儲存媒體300可儲存多個模組。儲存媒體300例如是任何型態的固定式或可移動式的隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟（hard disk drive，HDD）、固態硬碟（solid state drive，SSD）或類似元件或上述元件的組合，而用於儲存可由處理器100執行的多個模組或各種應用程式。多個模組可包括網域名稱過濾模組310、特徵計算模組320以及偵測模組330。The storage medium 300 can store a plurality of modules. The storage medium 300 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (ROM), and flash memory (flash memory). , a hard disk drive (HDD), a solid state drive (SSD) or similar components or a combination of the above components for storing a plurality of modules or various application programs executable by the processor 100 . The plurality of modules may include a domain name filtering module 310 , a feature calculation module 320 and a detection module 330 .

網域名稱過濾模組310可根據預設清單過濾網路流量之中的網域名稱服務（Domain Name Service，DNS）查詢紀錄而取得不存在網域（Non-existent Domain，NXDomain）查詢紀錄。The domain name filtering module 310 can filter the domain name service (Domain Name Service, DNS) query records in the network traffic according to the preset list to obtain the non-existent domain (Non-existent Domain, NXDomain) query records.

特徵計算模組320，可根據不存在網域查詢紀錄計算待測網際協議（Internet Protocol，IP）位址的相似度。不存在網域查詢紀錄可包含多個使用者IP位址。特徵計算模組320可從多個使用者IP位址中選出待測IP位址。The feature calculation module 320 can calculate the similarity of the Internet Protocol (IP) address to be tested according to the absence of the network domain query record. There is no domain query record that can contain multiple user IP addresses. The feature calculation module 320 can select the IP address to be tested from a plurality of user IP addresses.

偵測模組330，可反應於相似度大於門檻值，通過機器學習模型偵測待測網際協議位址是否為受惡意程式感染的主機以產生偵測結果，並且通過收發器200輸出偵測結果。The detection module 330 can detect whether the IP address under test is a host infected by a malicious program through a machine learning model in response to the similarity being greater than a threshold value to generate a detection result, and output the detection result through the transceiver 200 .

圖2是依照本發明實施例的一種偵測惡意網域查詢行為的方法的流程圖。請參照圖2，本實施例的方法適用如圖1所示的系統10，以下說明本發明實施例的詳細步驟。在步驟S210中，接收網路流量。在步驟S220中，根據預設清單過濾網路流量之中的網域名稱服務查詢紀錄而取得不存在網域查詢紀錄。在步驟S230中，根據不存在網域查詢紀錄計算待測網際協議位址的相似度。在步驟S240中，反應於相似度大於門檻值，通過機器學習模型偵測待測網際協議位址是否為受惡意程式感染的主機以產生偵測結果。在步驟S250中，輸出偵測結果。FIG. 2 is a flowchart of a method for detecting malicious domain query behavior according to an embodiment of the present invention. Referring to FIG. 2 , the method of this embodiment is applicable to the system 10 shown in FIG. 1 , and the detailed steps of the embodiment of the present invention are described below. In step S210, network traffic is received. In step S220, the query records of the domain name service in the network traffic are filtered according to the preset list to obtain the absence of the domain name query records. In step S230, the similarity of the IP address to be tested is calculated according to the absence of the network domain query record. In step S240, in response to the similarity being greater than the threshold value, the machine learning model is used to detect whether the IP address to be tested is a host infected with a malicious program to generate a detection result. In step S250, the detection result is output.

在本發明的一實施例中，預設清單包括白名單。網域名稱過濾模組310可反應於網域名稱服務查詢紀錄之中的待確認網域查詢不在白名單之中，將待確認網域查詢列入不存在網域查詢紀錄。在本發明的一實施例中，預設清單包括黑名單。網域名稱過濾模組310可反應於網域名稱服務查詢紀錄之中的待確認網域查詢不在黑名單之中，將待確認網域查詢列入不存在網域查詢紀錄。In an embodiment of the present invention, the preset list includes a white list. The domain name filtering module 310 can reflect that the to-be-confirmed domain query in the domain name service query record is not in the whitelist, and the to-be-confirmed domain query can be included in the non-existing domain query record. In an embodiment of the present invention, the preset list includes a blacklist. The domain name filtering module 310 may include the to-be-confirmed domain query in the non-existing domain query record in response to the fact that the to-be-confirmed domain query in the domain name service query record is not in the blacklist.

舉例來說，白名單儲存多個正常活動所產生的DNS查詢。白名單的作用在於降低系統的計算負擔與降低誤報率。白名單所記錄的網域也有可能包含NXDomain的查詢，如果將此流量也進行計算則會產生大量誤報。網域名稱過濾模組310將蒐集到的DNS查詢紀錄進行過濾，將白名單所記錄的正常活動產生之DNS查詢過濾掉而只留下關注的NXDomain查詢紀錄。具體來說，通常網路流量中NXDomain查詢的產生主要是來自輸入錯誤查詢網域與一些特殊用途的服務，所以NXDomain的流量只占整體流量不到十分之一。For example, a whitelist stores DNS queries generated by many normal activities. The role of the whitelist is to reduce the computational burden of the system and reduce the false positive rate. The domains recorded in the whitelist may also contain NXDomain queries. If this traffic is also calculated, a large number of false positives will be generated. The domain name filtering module 310 filters the collected DNS query records, filters out the DNS queries generated by normal activities recorded in the whitelist, and only leaves the concerned NXDomain query records. Specifically, NXDomain queries in network traffic are usually generated from incorrectly entered query domains and some special-purpose services, so NXDomain traffic only accounts for less than one-tenth of the overall traffic.

另外NXDomain的流量中可能包含大量的特殊服務所產生的查詢。最常見的屬於防毒軟體中的黑名單比對，例如防毒軟體的黑名單會將防毒軟體廠商的網域與待確認網域做結合。另一個常見的服務則是電子郵件伺服器，若是使用同一個公司所提供的第三方黑名單，則網路流量中會產生與DGA所產生的異常查詢相似的網域查詢，造成系統的誤判。因此，網域名稱過濾模組310可將所有第三方黑名單的網域查詢過濾掉。如此一來，網路流量經過預設清單過濾後只需處理不到原來網路流量的二十分之一的流量。藉此，系統10不需要處理龐大的網路流量，可節省計算資源並有效提升運行效能，也同時提高整體偵測精準度。In addition, NXDomain's traffic may contain a large number of queries generated by special services. The most common is the blacklist comparison in antivirus software. For example, the blacklist of antivirus software will combine the domain of the antivirus software manufacturer with the domain to be confirmed. Another common service is an email server. If a third-party blacklist provided by the same company is used, domain queries similar to the abnormal queries generated by DGA will be generated in the network traffic, resulting in misjudgment by the system. Therefore, the domain name filtering module 310 can filter out all third-party blacklisted domain queries. In this way, after the network traffic is filtered by the preset list, it only needs to process less than one-twentieth of the original network traffic. In this way, the system 10 does not need to deal with huge network traffic, which can save computing resources, effectively improve operating performance, and at the same time improve the overall detection accuracy.

在本發明的一實施例中，網域名稱服務查詢紀錄可包括使用者IP位址、查詢時間、查詢網域名稱或網域解析結果。舉例來說，使用者IP位址可以是由十進位數字組成的IPv4位址，也可以是由十六進位數字組成IPv6位址。查詢網域名稱可以是由多個部分組成的字串，這些部分通常連接在一起，並由點分隔，查詢網域名稱的英文字母可不區分大小寫。網域解析結果可以是DNS伺服器所回傳對於查詢網域名稱的解析結果。In an embodiment of the present invention, the domain name service query record may include the user IP address, the query time, the query domain name or the domain resolution result. For example, the user IP address can be an IPv4 address composed of decimal digits, or an IPv6 address composed of hexadecimal digits. The query domain name can be a string composed of multiple parts, which are usually connected together and separated by dots. The English letters of the query domain name are not case-sensitive. The domain resolution result may be the resolution result of the query domain name returned by the DNS server.

在本發明的一實施例中，特徵計算模組320可經配置以執行下列指令來計算相似度。特徵計算模組320可從網域名稱服務查詢紀錄中取得對應於待測網際協議位址的第一查詢網域清單以及對應於參考網際協議位址的第二查詢網域清單。特徵計算模組320可取得第一查詢網域清單與第二查詢網域清單的交集數量。特徵計算模組320可取得第一查詢網域清單與第二查詢網域清單的聯集數量。特徵計算模組320可將交集數量除以聯集數量而得到相似度。In one embodiment of the present invention, the feature calculation module 320 may be configured to execute the following instructions to calculate the similarity. The feature calculation module 320 can obtain the first query domain list corresponding to the IP address to be tested and the second query domain list corresponding to the reference IP address from the domain name service query record. The feature calculation module 320 can obtain the number of intersections between the first query domain list and the second query domain list. The feature calculation module 320 can obtain the union quantity of the first query domain list and the second query domain list. The feature calculation module 320 can obtain the similarity by dividing the number of intersections by the number of unions.

舉例來說，特徵計算模組320將DNS查詢紀錄蒐集到的所有查詢網域當作輸入資料，藉以統計第一使用者（即待測IP位址）查詢過的網域以產生第一查詢網域清單，並且統計第二使用者（即參考IP位址）查詢過的網域以產生第二查詢網域清單。接著，特徵計算模組320可計算待測IP位址與參考IP位址之間查詢的共同網域（即對應於待測網際協議位址的第一查詢網域清單以及對應於參考網際協議位址的第二查詢網域清單之間共同的查詢網域）比例。For example, the feature calculation module 320 uses all the query domains collected from the DNS query records as input data, so as to count the domains queried by the first user (ie the IP address to be tested) to generate the first query domain. A domain list is generated, and the second user (that is, the reference IP address) has queried the domains to generate a second queried domain list. Next, the feature calculation module 320 can calculate the common network domain queried between the IP address to be tested and the reference IP address (that is, the first query domain list corresponding to the IP address to be tested and the IP address corresponding to the reference IP address) The ratio of query domains that are common among the second query domain list of the URL.

同一隻DGA惡意程式會查詢同一份的網域名單，直到查詢到有存活的網域才會停止。當使用者之間的相似度很高代表很有可能是同一隻DGA惡意程式，因此查詢的網域清單會極為相似。由於異常查詢行為會有一份相同的網域名單，而正常使用者很少會去查詢整個網域名單，所以受惡意程式感染的使用者之間的查詢網域清單會有很高的相似度。The same DGA malware will query the same domain name list, and will not stop until there are surviving domains. When the similarity between users is high, it is likely to be the same DGA malware, so the queried domain list will be very similar. Since abnormal query behaviors will have the same domain name list, and normal users will rarely query the entire domain name list, the query domain lists among users infected with malware will have a high degree of similarity.

具體來說，相似度可由以下公式計算：

其中

為相似度，其中

代表使用者，

為第一IP位址（即待測IP位址）的查詢網域清單，

為第二IP位址（即參考IP位址）的查詢網域清單，

代表

與

的交集數量，

代表

與

的聯集數量。此公式計算的是兩兩使用者的相似度，而不是針對整體的網路環境進行分析，因此即使應用在不同的網路環境中也可以精準的算出使用者之間的相似度。 Specifically, the similarity can be calculated by the following formula:

in

is the similarity, where

on behalf of the user,

is the query domain list of the first IP address (that is, the IP address to be tested),

is a list of query domains for the second IP address (i.e. the reference IP address),

represent

and

the number of intersections,

represent

and

the number of unions. This formula calculates the similarity between two users, rather than analyzing the overall network environment, so even if it is applied in different network environments, it can accurately calculate the similarity between users.

舉例而言，當特徵計算模組320所計算的相似度高於門檻值（又稱為設計水位）就可以由偵測模組330對待測IP位址與對應的查詢網域進行偵測。在一些實施例中，當設計水位之值在0.8至0.9的區間，偵測模組330即可有效的偵測待測IP位址與對應的查詢網域是否為DGA所產生的網域。For example, when the similarity calculated by the feature calculation module 320 is higher than a threshold value (also called a design water level), the detection module 330 can detect the IP address to be tested and the corresponding query domain. In some embodiments, when the value of the design water level is in the range of 0.8 to 0.9, the detection module 330 can effectively detect whether the IP address to be tested and the corresponding query domain are domains generated by DGA.

在本發明的一實施例中，機器學習模型為長短期記憶（Long Short Term Memory，LSTM）模型。在深度學習技術中，LSTM模型為遞迴神經網路（Recurrent Neural Network，RNN）最常見的變形之一，適合用於語音識別、語言建模、情感分析和文本預測等具有前後文特性的資料，具有良好的準確性和處理複雜特徵的能力。在一實施例中，偵測模組330可將網路上可蒐集到的多種DGA演算法產生的網域清單作為訓練資料集進行訓練。在一實施例中，偵測模組330將LSTM模型訓練成可以判斷字串是否符合DGA形式。例如，LSTM模型可以針對網域名稱中是否存在疑似使用DGA或是隨機亂數產生的字串進行判斷，由於此判斷為字串上的語意分析，因此非常適合使用LSTM模型。當LSTM模型判斷網域清單符合DGA形式，偵測模組330判斷此待測IP位址為受惡意程式感染的主機以產生偵測結果，並且通過收發器200輸出偵測結果。In an embodiment of the present invention, the machine learning model is a Long Short Term Memory (Long Short Term Memory, LSTM) model. In deep learning technology, LSTM model is one of the most common variants of Recurrent Neural Network (RNN), suitable for speech recognition, language modeling, sentiment analysis, text prediction and other data with contextual characteristics , with good accuracy and ability to handle complex features. In one embodiment, the detection module 330 can use the network domain lists generated by various DGA algorithms that can be collected on the Internet as a training data set for training. In one embodiment, the detection module 330 trains the LSTM model to determine whether the string conforms to the DGA format. For example, the LSTM model can judge whether there is a string of suspected DGA or random random numbers in the domain name. Since this judgment is semantic analysis on the string, it is very suitable to use the LSTM model. When the LSTM model determines that the domain list conforms to the DGA format, the detection module 330 determines that the IP address to be tested is a host infected by a malicious program to generate a detection result, and outputs the detection result through the transceiver 200 .

值得一提的是，許多DGA生成的網域有一定的規律或是使用特定種子產生的字串，並且此字串會在二級域（Second level domain）中出現。因此，在一些實施例中，偵測模組330可使用隨機字串以及二級域訓練機器學習模型。偵測模組330在進行判斷時，機器學習模型可只萃取二級域作為輸入以產生偵測結果。It is worth mentioning that many DGA-generated domains have certain rules or strings generated using specific seeds, and this string will appear in the second level domain. Therefore, in some embodiments, the detection module 330 can use random strings and second-level domains to train a machine learning model. When the detection module 330 makes a judgment, the machine learning model can only extract the second-level domain as an input to generate a detection result.

在本發明的一實施例中，偵測模組330可產生多組隨機字串。偵測模組330可從歷史不存在網域查詢紀錄中擷取多筆二級域。偵測模組330可將多組隨機字串以及多筆二級域作為訓練資料訓練機器學習模型。In an embodiment of the present invention, the detection module 330 can generate multiple sets of random character strings. The detection module 330 can retrieve multiple second-level domains from the historically non-existing domain query records. The detection module 330 can use multiple sets of random character strings and multiple second-level domains as training data to train the machine learning model.

在本發明的一實施例中，偵測模組330可反應於相似度大於門檻值，從不存在網域查詢紀錄取得對應於待測網際協議的待測查詢網域清單。偵測模組330可從待測查詢網域清單中擷取分別對應於多個查詢網域的多個二級域。偵測模組330可將多個二級域輸入機器學習模型以產生偵測結果。In an embodiment of the present invention, the detection module 330 may obtain a list of the network domains to be tested corresponding to the Internet Protocol to be tested from the absence of the network domain query record in response to the similarity being greater than the threshold value. The detection module 330 can retrieve a plurality of secondary domains corresponding to the plurality of query domains respectively from the list of query domains to be tested. The detection module 330 can input multiple second-level domains into the machine learning model to generate detection results.

由於網路犯罪者使用DGA所產生的網域每天在變換，而防毒軟體廠商蒐集的惡意網域清單也未必會即時更新，若只單靠看網域名稱很難達到準確抓出惡意網域查詢行為。因此，本發明的實施例從使用者行為的層面與深度學習演算法做結合，彌補防毒軟體可能遺漏之惡意DGA程式所產生的網域。DGA確實會產生大量的NXDomain查詢，但是更常發現是機器內建之網域清單查詢而造成誤判。本發明的實施例更可結合人工智慧技術對網域名稱組成進行分析，使得偵測出來的告警更準確。本發明的實施例從NXDomain先篩選出非人類的查詢行為，接著再利用針對DGA網域訓練得到的機器學習模型偵測惡意DGA程式所產生的網域。如此，可達成更為精確的惡意網域偵測結果。Since the domains generated by cybercriminals using DGA are changing every day, and the list of malicious domains collected by antivirus software manufacturers may not be updated in real time, it is difficult to accurately identify malicious domains by just looking at the domain name. Behavior. Therefore, the embodiments of the present invention are combined with the deep learning algorithm from the level of user behavior to make up for the network domain generated by the malicious DGA program that may be missed by the antivirus software. DGA does generate a large number of NXDomain queries, but it is more often found that the machine's built-in domain list query causes misjudgment. The embodiment of the present invention can further analyze the composition of the network domain name in combination with artificial intelligence technology, so that the detected alarm is more accurate. The embodiments of the present invention first screen out non-human query behaviors from the NXDomain, and then use the machine learning model trained for the DGA domain to detect the domain generated by the malicious DGA program. In this way, more accurate malicious domain detection results can be achieved.

綜上所述，本發明所提供的偵測惡意網域查詢行為的系統及方法可達到以下之技術功效：（1）針對使用者行為模式計算相似度，不需要事前大量資料建模與訓練即可有效的區別出異常使用者查詢。（2）藉由使用者行為與查詢網域名稱分析綜合偵測，有效的避免分析網域名稱上的誤判，也可以利用使用者異常行為提供偵測結果的有效證據力。（3）利用DNS通訊無法加密之特性，針對DNS查詢的行為進行偵測，可避免因為封包加密而影響系統的偵測率或讓系統無法進行偵測。（4）利用使用者群體的連線行為進行偵測，不需知道網域清單的產生方式或是產生時間，即可偵測出不同型態之惡意DGA網域。（5）從大量的網路流量中萃取需要分析的流量，可有效提升系統運行效能，並可增加偵測的精準度。To sum up, the system and method for detecting malicious network domain query behavior provided by the present invention can achieve the following technical effects: (1) Calculate similarity for user behavior patterns, without the need for a large amount of data modeling and training in advance. It can effectively distinguish abnormal user queries. (2) Through comprehensive detection of user behavior and query domain name analysis, it can effectively avoid misjudgments in the analysis of domain names, and can also use the abnormal behavior of users to provide effective evidence for the detection results. (3) Using the feature that DNS communication cannot be encrypted, it can detect the behavior of DNS query, which can avoid affecting the detection rate of the system or making the system unable to detect due to packet encryption. (4) Using the connection behavior of user groups for detection, it is possible to detect different types of malicious DGA domains without knowing the generation method or generation time of the domain list. (5) Extracting the traffic that needs to be analyzed from a large amount of network traffic can effectively improve the operating efficiency of the system and increase the accuracy of detection.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above by the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the scope of the appended patent application.

10:系統 100:處理器 200:收發器 300:儲存媒體 310:網域名稱過濾模組 320:特徵計算模組 330:偵測模組 S210、S220、S230、S240、S250:步驟 10: System 100: Processor 200: Transceiver 300: Storage Media 310: Domain name filtering module 320: Feature calculation module 330: Detection Module S210, S220, S230, S240, S250: Steps

圖1是依照本發明實施例的一種偵測惡意網域查詢行為的系統的方塊圖。圖2是依照本發明實施例的一種偵測惡意網域查詢行為的方法的流程圖。 FIG. 1 is a block diagram of a system for detecting malicious domain query behavior according to an embodiment of the present invention. FIG. 2 is a flowchart of a method for detecting malicious domain query behavior according to an embodiment of the present invention.

10:系統 10: System

100:處理器 100: Processor

200:收發器 200: Transceiver

300:儲存媒體 300: Storage Media

310:網域名稱過濾模組 310: Domain name filtering module

320:特徵計算模組 320: Feature calculation module

330:偵測模組 330: Detection Module

Claims

A system for detecting malicious network domain query behavior, comprising: a transceiver for receiving network traffic; a storage medium for storing a plurality of modules; and a processor, coupled to the transceiver and the storage medium, configured to execute The plurality of modules, wherein the plurality of modules include: a domain name filtering module that filters the domain name service query records in the network traffic according to a preset list to obtain the absence of domain name query records The feature calculation module calculates the similarity of the IP address to be tested according to the described absence of network domain query records; and the detection module, in response to the similarity greater than the threshold value, detects the Whether the Internet Protocol address to be tested is a host infected with a malicious program is used to generate a detection result, and output the detection result through the transceiver.

The system according to claim 1, wherein the preset list includes a white list, wherein the domain name filtering module reflects that the to-be-confirmed domain query in the domain name service query record is not in the white list In the list, the to-be-confirmed domain query is included in the non-existing domain query record.

The system of claim 1, wherein the preset list includes a blacklist, wherein the domain name filtering module reflects that the to-be-confirmed domain query in the domain name service query record is not in the blacklist In the list, the to-be-confirmed domain query is included in the non-existing domain query record.

The system of claim 1, wherein the domain name service query record includes the user's IP address, query time, query domain name and domain resolution result.

The system of claim 1, wherein the feature calculation module is configured to perform: obtaining a first query domain list corresponding to the IP address under test from the domain name service query record; and A second query domain list corresponding to the reference IP address; obtaining the number of intersections between the first query domain list and the second query domain list; obtaining the first query domain list and the first query domain list 2. Querying the number of joins in the network domain list; and dividing the number of intersections by the number of joins to obtain the similarity.

The system of claim 1, wherein the machine learning model is a long short term memory model.

The system of claim 1, wherein the detection module is configured to: generate multiple sets of random strings; retrieve multiple secondary domains from historical non-existing domain query records; and A group of random character strings and the multiple secondary domains are used as training data to train the machine learning model.

The system of claim 1, wherein the detection module is configured to perform: in response to the similarity being greater than the threshold value, obtaining from the non-existent network domain query record corresponding to the network under test a list of query domains to be tested for the protocol; retrieve multiple secondary domains corresponding to multiple query domains from the list of query domains to be tested; and input the multiple secondary domains into the machine learning model to generate the detection results.

A method for detecting malicious network domain query behavior, comprising: receiving network traffic by a transceiver; filtering, by a processor, a domain name service query record in the network traffic according to a preset list to obtain the absence of network domain query record; the processor calculates the similarity of the IP address to be tested according to the non-existent network domain query record; the processor detects that the similarity is greater than the threshold value through a machine learning model Whether the Internet Protocol address to be tested is a host infected with a malicious program is used to generate a detection result; and the detection result is output by the transceiver.