TW202017337A

TW202017337A - Method and system for backbone network flow anomaly detection

Info

Publication number: TW202017337A
Application number: TW107138229A
Authority: TW
Inventors: 林炫佑; 謝善雄; 高震宇
Original assignee: 財團法人電信技術中心
Priority date: 2018-10-29
Filing date: 2018-10-29
Publication date: 2020-05-01
Also published as: TWI704782B

Abstract

The present invention discloses a method for backbone network flow anomaly detection, which includes the steps of: using the source Internet Protocol (IP) address to filter the network flow in a backbone network; evenly distributing the filtered network flow into a plurality of working nodes of a distributed big data processing system; performing a parallel processing on the plurality of working nodes, and generating a plurality of flow feature data sets on each of the working nodes; using a plurality of abnormal flow identification models and the flow feature data sets to determine whether the network flow is abnormal or not on each of the working nodes; and, when determining that the network flow is abnormal, using a plurality of attack-type identification models and the flow feature data sets to determine the attack type and generate an analysis result.

Description

Backbone network abnormal flow detection method and system

本發明揭露一種網路流量的偵測方法和系統，尤其是一種關於骨幹網路異常流量偵測方法和系統。The invention discloses a method and system for detecting network traffic, in particular to a method and system for detecting abnormal traffic in a backbone network.

隨著網際網路的發展，網路流量飛速增長，網際網路已成為不可或缺的資訊載體，與此同時，網路流量也經常會出現偏離正常範圍的異常流量，主要是由蠕蟲傳播（Worms）、分散式阻斷服務（DDoS）攻擊、僵屍網路（Botnet）等惡意網路攻擊行為以及網路配置失誤或偶發性線路中斷等引起。這些異常流量往往會導致整個網路服務品質急劇下降，使受害端主機、網路直接癱瘓。因此，如何在骨幹網路環境下進行網路異常檢測並及時提供警示資訊，對保障骨幹網路正常運行具有重要意義。With the development of the Internet, network traffic has grown rapidly, and the Internet has become an indispensable information carrier. At the same time, network traffic often has abnormal traffic that deviates from the normal range, mainly spread by worms. (Worms), distributed denial of service (DDoS) attacks, botnet (Botnet) and other malicious network attacks, as well as network configuration errors or occasional line interruptions. These abnormal traffic will often lead to a sharp decline in the quality of the entire network service, directly paralyzing the victim host and the network. Therefore, how to perform network anomaly detection and provide warning information in a backbone network environment is of great significance to ensure the normal operation of the backbone network.

同時，隨著網路頻寬的不斷提高，網路流量異常檢測面臨新的問題：一方面，網路傳輸速率大幅度提高，相同的網路攻擊，在區域網路表現非常明顯，而在骨幹網路中可能並不容易發現，需要高準確性的網路流量異常檢測方法，另一方面，網路頻寬提高的同時也加快了網路攻擊的速度，以網路蠕蟲爆發為例，它能夠在更短的時間內感染網際網路內大部分脆弱主機，這就要求異常檢測系統更快速且有效率識別出異常流量，以利後續能夠即時實施阻斷工作。At the same time, with the continuous improvement of network bandwidth, network traffic anomaly detection faces new problems: on the one hand, the network transmission rate has been greatly improved. The same network attack is very obvious in the regional network, but in the backbone It may not be easy to find on the network, and requires a high-accuracy network traffic anomaly detection method. On the other hand, the increase in network bandwidth also speeds up the speed of network attacks. Take the outbreak of network worms as an example It can infect most vulnerable hosts in the Internet in a shorter period of time, which requires the anomaly detection system to identify anomalous traffic more quickly and efficiently, so that subsequent blocking can be implemented in real time.

另外，目前在檢測骨幹網路流量的方式通常是直接分析原始的網路流量，若要達到快速且有效率地識別出異常流量，將要需要大量頻寬以及大量的運算資源，這將會增加運算的硬體成本。In addition, the current method for detecting backbone network traffic is usually to directly analyze the original network traffic. To quickly and efficiently identify abnormal traffic, it will require a lot of bandwidth and a lot of computing resources, which will increase computing. Hardware costs.

此外，目前亦有利用負載平衡交換器（Load Balance Switch）將骨幹網路流量進行分流後進行異常流量偵測，但由於負載平衡交換器僅為普遍網通設備，受限於設備的運算能力，無法同時進行多個運算任務（即進行平行處理），因此無法有效率地識別出異常流量。In addition, there are currently load balance switches (Load Balance Switch) to split the backbone network traffic for abnormal traffic detection, but because the load balance switch is only a general Netcom device, it is limited by the computing power of the device and cannot Simultaneously perform multiple calculation tasks (that is, parallel processing), so it is not possible to efficiently identify abnormal flows.

因此，習知技術需要一種提升骨幹網路異常流量偵測效能的改進方案，其能因應骨幹網路的頻寬需求，更快速且有效率地識別出異常流量。Therefore, the conventional technology needs an improved solution for improving the abnormal traffic detection performance of the backbone network, which can identify abnormal traffic more quickly and efficiently in response to the bandwidth requirements of the backbone network.

本發明的目的在於提供一種骨幹網路流量的偵測方法和系統，其能因應骨幹網路的頻寬需求，並利用「分層」與「分流」概念以及「離線」與「即時」兩階段處理達成快速且有效率地識別出異常流量。The purpose of the present invention is to provide a method and system for detecting backbone network traffic, which can respond to the bandwidth requirements of the backbone network and utilize the concepts of "layering" and "distribution" and the two stages of "offline" and "real-time" The process achieves rapid and efficient identification of abnormal traffic.

為達到發明目的，本發明揭露一種骨幹網路異常流量偵測方法，其包括以下步驟：利用來源網際網路協定（IP）位址過濾骨幹網路中的網路流量；將過濾過後的該網路流量均分到分散式大數據處理系統的複數工作節點中；於上述複數工作節點進行平行處理，並於上述每一個工作節點產生複數流量特徵資料集；於上述每一個工作節點，利用複數異常流量辯識模型以及上述流量特徵資料集，以判定該網路流量是否異常；以及當判定該網路流量為異常時，利用複數攻擊類型辯識模型以及上述流量特徵資料集，以判定攻擊類型並產生分析結果。In order to achieve the purpose of the invention, the present invention discloses a method for detecting abnormal traffic in a backbone network, which includes the following steps: filtering the network traffic in the backbone network by using the source Internet Protocol (IP) address; the filtered network Road traffic is equally divided into the plural working nodes of the distributed big data processing system; parallel processing is performed on the aforementioned plural working nodes, and a complex flow characteristic data set is generated on each of the aforementioned working nodes; on each of the aforementioned working nodes, the plural anomalies are used The traffic identification model and the above-mentioned traffic characteristic data set to determine whether the network traffic is abnormal; and when the network traffic is determined to be abnormal, the multiple attack type identification model and the above-mentioned traffic characteristic data set are used to determine the attack type and Produce analysis results.

在一具體實施例中，該骨幹網路異常流量偵測方法更包括將該分析結果送往分析資料庫儲存並於顯示介面中顯示。In a specific embodiment, the backbone network abnormal traffic detection method further includes sending the analysis result to an analysis database for storage and display on the display interface.

在一具體實施例中，該利用該IP位址過濾該骨幹網路中的該網路流量的步驟更包括：建立白名單以及黑白單，其中該黑白單內儲存複數異常來源IP位址，該白名單內儲存複數可信任來源IP位址；判斷該骨幹網路中一封包的一來源IP位址是否於該白名單或者該黑白單中；以及當上述封包的該來源IP位址於該白名單或者該黑白單中時，捨棄上述封包。In a specific embodiment, the step of using the IP address to filter the network traffic in the backbone network further includes: creating a white list and a black and white list, wherein the black and white list stores a plurality of abnormal source IP addresses, the A plurality of trusted source IP addresses are stored in the white list; determine whether a source IP address of a packet in the backbone network is in the white list or the black and white list; and when the source IP address of the packet is in the white When the list or the black-and-white list is included, the above packet is discarded.

在一具體實施例中，當判定該網路流量為異常時，將該分析結果送往一大量異常流量分析模組；分析該網路流量以取得該網路流量的該來源IP位址；以及將該來源IP位址加入該黑名單中。In a specific embodiment, when it is determined that the network traffic is abnormal, the analysis result is sent to a large number of abnormal traffic analysis modules; analyzing the network traffic to obtain the source IP address of the network traffic; and Add the source IP address to the blacklist.

在一具體實施例中，該將過濾過後的該網路流量均分到分散式大數據處理系統的上述工作節點中的步驟更包括：將過濾過後的該網路流量均分到阿帕契風暴（Apache Storm）系統的複數工作節點中並經過轉換處理輸出流量傳輸統計資料。In a specific embodiment, the step of distributing the filtered network traffic to the above-mentioned working nodes of the distributed big data processing system further includes: distributing the filtered network traffic to the Apache storm (Apache Storm) In the plural working nodes of the system, the traffic transmission statistics are output after conversion processing.

在一具體實施例中，該於上述工作節點的每一個產生上述流量特徵資料集的步驟更包括：利用流量特徵演算法分析該流量傳輸統計資料以產生上述流量特徵資料集。In a specific embodiment, the step of generating the flow characteristic data set at each of the working nodes further includes: analyzing the flow transmission statistical data using a flow characteristic algorithm to generate the flow characteristic data set.

在一具體實施例中，上述流量特徵資料集包括至少一基本流量特徵、至少一原始流量特徵以及至少一額外流量特徵。In a specific embodiment, the aforementioned flow characteristic data set includes at least one basic flow characteristic, at least one original flow characteristic and at least one additional flow characteristic.

在一具體實施例中，利用上述異常流量辯識模型以及該流量特徵資料集判定該網路流量是否異常的步驟更包括：從至少一已知之入侵偵測資料集選擇至少一行為特徵；對至少一機器學習演算法及一辨識結果進行效益分析以產生至少一被選擇的機器學習演算法；以及以上述行為特徵及上述被選擇的機器學習演算法離線訓練出上述異常流量辯識模型。In a specific embodiment, the step of determining whether the network traffic is abnormal using the aforementioned abnormal traffic identification model and the traffic characteristic data set further includes: selecting at least one behavior characteristic from at least one known intrusion detection data set; A machine learning algorithm and a recognition result perform benefit analysis to generate at least one selected machine learning algorithm; and the above-mentioned behavioral characteristics and the selected machine learning algorithm are used to train the abnormal traffic identification model offline.

在一具體實施例中，利用上述攻擊類型辯識模型以及該流量特徵資料集判定該攻擊類型的步驟更包括：從至少一已知之入侵偵測資料集選擇至少一行為特徵；對至少一機器學習演算法及一辨識結果進行效益分析以產生至少一被選擇的機器學習演算法；以及以以上述行為特徵及上述被選擇的機器學習演算法離線訓練出上述攻擊類型辯識模型。In a specific embodiment, the step of using the attack type identification model and the traffic feature data set to determine the attack type further includes: selecting at least one behavior feature from at least one known intrusion detection data set; learning at least one machine An algorithm and an identification result are used for benefit analysis to generate at least one selected machine learning algorithm; and the attack type identification model is trained offline with the behavior characteristics and the selected machine learning algorithm.

據此，本發明還提供一種骨幹網路異常流量偵測系統，執行前述骨幹網路異常流量偵測方法。Accordingly, the present invention also provides a backbone network abnormal traffic detection system that executes the aforementioned backbone network abnormal traffic detection method.

對於相關領域一般技術者而言這些與其他的觀點與實施例在參考後續詳細描述與伴隨圖示之後將變得明確。These and other viewpoints and embodiments will become clear to those of ordinary skill in the related art with reference to the subsequent detailed description and accompanying drawings.

現在將參考本發明之伴隨圖式詳細描述實施例。在該伴隨圖式中，相同及/或對應元件系以相同參考符號所表示。The embodiments will now be described in detail with reference to the accompanying drawings of the present invention. In the accompanying drawings, the same and/or corresponding elements are denoted by the same reference symbols.

在此將揭露各種實施例；然而，要瞭解到所揭露之實施例只用於作為可體現為各種形式之例證。此外，連接各種實施例所給予之每一範例都預期作為例示，而非用於限制。進一步的，該圖式並不一定符合尺寸比例，某些特徵係被放大以顯示特定元件之細節（且該圖式中所示之任何尺寸、材料與類似細節都預期僅為例示而非限制）。因此，在此揭露之特定結構與功能細節並不被解釋做為限制，而只是用於教導相關領域技術人員實作所揭露之實施例的基礎。Various embodiments will be disclosed here; however, it should be understood that the disclosed embodiments are only used as examples that can be embodied in various forms. In addition, each example given in connection with the various embodiments is intended to be illustrative, not limiting. Further, the drawings do not necessarily conform to the size ratio, and some features are enlarged to show the details of specific elements (and any dimensions, materials, and similar details shown in the drawings are intended to be illustrative and not limiting) . Therefore, the specific structural and functional details disclosed herein are not to be construed as limitations, but are merely used to teach those skilled in the relevant art to implement the disclosed embodiments.

在以下多個示例具體實施例的詳細敘述中，對該等隨附圖式進行參考，該等圖式形成本發明之一部分。且係以範例說明的方式顯示，藉由該範例可實作該等所敘述之具體實施例。提供足夠的細節以使該領域技術人員能夠實作該等所述具體實施例，而要瞭解到在不背離其精神或範圍下，也可以使用其他具體實施例，並可以進行其他改變。此外，雖然可以如此，但對於「一實施例」的參照並不需要屬於該相同或單數的具體實施例。因此，以下詳細敘述並不具有限制的想法，而該等敘述具體實施例的範圍係僅由該等附加申請專利範圍所定義。In the following detailed description of a number of example specific embodiments, reference is made to the accompanying drawings, which form part of the present invention. It is shown by way of example description, and the described specific embodiments can be implemented by this example. Provide sufficient details to enable those skilled in the art to implement the described specific embodiments, and understand that other specific embodiments can be used and other changes can be made without departing from the spirit or scope thereof. In addition, although this may be the case, reference to "one embodiment" does not need to belong to the same or singular specific embodiment. Therefore, the following detailed description does not have a limiting idea, and the scope of the specific embodiments of the description is only defined by the scope of the additional patent applications.

第一圖係依據本發明一實施例之骨幹網路異常流量偵測方法的流程圖，第二圖係依據本發明一實施例之離線訓練異常流量辯識模型的流程圖，第三圖係依據本發明一實施例之離線訓練攻擊類型辯識模型的流程圖。以下請一併參考第一圖到第三圖說明本發明一實施例之骨幹網路異常流量偵測方法。The first figure is a flowchart of a backbone network abnormal traffic detection method according to an embodiment of the invention, the second figure is a flowchart of an offline training abnormal traffic identification model according to an embodiment of the invention, and the third figure is based on A flowchart of an offline training attack type identification model according to an embodiment of the present invention. In the following, please refer to the first figure to the third figure to explain the abnormal traffic detection method of the backbone network according to an embodiment of the present invention.

第一圖係依據本發明一實施例之骨幹網路異常流量偵測方法的流程圖，雖然圖中顯示這些步驟具有順序性，但本發明所屬領域中具有通常知識者應可瞭解，在其他實施例中，某些步驟可以交換或者同時執行。The first figure is a flowchart of a method for detecting abnormal traffic in a backbone network according to an embodiment of the present invention. Although the steps shown in the figure are sequential, those with ordinary knowledge in the field of the present invention should be able to understand that in other implementations For example, some steps can be exchanged or executed simultaneously.

在步驟S102，利用來源網際網路協定（IP）位址過濾骨幹網路中的網路流量。在本實施例中，利用預先建立黑名單（未圖示）與白名單（未圖示）過濾骨幹網路中的網路流量為第一層過濾機制，黑名單內儲存已知之異常流量來源IP位址，白名單內儲存可信任來源IP位址，當某個來源IP位址在短時間內出現大量的異常流量時，將會將此來源IP位址儲存到黑名單內，而白名單為使用者預先加入，通常將加入大型網路服務商（例如：Google、Facebook、YouTube等）的IP位址，當來自於黑名單或白名單內IP位址的封包到達時，即將這些封包直接丟棄。In step S102, the source Internet Protocol (IP) address is used to filter the network traffic in the backbone network. In this embodiment, the pre-established blacklist (not shown) and whitelist (not shown) are used to filter the network traffic in the backbone network as the first layer filtering mechanism, and the known abnormal traffic source IP is stored in the blacklist Address, the trusted source IP address is stored in the white list. When a source IP address has a large amount of abnormal traffic in a short time, the source IP address will be stored in the black list, and the white list is Users join in advance, usually will join the IP addresses of large network service providers (such as: Google, Facebook, YouTube, etc.), when the packets from the IP addresses in the blacklist or whitelist arrive, they will be discarded directly .

在步驟S104，將過濾過後的該網路流量均分到分散式大數據處理系統的複數工作節點中。在本實施例中，將過濾過後的網路流量均分到分散式大數據處理系統的工作節點中為第一層分流機制，分散式大數據處理系統為阿帕契風暴（Apache Storm）系統，每個工作節點都為一台具有平行處理的能力之伺服器，每台伺服器可為實體伺服器或虛擬機器（Virtual Machine）。在本實施例中，非將原始之網路流量送入複數工作節點中，而是將經過轉換後的流量傳輸統計資料送入複數工作節點中，因此將節省大量的頻寬，具體實作方式可利用習知封包解析器實現，例如：瑞擎數位股份有限公司的”PacketX Grism”產品內建的負載平衡（Load Balance）的設定實現，若利用該產品實現時，送入複數工作節點的流量傳輸統計資料之格式即為Cisco Netflow V9。In step S104, the filtered network traffic is evenly distributed to a plurality of working nodes of the distributed big data processing system. In this embodiment, the filtered network traffic is evenly distributed to the working nodes of the decentralized big data processing system as the first layer of distribution mechanism, and the decentralized big data processing system is the Apache Storm system. Each working node is a server with parallel processing capability. Each server can be a physical server or a virtual machine (Virtual Machine). In this embodiment, instead of sending the original network traffic to the plural working nodes, the converted traffic transmission statistics are sent to the plural working nodes, so a lot of bandwidth will be saved, and the specific implementation method It can be realized by a conventional packet parser, for example: the load balance setting of the built-in "PacketX Grism" product of Rockwell Digital Co., Ltd., and if it is realized by this product, the traffic sent to multiple working nodes The format of the transmission statistics is Cisco Netflow V9.

在步驟S106，於上述複數工作節點進行平行處理，並於上述每一個工作節點的每一個產生複數流量特徵資料集（Datasets）。在本實施例中，於每個工作節點進行平行處理並於每個工作節點產生流量特徵資料集為第二層的分流機制以及資料預處理程序。送入每個工作節點的資料格式將為Cisco Netflow V9，這個格式中已包括基本流量特徵，接著，在利用流量特徵演算法分析流量傳輸統計資料以產生上述流量特徵資料集，舉例來說，利用習知之網路流量監視器例如：Argus所提供的工具將網路流量轉換為Argus 流量後，先取得原始流量特徵，再根據UNSW-NB15 （University of New South Wale Network Based 2015）定義的演算法產生額外流量特徵，流量特徵資料集包括基本流量特徵、原始流量特徵以及額外流量特徵。要說明的是，由於UNSW-NB15為2015年位於澳洲新南威爾斯大學（UNSW）的澳洲國防學院（ADFA）發佈的入侵偵測資料集，且為目前使用最廣範之入侵偵測資料集，因此，將省略詳細的實作方式。In step S106, parallel processing is performed on the plural working nodes, and plural flow characteristic data sets (Datasets) are generated on each of the working nodes. In this embodiment, parallel processing is performed on each working node and a flow characteristic data set is generated on each working node as a second-level distribution mechanism and data preprocessing program. The data format sent to each working node will be Cisco Netflow V9. This format already includes the basic flow characteristics. Then, the flow characteristics calculation algorithm is used to analyze the flow transmission statistics to generate the above flow characteristic data set. For example, use Conventional network traffic monitors: For example, the tool provided by Argus converts network traffic to Argus traffic, and then obtains the original traffic characteristics, and then generates it according to the algorithm defined by UNSW-NB15 (University of New South Wale Network Based 2015) Additional flow characteristics. The flow characteristics data set includes basic flow characteristics, original flow characteristics, and additional flow characteristics. It should be noted that since UNSW-NB15 is an intrusion detection data set released by the Australian Defense College (ADFA) at the University of New South Wales (UNSW) in 2015, it is currently the most widely used intrusion detection data set. Therefore, detailed implementation will be omitted.

在步驟S108，於上述每一個工作節點，利用複數異常流量辯識模型以及上述流量特徵資料集，以判定該網路流量是否異常。在本實施例中，利用離線訓練出的複數異常流量辯識模型以及上述流量特徵資料集判定該網路流量是否異常為第二層過濾機制，為了提升整體的分析效能，利用離線訓練出的異常流量辯識模型辨識流量是否具有攻擊行為特徵，當出現攻擊行為特徵時，進入步驟S110，當判定該網路流量為正常時，將結束本發明之骨幹網路異常流量偵測方法的流程（未圖示）。In step S108, at each of the working nodes, a plurality of abnormal traffic identification models and the above-mentioned traffic characteristic data set are used to determine whether the network traffic is abnormal. In this embodiment, the offline abnormal multiple traffic identification model and the above-mentioned traffic feature data set are used to determine whether the network traffic is abnormal is the second layer filtering mechanism. In order to improve the overall analysis performance, the offline trained abnormal The traffic identification model identifies whether the traffic has attack behavior characteristics. When the attack behavior characteristics appear, it proceeds to step S110. When it is determined that the network traffic is normal, the process of the backbone network abnormal traffic detection method of the present invention will be ended (not Icon).

在本實施例中，將以離線的方式，預先建立異常流量辯識模型，請同時參考第二圖，第二圖係依據本發明一實施例之離線訓練出異常流量辯識模型的流程圖。In this embodiment, an abnormal traffic identification model will be pre-established in an offline manner. Please also refer to the second figure, which is a flowchart of training an abnormal traffic identification model offline according to an embodiment of the present invention.

在步驟S202，從至少一已知之入侵偵測資料集選擇至少一行為特徵。在習知技術中，使用資料集的所有特徵來做機器學習知識模型的訓練，不一定保證會有最佳的性能（performance），且會增加計算成本，並提高辨識的錯誤率，因此在本實例中，將先做特徵選擇（feature selection），其目的為在不失去準確度的情況下，提高機器學習的分類速度，所使用的特徵選擇方法包括下列：In step S202, at least one behavior feature is selected from at least one known intrusion detection data set. In the conventional technology, using all the features of the data set to train the machine learning knowledge model does not necessarily guarantee the best performance (performance), and will increase the calculation cost and increase the recognition error rate. In the example, feature selection will be done first. Its purpose is to improve the classification speed of machine learning without losing accuracy. The feature selection methods used include the following:

CfsSubsetEval：產生一組與類別（Class）有高關聯性，但是特徵之間低關聯性的特徵集合。CfsSubsetEval: Generates a set of features that have a high correlation with the class (Class) but low correlation between features.

CorrelationAttributeEval：計算出類別和特徵（ Feature）之間的關聯性，數值範圍為1至-1。CorrelationAttributeEval: Calculate the correlation between category and feature (Feature), the value range is 1 to -1.

InfoGainAttributeEval：以熵（Entropy）為基準，計算出資訊增益（Information Gain），得到的值愈大，表示這個特徵用來分類資料會愈佳。InfoGainAttributeEval: Based on the entropy (Entropy) as the benchmark, the information gain (Information Gain) is calculated. The larger the value, the better this feature is used to classify the data.

GainRatioAttributeEval：根據資訊增益和分別資訊（Split Information），計算出增益比率（Gain Ratio）的值，得到的值愈大表示這個特徵愈重要。GainRatioAttributeEval: According to the information gain and split information, the value of Gain Ratio is calculated. The larger the value, the more important this feature is.

OneRAttributeEval：根據一種名為OneR的分類（Classifier）方法，計算出特徵的誤判率，誤判率越低越好。OneRAttributeEval: According to a classifier called OneR (Classifier) method, calculate the feature false positive rate, the lower the false positive rate, the better.

ReliefFAttributeEval：根據猜中近鄰（near hit）和猜錯近鄰（near miss）計算出特徵權重。猜中近鄰指的是相同類別最為接近的值，猜錯近鄰為不同類別最為接近的值。ReliefFAttributeEval: Calculate feature weights based on near hits and near misses. The nearest neighbor in the guess refers to the closest value of the same category, and the wrong neighbor is the closest value in different categories.

SymmetricalUncertAttributeEval：衡量類別與特徵之間的關係。SymmetricalUncertAttributeEval: measures the relationship between categories and features.

WrapperSubsetEval：設定分類的方法，將特徵進行篩選選出一組特徵集合，並帶入分類來確認該特徵集合是否為最佳的組合，否則重新篩選特徵。WrapperSubsetEval: Set the classification method, filter the features to select a set of feature sets, and bring in the classification to confirm whether the feature set is the best combination, otherwise re-filter the features.

在步驟S204，對至少一機器學習演算法及辨識結果進行效益分析以產生至少一被選擇的機器學習演算法。在本實施例中，為了訓練機器，將預先蒐集具有攻擊行為之封包資料樣本作為訓練資料（Training Data），從訓練資料中擷取出資料的特徵（Features）幫助系統判讀出目標，例如：來源IP、使用協定等，再告訴機器每一個攻擊所對應到的答案，把有攻擊行為之封包的資料標籤（Label）為1、一般封包標籤為0，由此讓機器知道那些封包有攻擊行為、那些封包沒有，隨著訓練的資料量變大，當一筆新資料輸入機器中，比如特徵具備協定為用戶資料報協定（UDP），系統即會判斷這封包有無攻擊行為或者有無攻擊行為的機率。由於此方法在機器的訓練過程中告訴機器答案即上述「有標籤」的資料，此為「監督式學習」（Supervised Learning）。In step S204, a benefit analysis is performed on at least one machine learning algorithm and the recognition result to generate at least one selected machine learning algorithm. In this embodiment, in order to train the machine, the pre-collected packet data samples with attack behaviors are used as training data (Training Data), and the features of the data are extracted from the training data to help the system interpret the target, for example: source IP , Use agreement, etc., and then tell the machine the answer corresponding to each attack, the data label (Label) of the packet with the attack behavior is 1, the general packet label is 0, thus letting the machine know that those packets have the attack behavior, those There is no packet. As the amount of training data becomes larger, when a new piece of data is entered into the machine, for example, the feature agreement is User Datagram Protocol (UDP), the system will determine whether the packet has attack behavior or the probability of attack behavior. Since this method tells the machine that the above-mentioned "labeled" data is in the training process of the machine, this is "supervised learning".

在本實施例中，還可利用「非監督式學習」（Unsupervised Learning），訓練資料沒有標準答案、不需要事先輸入標籤，機器在學習時並不知道其分類結果是否正確。訓練時僅須對機器提供輸入範例，它會自動從這些範例中找出潛在的規則。In this embodiment, "Unsupervised Learning" can also be used. There is no standard answer to the training data and there is no need to input tags in advance. The machine does not know whether the classification result is correct when learning. During training, you only need to provide input examples to the machine, and it will automatically find potential rules from these examples.

在本實施例中，所使用的機器學習演算法包括下列：In this embodiment, the machine learning algorithm used includes the following:

貝氏網路學習法（BayesNet）：貝氏網路學習法使用了各種搜尋演算法與品質測量法，基於貝氏網路分類器，提供了資料結構（網路結構、條件機率分佈等）以及使用貝氏網路學習演算法常見的工具。Bayesian network learning method (BayesNet): Bayesian network learning method uses various search algorithms and quality measurement methods, based on Bayesian network classifier, provides data structure (network structure, conditional probability distribution, etc.) and Use common tools for Bayesian network learning algorithms.

簡單貝氏模型（NaiveBayes）：簡單貝氏模型直接假設所有的隨機變數之間具有條件獨立的情況，因此可以直接利用條件機率相乘的方法，計算出聯合機率分布。p（X|C） = P（X1|C）P（X2|C）...P（Xd|C），其中X=[X1,X2,...,Xd]是一個特徵向量，而C代表一個特定類別。由此假設所產生的簡單貝氏分類器（naive Bayes classifier）是相當有實用性，其辨識效能常常不輸給其它更複雜的辨識器。Simple Bayesian model (NaiveBayes): The simple Bayesian model directly assumes that all random variables are conditionally independent, so you can directly use the method of conditional probability multiplication to calculate the joint probability distribution. p(X|C) = P(X1|C)P(X2|C)...P(Xd|C), where X=[X1,X2,...,Xd] is a feature vector, and C Represents a specific category. The simple naive Bayes classifier produced by this assumption is quite practical, and its recognition performance is often not lost to other more complex classifiers.

JRip分類法：此種分類法實施規則分類法（Rule-Based Classifier），主要採用"If...Then"的方式對記錄做分類的動作。而本分類法為由William W. Cohen提出，由重複增量修剪以產生減少誤差。JRip taxonomy: This taxonomy implements the rule-based classifier (Rule-Based Classifier), which mainly uses the "If...Then" method to classify records. The classification method is proposed by William W. Cohen, and it is pruned by repeated increments to reduce errors.

PART分類法：此種分類法實施規則分類法（Rule-Based Classifier），為Weka（Weka是以Java為基礎的資料探勘與機器學習的軟體）特有，用個別擊破的方式建構局部的C4.5決策樹，並將最好的葉子組成規則。PART taxonomy: This taxonomy implements the rule-based classifier (Rule-Based Classifier), unique to Weka (Weka is a software based on Java-based data exploration and machine learning), and constructs a partial C4.5 by means of individual breakdown Decision tree, and make the best leaves into rules.

J48演算法：此演算法為使用C4.5之決策樹，其核心算法是ID3演算法，改進了 ID3，用信息增益率（Gain Ratio）來選擇屬性，克服了用信息增益選擇屬性時偏向選擇取值多的屬性的不足。在Weka中，可以設定參數使用修剪或未修剪的決策樹，而若選擇修剪，則預設為悲觀錯誤剪枝（Pessimistic Error Pruning，PER），使用錯誤率來進行修剪，此演算法首先確定這個葉子的經驗錯誤率（empirical）為（E+ 0.5）/N，0.5為一個調整係數。對於一顆擁有L個葉子的子樹，則子樹的錯誤數和實例數都是就應該是葉子的錯誤數和實例數求和的結果。J48也預設使用子樹提升（Subtree Raising）方法，方法為選擇某子樹，提升其層級以置換並將該子樹的樹根用其內部節點或葉節點替代。也可透過設置參數更改為子樹置換（Subtree Replacement），方法為選擇某個子樹，並用單個樹葉來置換它。J48 algorithm: This algorithm is a decision tree using C4.5, and its core algorithm is the ID3 algorithm, which improves ID3, uses the information gain ratio (Gain Ratio) to select attributes, and overcomes the bias selection when using information gain to select attributes Insufficient attributes with many values. In Weka, you can set parameters to use pruned or unpruned decision trees. If you choose pruning, the default is Pessimistic Error Pruning (PER). Use error 率來进行 pruning. This algorithm first determines this The empirical error rate of the leaves is (E+0.5)/N, and 0.5 is an adjustment factor. For a subtree with L leaves, the error number and the number of instances of the subtree are the result of the sum of the number of leaf errors and the number of instances. J48 also presets to use the Subtree Raising method by selecting a subtree, raising its hierarchy to replace and replace the root of the subtree with its internal node or 葉node. It can also be changed to Subtree Replacement by setting parameters by selecting a subtree and replacing it with a single tree 葉來.

隨機樹（Random Tree）：此演算法為Weka特有的演算法，其它的函式庫有些稱為隨機樹之演算法其實為隨機森林，但Weka不同，這裡指的是一棵隨機選取屬性的樹。Random Tree (Random Tree): This algorithm is unique to Weka. Some of the other libraries are called random trees. The algorithm is actually a random forest, but Weka is different. This refers to a tree with randomly selected attributes. .

隨機森林（Random Forest）：此種演算法會從中訓練資料中取出部分特徵（Feature）與部份資料產生出樹（Tree ）（通常是使用CART演算法），重覆建構出數棵不修剪的隨機樹後，每一棵樹都進行預測，最後將每個預測結果進行投票，票多者就是整個森林的預測結果。而在Weka中，則是會建構出數棵隨機樹以建立一個森林並進行預測。Random Forest (Random Forest): This algorithm will extract some features (Feature) and some data from the training data to generate a tree (usually using the CART algorithm), repeatedly constructing a few trees without pruning After random trees, each tree is predicted, and finally each prediction result is voted. The one with the most votes is the prediction result of the entire forest. In Weka, several random trees are constructed to build a forest and make predictions.

在步驟S206，以上述行為特徵及上述被選擇的機器學習演算法離線訓練出上述異常流量辯識模型。In step S206, the abnormal traffic identification model is trained offline with the behavior characteristics and the selected machine learning algorithm.

回到第一圖，在步驟S108，即利用上述方法訓練出的異常流量辯識模型判定該網路流量是否異常，當判定該網路流量為異常時，進入步驟S110。Returning to the first figure, in step S108, the abnormal traffic identification model trained using the above method determines whether the network traffic is abnormal. When it is determined that the network traffic is abnormal, the process proceeds to step S110.

在步驟S110，當判定該網路流量為異常時，利用複數攻擊類型辯識模型以及上述流量特徵資料集，以判定攻擊類型並產生分析結果。在本實施例中，利用離線訓練出的攻擊類型辯識模型以及流量特徵資料集判定攻擊類型並產生分析結果為第三層過濾機制。於識別攻擊類型後，可將分析結果送往分析資料庫儲存並於顯示介面中利用圖形化方式顯示分析結果並發出即時警示資訊。於識別攻擊類型後，還可將分析結果送往大量異常流量分析模組以分析該網路流量以取得網路流量的來源IP位址，並將來源IP位址加入黑名單中以即時更新黑名單。In step S110, when it is determined that the network traffic is abnormal, the multiple attack type identification model and the aforementioned traffic feature data set are used to determine the attack type and generate an analysis result. In this embodiment, the attack type identification model and the traffic characteristic data set trained offline are used to determine the attack type and generate the analysis result as the third layer filtering mechanism. After identifying the attack type, the analysis results can be sent to the analysis database for storage and graphically displayed in the display interface to display the analysis results and issue real-time warning information. After identifying the attack type, the analysis results can also be sent to a large number of abnormal traffic analysis modules to analyze the network traffic to obtain the source IP address of the network traffic, and the source IP address is added to the blacklist to update the black real-time List.

在本實施例中，將離線的方式，預先建立攻擊類型辯識模型，請同時參考第三圖，第三圖係依據本發明一實施例之離線訓練出攻擊類型辯識模型的流程圖。In this embodiment, an attack type identification model is pre-established in an offline manner. Please also refer to the third figure. The third figure is a flowchart of training an attack type identification model according to an embodiment of the present invention.

在步驟S302，對從至少一已知之入侵偵測資料集選擇至少一行為特徵，在本實施例中，使用的特徵選擇方法請參考上述第二圖中的步驟S202的描述。In step S302, at least one behavior feature is selected from at least one known intrusion detection data set. In this embodiment, for the feature selection method, please refer to the description of step S202 in the second figure above.

在步驟S304，至少一機器學習演算法及一辨識結果進行效益分析以產生至少一被選擇的機器學習演算法。在本實施例中，所使用的機器學習演算法請參考上述第二圖中的步驟S204的描述。In step S304, at least one machine learning algorithm and a recognition result are analyzed for benefit to generate at least one selected machine learning algorithm. In this embodiment, please refer to the description of step S204 in the second figure above for the machine learning algorithm used.

在步驟S306，以上述行為特徵及上述被選擇的機器學習演算法離線訓練出上述攻擊類型辯識模型。In step S306, the attack type identification model is trained offline with the behavior characteristics and the selected machine learning algorithm.

第四圖係依據本發明一實施例之骨幹網路異常流量偵測系統的具體結構之示意圖。如第四圖所示，骨幹網路異常流量偵測系統400包括來源過濾模組410、資料分流模組420、分散式大數據處理系統430、大量異常流量分析模組480以及分析資料庫490。分散式大數據處理系統430中包括n個工作節點432_1~432_n中，在工作節點432_1中配置有m個流量特徵處理模組440_1~440_m、m個異常流量辯識模組450_1~450_m以及攻擊類型辯識模組460_1~460_m，在此，n為至少為1的自然數及m為至少為2的自然數。The fourth figure is a schematic diagram of a specific structure of a backbone network abnormal traffic detection system according to an embodiment of the present invention. As shown in the fourth figure, the backbone network abnormal traffic detection system 400 includes a source filtering module 410, a data distribution module 420, a distributed big data processing system 430, a large number of abnormal traffic analysis modules 480, and an analysis database 490. The decentralized big data processing system 430 includes n working nodes 432_1~432_n, and m flow characteristic processing modules 440_1~440_m, m abnormal flow recognition modules 450_1~450_m and attack types are arranged in the working node 432_1 Identification modules 460_1~460_m, where n is a natural number of at least 1 and m is a natural number of at least 2.

來源過濾模組410利用來源網際網路協定（IP）位址過濾骨幹網路中的網路流量。在本實施例中，來源過濾模組410進行第一層過濾機制，預先建立黑名單（未圖示）與白名單（未圖示）於來源過濾模組410，黑名單內儲存已知的異常流量來源IP位址，白名單內儲存可信任來源IP位址，當某個來源IP位址在短時間內出現大量的異常流量時，將會將此來源IP位址儲存到黑名單內，而白名單為系統管理者手動加入，通常將加入大型網路服務商（例如：Google、Facebook、YouTube等）的IP位址，當來自於黑名單或白名單內IP位址的封包到達來源過濾模組410時，來源過濾模組410會將這些封包直接丟棄。The source filtering module 410 uses the source Internet Protocol (IP) address to filter the network traffic in the backbone network. In this embodiment, the source filtering module 410 performs a first-level filtering mechanism, pre-establishing a black list (not shown) and a white list (not shown) in the source filtering module 410, and the known exceptions are stored in the black list Traffic source IP address, the trusted source IP address is stored in the white list. When a source IP address has a large amount of abnormal traffic in a short time, the source IP address will be stored in the black list, and The white list is manually added by the system administrator, and usually will be added to the IP addresses of large network service providers (for example: Google, Facebook, YouTube, etc.). When the packets from the IP addresses in the black list or white list reach the source filtering module In group 410, the source filtering module 410 will directly discard these packets.

在本實施例中，來源過濾模組410可利用一般封包解析器實現，例如：瑞擎數位股份有限公司的”PacketX Grism”產品，白名單即使用該產品內建的設定即可輕易地將可信任來源IP位址加入，黑名單則是利用該產品提供的應用程式與本發明之大量異常來源模組480進行連接，當出現大量的異常行為封包時即可在第一時間進行阻擋，避免後端設備遭受攻擊。In this embodiment, the source filter module 410 can be implemented using a general packet parser, for example: the "PacketX Grism" product of Realtek Digital Co., Ltd. The white list can be easily changed using the built-in settings of the product Trust source IP address is added, and the blacklist uses the application provided by the product to connect with the large number of abnormal source modules 480 of the present invention. When a large number of abnormal behavior packets occur, they can be blocked at the first time to avoid End device is attacked.

資料分流模組420將來源過濾模組410過濾過後的網路流量均分到分散式大數據處理系統430的n個工作節點432_1~432_n中，在本實施例中，資料分流模組420進行第一層分流機制，資料分流模組420輸出到每個工作節點432_1~432_n為經過轉換後的流量傳輸統計資料，因此將節省大量的頻寬。The data distribution module 420 divides the network traffic filtered by the source filtering module 410 into n working nodes 432_1~432_n of the distributed big data processing system 430. In this embodiment, the data distribution module 420 performs With a layered distribution mechanism, the data distribution module 420 outputs to each working node 432_1~432_n the converted traffic transmission statistics, so a large amount of bandwidth will be saved.

在本實施例中，資料分流模組420具體的實作方式可利用一般封包解析器實現，例如：瑞擎數位股份有限公司的”PacketX Grism”產品內建的負載平衡的設定實現，資料分流模組420輸出的格式為Cisco Netflow V9。In this embodiment, the specific implementation of the data distribution module 420 can be implemented using a general packet parser, for example: the load balancing setting built into the "PacketX Grism" product of Realtek Digital Co., Ltd., data distribution mode The output format of group 420 is Cisco Netflow V9.

每個工作節點432_1~432_n都為一台具有平行處理的能力之伺服器，每台伺服器可為實體伺服器，也可以是虛擬機器。由於每個工作節點432_1~432_n都為一台具有平行處理的能力之伺服器，因此可於每個工作節點432_1~432_n配置複數個流量特徵處理模組、異常流量辯識模組以及攻擊類型辯識模組，在本實施例中以在工作節點432_1中配置有m個流量特徵處理模組440_1~440_m、m個異常流量辯識模組450_1~450_m以及攻擊類型辯識模組460_1~460_m為例進行說明。Each working node 432_1~432_n is a server with parallel processing capability. Each server can be a physical server or a virtual machine. Since each working node 432_1~432_n is a server with parallel processing capabilities, a plurality of traffic feature processing modules, abnormal traffic identification modules, and attack type defense can be configured on each working node 432_1~432_n Recognition module. In this embodiment, m working flow nodes 432_1 are equipped with m flow characteristic processing modules 440_1~440_m, m abnormal flow recognition modules 450_1~450_m and attack type recognition modules 460_1~460_m as Examples.

流量特徵處理模組440_1~440_m於工作節點432_1進行平行處理，並分別產生流量特徵資料集。在本實施例中，流量特徵處理模組440_1~440_m進行第二層的分流機制以及資料預處理程序，資料分流模組420輸出的格式為Cisco Netflow V9，這個格式中已包括基本流量特徵，接著，再利用流量特徵演算法分析流量傳輸統計資料以產生上述流量特徵資料集，舉例來說，利用習知之網路流量監視器例如：Argus所提供的工具將網路流量轉換為Argus 流量後，先取得原始流量特徵，再根據UNSW-NB15定義的演算法產生額外流量特徵，流量特徵資料集包括基本流量特徵、原始流量特徵以及額外流量特徵。The flow characteristic processing modules 440_1~440_m perform parallel processing on the working node 432_1 and generate flow characteristic data sets respectively. In this embodiment, the traffic feature processing modules 440_1~440_m perform the second layer of traffic distribution mechanism and data preprocessing procedures. The format of the data traffic distribution module 420 output is Cisco Netflow V9. This format already includes the basic traffic characteristics, and then , And then use the traffic characteristic algorithm to analyze the traffic transmission statistics to generate the above traffic characteristic data set. For example, using a conventional network traffic monitor such as: Argus provides tools to convert network traffic to Argus traffic, first Obtain the original flow characteristics, and then generate additional flow characteristics according to the algorithm defined by UNSW-NB15. The flow characteristic data set includes basic flow characteristics, original flow characteristics, and additional flow characteristics.

異常流量辯識模組450_1~450_m利用離線訓練出的複數異常流量辯識模型以及上述流量特徵資料集判定該網路流量是否異常。在本實施例中，異常流量辯識模組450_1~450_m進行第二層過濾機制，為了系統整體的分析效能，異常流量辯識模組450_1~450_m僅辨識流量是否具有攻擊行為特徵，若有出現時才將流量送往下一層進行攻擊類型辨識，若辨識結果為正常，則輸出分析結果474_1~474_m。The abnormal traffic identification modules 450_1~450_m use a plurality of abnormal traffic identification models trained offline and the above-mentioned traffic characteristic data set to determine whether the network traffic is abnormal. In this embodiment, the abnormal traffic identification modules 450_1~450_m perform the second layer filtering mechanism. For the overall analysis performance of the system, the abnormal traffic identification modules 450_1~450_m only identify whether the traffic has attack behavior characteristics. Only when the traffic is sent to the next layer for attack type identification, if the identification result is normal, the analysis results 474_1~474_m will be output.

攻擊類型辯識模組460_1~460_m利用離線訓練出的複數攻擊類型辯識模型以及該流量特徵資料集判定攻擊類型並產生分析結果472_1~472_m。本實施例中，攻擊類型辯識模組460_1~460_m進行第三層過濾機制，若流量被判定具有攻擊行為，則會在攻擊類型辯識模組460_1~460_m識別攻擊類型，並產生分析結果472_1~472_m 。The attack type identification modules 460_1~460_m use the offline multiple attack type identification model and the traffic feature data set to determine the attack type and generate analysis results 472_1~472_m. In this embodiment, the attack type identification modules 460_1~460_m perform a third layer filtering mechanism. If the traffic is determined to have an attack behavior, the attack type identification modules 460_1~460_m identify the attack type and generate an analysis result 472_1 ~472_m.

在本實施例中，所有工作節點432_1~432_n的分析結果470，包括分析結果472_1~472_m以及分析結果474_1~474_m都可送往分析資料庫490儲存並於顯示介面中利用圖形化方式顯示分析結果（未圖示）。在本實施例中，分析結果472_1~472_m時於顯示介面中顯示時，也同時發出即時警示資訊。在本實施例中，所有工作節點432_1~432_n的分析結果470也可送往大量異常流量分析模組480以分析該網路流量以取得網路流量的來源IP位址，並將來源IP位址加入黑名單中以即時更新黑名單。In this embodiment, the analysis results 470 of all working nodes 432_1~432_n, including the analysis results 472_1~472_m and the analysis results 474_1~474_m, can be sent to the analysis database 490 for storage and graphically displayed in the display interface (Not shown). In this embodiment, when the analysis results 472_1~472_m are displayed in the display interface, real-time warning information is also issued at the same time. In this embodiment, the analysis results 470 of all working nodes 432_1~432_n can also be sent to a large number of abnormal traffic analysis modules 480 to analyze the network traffic to obtain the source IP address of the network traffic, and the source IP address Join the blacklist to update the blacklist instantly.

在本實施例中，分散式大數據處理系統430為分散式即時運算系統，因此下文將以阿帕契風暴系統作為分散式大數據處理系統430的實施例進行更詳細地說明，請同時參考第五圖，第五圖為阿帕契風暴系統的架構示意圖。In this embodiment, the decentralized big data processing system 430 is a decentralized real-time computing system, so the Apache storm system will be used as an embodiment of the decentralized big data processing system 430 in more detail below. Please also refer to the Figure 5 is a schematic diagram of the Apache storm system architecture.

如第五圖所示，阿帕契風暴系統500是一個分散式、可靠與容錯的系統並且以串流的方式處理大量數據，目前已普遍應用於即時數據分析或處理，阿帕契風暴系統500包括三種節點：工作節點432_1~432_n、主節點510、暫存節點520_1~520_t，其中t為至少為3的自然數。As shown in the fifth figure, Apache Storm System 500 is a decentralized, reliable, and fault-tolerant system and processes a large amount of data in a stream. It is currently widely used in real-time data analysis or processing. Apache Storm System 500 There are three types of nodes: working nodes 432_1~432_n, master node 510, and temporary nodes 520_1~520_t, where t is a natural number of at least 3.

主節點510又稱為Nimbus，主要負責管理、協調和監控在整個系統中運行的拓撲（topology），包括拓撲佈署、任務分配以及發生故障時的任務重新分配。The master node 510, also known as Nimbus, is mainly responsible for managing, coordinating, and monitoring the topology running in the entire system, including topology deployment, task allocation, and task redistribution in the event of a failure.

暫存節點520_1~520_t又稱為ZooKeeper，在習知分散式應用程式中，各種工作流程都需要相互協調並共享一些資訊，暫存節點520_1~520_t就是扮演主節點510和工作節點432_1~432_n之間的溝通橋樑。主節點510和工作節點432_1~432_n將所有資料儲存於暫存節點520_1~520_t，若突然終止主節點510和工作節點432_1~432_n將不會影響整個系統的運作。Temporary nodes 520_1~520_t are also known as ZooKeeper. In conventional distributed applications, various workflows need to coordinate with each other and share some information. Temporary nodes 520_1~520_t play the role of master node 510 and working nodes 432_1~432_n Communication bridge. The master node 510 and the working nodes 432_1~432_n store all the data in the temporary nodes 520_1~520_t. If the master node 510 and the working nodes 432_1~432_n are terminated abruptly, it will not affect the operation of the entire system.

工作節點432_1~432_n又稱為Supervisor，每個工作節點432_1~432_n都有工作流程，主要負責創建、啟動和停止工作流程以執行所分配的任務。Work nodes 432_1~432_n are also called Supervisors. Each work node 432_1~432_n has a work flow, which is mainly responsible for creating, starting, and stopping the work flow to perform the assigned tasks.

第六圖係利用第五圖之阿帕契風暴系統進行骨幹網路異常流量偵測方法的流程圖。在本實施例中，為了在阿帕契風暴系統500上使用即時運算，需要建立拓撲，並將其佈署於群集上以達到即時處理數據，拓撲由根節點（Spout）600與子節點（Bolt）節點610_1~610_6組成，根節點600與子節點610_1~610_6之間透過值組（tuple）這種數據結構傳遞數據。The sixth figure is a flowchart of the Apache storm system of the fifth figure for the backbone network abnormal traffic detection method. In this embodiment, in order to use real-time computing on the Apache Storm System 500, a topology needs to be established and deployed on the cluster to achieve real-time data processing. The topology consists of the root node (Spout) 600 and the child nodes (Bolt ) Nodes 610_1~610_6 are formed, and data is transferred between the root node 600 and the child nodes 610_1~610_6 through a data structure such as a tuple.

根節點600主要接收由流量特徵資料集表示的網路流量，並將此網路流量以值組格式流向子節點610_1。The root node 600 mainly receives the network traffic represented by the traffic characteristic data set, and flows the network traffic to the child node 610_1 in a value group format.

子節點610_1啟動離線產生之多個流量異常辨識模型即時辨識網路流量行為，若模型認為該網路流量為正常則標示為0，若為異常則標示為1，由流量特徵資料集表示的網路流量與多個流量異常辨識模型的辨識結果以值組格式流向子節點610_2。The child node 610_1 starts multiple traffic abnormality identification models generated offline to identify network traffic behavior in real time. If the model considers the network traffic to be normal, it is marked as 0, and if it is abnormal, it is marked as 1. The network represented by the flow characteristic data set The identification results of the road flow and multiple flow anomaly identification models flow to the child node 610_2 in the value group format.

子節點610_2綜合多個流量異常辨識模型的辨識結果，若任一辨識結果為異常，則由流量特徵資料集表示的網路流量以值組格式流向子節點610_3，若多個辨識結果皆為正常，則最終結果標示為0，並將由流量特徵資料集表示的網路流量、多個辨識結果以及最終辨識結果以值組格式流向子節點610_4。The child node 610_2 integrates the recognition results of multiple traffic anomaly recognition models. If any recognition result is abnormal, the network traffic represented by the traffic characteristic data set flows to the child node 610_3 in the value group format. If multiple recognition results are normal , The final result is marked as 0, and the network traffic represented by the traffic characteristic data set, multiple recognition results, and the final recognition result are flowed to the child node 610_4 in a value group format.

子節點610_3啟動離線產生之多個攻擊類型辨識模型辨識網路攻擊類型，若模型認為該網路流量非模型本身攻擊類型，則標示為0，若為模型本身攻擊類型，則標示為1，且流量特徵資料集表示的網路流量以及多個辨識結果以值組格式流向子節點610_5。The child node 610_3 starts multiple attack type identification models generated offline to identify the type of network attack. If the model considers that the network traffic is not the attack type of the model itself, it is marked as 0, and if it is the model itself attack type, it is marked as 1, and The network traffic represented by the traffic characteristic data set and the multiple recognition results flow to the child node 610_5 in a value group format.

子節點610_5綜合多個辨識結果，若任一辨識結果為異常，則最終辨識結果標示為1，若每個辨識結果為正常，則最終結果標示為0。最後將由流量特徵資料集表示的網路流量、多個辨識結果以及最終辨識結果以值組格式流向流向子節點610_6。The child node 610_5 integrates multiple recognition results. If any recognition result is abnormal, the final recognition result is marked as 1, if each recognition result is normal, the final result is marked as 0. Finally, the network traffic represented by the traffic characteristic data set, multiple recognition results, and the final recognition result flow to the child node 610_6 in the value group format.

子節點610_4以日誌（Log）方式記錄子節點610_2所傳送之資料即流量異常辯識結果，並且以每秒之速度產生日誌記錄檔。The child node 610_4 records the data transmitted by the child node 610_2 in the form of log (Log), that is, the identification result of the abnormal traffic, and generates a log file at the rate of every second.

子節點610_6以日誌（Log）方式記錄子節點610_5所傳送之資料即攻擊類型辨識結果，並且以每秒之速度產生日誌記錄檔。The child node 610_6 records the data transmitted by the child node 610_5 as a result of attack type identification in a log (Log) mode, and generates a log record file at a rate of every second.

第七圖係依據本發明另一實施例之骨幹網路異常流量偵測系統的示意圖。如圖所示，在本實施例中，骨幹網路異常流量偵測系統700的分為離線部份702和即時部份704。The seventh figure is a schematic diagram of a backbone network abnormal traffic detection system according to another embodiment of the present invention. As shown in the figure, in this embodiment, the backbone network abnormal traffic detection system 700 is divided into an offline part 702 and a real-time part 704.

在離線部份702，預先建立多個異常流量辯識模型706及攻擊類型辯識模型708。在本實施例中，利用第二圖以及第三圖的流程分別建立多個異常流量辯識模型706及攻擊類型辯識模型708，所使用的方法請分別參考第二圖以及第三圖的說明，此內容將不再贅述。In the offline part 702, a plurality of abnormal traffic identification models 706 and attack type identification models 708 are pre-established. In this embodiment, multiple abnormal traffic identification models 706 and attack type identification models 708 are established using the processes in the second and third figures, respectively. For the method used, please refer to the descriptions in the second and third figures, respectively , This content will not repeat.

在即時部份704，來源過濾模組710利用來源網際網路協定（IP）位址過濾骨幹網路中的網路流量，來源過濾模組710進行第一層過濾機制，擷取真實的網路流量並過濾已知無害的網路流量，以降低後續入侵偵測的負載In the real-time part 704, the source filtering module 710 uses the source internet protocol (IP) address to filter the network traffic in the backbone network, and the source filtering module 710 performs the first layer filtering mechanism to extract the real network Traffic and filter known harmless network traffic to reduce the load of subsequent intrusion detection

資料分流模組720將來源過濾模組710過濾過後的網路流量均分到分散式大數據處理系統730的n個工作節點732_1~732_n中，資料分流模組720進行第一層分流機制，其將分成符合異常流量辯識模型以及攻擊類型辯識模型之行為特徵的串流型式，之後在各工作節點732_1~732_n進行分散式異常行為偵測。The data distribution module 720 divides the network traffic filtered by the source filtering module 710 into n working nodes 732_1~732_n of the distributed big data processing system 730. The data distribution module 720 performs the first layer distribution mechanism. It will be divided into a stream type that conforms to the behavior characteristics of the abnormal traffic identification model and the attack type identification model, and then performs distributed abnormal behavior detection on each working node 732_1~732_n.

各工作節點732_1~732_n各自進行平行處理即第二層分流機制，依據預先建立之多個異常流量辯識模型706及多個攻擊類型辯識模型708同時快速地區分出正常及異常的網路流量並產生分析結果740以對異常的網路流量發出警告。另外，在本實施例中，各工作節點732_1~732_n之和第四圖之工作節點432_1具有相同的工作內容，請參考第四圖的相關說明，此內容將不再贅述。Each working node 732_1~732_n performs parallel processing, that is, the second-layer diversion mechanism, and quickly distinguishes normal and abnormal network traffic at the same time according to multiple abnormal traffic identification models 706 and multiple attack type identification models 708 established in advance And generate analysis results 740 to warn about abnormal network traffic. In addition, in this embodiment, the working nodes 732_1 to 732_n and the working node 432_1 of the fourth figure have the same working content. Please refer to the related description of the fourth figure, which will not be repeated here.

綜上所述，本發明的特點在於提供一種骨幹網路流量的偵測方法和系統，其能因應骨幹網路的頻寬需求，並利用「分層」與「分流」的概念以及「離線」與「即時」兩階段處理達成快速且有效率地識別出異常流量，據此，大幅提升偵測方法和系統的效能。此外，本發明的分為離線和即時兩個部份，先擷取真實流量，利用第一層過濾機制過濾已知無害的網路流量以降低後續入侵偵測的負載，接著再以符合入侵偵測知識庫行為特徵的串流型式，進行分散式異常行為偵測，每一個異常行為分類器依據事先載入的入侵偵測知識庫，同時、快速地區分出正常及異常的網路流量，並對異常的網路流量發出警告。In summary, the feature of the present invention is to provide a backbone network traffic detection method and system, which can respond to the bandwidth requirements of the backbone network, and utilize the concepts of "layering" and "shunting" and "offline" With the "real-time" two-stage processing, abnormal and rapid traffic can be identified quickly and efficiently. Based on this, the performance of the detection method and system is greatly improved. In addition, the present invention is divided into two parts: offline and real-time. It first captures real traffic, uses the first layer filtering mechanism to filter known harmless network traffic to reduce the load of subsequent intrusion detection, and then conforms to intrusion detection. Streaming patterns that measure the behavior characteristics of the knowledge base to perform distributed anomaly behavior detection. Each anomaly behavior classifier quickly distinguishes normal and abnormal network traffic based on the intrusion detection knowledge base loaded in advance. Warn about abnormal network traffic.

S102:步驟 S104:步驟 S106:步驟 S108:步驟 S110:步驟 S202:步驟 S204:步驟 S206:步驟 S302:步驟 S304:步驟 S306:步驟 400:骨幹網路異常流量偵測系統 410:來源過濾模組 420:資料分流模組 430:分散式大數據處理系統 432_1至432_n:工作節點 440_1至440_m:流量特徵處理模組 450_1至450_m:異常流量辯識模組 460_1至460_m:攻擊類型辯識模組 470:分析結果 472_1至472_m:分析結果 474_1至474_m:分析結果 480:大量異常流量分析模組 490:分析資料庫 500:阿帕契風暴系統 510:主節點 520_1至520_t:暫存節點 600:根節點 610_1至610_6:子節點 700:骨幹網路異常流量偵測系統 702:離線部份 704:即時部份 706:多個異常流量辯識模型 708:多個攻擊類型辯識模型 710:來源過濾模組 720:資料分流模組 730:分散式大數據處理系統 732_1至732_n:工作節點 740:分析結果 S102: Step S104: Step S106: Step S108: Step S110: Step S202: Step S204: Step S206: Step S302: Step S304: Step S306: Step 400: backbone network abnormal traffic detection system 410: Source filter module 420: data distribution module 430: Decentralized big data processing system 432_1 to 432_n: working node 440_1 to 440_m: flow characteristic processing module 450_1 to 450_m: abnormal flow identification module 460_1 to 460_m: Attack type identification module 470: Analysis results 472_1 to 472_m: analysis results 474_1 to 474_m: analysis results 480: A large number of abnormal traffic analysis modules 490: Analysis database 500: Apache Storm System 510: master node 520_1 to 520_t: temporary node 600: root node 610_1 to 610_6: child nodes 700: Backbone network abnormal traffic detection system 702: Offline part 704: real-time part 706: Multiple abnormal traffic identification models 708: Multiple attack type identification model 710: Source filter module 720: data distribution module 730: Decentralized big data processing system 732_1 to 732_n: working node 740: Analysis results

參照下列圖式與說明，可更進一步理解本發明。非限制性與非窮舉性實例系參照下列圖式而描述。在圖式中的構件並非必須為實際尺寸；重點在於說明結構及原理。第一圖係依據本發明一實施例之骨幹網路異常流量偵測方法的流程圖。第二圖係依據本發明一實施例之離線訓練異常流量辯識模型的流程圖。第三圖係依據本發明一實施例之離線訓練攻擊類型辯識模型的流程圖。第四圖係依據本發明一實施例之骨幹網路異常流量偵測系統的具體結構之示意圖。第五圖係依據本發明一實施例之分散式即時運算系統的架構示意圖。第六圖係利用第五圖之分散式即時運算系統進行骨幹網路異常流量偵測方法的流程圖。第七圖係依據本發明另一實施例之骨幹網路異常流量偵測系統的示意圖。The invention can be further understood with reference to the following drawings and description. Non-limiting and non-exhaustive examples are described with reference to the following drawings. The components in the drawings do not have to be actual sizes; the emphasis is on explaining the structure and principles. The first figure is a flowchart of a method for detecting abnormal traffic in a backbone network according to an embodiment of the present invention. The second figure is a flowchart of an offline training abnormal traffic identification model according to an embodiment of the present invention. The third figure is a flowchart of an offline training attack type identification model according to an embodiment of the invention. The fourth figure is a schematic diagram of a specific structure of a backbone network abnormal traffic detection system according to an embodiment of the present invention. The fifth figure is a schematic structural diagram of a distributed real-time computing system according to an embodiment of the invention. Figure 6 is a flow chart of a method for detecting abnormal traffic on the backbone network using the distributed real-time computing system of Figure 5. The seventh figure is a schematic diagram of a backbone network abnormal traffic detection system according to another embodiment of the present invention.

S102:步驟 S102: Step

S104:步驟 S104: Step

S106:步驟 S106: Step

S108:步驟 S108: Step

S110:步驟 S110: Step

Claims

A method for detecting abnormal traffic in a backbone network, which includes the following steps: filtering a network traffic in a backbone network using a source Internet Protocol (IP) address; dividing the filtered network traffic into equal parts In the plural working nodes of a decentralized big data processing system; parallel processing is performed on the plural working nodes, and a complex flow characteristic data set is generated on each of the working nodes; on each of the working nodes, the plural abnormal flow identification is used The model and the above-mentioned traffic feature data set to determine whether the network traffic is abnormal; and when the network traffic is determined to be abnormal, use the multiple attack type identification model and the above-mentioned traffic feature data set to determine an attack type and generate a Analyze the results.

The method for detecting abnormal traffic in the backbone network as described in the first patent application scope further includes: sending the analysis result to an analysis database for storage and display in a display interface.

The method for detecting abnormal traffic in the backbone network as described in the first patent application, wherein the step of using the IP address to filter the network traffic in the backbone network further includes: creating a white list and a black and white list, wherein A plurality of abnormal source IP addresses are stored in the black and white list, and a plurality of trusted source IP addresses are stored in the white list; determine whether a source IP address of a packet in the backbone network is in the white list or the black and white list; And when the source IP address of the above packet is in the white list or the black and white list, the above packet is discarded.

The method for detecting abnormal traffic in the backbone network as described in the third patent application scope further includes: when it is determined that the network traffic is abnormal, sending the analysis result to a large number of abnormal traffic analysis modules; analyzing the network traffic To obtain the source IP address of the network traffic; and add the source IP address to the blacklist.

The method for detecting abnormal traffic in the backbone network as described in claim 1 of the patent scope, wherein the step of dividing the filtered network traffic into the above-mentioned working nodes of the distributed big data processing system further includes: filtering The network traffic is divided into a plurality of working nodes of an Apache Storm system, and a flow transmission statistics is output after a conversion process.

The abnormal traffic detection method of the backbone network as described in claim 5 of the patent application, wherein the step of generating the traffic characteristic data set at each of the working nodes further includes: analyzing the traffic transmission statistics using a traffic characteristic algorithm to Generate the above flow characteristic data set.

The abnormal traffic detection method of the backbone network as described in the first patent application, wherein the traffic characteristic data set includes at least one basic traffic characteristic, at least one original traffic characteristic, and at least one additional traffic characteristic.

The method for detecting abnormal traffic in the backbone network as described in claim 1 of the patent application, wherein the step of using the abnormal traffic identification model and the flow characteristic data set to determine whether the network traffic is abnormal further includes: from at least one Knowing intrusion detection data sets select at least one behavior feature; perform benefit analysis on at least one machine learning algorithm and a recognition result to generate at least one selected machine learning algorithm; using the above behavior characteristics and the selected machine learning algorithm Method to train the above abnormal traffic identification model offline.

The method for detecting abnormal traffic in the backbone network as described in the first patent application, wherein the step of using the attack type identification model and the traffic characteristic data set to determine the attack type further includes: from at least one known intrusion detection data Select at least one behavior feature; perform benefit analysis on at least one machine learning algorithm and a recognition result to generate at least one selected machine learning algorithm; train the above offline with the above behavior characteristics and the selected machine learning algorithm Attack type identification model.

A backbone network abnormal traffic detection system executes the backbone network abnormal traffic detection method as described in one of the patent application items 1 to 10.