TW202135507A

TW202135507A - Unsupervised malicious flow detecting system and method capable of classifying normal flow rate or abnormal flow rate by only inspecting the first few bytes of the first few packets in each flow

Info

Publication number: TW202135507A
Application number: TW109107039A
Authority: TW
Inventors: 黃仁竑; 林柏青; 黃建維; 彭敏君
Original assignee: 國立中正大學
Priority date: 2020-03-04
Filing date: 2020-03-04
Publication date: 2021-09-16
Also published as: TWI715457B

Abstract

The invention provides an unsupervised malicious flow detecting system and method. A pre-processing module is used to classify the received original packets according to the flow to which the original packets belong, and then the first few bytes of the first few packets in the same flow are selected to be input to a convolutional neural network model for performing at least one convolution and dimension reduction sampling so as to screen out features of the packets. Furthermore, an autoencoder is used to learn and classify the features of the packets so as to establish at least one type of normal flow rate, and determine whether the flow rate of the flow detected at present is abnormal or not according to the type of normal flow rate. Therefore, the normal flow rate or abnormal flow rate can be classified by only inspecting the first few bytes of the first few packets in each flow without having to inspect the complete flow, so that the system efficiency can be increased, and the abnormal flow rate can be stopped as soon as possible.

Description

Unsupervised malicious flow detection system and method

本發明係有關一種偵測惡意網路流量之技術，特別是指一種非監督式惡意流量偵測系統及方法。The present invention relates to a technology for detecting malicious network traffic, in particular to an unsupervised malicious traffic detection system and method.

面對各種網路威脅，入侵檢測系統大概有兩種主要的檢測方法：以流量中的特定片段與惡意流量資料庫中的資料比對來判斷的方法為特徵檢測(Signature-based Detection)。雖然此種方法之誤報率(False Positive Rate)較低，但面對未知攻擊流量會喪失判斷的能力，並且由於需要擷取特徵，用在即時檢測的系統上效能較差。另一種是基於異常的檢測方法(Anomaly-based Detection)，能夠偵測未知型態的入侵，但具有較高的誤報率。。In the face of various network threats, there are probably two main detection methods for intrusion detection systems: Signature-based Detection is the method of judging by comparing specific fragments in the traffic with data in the malicious traffic database. Although the false positive rate of this method is low, it will lose the ability to judge in the face of unknown attack traffic, and because of the need to extract features, the performance of real-time detection systems is poor. The other is Anomaly-based Detection, which can detect unknown types of intrusions, but has a high false alarm rate. .

現今的異常檢測系統，多為以下四種分類方法：基於端口的辨識(port-based)、深度封包檢測(deep packets inspection based)、流量的統計資料(statistical)、和流量行為模式(behavioral)。從機器思維的觀點來看前兩種屬於以定義規則來對流量的辨識方法 (Rule-based approach)需要比對資料來判斷，但具有一定的計算成本，且無法對加密的流量進行判斷。後兩種屬於機器學習的範疇，利用提取的特徵對流量進行分類，雖然突破了基於規則的缺點，但在特徵的提取會對結果有很大的影響。此外，目前也有利用自動編碼器進行特徵學習和降維，針對不同輸入進行訓練，以檢測物聯網裝置是否發出惡意流量的深度學習架構，利用連線特徵做為輸入，可篩選重要的特徵，一般具有很高的正確率，但特徵的好壞對結果有較大影響，更由於其輸入資料需要從原始連線中擷取特徵，致使即時偵測之效能有所下降，且容易從同類型之位址資訊找出封包關聯性而影響隱私。Today's anomaly detection systems are mostly classified into the following four classification methods: port-based, deep packets inspection based, traffic statistics (statistical), and traffic behavior mode (behavioral). From the point of view of machine thinking, the first two belong to the rule-based approach that uses defined rules to identify traffic (Rule-based approach) that requires comparison of data to determine, but has a certain computational cost and cannot determine encrypted traffic. The latter two belong to the category of machine learning, and use the extracted features to classify traffic. Although it breaks through the rule-based shortcomings, the feature extraction will have a great impact on the results. In addition, there are also deep learning architectures that use autoencoders for feature learning and dimensionality reduction, and train for different inputs to detect whether IoT devices are sending out malicious traffic. Use connection features as input to filter important features. It has a high accuracy rate, but the quality of the features has a greater impact on the results, and because the input data needs to be extracted from the original connection, the performance of real-time detection is reduced, and it is easy to get from the same type. The address information finds out the packet relevance and affects privacy.

有鑑於此，本發明即提出一種非監督式惡意流量偵測系統及方法，利用深度學習的卷積神經網路模型中權重的訓練，達到自動特徵提取和選擇的功能，以有效解決上述該等問題，具體架構及其實施方式將詳述於下：In view of this, the present invention proposes an unsupervised malicious traffic detection system and method, which uses the weight training in the convolutional neural network model of deep learning to achieve the functions of automatic feature extraction and selection, so as to effectively solve the above-mentioned problems. The problems, specific structure and implementation methods will be detailed below:

本發明之主要目的在提供一種非監督式惡意流量偵測系統及方法，其只需檢視每條連線中前幾個封包的前幾個位元組，即可對網路流量是正常或異常進行分類，無需檢視完整的連線，故可大幅降低所檢視的流量，提高系統效能，並及早對異常流量進行阻擋。The main purpose of the present invention is to provide an unsupervised malicious traffic detection system and method. It only needs to check the first few bytes of the first few packets in each connection to determine whether the network traffic is normal or abnormal. For classification, there is no need to view the complete connection, so the traffic under inspection can be greatly reduced, system performance can be improved, and abnormal traffic can be blocked early.

本發明之另一目的在提供一種非監督式惡意流量偵測系統及方法，其利用卷積神經網路從原始封包中自動學習特徵，再利用自動編碼器依據該些特徵建立正常流量的型態，因此相當容易部署與調整，更可達到高準確度。Another object of the present invention is to provide an unsupervised malicious traffic detection system and method, which uses a convolutional neural network to automatically learn features from original packets, and then uses an autoencoder to establish a pattern of normal traffic based on these features , So it is quite easy to deploy and adjust, and can achieve high accuracy.

本發明之再一目的在提供一種非監督式惡意流量偵測系統及方法，其用以區別正常和異常流量所設定之閥值係基於自動編碼器中正常流量的均方誤差(MSELoss)分佈，並可針對不同的均方誤差的差異進行分級警示。Another object of the present invention is to provide an unsupervised malicious traffic detection system and method, which is used to distinguish between normal and abnormal traffic. The threshold set is based on the mean square error (MSELoss) distribution of the normal traffic in the autoencoder. And can carry out grading warning according to the difference of different mean square error.

為達上述目的，本發明提供一種非監督式惡意流量偵測系統，包括：一預處理模組，將接收到的複數原始封包依據所屬的連線進行分類後，取同一連線中的前複數個封包，再提取該等封包的前複數個位元組；一卷積神經網路模型，訊號連接該預處理模組，以該等位元組做為輸入進行至少一次卷積和降維採樣，再篩選出該等封包之特徵；以及一自動編碼器，訊號連接該卷積神經網路模型，對該等封包之特徵進行學習及分類，建立至少一正常流量的型態，並藉由該正常流量的型態分類目前所檢視之該連線的流量是否異常。To achieve the above objective, the present invention provides an unsupervised malicious traffic detection system, including: a preprocessing module, which classifies the received plural original packets according to the connection to which they belong, and then takes the first plural of the same connection Then extract the first multiple bytes of the packets; a convolutional neural network model, the signal is connected to the preprocessing module, and the bytes are used as input for at least one convolution and dimensionality reduction sampling , And then filter out the characteristics of the packets; and an autoencoder, the signal is connected to the convolutional neural network model, to learn and classify the characteristics of the packets, establish at least one type of normal traffic, and use the The type of normal traffic is classified as to whether the traffic of the connection currently being checked is abnormal.

依據本發明之實施例，該卷積神經網路模型包括一卷積層及一池化層，該卷積層以該等位元組做為輸入進行卷積，得到一特徵圖像，該池化層以降維方式對該特徵圖像進行至少一特徵之採樣。According to an embodiment of the present invention, the convolutional neural network model includes a convolutional layer and a pooling layer. The convolutional layer uses the bytes as input to perform convolution to obtain a feature image. The pooling layer At least one feature is sampled on the feature image in a dimensionality reduction manner.

依據本發明之實施例，該預處理模組係依據該等原始封包之來源IP位址、來源埠、目的IP位址、目的埠及傳輸層協議判斷是否為同一連線。According to an embodiment of the present invention, the preprocessing module determines whether the original packets are the same connection based on the source IP address, source port, destination IP address, destination port, and transport layer protocol of the original packets.

承上，該預處理模組更刪除錯誤和重複流量，並將該等封包之來源IP位址、MAC位址等資訊進行隨機化處理。In addition, the preprocessing module deletes errors and duplicate traffic, and randomizes the source IP address, MAC address and other information of the packets.

依據本發明之實施例，該等位元組包括該等封包之一標頭欄位及部分封包內容。According to an embodiment of the present invention, the bytes include a header field of the packets and part of the packet content.

依據本發明之實施例，該自動編碼器為一非監督式二元分類器，將該連線之流量分類為正常或異常。According to an embodiment of the present invention, the autoencoder is an unsupervised binary classifier, which classifies the connection traffic as normal or abnormal.

依據本發明之實施例，該卷積神經網路模型之一交叉熵損失(CrossEntropyLoss)加上該自動編碼器之一均方誤差(MSELoss)可得到一損失函數。According to an embodiment of the present invention, a cross-entropy loss (CrossEntropyLoss) of the convolutional neural network model plus a mean square error (MSELoss) of the autoencoder can obtain a loss function.

依據本發明之實施例，該自動編碼器具有一閥值，該閥值之計算方式為參考正常流量從該自動編碼器得到的該均方誤差之分佈。According to an embodiment of the present invention, the automatic encoder has a threshold, and the threshold is calculated by referring to the distribution of the mean square error obtained from the automatic encoder with reference to the normal flow.

依據本發明之實施例，該卷積神經網路模型中更包括一全連接層(Dense Layer)，其中包括與該封包之一標頭欄位之數量相符的複數神經元。According to an embodiment of the present invention, the convolutional neural network model further includes a fully connected layer (Dense Layer), which includes a plurality of neurons corresponding to the number of a header field of the packet.

本發明另提供一種非監督式惡意流量偵測方法，包括下列步驟：利用一預處理模組將接收到的複數原始封包依據所屬的連線(flow)進行分類後，取同一連線中的前複數個封包，再提取該等封包的前複數個位元組；將該等位元組輸入一卷積神經網路模型，進行至少一次卷積和降維採樣，再篩選出該等封包之特徵；以及利用一自動編碼器對該等封包之特徵進行學習及分類，建立至少一正常流量的型態，並藉由該正常流量的型態分類目前所檢視之該連線的流量是否異常。The present invention also provides an unsupervised malicious traffic detection method, including the following steps: using a preprocessing module to classify the received plural original packets according to the connection (flow) to which they belong, and then select the previous ones in the same connection. Multiple packets, and then extract the first multiple bytes of the packets; input the same bytes into a convolutional neural network model, perform at least one convolution and dimensionality reduction sampling, and then filter out the characteristics of the packets ; And using an auto-encoder to learn and classify the characteristics of the packets, establish at least one normal traffic type, and use the normal traffic type to classify whether the current connection traffic is abnormal.

本發明提供一種非監督式惡意流量偵測系統及方法，先利用卷積神經網路從原始封包中自動學習各裝置的流量特徵，且僅檢視少部分的原始封包的標頭與內容，而學習後的資料輸出到一非監督式深度學習模型(自動編碼器)，訓練以建立正常流量的型態，並據此決定所檢視的流量是否異常。由於目前對於各種攻擊最新的防禦系統仍然大都依靠事先定義之完整網路流量的特徵，這些特徵定義是人工的，且在取出流量特徵後也已來不及阻擋惡意流量。而本發明只檢視每條連線的前幾個封包的前幾個位元組，因此可以大幅降低所檢視的流量，快速偵測到異常流量，及早發現異常流量並進行阻擋。The present invention provides an unsupervised malicious traffic detection system and method. It first uses a convolutional neural network to automatically learn the traffic characteristics of each device from an original packet, and only inspects a small part of the header and content of the original packet to learn The latter data is output to an unsupervised deep learning model (autoencoder), which is trained to establish a pattern of normal traffic, and based on this, it is determined whether the traffic under inspection is abnormal. Since the latest defense systems for various attacks still mostly rely on pre-defined characteristics of complete network traffic, these characteristics are defined manually, and it is too late to block malicious traffic after extracting the traffic characteristics. However, the present invention only inspects the first few bytes of the first few packets of each connection, so the inspected traffic can be greatly reduced, abnormal traffic can be detected quickly, and abnormal traffic can be detected early and blocked.

請參考第1圖，其為本發明非監督式惡意流量偵測系統之方塊圖，請同時參考第2圖，其為本發明非監督式惡意流量偵測方法之流程圖。Please refer to Figure 1, which is a block diagram of the unsupervised malicious traffic detection system of the present invention. Please also refer to Figure 2, which is a flowchart of the unsupervised malicious traffic detection method of the present invention.

本發明之非監督式惡意流量偵測系統包括一預處理模組10、一卷積神經網路模型(Convolutional Neural Network, CNN)12及一自動編碼器(Autoencoder)14，其中卷積神經網路模型12訊號連接預處理模組10，自動編碼器14訊號連接卷積神經網路模型12。本發明中之自動編碼器14為非監督式深度學習模型。The unsupervised malicious traffic detection system of the present invention includes a preprocessing module 10, a convolutional neural network (Convolutional Neural Network, CNN) 12, and an autoencoder (Autoencoder) 14, wherein the convolutional neural network The signal of the model 12 is connected to the preprocessing module 10, and the signal of the autoencoder 14 is connected to the convolutional neural network model 12. The autoencoder 14 in the present invention is an unsupervised deep learning model.

當接收到原始封包後，如步驟S10所述，預處理模組10將接收到的原始封包進行分類，屬於同一條連線(flow)的原始封包被分類在一起，接著取同一連線中的前複數個封包，再提取該等封包的前複數個位元組；接著步驟S12，將該些位元組做為卷積神經網路模型12的輸入，在卷積神經網路模型12中進行至少一次卷積和降維採樣後，再篩選出該些封包之特徵；如步驟S14所述，利用自動編碼器14對該些封包之特徵進行學習及分類，建立至少一正常流量的型態，最後並藉由正常流量的型態判斷目前所檢視之連線的流量是否異常，如步驟S16所述。When the original packet is received, as described in step S10, the preprocessing module 10 classifies the received original packet. The original packets belonging to the same flow are classified together, and then the packets in the same flow The first plurality of packets, and then the first plurality of bytes of the packets are extracted; then in step S12, these bytes are used as the input of the convolutional neural network model 12, which is performed in the convolutional neural network model 12. After at least one convolution and dimensionality reduction sampling, the characteristics of the packets are filtered out; as described in step S14, the autoencoder 14 is used to learn and classify the characteristics of the packets to establish at least one type of normal traffic, Finally, it is determined whether the current flow of the connection under inspection is abnormal according to the normal flow type, as described in step S16.

以下詳述每一元件在每一步驟中的詳細流程。The detailed flow of each component in each step is detailed below.

預處理模組Preprocessing module 1010 ：:

預處理模組10依據原始封包之來源IP位址、來源埠、目的IP位址、目的埠及傳輸層協議等，判斷是否為同一條連線，並剔除掉錯誤和重複流量後，已將輸入的原始封包依所屬連線分類完成；接著，由於在實驗中的惡意流量只有少數幾個是被攻擊者，且現實中之攻擊多會偽造來源IP，為了確保系統的可信度，本發明特別針對惡意流量的固定身分資訊(如來源IP位址, MAC位址等)進行以隨機化處理。最後會針對每條連線中所具有的封包大小及數量進行測試，以在正確率及即時性上取得平衡。The preprocessing module 10 judges whether it is the same connection based on the source IP address, source port, destination IP address, destination port, and transport layer protocol of the original packet, and after removing errors and duplicate traffic, it has input The original packets are classified according to their connection; then, since only a few malicious traffic in the experiment are attacked, and in reality, most attacks will forge the source IP. In order to ensure the credibility of the system, the present invention is particularly Randomize the fixed identity information (such as source IP address, MAC address, etc.) of malicious traffic. Finally, the size and number of packets in each connection will be tested to strike a balance between accuracy and real-time.

卷積神經網路模型 12 ： Convolutional neural network model 12 :

卷積神經網路是一深度神經網路，最常用於分析視覺圖像，利用卷積層(convolution layer)的方式將影像中顏色、紋理、光源、大小等等做為類神經網路(neural network)的輸入特徵。與一般的多層感知器相比，最大特色在於局部感知與權重共享，藉由Filter抽取影像的局部特徵，並且讓影像各區域共享這個Filter，如此一來能改善原先類神經網路將影像拉成1×N向量時，輸入資料失去局部關聯性的問題，因此常運用於局部關係強烈的圖像辨識領域。Convolutional neural network is a deep neural network. It is most commonly used to analyze visual images. The color, texture, light source, size, etc. in the image are used as a neural network (neural network) by means of a convolution layer. ) Input features. Compared with the general multi-layer perceptron, the biggest feature is the local perception and weight sharing. The filter extracts the local features of the image, and allows each area of the image to share this Filter, which can improve the original neural network to pull the image into In the case of 1×N vector, the input data loses the problem of local relevance, so it is often used in the image recognition field with strong local relations.

卷積神經網路的基本思想簡單直觀，利用多樣化的影像資料庫做為訓練影像，將影像利用數以百萬計的神經網路參數(一群具有特定功能的參數我們稱為model)向網路輸出端傳遞，在輸出端計算目標與預測的誤差，藉由反向傳播學習(back-propagation)不斷更新神經網路的權重值，造就卷積類神經網路可解決大量資料的問題，因此對於高變化性、大量且高維的影像辨識而言，具有很大的應用與研究價值，網路架構常包含單個或多個卷積層(convolution layer)、池化層(pooling layer，subsampling)，並在輸出端連結全連接層(fully-connected layer，原始的類神經網路)。The basic idea of convolutional neural network is simple and intuitive. It uses a diversified image database as training image, and uses millions of neural network parameters (a group of parameters with specific functions we call model) to the network. The output end is passed, the target and prediction errors are calculated at the output end, and the weight value of the neural network is continuously updated through back-propagation, so that the convolutional neural network can solve the problem of a large amount of data. For high-variability, large-scale and high-dimensional image recognition, it has great application and research value. The network architecture often includes single or multiple convolution layers and pooling layers (subsampling). And connect the fully-connected layer (original neural network) at the output end.

卷積神經網路模型12之架構圖如第1圖所示，第一層卷積層122之運作為透過每一個過濾器(Filter)與原始圖像進行卷積，而後可得到一個特徵圖像，而其特徵圖像之深度會等於其過濾器之數量，其方程式如下式(1)：

……..(1) 其中，k為過濾器之代號；W^k 為第k個過濾器的向量權重；b_k 為第k個過濾器的偏移量；x_ij 為基於第k個過濾器之大小下於原始影像的位置(i,j)中，各像素之數值；而h^k _ij 為基於第k個過濾器的向量權重與基於第k個過濾器之大小下於原始影像的位置(i,j)中像素之數值進行點積後，所輸出之新的像素值。The architecture diagram of the convolutional neural network model 12 is shown in Figure 1. The operation of the first convolutional layer 122 is to convolve with the original image through each filter, and then a feature image can be obtained. And the depth of the feature image will be equal to the number of filters, and the equation is as follows (1):

……..(1) where k is the code name of the filter; W ^k is the vector weight _{of the k-th filter; b k} is the offset of the k-th filter; x _ij is based on the k-th filter The size of is the value of each pixel in the position (i, j) of the original image; and h ^k _ij is the vector weight based on the k-th filter and the position of the original image based on the size of the k-th filter ( The new pixel value output after dot product of the pixel value in i, j).

第二層池化層124之運作為類似進行訊號處理，以降維的方式進行特徵的採樣。假設以最大池化(Maxpooling)為例，如第2圖所示，將具有16個像素的影像分成四個區塊，每一個區塊的四個像素中取最大值，例如左上角區塊為1,1,5,6，則最大值為6，以此類推可得到最大池化後的影像包含6,8,3,4等四個像素。The operation of the second pooling layer 124 is similar to signal processing, and feature sampling in a dimensionality reduction manner. Assume that Maxpooling is taken as an example. As shown in Figure 2, an image with 16 pixels is divided into four blocks, and the four pixels in each block take the maximum value. For example, the block in the upper left corner is 1,1,5,6, the maximum value is 6, and so on, the maximum pooled image can be obtained with four pixels including 6, 8, 3, and 4.

第三層全連接層126之運作像一般類神經網路裡的全連接層一樣，其將原始圖片經過好幾層卷積層122和池化層124後，所篩選過之重要圖片與神經元進行兩向量間之點積，方程式如下式(2)：

……(2) 其上述之參數n^w _i 為第w個全連接層下，第i個神經元之輸出值；k為全連接層中神經元之數量；ω^w _ji 為第w個全連接層中，對於第j個特徵參數所對應之第i個神經元的向量權重；b^w 為第w個全連接層中之偏移量；而x_j 為第j張圖片之輸入的特徵向量。The third fully-connected layer 126 operates like a fully-connected layer in a general neural network. It passes the original image through several layers of convolutional layer 122 and pooling layer 124, and then the selected important images are doubled with neurons. The dot product between vectors, the equation is as follows (2):

……(2) The above-mentioned parameter n ^w _i is the output value of the i-th neuron under the w-th fully connected layer; k is the number of neurons in the fully-connected layer; ω ^w _ji is the w-th fully connected layer In the layer, the vector weight of the i-th neuron corresponding to the j-th feature parameter; b ^w is the offset in the w-th fully connected layer; and x _j is the input feature vector of the j-th picture.

第四層輸出層128之運作為經過全連接層126後欲輸出之預測結果，其方程式如下式(3)： ……….(3) 其中，參數o_c 為c類別之預測輸出結果；l為輸入之神經元數量；ω_ic 為c類別之神經元中，對於第i個神經元之權重值；b為輸出層之偏移量；而n_i 為欲輸入之第i個神經元。

The operation of the fourth output layer 128 is the prediction result to be output after the fully connected layer 126, and its equation is as follows (3): ... (3) where the parameter o _c is the prediction output result of category c; Is the number of input neurons; ω _ic is the weight value of the i-th neuron in the neuron of category c; b is the offset of the output layer; and n _i is the i-th neuron to be input.

本發明中所使用的卷積神經網路模型12如上所述為利用分類二維圖像資料的方法分類網路流量的原始封包，但也可用一維的方式，此時，一維的卷積神經網路模型是以原始封包的標頭欄位做為輸入，所以把卷積層124的過路器之核心尺寸(kernel size)設成6，其係以標頭欄位中最大寬度欄位之Mac位址設置)。As described above, the convolutional neural network model 12 used in the present invention uses the method of classifying two-dimensional image data to classify the original packets of network traffic, but it can also be used in a one-dimensional manner. In this case, a one-dimensional convolution The neural network model takes the header field of the original packet as input, so the kernel size of the passer of the convolutional layer 124 is set to 6, which is the Mac with the largest width field in the header field Address setting).

自動編碼器 14 ： Auto encoder 14 :

自動編碼器是通過重建輸入的神經網路訓練過程，其全連接層向量具有降維、降噪的作用。特點是編碼器會建立一個全連接層(或多個全連接層)包含了輸入資料含義的低維向量。此外有一個解碼器，會通過全連接層的低維向量重建輸入資料。通過神經網路的訓練最後自動編碼器會在全連接層中得到一個代表輸入資料的低維向量，其可幫助保留重要資訊以達到資料分類、視覺化、儲存、壓縮、降噪…等的功能，是一種非監督的學習模式，只需要輸入資料，不需要標籤輸入資料。The autoencoder is a neural network training process by reconstructing the input, and its fully connected layer vector has the effect of dimensionality reduction and noise reduction. The characteristic is that the encoder will establish a fully connected layer (or multiple fully connected layers) containing the low-dimensional vector of the meaning of the input data. In addition, there is a decoder that reconstructs the input data from the low-dimensional vectors of the fully connected layer. Through the training of the neural network, the autoencoder will obtain a low-dimensional vector representing the input data in the fully connected layer, which can help retain important information to achieve data classification, visualization, storage, compression, noise reduction... etc. , Is an unsupervised learning mode, only need to input data, no label input data.

本發明中的自動編碼器14為一種非監督式二元分類器，用以將連線之流量分類為正常或異常，將上述一維卷積神經網路中之全連接層(第5層)加上自動編碼器14，去學習卷積神經網路模型12中提取之特徵，可用以訓練所有正常流量之型態。The autoencoder 14 in the present invention is an unsupervised binary classifier, used to classify the connected traffic as normal or abnormal, and classify the fully connected layer (layer 5) in the above-mentioned one-dimensional convolutional neural network The autoencoder 14 is added to learn the features extracted from the convolutional neural network model 12, which can be used to train all normal traffic patterns.

當自動編碼器14對正常流量之型態訓練完成後，最後對原始流量進行測試時，以正常跟惡意資料平衡之測試集與自動編碼器輸出之一均方誤差(MSELoss)分佈分類流量，且由於是以正常流量計算出之閥值，所以可以針對不同的均方誤差差異做分級之警示，詳細之架構參數設計如下表一：層數型態過濾器/神經元步伐(Stride) 填充(Padding) 1 1D-ConV+Relu+ 批量標準化 32 (核心尺寸=6) 1 5 2 最大池化核心尺寸=2 2 - 3 1D-ConV+Relu+ 批量標準化 64 (核心尺寸=6) 1 5 4 最大池化核心尺寸=2 2 - 5 全連接層(Dense)+ 批量標準化 1024 - - 6 全連接層(Dense)+ 批量標準化 25 - - 7 全連接層(Dense) 10 - - 8 全連接層(Dense) 512 - - 9 全連接層(Dense) 256 - - 10 全連接層(Dense) 512 - - 11 全連接層(Dense) 1024 - - 表一在上表一中，第6層及第7層為求卷積神經網路之交叉熵損失(CrossEntropyLoss)所設計的層，而第8層的前一層為第5層。When the autoencoder 14 completes the training of the normal traffic pattern and finally tests the original traffic, it will classify the traffic based on a test set balanced with normal and malicious data and a mean square error (MSELoss) output from the autoencoder, and Since the threshold is calculated based on the normal flow rate, it can be used as a warning for different mean square error differences. The detailed architecture parameter design is shown in Table 1: Number of layers Type Filter/neuron Stride Padding 1 1D-ConV+Relu+ batch standardization 32 (core size=6) 1 5 2 Max pooling Core size = 2 2 - 3 1D-ConV+Relu+ batch standardization 64 (core size=6) 1 5 4 Max pooling Core size = 2 2 - 5 Fully connected layer (Dense) + batch standardization 1024 - - 6 Fully connected layer (Dense) + batch standardization 25 - - 7 Fully connected layer (Dense) 10 - - 8 Fully connected layer (Dense) 512 - - 9 Fully connected layer (Dense) 256 - - 10 Fully connected layer (Dense) 512 - - 11 Fully connected layer (Dense) 1024 - - Table 1 In the above table 1, the 6th and 7th layers are the layers designed to calculate the cross-entropy loss (CrossEntropyLoss) of the convolutional neural network, and the previous layer of the 8th layer is the 5th layer.

以T-SNE降維均方誤差之可視化結果如第6A圖至第6C圖所示，第6A圖為正常流量分佈圖，第6B圖為惡意流量分佈圖，第6C圖為正常和惡意流量之共同分佈圖，其中，降維資料為卷積神經網路模型之特徵擷取輸出分佈。The visualization results of the mean square error of T-SNE dimensionality reduction are shown in Figures 6A to 6C. Figure 6A is a normal traffic distribution map, Figure 6B is a malicious traffic distribution map, and Figure 6C is a normal and malicious traffic distribution. Common distribution map, where the dimensionality reduction data is the feature extraction output distribution of the convolutional neural network model.

特別的是，本發明更對自動編碼器14之均方誤差(MSELoss, 即自動編碼器原始的損失函數)進行優化，其係將卷積神經網路模型12之一交叉熵損失加上自動編碼器14之均方誤差做為本發明整體架構之一損失函數，此外，本發明還提供以下幾種優化程序： 1. 優化連線中每個封包大小及封包數量，找出可以最少資料最短時間內能夠處理之輸入資料，並具有一定準確率之適合組合資料； 2. 在所有卷積神經網路層間批量標準化(Batch Normalization)，蓋因於本發明具有較多層之深度學習架構，每層間加上批量標準化能夠使參數分布相對穩定，加速學習效率，還可緩解梯度消失與過度學習(Overfitting)的情況； 3. 在卷積神經網路模型提取特徵時，多增加一層25個神經元之全連接層(Dense Layer) ，其中包括與該封包之一標頭欄位之數量相符的複數神經元，但由於主要是以標頭欄位作為輸入資料，因此多增加一層25個神經元之全連接層參考各種特徵之排列組合，每種特徵之組合皆有機會影響分類結果，本發明藉此可避免遺漏重要的特徵組合作為分類之用，對分類結果有大幅提升的效果； 4. 所有全連接層皆設計有逐層貪婪之預訓練。逐層貪婪預訓練之設計一樣具有緩解深層架構中梯度消失與過度學習之問題，而且能夠更好的初始化每一層之參數； 5. 最後在偵測攻擊時，會利用自動編碼器的訓練集(即正常流量)所產生之均方誤差分佈，取其最大值與最大之1%資料的平均值進行比較以決定閥值。若最大值與最大1%資料平均值之間的差距超過均方誤差分佈之三倍標準差，則會以最大1%資料平均值做為閥值；反之，則以最大值為偵測之閥值。In particular, the present invention further optimizes the mean square error (MSELoss, the original loss function of the autoencoder) of the autoencoder 14, which combines the cross-entropy loss of one of the convolutional neural network models 12 with the autoencoder The mean square error of the device 14 is used as a loss function of the overall architecture of the present invention. In addition, the present invention also provides the following optimization procedures: 1. Optimize the size of each packet and the number of packets in the connection, find the input data that can be processed in the shortest time with the least data, and a suitable combination of data with a certain accuracy; 2. Batch normalization between all convolutional neural network layers is due to the deep learning architecture of the present invention with more layers. The addition of batch normalization between each layer can make the parameter distribution relatively stable, accelerate the learning efficiency, and also alleviate Vanishing gradients and over-learning (Overfitting); 3. When extracting features from the convolutional neural network model, an additional layer of 25 neurons is added to the dense layer (Dense Layer), which includes plural neurons that match the number of header fields in the packet, but due to The header field is mainly used as the input data, so an additional layer of 25 neurons is added to refer to the permutation and combination of various features. Each combination of features has the opportunity to affect the classification result. The present invention can avoid missing important The feature combination of is used for classification, which greatly improves the classification result; 4. All fully connected layers are designed with layer-by-layer greedy pre-training. The layer-by-layer greedy pre-training design can also alleviate the problems of gradient disappearance and over-learning in the deep architecture, and it can better initialize the parameters of each layer; 5. Finally, when detecting an attack, the mean square error distribution generated by the training set of the autoencoder (ie, normal traffic) is used, and the maximum value is compared with the average value of the maximum 1% data to determine the threshold. If the difference between the maximum value and the maximum 1% data average exceeds three times the standard deviation of the mean square error distribution, the maximum 1% data average value will be used as the threshold; otherwise, the maximum value will be the detection valve value.

以本發明之系統及方法進行實驗，以USTC-TFC2016之正常流量資料為輸入訓練資料，經過預處理後，輸入資料為10個種類之正常流量，測試資料為平衡USTC-TFC2016之正常流量與Mirai之惡意DDoS資料之測試集，如下表二和三：訓練集類型數量 BitTorrent 6000 Facetime 6000 FTP 6000 Gmail 6000 MySQL 6000 Outlook 6000 Skype 6000 SMB 6000 Weibo 6000 WorldofWarcraft 6000 表二測試集類型數量 BitTorrent 2398 Facetime 2398 FTP 2399 Gmail 2399 MySQL 2399 Outlook 2399 Skype 2399 SMB 2399 Weibo 2399 WorldofWarcraft 2399 ACK Flood 5997 SYN Flood 5997 UDP Flood 5997 HTTP Flood 5997 表三Experiment with the system and method of the present invention. The normal flow data of USTC-TFC2016 is used as the input training data. After preprocessing, the input data is 10 types of normal flow. The test data is to balance the normal flow of USTC-TFC2016 and Mirai. The test set of malicious DDoS data is shown in Tables 2 and 3: Training set type quantity BitTorrent 6000 Facetime 6000 FTP 6000 Gmail 6000 MySQL 6000 Outlook 6000 Skype 6000 SMB 6000 Weibo 6000 WorldofWarcraft 6000 Table II Test set type quantity BitTorrent 2398 Facetime 2398 FTP 2399 Gmail 2399 MySQL 2399 Outlook 2399 Skype 2399 SMB 2399 Weibo 2399 WorldofWarcraft 2399 ACK Flood 5997 SYN Flood 5997 UDP Flood 5997 HTTP Flood 5997 Table Three

分別以連線中不同封包大小及封包數量處理輸入資料，分別測試之結果如下表四： 封包數量 封包大小 ( 位元組 ) 40 50 60 70 80 2 99.96% 99.59% 100.00% 100.00% 99.98% 3 99.89% 100.00% 100.00% 100.00% 100.00% 4 99.85% 100.00% 100.00% 100.00% 100.00% 5 99.55% 99.39% 99.99% 98.49% 98.86% 表四The input data is processed with different packet sizes and number of packets in the connection, and the results of the respective tests are as follows: Number of packets Packet size ( bytes ) 40 50 60 70 80 2 99.96% 99.59% 100.00% 100.00% 99.98% 3 99.89% 100.00% 100.00% 100.00% 100.00% 4 99.85% 100.00% 100.00% 100.00% 100.00% 5 99.55% 99.39% 99.99% 98.49% 98.86% Table Four

從上表四中可看出，在取得封包標頭欄位的資料(TCP一般具有54位元組，UDP一般具有42位元組)的情況下，此非監督式分類架構具有99.6%正確率以上，更是在每個封包取50位元組，每個連線取兩個封包時，即可達到完全分類之效果。由此可知，實驗證實本發明只需要擷取一個連線中少數幾個封包，即能偵測惡意連線。It can be seen from Table 4 above that when the data in the header field of the packet is obtained (TCP generally has 54 bytes, UDP generally has 42 bytes), this unsupervised classification architecture has a 99.6% accuracy rate Above, when each packet takes 50 bytes and each connection takes two packets, the effect of complete classification can be achieved. It can be seen from the experiment that the present invention only needs to capture a few packets in a connection to detect malicious connections.

第7圖為以本發明進行實驗之測試集之均方誤差分佈直條圖，其中虛線為分類所設之閾值。從圖中可清楚看到正常流量跟Mirai DDoS之均方誤差之差異(測試集流量)，此圖為每個封包取50位元組，每個連線取兩個封包之架構結果，圖中橫軸為均方誤差之值，縱軸為單位區間之資料數量。Figure 7 is a histogram of the mean square error distribution of the test set in the experiment of the present invention, in which the dashed line is the threshold set by the classification. From the figure, we can clearly see the difference between the mean square error between normal traffic and Mirai DDoS (test set traffic). This figure shows the architecture result of 50 bytes for each packet and two packets for each connection. The horizontal axis is the value of the mean square error, and the vertical axis is the number of data in the unit interval.

綜上所述，本發明所提供之一種非監督式惡意流量偵測系統及方法係利用卷積神經網路從原始封包中自動學習特徵，再利用自動編碼器依據該些特徵建立正常流量的型態，因此相當容易部署與調整，更可達到高準確度。此外，本發明只需檢視每條連線中前幾個封包的前幾個位元組，雖然只檢視少量的封包及其中的少數位元組，卻可對網路流量是正常或異常進行分類，無需檢視完整的連線，故可大幅降低所檢視的流量，提高系統效能，並及早對異常流量進行阻擋。In summary, an unsupervised malicious traffic detection system and method provided by the present invention uses convolutional neural networks to automatically learn features from original packets, and then uses an autoencoder to establish a pattern of normal traffic based on these features. Therefore, it is quite easy to deploy and adjust, and can achieve high accuracy. In addition, the present invention only needs to view the first few bytes of the first few packets in each connection. Although only a small number of packets and a few bytes in it are checked, it can classify whether the network traffic is normal or abnormal. , It is not necessary to check the complete connection, so it can greatly reduce the traffic under inspection, improve system performance, and block abnormal traffic early.

唯以上所述者，僅為本發明之較佳實施例而已，並非用來限定本發明實施之範圍。故即凡依本發明申請範圍所述之特徵及精神所為之均等變化或修飾，均應包括於本發明之申請專利範圍內。Only the above are only preferred embodiments of the present invention and are not used to limit the scope of implementation of the present invention. Therefore, all equivalent changes or modifications made in accordance with the characteristics and spirit of the application scope of the present invention should be included in the patent application scope of the present invention.

10:預處理模組 12:卷積神經網路模型 122:卷積層 124:池化層 126:全連接層 128:輸出層 14:自動編碼器10: preprocessing module 12: Convolutional Neural Network Model 122: Convolutional layer 124: Pooling layer 126: Fully connected layer 128: output layer 14: Auto encoder

第1圖為本發明非監督式惡意流量偵測系統之方塊圖。第2圖為本發明非監督式惡意流量偵測方法之流程圖。第3圖為卷積神經網路模型之架構示意圖第4圖為卷積神經網路模型中最大池化之示意圖。第5圖為本發明非監督式惡意流量偵測方法中一維卷積神經網路模型結合自動編碼器之示意圖。第6A圖為正常流量分佈圖，第6B圖為惡意流量分佈圖，第6C圖為正常和惡意流量之共同分佈圖。第7圖為以本發明進行實驗之測試集之均方誤差分佈直條圖。Figure 1 is a block diagram of the unsupervised malicious traffic detection system of the present invention. Figure 2 is a flowchart of the unsupervised malicious traffic detection method of the present invention. Figure 3 is a schematic diagram of the architecture of the convolutional neural network model Figure 4 is a schematic diagram of maximum pooling in the convolutional neural network model. Figure 5 is a schematic diagram of a one-dimensional convolutional neural network model combined with an automatic encoder in the unsupervised malicious traffic detection method of the present invention. Figure 6A is a distribution of normal traffic, Figure 6B is a distribution of malicious traffic, and Figure 6C is a common distribution of normal and malicious traffic. Figure 7 is a histogram of the mean square error distribution of the test set of the experiment conducted by the present invention.

10:預處理模組10: preprocessing module

12:卷積神經網路模型12: Convolutional Neural Network Model

122:卷積層122: Convolutional layer

124:池化層124: Pooling layer

126:全連接層126: Fully connected layer

128:輸出層128: output layer

14:自動編碼器14: Auto encoder

Claims

An unsupervised malicious traffic detection system, including: A preprocessing module, after classifying the received plural original packets according to the flow to which they belong, taking the first plural packets in the same connection, and then extracting the first plural bytes of the packets; A convolutional neural network model, the signal is connected to the preprocessing module, convolution and dimensionality reduction sampling are performed at least once with the bytes as input, and then the characteristics of the packets are filtered out; and An autoencoder, the signal is connected to the convolutional neural network model, learns and classifies the characteristics of the packets, establishes at least one normal traffic type, and classifies the currently viewed by the at least one normal traffic type Whether the connection traffic is abnormal.

The unsupervised malicious traffic detection system according to claim 1, wherein the convolutional neural network model includes a convolutional layer and a pooling layer, and the convolutional layer uses the bytes as input for convolution, A feature image is obtained, and the pooling layer performs at least one feature sampling on the feature image in a dimensionality reduction manner.

The unsupervised malicious traffic detection system described in claim 1, wherein the preprocessing module judges whether the original packets are based on the source IP address, source port, destination IP address, destination port, and transport layer protocol of the original packets For the same connection.

The unsupervised malicious traffic detection system described in claim 3, wherein the preprocessing module deletes errors and duplicate traffic, and randomizes the source IP address, MAC address and other information of the packets.

The unsupervised malicious traffic detection system according to claim 1, wherein the bytes include a header field of the packets and part of the packet content.

The unsupervised malicious traffic detection system according to claim 1, wherein the autoencoder is an unsupervised binary classifier to classify the connection traffic as normal or abnormal.

The unsupervised malicious traffic detection system according to claim 1, wherein a cross-entropy loss (CrossEntropyLoss) of the convolutional neural network model plus a mean square error (MSELoss) of the autoencoder can obtain a loss function.

The unsupervised malicious traffic detection system according to claim 7, wherein the autoencoder has a threshold, and the calculation method of the threshold is the distribution of the mean square error obtained from the autoencoder with reference to the normal flow.

The unsupervised malicious traffic detection system according to claim 1, wherein the convolutional neural network model further includes a fully connected layer (Dense Layer), including the number of header fields corresponding to the packet The plural neurons.

An unsupervised malicious traffic detection method includes the following steps: After using a preprocessing module to classify the received plural original packets according to the connection (flow) to which they belong, take the first plural packets in the same connection, and then extract the first plural bytes of the packets; Input these bytes into a convolutional neural network model, perform convolution and dimensionality reduction sampling at least once, and then filter out the characteristics of the packets; and Use an auto-encoder to learn and classify the characteristics of the packets, establish at least one normal traffic type, and use the normal traffic type to classify whether the current connection traffic is abnormal or not.

The unsupervised malicious traffic detection method according to claim 10, wherein the convolutional neural network model includes a convolutional layer and a pooling layer, and the convolutional layer uses the bytes as input for convolution, A feature image is obtained, and the pooling layer performs at least one feature sampling on the feature image in a dimensionality reduction manner.

The unsupervised malicious traffic detection method described in claim 10, wherein the preprocessing module determines whether the original packets are based on the source IP address, source port, destination IP address, destination port, and transport layer protocol of the original packets For the same connection.

In the unsupervised malicious traffic detection method described in claim 12, the preprocessing module deletes errors and duplicate traffic, and randomizes the source IP address, MAC address and other information of the packets.

The unsupervised malicious traffic detection method according to claim 10, wherein the bytes include a header field of the packets and part of the packet content.

The unsupervised malicious traffic detection method according to claim 10, wherein the autoencoder is an unsupervised binary classifier to classify the connection traffic as normal or abnormal.

The unsupervised malicious traffic detection method according to claim 10, wherein a cross-entropy loss (CrossEntropyLoss) of the convolutional neural network model plus a mean square error (MSELoss) of the autoencoder can obtain a loss function.

The unsupervised malicious traffic detection method according to claim 16, wherein the autoencoder has a threshold, and the threshold is calculated by referring to the distribution of the mean square error obtained from the autoencoder with reference to normal traffic.

The unsupervised malicious traffic detection method according to claim 10, wherein the convolutional neural network model further includes a fully connected layer (Dense Layer), including the number of header fields corresponding to the packet The plural neurons.