TWI419531B

TWI419531B - Web site attack detection system and method

Info

Publication number: TWI419531B
Application number: TW99135681A
Authority: TW
Original assignee: Chunghwa Telecom Co Ltd
Priority date: 2010-10-20
Filing date: 2010-10-20
Publication date: 2013-12-11
Also published as: TW201218716A

Description

Website attack detection system and method

本發明係關於一種網站攻擊偵測系統與方法，尤指在電腦網路上利用其機制達到偵測攻擊行為的系統與方法。The present invention relates to a website attack detection system and method, and more particularly to a system and method for detecting an attack behavior by using a mechanism on a computer network.

一般而言，在網站攻擊行為的偵測上，多採用特徵碼(Signature)比對的方式來進行偵測，管理員從已知的攻擊行為中擷取出足以代表該攻擊行為的特徵碼，然後將該特徵碼以規則(Rule)的形式加入入侵偵測系統(Intrusion Detection System,IDS)或網站應用程式防火牆中(Web Application Firewall,WAF)。Generally speaking, in the detection of website attack behavior, the signature is used to detect the signature, and the administrator extracts the signature from the known attack behavior enough to represent the attack behavior, and then The signature is added to the Intrusion Detection System (IDS) or the Web Application Firewall (WAF) in the form of a rule.

但是上述的網站攻擊偵測架構卻存在著一個很大缺點-無法有效地偵測到變形攻擊，特徵碼比對的偵測方式乃針對現行已知的攻擊行為進行分析，找出一段可代表攻擊的特徵碼，而入侵偵測系統/網站應用程式防火牆便藉由比對所過濾的封包資料中是否包含那些攻擊特徵碼，以此來判斷是否有攻擊行為產生，然而，當攻擊者發起一些與現行攻擊行為相似，但攻擊碼不完全相同的變形攻擊時，這種以特徵碼為基礎的偵測機制就可能因為在變種攻擊的封包中比對不到特徵碼而失效；再加上一個攻擊行為的變形攻擊可能多達數十甚至數百種，因此並無法將所有的變形攻擊都轉換成特徵碼來進行偵測，所以對於以特徵碼為基礎的攻擊偵測機制來說，其並無法完整地偵測到變形攻擊。However, the above-mentioned website attack detection architecture has a big disadvantage - it cannot effectively detect the deformation attack. The detection method of the feature code comparison is to analyze the currently known attack behavior and find a segment to represent the attack. The signature code, and the intrusion detection system/web application firewall determines whether an attack behavior is generated by comparing the attack signatures in the filtered packet data. However, when the attacker initiates some and current When the attack behavior is similar, but the attack code is not exactly the same, the signature-based detection mechanism may be invalidated because the signature is not matched in the variant attack packet; plus an attack behavior Deformation attacks can be as many as tens or even hundreds, so it is impossible to convert all morphing attacks into signatures for detection, so it is not complete for signature-based attack detection mechanisms. A deformation attack was detected.

由此可見，上述習用方式仍有諸多缺失，實非一良善之設計者，而亟待加以改良。本案發明人鑑於上述習用方式所衍生的各項缺點，乃亟思加以改良創新，並經多年苦心孤詣潛心研究後，終於成功研發完成本件網站攻擊偵測方法。It can be seen that there are still many shortcomings in the above-mentioned methods of use, which is not a good designer, but needs to be improved. In view of the shortcomings derived from the above-mentioned conventional methods, the inventor of the present invention has improved and innovated, and after years of painstaking research, he finally succeeded in researching and developing the attack detection method of this website.

本發明之目的在於提供一種網站攻擊偵測系統與方法，透過標記與評分請求參數(Request Parameter)以判斷某個HTTP Request是否為攻擊行為，達到有效且準確地偵測網站攻擊之目的。The object of the present invention is to provide a website attack detection system and method, which can determine whether an HTTP request is an attack behavior through a tag and a request parameter, so as to effectively and accurately detect a website attack.

達成上述發明目的之一種網站攻擊偵測系統包括：一標記序列產生模組，對於一個HTTP Request，負責將其所屬的請求參數轉換成標記序列，標記序列由複數個標記組成之；一標記關聯模組，負責根據各標記之間的關聯強度來產生標記序列的分數；以及一攻擊偵測模組，其根據各標記序列的分數以及門檻值來判斷該HTTP Request是否為攻擊行為。A website attack detection system for achieving the above object includes: a mark sequence generation module, for an HTTP Request, responsible for converting a request parameter to which it belongs into a mark sequence, the mark sequence consisting of a plurality of marks; a mark association mode The group is responsible for generating a score of the marker sequence according to the strength of the association between the markers; and an attack detection module, which determines whether the HTTP Request is an aggressive behavior according to the score of each marker sequence and the threshold value.

本網站攻擊偵測方法的運作流程為，標記序列產生模組接受HTTP Request輸入並產生標記序列；標記關聯模組接受標記序列產生模組輸出的標記序列，並產生每個標記序列的分數；攻擊偵測模組根據標記關聯模組所產生之各標記序列的分數以及門檻值，加以判斷該HTTP Request是否為攻擊。The operation process of the attack detection method of the website is that the mark sequence generation module accepts the HTTP Request input and generates the mark sequence; the mark association module accepts the mark sequence generated by the mark sequence generation module, and generates a score of each mark sequence; The detecting module determines whether the HTTP Request is an attack according to the score of each marking sequence generated by the tag association module and the threshold value.

標記序列產生模組定義了複數個標記以及標記與請求參數字串的關係，其主要目的在於簡化請求參數以讓標記關聯模組可以對其施以關聯分析，例如，請求參數中的『eval』、『script』字串會被以『SCRIPT_WORD』這個標記取代之。標記序列產生模組以HTTP Request為輸入資料，其分析並擷取所有的請求參數，將之轉換成複數個標記序列。The tag sequence generation module defines a plurality of tags and a relationship between the tags and the request parameter strings. The main purpose of the tag sequence generation module is to simplify the request parameters so that the tag association module can perform association analysis on them, for example, "eval" in the request parameters. The "script" string will be replaced by the "SCRIPT_WORD" tag. The tag sequence generation module uses HTTP Request as input data, which analyzes and extracts all request parameters and converts them into a plurality of tag sequences.

標記關聯模組利用統計或機器學習(包含隱藏式馬可夫模型、馬可夫鏈、類神經網路、決策樹、支持向量機、貝氏網路)等方法來產生每一個標記序列的分數，為了讓標記關聯模組有計算標記序列分數的準則，所以必須先對其進行訓練，訓練方式為，給予標記關聯模組一些屬於攻擊的標記序列樣本資料，透過樣本資料的訓練，標記關聯模組便可對之後輸入的標記序列計算其分數。標記序列樣本資料產生方式為，先取得一些經由專家鑑為攻擊的HTTP Request，然後利用標記序列產生模組將那些請求參數轉換為標記序列樣本。The tag association module uses statistical or machine learning (including hidden Markov models, Markov chains, neural networks, decision trees, support vector machines, Bayesian networks) to generate scores for each tag sequence, in order to make the tags The association module has the criteria for calculating the score of the marker sequence, so it must be trained first. The training method is to give the marker association module some sample sequence data belonging to the attack. Through the training of the sample data, the marker association module can The entered sequence of tokens is then calculated for its score. The tag sequence sample data is generated by first obtaining some HTTP Requests that are attacked by the expert, and then using the tag sequence generation module to convert those request parameters into tag sequence samples.

攻擊偵測模組根據各標記序列的分數、輸入的門檻值進行綜合判斷，以決定某一HTTP Request是否為攻擊，主要判斷原則為，當一HTTP Request所屬的任一個請求參數，其分數高於門檻值時，即判定該HTTP Request屬於攻擊行為。The attack detection module performs comprehensive judgment according to the score of each mark sequence and the threshold value of the input to determine whether an HTTP Request is an attack. The main judgment principle is that when any HTTP request belongs to a request parameter, the score is higher than When the threshold is exceeded, it is determined that the HTTP Request is an attack behavior.

如圖一所示為網站攻擊偵測系統的架構圖，表示本發明各模組間的關係與運作流程；網站攻擊偵測系統包含標記序列產生模組100、標記關聯模組200與攻擊偵測模組300。FIG. 1 is an architectural diagram of a website attack detection system, showing the relationship and operation process between the modules of the present invention; the website attack detection system includes a mark sequence generation module 100, a mark association module 200, and attack detection. Module 300.

標記序列產生模組100接收一HTTP Request並從中擷取出所有的請求參數，根據一定義好的標記列表將請求參數轉換成標記序列，標記列表如圖五與圖六所示，圖五用於界定請求參數中每個標記的界限，圖六中的『KEY』標記代表出現頻率較高的字串，所有不屬於其他標記的字串則以『VAR』標記轉換之。標記序列產生模組100最後產出一個或多個標記序列，標記序列的個數等同於該HTTP Request所擁有的請求參數數量。例如，一HTTP Request包含二組請求參數為『id=jacky』與『value=<script>alert(document.cookie)</script>』，則該二組請求參數將會被標記序列產生模組轉換成『KEY=VAR』與『KEY=<SCRIPT_WORD>ALERT VAR</SCRIPT_WORD>』二組標記序列。The tag sequence generation module 100 receives an HTTP Request and extracts all the request parameters from the file, and converts the request parameters into a tag sequence according to a defined tag list. The tag list is shown in FIG. 5 and FIG. 6, and FIG. 5 is used to define The boundary of each mark in the request parameter, the "KEY" mark in Figure 6 represents the string with the higher frequency of occurrence, and all the strings that do not belong to other marks are converted with the "VAR" mark. The tag sequence generation module 100 finally produces one or more tag sequences, the number of tag sequences being equal to the number of request parameters possessed by the HTTP Request. For example, if an HTTP Request contains two sets of request parameters as "id=jacky" and "value=<script>alert(document.cookie)</script>", the two sets of request parameters will be converted by the mark sequence generation module. The two sets of markup sequences are "KEY=VAR" and "KEY=<SCRIPT_WORD>ALERT VAR</SCRIPT_WORD>".

標記關聯模組200接受標記序列產生模組100所輸出之標記序列，分別計算每個標記序列的分數後將其輸出；標記關聯模組200採用的計分方式可以是任一統計或機器學習方法(包含隱藏式馬可夫模型、馬可夫鏈、類神經網路、決策樹、支持向量機、貝氏網路)，本實施例中則採用隱藏式馬可夫模型(Hidden Markov Model,HMM)來執行標記序列的計分。標記關聯模組200必須先透過訓練階段來取得計算分數的準則，然後才能正式對標記序列進行計分，如圖二所示示為標記關聯模組200在訓練階段的流程，當已知為攻擊的HTTP Request透過標記序列產生模組轉換成標記序列後，將該標記序列輸入標記關聯訓練模組201，以此方式來取得各標記之間的關聯程度。圖三表示標記關聯模組200完成訓練後開始對標記序列進行計分動作的流程，標記關聯上線模組202根據標記關聯訓練模組201在訓練階段所取得的標記關聯強度資訊，計算出每個標記序列的分數，分數越高代表該標記序列越可能隱含攻擊行為。The tag association module 200 accepts the tag sequence output by the tag sequence generation module 100, and respectively outputs the score of each tag sequence and outputs it; the scoring method used by the tag association module 200 can be any statistical or machine learning method. (including hidden Markov model, Markov chain, neural network, decision tree, support vector machine, Bayesian network). In this embodiment, the Hidden Markov Model (HMM) is used to execute the mark sequence. Score. The tag association module 200 must first obtain the criteria for calculating the score through the training phase before the flag sequence can be officially scored, as shown in FIG. 2 as the process of the tag association module 200 in the training phase, when it is known as an attack. After the HTTP Request is converted into a mark sequence by the mark sequence generation module, the mark sequence is input into the mark association training module 201, thereby obtaining the degree of association between the marks. FIG. 3 shows a flow of the marker association module 200 starting to score the marker sequence after the training is completed. The marker association online module 202 calculates each of the marker association strength information obtained by the marker association training module 201 during the training phase. The score of the marker sequence, the higher the score, the more likely the marker sequence is to imply an aggressive behavior.

攻擊偵測模組300接受標記關聯模組200所輸出之每個標記序列的分數後，根據預先定義的門檻值來判斷該HTTP Request是否為攻擊行為，圖四表示攻擊偵測模組300的判斷流程圖，其步驟包括：301先利用標記列表將HTTP Request中的請求參數轉換成一個或多個標記序列，儲存標記關聯模組200所輸出之每個標記序列的分數；302利用機器學習或統計方法對每一個標記序列分別進行評分，在評分之前必須先進行訓練，因此需提供該方法多個已知為攻擊或正常行為的標記序列資料，作為前述方法習得評分規則的訓練集，以此進行訓練，然後一一比對每一個標記序列的分數與門檻值的關係；303在所儲存的標記序列中，當任一個標記序列分數大於門檻值時則判定該HTTP Request為攻擊行為；304反之，若沒有任何一個標記序列分數大於門檻值，則判定該HTTP Request為不含攻擊的正常行為。After the attack detection module 300 receives the score of each mark sequence output by the mark association module 200, it determines whether the HTTP request is an attack behavior according to a predefined threshold value, and FIG. 4 indicates the judgment of the attack detection module 300. The flowchart includes the steps of: 301 first converting the request parameter in the HTTP Request into one or more tag sequences by using the tag list, and storing the score of each tag sequence output by the tag association module 200; 302 using machine learning or statistics The method scores each marker sequence separately, and must be trained before scoring. Therefore, it is necessary to provide the marker sequence data of the method known as attack or normal behavior as the training set of the above method acquisition scoring rules. Training, and then comparing the scores of each mark sequence with the threshold value; 303 in the stored mark sequence, when any one of the mark sequence scores is greater than the threshold value, the HTTP Request is determined to be an aggressive behavior; 304, If none of the mark sequence scores is greater than the threshold value, it is determined that the HTTP Request is normal without an attack. It is.

本發明所提供之網站攻擊偵測方法，與其他習用技術相互比較時，更具備下列優點：The website attack detection method provided by the invention has the following advantages when compared with other conventional technologies:

1.本發明相對於目前常見的攻擊偵測方法，較能完整且有效地偵測變形攻擊。1. Compared with the current common attack detection method, the present invention can detect deformation attacks completely and effectively.

2.本發明針對每個請求參數分別處理計算，當任一請求參數被判定為攻擊時，該HTTP Request即被判定為攻擊，相對於有些方法只對整著HTTP Request內容進行判斷時，本發明可避免攻擊程式碼佔HTTP Request內容比例過低時，其容易被忽略的問題。2. The present invention separately processes calculations for each request parameter. When any of the request parameters is determined to be an attack, the HTTP Request is determined to be an attack. Compared with some methods, only the content of the HTTP Request is determined. It can avoid the problem that the attack code is easy to be ignored when the proportion of HTTP Request content is too low.

上列詳細說明乃針對本發明之一可行實施例進行具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。The detailed description of the present invention is intended to be illustrative of a preferred embodiment of the invention, and is not intended to limit the scope of the invention. The patent scope of this case.

綜上所述，本案不僅於技術思想上確屬創新，並具備習用之傳統方法所不及之上述多項功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請貴局核准本件發明專利申請案，以勵發明，至感德便。To sum up, this case is not only innovative in terms of technical thinking, but also has many of the above-mentioned functions that are not in the traditional methods of the past. It has fully complied with the statutory invention patent requirements of novelty and progressiveness, and applied for it according to law. Approved this invention patent application, in order to invent invention, to the sense of virtue.

100．．．標記序列產生模組100. . . Tag sequence generation module

200．．．標記關聯模組200. . . Tag association module

201．．．標記關聯訓練模組201. . . Tag association training module

202．．．標記關聯上線模組202. . . Tag association online module

300．．．攻擊偵測模組300. . . Attack detection module

請參閱以下有關本發明之詳細說明及其附圖，將可進一步瞭解本發明之技術內容及其目的功效；有關附圖為：Please refer to the following detailed description of the invention and the accompanying drawings, which will further understand the technical content of the invention and its effect;

圖1為網站攻擊偵測裝置的流程架構圖；1 is a flow chart of a website attack detection device;

圖2為標記關聯模組在訓練階段的流程圖；2 is a flow chart of the tag association module in the training phase;

圖3為標記關聯模組之計分流程圖；Figure 3 is a scoring flowchart of the tag association module;

圖4為攻擊偵測模組的判斷流程圖；4 is a flow chart of determining an attack detection module;

圖5為標記列表-1；Figure 5 is a list of markers -1;

圖6為標記列表-2。Figure 6 is a list of tags -2.

100‧‧‧標記序列產生模組100‧‧‧Marking sequence generation module

200‧‧‧標記關聯模組200‧‧‧ tag association module

300‧‧‧攻擊偵測模組300‧‧‧ Attack Detection Module

Claims

A website attack detection system includes: a. a mark sequence generation module, which converts a request parameter in an HTTP Request into a singular or plural mark sequence according to a mark list, wherein the content of the mark list is not fixed, due to a system environment And b. the tag association module, which is responsible for scoring the tag sequence; and c. the attack detection module, which is responsible for comprehensively determining whether the HTTP request is based on external information such as the score and threshold of each tag sequence For the attack.

The website attack detection system of claim 1, wherein the mark sequence generation module converts the request parameter into a tag sequence through a self-definable tag list.

For example, in the website attack detection system described in claim 1, wherein the tag association module needs to undergo a training phase, and the training mode is to provide a plurality of tag sequences belonging to the attack and input to the tag association module to learn the attack. The structure of the data; the attack signature sequence is generated by the tag sequence generation module by inputting the HTTP Request belonging to the attack into the tag sequence generation module.

The website attack detection system of claim 1, wherein the mark association module can use a statistical method.

The website attack detection system of claim 1, wherein the mark association module can use a Hidden Markov Model.

The website attack detection system of claim 1, wherein the mark association module can use a Markov Chain.

The website attack detection system of claim 1, wherein the mark association module can use a neural network.

The website attack detection system of claim 1, wherein the mark association module can use a decision tree.

The website attack detection system of claim 1, wherein the mark association module can use a support vector machine (Support Vector Machine).

The website attack detection system of claim 1, wherein the mark association module can use a Bayesian network.

For example, in the website attack detection system described in claim 1, wherein the attack detection module compares the scores of each mark sequence with external information such as threshold values, as long as any mark sequence is identified as an attack ( HOW), the HTTP Request is considered an attack.

A method for detecting a website attack, the method comprising the steps of: a. converting a request parameter in an HTTP Request into one or more tag sequences using a tag list; b. scoring each tag sequence separately using machine learning or statistical methods ;c. Training must be performed before scoring, so it is necessary to provide the marker sequence data of the method known as attack or normal behavior as the training set of the above method acquisition scoring rules, so as to train; d. compare each one The difference between the score of the mark sequence and the threshold value. When the score of any mark sequence is higher than the threshold value, it is determined that the HTTP Request contains the attack behavior; otherwise, when the scores of all the mark sequences are lower than the threshold value, the judgment is determined. This HTTP Request does not contain attack behavior.