KR20100058695A

KR20100058695A - Profile-based models and selective use for web application attack detection

Info

Publication number: KR20100058695A
Application number: KR1020080117161A
Authority: KR
Inventors: 박영민
Original assignee: 박영민; 중앙대학교 산학협력단
Priority date: 2008-11-25
Filing date: 2008-11-25
Publication date: 2010-06-04

Abstract

Recently, there are several profile-based techniques for detecting input manipulation attacks on web applications, but they have relatively high false positive rates and long detection times. In order to alleviate this drawback, the present invention improves the error rate and detection time by adding new models and existing models.

Description

Profile-based Models and Selective Use for Web Application Attack Detection

Web application

Recently, web application attacks are increasing and diversifying as more and more people share information as well as financial transactions and sell / buy products through the web. These methods of attacking web servers are diverse and evolving, and researches to prevent them have been conducted steadily.

These studies mainly include the method of preventing the attack through the vulnerability scanning of the web application itself and the method of detecting the abnormal behavior through the input / output inspection of the web application. I / O checking method of web application is largely based on signature.

It can be divided into policy based and profile based.

Kruegel's research, one of the profile-based methods, presented a positive model for the length of arguments, character distribution, token information, structure inference, presence of arguments, and order of arguments. Sum the results and judge the abnormality. This study widens the scope of detection and makes it difficult for an attacker to tamper, but it has a slow detection speed, making it difficult to process in real time.

Therefore, the problem to be achieved by the present invention is to reduce the false positive rate and execution time by using a number of positive models, by selectively applying the model according to the characteristics of each factor.

As described above, the method of preventing input manipulation attack on the web server according to the present invention is slightly improved than the related research as shown in Fig. 1 at the false positive rate and significantly improved than the related research as shown in Fig. 2 at the false negative rate. . Figure 3 shows a comparison of the execution speeds. Using the selective model application, the execution speed is improved by 50% over the related studies.

The present invention uses the abnormal behavior detection method for determining the abnormal behavior based on the characteristics of the normal behavior in order to achieve the above technical problem. To do this, we use positive models that describe the characteristics of normal factor values. We use slightly modified length models, character distribution models, and token models used in Kruegel's research. Newly added constituent model, structural analysis model, and factor constituent model are used together.

Each model calculates the probability that the factor is normal and makes a final decision using Equation 1. Where is the probability that the factor is normal for model i, and is the threshold defined for each model. And represents a threshold for the probability mean of the entire model.

The length model finds the mean and the variance of the length of each factor value during training. For detection, Chebyshev inequality is used to quantify how far the corresponding parameter value is from the mean. In Equation 2, since the probability that any normal factor x is out of the range of the factor to be examined is smaller than p (1), p (1) may be used as the probability that the factor is normal.

In the present invention, as in Kruegel's study, the distribution of the letters of a string s is defined as the value of the relative frequency of each letter of s sorted. In the character distribution model, the ideal character distribution of each factor is calculated during training. The frequency of characters from 0 to 255 is divided into six sets of [0], [1,3], [4,6], [7,11], [12,15], and [16,255], and each is written as do. The test shows the probability of the deviation from the ideal character distribution. The value obtained from Equation 3 is converted into a table with 5 degrees of freedom, and the probability that the factor is normal.

In the token model, it is determined whether a corresponding factor is an enumeration type during training, and in the case of an enumeration type, a hash value of possible factor values is stored. Determining whether an argument is an enumeration follows the method used in Kruegel's work. This model applies only to the arguments of the enumerated type. At detection, the hash value of the value of the factor to be examined is compared with the stored normal values.

If the data type of the argument is numeric, a non-numeric value or a value outside the normal range may be considered abnormal. If the factor is a numeric type, the average length variance of the value is calculated for each factor in training. In the test, Chebyshev inequality is used to find the probability p (v) that the factor value v is normal.

Since one character is one byte, it has one of 256 values of 0 ~ 255 and belongs to one of five character sets as shown in Table 2. In the character composition model, the average of the frequencies in all cases is calculated during training to obtain the distribution of each character set in the ideal case. Is a probability value between 0 and 1, which, in an ideal case, represents the probability that one letter of the argument belongs to the part. At detection time, the character composition of the value of the argument is obtained and then tested by how much different it is from the ideal case.

In the structural analysis model, the structure of the factor value is determined during training. If the argument value has a structure, that structure is determined by special characters. First, through the tokenization process, consecutive plain text is regarded as one token, and each special character is regarded as one token. Tokenize each of the profiling data and then create a state machine for each. Each state machine is made into a state machine using state merging. At the time of detection, it verifies that the factor to be examined conforms to the structure. If you follow the structure, it can be judged as normal and if it is not followed, it can be judged abnormal.

Save the normal parameter configuration of the URL during training. This is done by hashing consecutive argument names in the normal argument configuration and storing the hash value. At the time of detection, it checks the hash value of the composition of the factor to check and if it belongs to the normal value, it is determined as normal.

The present invention proposes a method of selectively applying only models that are important to each factor without applying all models to all factors. After profiling, the characteristics of each factor are identified, and the model set to be applied in the factor is configured in advance.

The method for determining the model set is made up of four steps. First, we remove models that cannot be applied depending on the data type and value characteristics of the factor. Second, determine what model is important for that factor. Third, determine which model is preventing the detection of the factor. Fourth, determine what models are duplicated in the argument.

Table 1

Table 2

Equation 1

Equation 2

Equation 3

Claims

Improved performance by modifying existing positive models based on profile to prevent input manipulation by checking input value in front of web server.

Length model;

Character distribution model;

Token model;

New models to reduce false positives,

Value range model;

Character composition model;

Structural analysis model;

Factor configuration model;