SQ L injection detection method based on user behavior credibility analysis
Technical Field
The invention relates to web security detection, in particular to an SQ L injection detection method based on user behavior credibility analysis.
Background
At present, among a plurality of Web application vulnerabilities, SQ L injection vulnerabilities become places where attacks can be frequently utilized, and are one of the oldest, most popular and most dangerous Web application program vulnerabilities.
The SQ L injection is an attack of adding or inserting illegal SQ L statement fragments into request parameters of the Web application, the inserted SQ L statement fragments can dynamically construct SQ L statements in the Web application, the illegal SQ L code fragments can cause the change of the grammar and the semantics of the original SQ L statements, and finally the Web background application can transmit the modified illegal SQ L statements to a database for analysis and execution.
SQ L injection attacks can be broadly divided into two categories:
(1) direct attack-the attack code is inserted directly into the request parameters, which are then placed into the SQ L command for execution when the SQ L command is dynamically constructed.
(2) When the Web application needs to use the character strings stored in the database before to complete certain functions, the stored character strings are taken out of the database and are placed into an SQ L statement spliced dynamically, and therefore malicious SQ L code can be executed.
Due to the fact that the SQ L language has the characteristic of diversity, the language structure is changeable, the SQ L language also supports different coding modes, if Web applications do not verify the untrusted data used for constructing the SQ L code, the data are likely to cause structural and semantic changes of the original SQ L statement, if an attacker finds such a vulnerability, user information and sensitive data are likely to be leaked, the attacker can acquire an account password of a database system, improve the authority of a common user, authorize the normal user, and change or delete information in the database.
The main detection methods for SQ L injection in the conventional technology include the following:
1. program analysis:
the SQ L Probe is an SQ L injection detection system realized based on program analysis, the detection system statically detects Web application codes through a data flow tracking technology, analyzes and determines a path of user data input to obtain a suspicious injection point in a Web application program, then carries out syntax and lexical analysis on the suspicious injection point to establish an abstract SQ L statement representation form, uses a finite state machine to establish and store a legal query statement state model in the Web application, converts the generated SQ L statement into an abstract syntax tree in the program dynamic running process, matches the abstract syntax tree with a prestored legal model, and proves that an attack occurs if the match is wrong, but the detection accuracy mainly depends on the accuracy of the static analysis process and the correctness of the SQ L statement abstract model.
2. Black box test and white box test:
the method comprises the steps of testing whether an SQ L injection vulnerability exists in an application system or not by simulating SQ L injection attack behavior characteristics and according to a server response result in the actual operation of the system under the condition of not knowing source codes, and testing and analyzing which places have the SQ L injection vulnerability and searching specific positions of the vulnerability from the source codes under the condition of obtaining Web application source codes by a white box test, wherein the white box test needs to obtain the source codes of the system by detecting the SQ L injection attack, and the black box test needs to detect the SQ L injection attack without installing any program on the server, does not need to obtain the source codes of the system and know an internal implementation form, but the detection accuracy depends on a complete test case.
3. Static analysis and dynamic analysis:
one advantage of static code analysis is that the Web application under test does not need to be run in advance, but rather, the data flow and the control flow of the program are scanned, and the program is analyzed to find whether codes with known vulnerability type patterns exist, so as to determine whether SQ L vulnerability injection exists.
4. Pattern matching:
researchers at home and abroad combine static pattern detection with dynamic feature filtering to provide a method for defending SQ L injection attacks, the method extracts all normal and legal SQ L sentences to construct a knowledge base through automatically learning a sample set, then, under the condition of running a system, the dynamically constructed SQ L sentences are matched with established patterns by utilizing a pre-constructed pattern base, if the matching is successful, the sentences are represented as legal SQ L sentences, and if the matching is failed, the sentences are judged as illegal SQ L sentences.
5. And (3) sequence comparison:
sequence alignment is to analyze the structure of the SQ L statement to determine whether there is an attack behavior, but the method has limited kinds of SQ L injection attacks.
6. Proxy firewall:
the firewall agent detection is a database firewall technology which is positioned between a Web server and a background database server and is used for preventing the attack of a background Mysql database aiming at Web application, the agent firewall monitors an SQ L query request from a client and analyzes the legality of the query request, the method carries out detection from two aspects, firstly analyzes and filters a user input data part, and secondly carries out detailed analysis on an SQ L syntactic structure.
7. Context sensitive string evaluation:
a detection method based on context-sensitive character string evaluation is based on the principle reason that SQ L injection causes attacks, an attack source is defined as user data input, the user data input is divided into a character string type and a numerical value type, if the character string type is adopted, whether unsafe characters or character strings are contained in the character string or not is detected, and the SQ L injection attacks are proved to be possible once abnormity is detected.
In summary, most of the existing SQ L injection detection methods have some defects, such as too high false alarm rate and missing alarm rate, inability to dynamically detect attacks in real time, and difficulty in complete detection and low detection efficiency for some complicated attack forms.
Disclosure of Invention
The invention aims to solve the technical problem of providing a SQ L injection detection method based on user behavior credible analysis, realizing the real-time detection of SQ L injection attack, enhancing the safety and reducing the false alarm rate and the false alarm rate.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a SQ L injection detection method based on user behavior credibility analysis comprises the following steps:
a. collecting user behavior sample data when a user normally accesses and SQ L injects attacks;
b. counting the user behavior sample data to generate a feature set for identifying user behavior, and screening the feature data in the feature set to obtain a screened feature set;
c. training based on the screened feature set to obtain a behavior detection model;
d. and acquiring and processing real-time data of the user to be used as input of a behavior detection model, and judging whether the current behavior of the user is credible by using the behavior detection model.
As a further optimization, in step a, data such as a call log, a user access log and a weblog of the Web application are collected to collect a user behavior sample data set when the user normally accesses and when the SQ L injects an attack.
As a further optimization, step b specifically includes:
b1. counting the sample data of each user behavior by using a client ip as a unique identifier to generate a characteristic data set S for identifying the user behavior;
b2. and extracting the features in the feature data set S by using a Relief algorithm, filtering out irrelevant, redundant and non-difference characterization capability features, and obtaining an optimal feature set T as a screened feature set.
As a further optimization, in step b1, the statistics specifically include information contents such as a request type, UR L length, UR L access frequency, a parameter type, a parameter length, a parameter number, whether a parameter includes an SQ L keyword, time consumed by response time, whether a server processing is abnormal, and the like.
And c, as a further optimization, training based on the screened feature set to obtain a behavior detection model, specifically comprising the steps of training a classifier by using a K-means model, taking a feature vector matrix in the feature set T as input, dividing training data by using a K-means algorithm, and finally obtaining K divided clusters and cluster centers to obtain the detection model capable of distinguishing normal behaviors of the user from SQ L injection attack behaviors.
As a further optimization, step d specifically includes:
d1. collecting user real-time data;
d2. extracting the characteristics of the collected real-time user data, and processing the real-time user data into characteristic vectors capable of representing real-time user behaviors;
d3. inputting the feature vector of the real-time behavior of the user into the trained detection model;
d4. calculating to obtain a cluster distance from a feature vector of the real-time behavior of the user to a user behavior model through a detection model, and dividing the behavior into clusters with the closest distance, thereby judging whether the current behavior is a common user behavior or an SQ L injection attack;
d5. if the user behavior is normal, judging that the user behavior is credible; otherwise, the user behavior is judged to be not credible.
The invention has the beneficial effects that:
the method comprises the steps of carrying out sample collection on common user behaviors and SQ L injection attack behaviors, carrying out statistics on user behavior feature data according to differences of the common user behaviors and the SQ L injection attack behaviors, selecting features capable of identifying the user behaviors by using a Relief algorithm, then training a classifier by using a k-means model, effectively distinguishing the use behaviors of legal users and the use behaviors of attackers to obtain a behavior detection model, inputting the feature vectors of the real-time user behaviors into the trained model, calculating the cluster distance between the feature vectors of the real-time user behaviors and the user behavior model, and dividing the behaviors to the closest cluster to judge whether the current behaviors are credible.
Drawings
Fig. 1 is a flowchart of an SQ L injection detection method based on user behavior credibility analysis according to the present invention.
Detailed Description
Generally, the behavior habit of the user is kept relatively stable when the user uses the Web application, just like the daily habit of a person. When a user uses a Web application, the user often uses a fixed IP at a fixed location, uses a fixed tool, operates at a fixed frequency and with a relatively stable operation, and the information content of the operation is relatively fixed, which are the behavior habits of the user when using the Web application, and once the behavior of the user changes or frequently changes. We can suspect whether this user is a legitimate user.
We find through research that an attacker needs to go through three stages of searching for an injection point, confirming the injection point and acquiring database data to perform destruction operations when SQ L injects an attack, the attacker needs to continuously send a request to a Web application program, tamper the request data and try to improve user rights in the three stages, the behavior actions of the attackers are also traceable when the attackers attack, and the behavior of the attackers is roughly summarized into four categories:
(1) when an attacker carries out SQ L injection attack, the attacker can tamper with the parameter content in the request UR L or the Form, usually a code fragment with SQ L syntax and semantics or some special character strings are inserted into the request parameter and the Form, such as SQ L keywords and functions of select, union, from, exists and the like, and due to the generation of the character strings, the finally dynamically spliced SQ L statement can have the following execution conditions that the execution can still be normally carried out, the construction of the Web application program SQ L is wrong, the execution of the SQ L command by the database is wrong, and the execution response result of the SQ L command is delayed and returned.
(2) When an attacker makes an SQ L injection attack, the attacker usually sends a large number of probe requests to the Web application in a very short time to search for SQ L injection vulnerabilities, meanwhile, the number of network requests sent to the Web application server in a period of time is higher than that sent by normal users, and through research, the UR L frequency of requests sent by the Web application to the same IP, port and certain browsing content of the same user is changed when the attack is received.
(3) When an attacker carries out SQ L injection attack, the UR L content of a GET request of a normal browsing function also changes remarkably, for example, the length, the number and the content of parameters of the request UR L all change correspondingly, and meanwhile, after the Web application dynamically splices an SQ L statement, the syntax and semantic structure of an SQ L command transmitted to a database for execution also change.
(4) During the process, the data content in the database accessed by the attacker, the accessed data position and normal users also have obvious differences, meanwhile, the attacker tries to carry out illegal operation in order to improve the user authority and change the user authority, and the behavior track of the attacker carrying out SQ L injection has some obvious characteristics.
By analyzing the behavior of the attacker using the Web application program, the method helps to find out whether the Web application program is attacked or not in advance, can prevent the attacker from accessing the Web application program as soon as possible, and reduces the loss caused by users and enterprises.
Based on the above, the present invention provides a SQ L injection detection method based on user behavior credibility analysis, which realizes real-time detection of SQ L injection attack, enhances security, and reduces false alarm and false negative rate, as shown in fig. 1, the method specifically includes:
1. collect SQ L samples at time of injection attack and samples normally visited:
in this step, data such as a call log of the Web application, a user access log, and a weblog are collected as an original data set.
2. Selecting characteristics:
a) the data statistics is carried out on each sample of the sample set by using the client ip as a unique identifier, the request type, the UR L length, the UR L access frequency, the parameter type, the parameter length, the parameter number, whether the parameter contains an SQ L keyword or not, the response time is time-consuming, the server processes information contents such as whether an abnormality occurs or not, and feature set data S for identifying user behaviors is generated { S1, S2, S3 … sn }.
b) And selecting the features by using a Relief algorithm, and removing the features which are irrelevant, redundant and have no difference depicting ability to obtain an optimal feature set D.
The basic contents of the Relief algorithm are as follows: randomly selecting a sample R from the training set D, then searching a nearest neighbor sample H from samples in the same class as R, and searching a nearest neighbor sample M from samples in different classes from R. Repeating the above processes m times to obtain the weight of each feature. Features whose weight is less than a certain threshold will be removed and those that are greater than it will remain, eventually constituting a new subset of features.
Implementation of the Relief algorithm:
inputting: a sample set S, sampling times m and a characteristic weight threshold value R;
and (3) outputting: t is the output characteristic set;
p is a characteristic number, m is an iteration number, and n is a sample number;
dividing S into S1+Positive example and S1-1 ═ negative example }
Weight W ═ (0,0, …,0)
When the sampling times are less than the preset sampling times, the following operations are carried out:
(1) randomly selecting a sample X ∈ S;
(2) random selectionA positive case Z of a nearest neighbor to X+∈S+;
(3) Randomly selecting a negative Z of the nearest neighbors of distance X-∈S-;
(4) Near-hit ═ Z if X is a positive case+;Near-miss=Z-Otherwise, Near-hit ═
Z-;Near-miss=Z+;
(5) Calculating the value of the characteristic weight value Wi:
Wi=Wi-diff(xi,near-hiti)2+diff(xi,near-missi)2
(6) and sequencing the finally obtained Wi according to the size, and removing the features with the lowest weight to obtain a feature set T.
3. Feature training, generating a behavior detection model:
in the step, a K-means model is used for training a classifier, a feature vector matrix in a feature set T is used as input, a K-means algorithm is used for processing a training set, training data are divided, and finally K divided clusters and clustering centers are obtained to obtain a behavior detection model.
The K-means algorithm implementation process comprises the following steps:
the input is a sample set D ═ x1,x2,…xmH, clustering cluster tree k and maximum iteration number N;
the output is the cluster division C ═ C1,C2,…Ck};
Step 1: randomly select k samples from the data set D as the initial k centroid vectors: { u1,u2,…uk};
Step 2: n for N1, 2, · N;
a) initializing cluster partitioning C to
b) For i 1,2.. m, sample x is calculated
iAnd each centroid vector u
j(j — distance of 1,2, … k:
x is to be
iMinimum mark is d
ijCorresponding class λ
i. At this time, update C
λi=C
λi∪{x
i};
c) For j 1,2, k, pair C
jRecalculate new centroid for all sample points in the image
d) If all k centroid vectors have not changed, go to step 3;
and step 3: output cluster partitioning C ═ C1,C2,…Ck}。
4. And (3) behavior detection:
inputting the characteristic vector of the real-time user behavior into a trained model, calculating to obtain a cluster distance from the characteristic vector of the real-time user behavior to a user behavior model, and dividing the behavior into clusters with the closest distance, thereby judging whether the current behavior is a common user behavior or SQ L injection attack, wherein the user behavior is credible if the current behavior is the common user behavior, and the user behavior is not credible if the current behavior is the common user behavior.