CN117014193A

CN117014193A - Unknown Web attack detection method based on behavior baseline

Info

Publication number: CN117014193A
Application number: CN202310845725.XA
Authority: CN
Inventors: 吴新龙; 何枭男; 杨帆; 白静文
Original assignee: Huaxin Consulting Co Ltd
Current assignee: Huaxin Consulting Co Ltd
Priority date: 2023-07-11
Filing date: 2023-07-11
Publication date: 2023-11-07

Abstract

The invention discloses an unknown Web attack detection method based on a behavior baseline, which solves the problem that the existing Web application attack detection methods are insufficient. The method comprises the steps of collecting normal behavior data of a user; establishing a behavior baseline model according to the normal behavior data; abnormal behavior detection is carried out according to the behavior baseline model, and an unknown attack detector is formed; and detecting abnormal behaviors of the actual behaviors of the user through an unknown attack detector. The invention adopts the behavior baseline method to detect the Web attack, and compared with the traditional rule or statistics-based method, the unknown attack can be detected more accurately. The invention combines a plurality of abnormal detection algorithms and techniques, and can effectively detect abnormal behaviors and intrusion behaviors.

Description

Unknown Web attack detection method based on behavior baseline

Technical Field

The invention relates to the technical field of network security, in particular to an unknown Web attack detection method based on a behavior baseline.

Background

Web applications have become an indispensable part of our daily lives, however, as Web applications continue to grow and become popular, web application security issues have also become increasingly prominent. An attacker can bypass the existing security defense mechanisms through various means to attack and destroy the Web application, thereby causing significant economic and social losses. Therefore, how to secure Web applications is a urgent problem to be solved.

In terms of technical background, web application security technology has long developed, and currently mainly comprises three types: rule-based, statistics-based, and machine learning-based Web application attack detection methods.

The rule-based Web application attack detection method is a traditional method, and the main idea is to detect and intercept attacks by writing rules in advance according to known attack modes and features. However, the method has the problems of difficult rule writing, incapability of coping with unknown attacks and the like, and can not meet the safety protection requirement of the Web application program.

The statistical-based Web application attack detection method is based on some statistical methods, such as anomaly detection, anomaly change detection, etc. The advantage of this approach is that some new attacks can be detected, but there is also the problem of high false positive rates, as some normal operations may also be mistaken for an attack.

The Web application attack detection method based on machine learning is a security defense method which is popular in recent years, and the main idea is to learn normal and abnormal Web application behavior characteristics by training a machine learning model so as to realize detection and defense of the Web application attack. The method can judge whether a request is an abnormal request or not through the learned behavior characteristics, so that the unknown attack can be well dealt with. However, this approach has the disadvantage of requiring a large amount of data to train and also requires adjustment and optimization for different Web applications.

Disclosure of Invention

The invention mainly solves the problem that the existing Web application program attack detection methods are insufficient, and provides an unknown Web attack detection method based on a behavior baseline.

The technical problems of the invention are mainly solved by the following technical proposal: an unknown Web attack detection method based on a behavior baseline comprises the following steps:

step one: collecting normal behavior data of a user;

step two: establishing a behavior baseline model according to the normal behavior data;

step three: abnormal behavior detection is carried out according to the behavior baseline model, and an unknown attack detector is formed;

step four: and detecting abnormal behaviors of the actual behaviors of the user through an unknown attack detector.

The invention adopts the behavior baseline method to detect the Web attack, and compared with the traditional rule or statistics-based method, the unknown attack can be detected more accurately. Since the conventional method hardly covers all the attack behaviors, the behavior baseline-based method can reflect different behavior patterns of normal users and attackers, so that the attack behaviors can be detected more accurately. The invention combines a plurality of abnormal detection algorithms and techniques, and can effectively detect abnormal behaviors and intrusion behaviors. The behavior baseline model is regarded as a reference model of the Web application program and is used for describing a normal behavior mode, and the unknown network traffic can be monitored by combining an anomaly detection algorithm to identify the anomaly behavior in the unknown network traffic.

As a preferable scheme, the normal behavior data of the user comprises behavior characteristics of the user and traffic corresponding to the behavior.

As a preferable scheme, the generation of the user behavior data in the first step is to simulate the normal behavior of the user on the website by adopting an automatic testing tool, and generate normal flow.

The automatic testing tool is adopted to simulate the normal behavior of the user on the website, wherein the normal behavior comprises the behavior of accessing pages, clicking links, submitting forms and the like. Processing the collected network traffic data, performing data cleaning and feature extraction, converting the network traffic data into a form which can be processed by a machine learning algorithm, and simultaneously eliminating unnecessary data interference.

As a preferable solution, the detecting abnormal behavior in the third step according to the behavior baseline model specifically includes:

and receiving behavior information, analyzing behavior characteristics of the behavior, comparing the behavior with a behavior baseline model, and detecting abnormal behavior by combining a plurality of abnormal detection algorithms. The scheme combines various anomaly detection algorithms and technologies, and can effectively detect abnormal behaviors and intrusion behaviors.

As a preferable scheme, the method for detecting abnormal behavior by combining a plurality of abnormal detection algorithms specifically comprises the following steps:

setting an anomaly score threshold value, and setting the result of each anomaly detection algorithm by a score system;

detecting the input behavior and the behavior baseline model by adopting a plurality of abnormal detection algorithms;

and obtaining the scores of the abnormal detection algorithms after detection, adding all the scores, comparing the scores with an abnormal score threshold, judging the current behavior as normal behavior if the total score is smaller than the abnormal score threshold, and judging the current behavior as abnormal behavior if the total score is not smaller than the abnormal score threshold. By adopting a plurality of abnormal detection algorithms to cooperatively work, the real abnormal behavior can be identified, false alarm is reduced, the false alarm rate is reduced, and unnecessary interference caused by false alarm is reduced.

As a preferred scheme, the plurality of abnormality detection algorithms includes a collaborative filtering algorithm, an outlier detection algorithm, an abnormality analysis algorithm, and a cluster analysis algorithm. The abnormality detection algorithm of the scheme is mainly an abnormality detection algorithm based on statistics and an abnormality detection algorithm based on machine learning.

As a preferable scheme, collecting normal and abnormal behavior data of a user and preparing a training sample and a test sample;

training the unknown attack detector by using a training sample, testing the unknown attack detector by using a test sample after training, and performing parameter adjustment according to a test result until the test result meets the set precision requirement to obtain the final unknown attack detector.

In the scheme, the position attack detector is trained, parameter adjustment is carried out according to the test result, and finally the unknown attack detector with higher precision is obtained. The unknown attack detector is capable of monitoring unknown behavior and identifying abnormal behavior therein. When an unknown attack detector is constructed, data of a plurality of users are required to be collected and counted and analyzed to accurately reflect the behavior patterns of normal users in consideration of possible differences of behavior habits and behavior characteristics of different users. At the same time, care is also taken not to misinterpret the behavior of the attacker as normal behavior.

As a preferred scheme, the normal behavior data of the user is collected periodically, and the normal baseline model is updated and adjusted. The scheme enables the behavior baseline model to be automatically updated and adjusted along with time and data changes, so that adaptability is maintained.

Therefore, the invention has the advantages that: the method of behavior baseline is adopted to detect Web attack, and compared with the traditional method based on rules or statistics, unknown attack can be detected more accurately. Since the conventional method hardly covers all the attack behaviors, the behavior baseline-based method can reflect different behavior patterns of normal users and attackers, so that the attack behaviors can be detected more accurately. By combining various anomaly detection algorithms and technologies, the anomaly behavior and the intrusion behavior can be effectively detected.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

The technical scheme of the invention is further specifically described below through examples and with reference to the accompanying drawings.

Examples:

the unknown Web attack detection method based on the behavior baseline in the embodiment, as shown in fig. 1, comprises the following steps:

step one: collecting normal behavior data of a user;

the normal behavior data includes behavior characteristics of the user and traffic corresponding to the behavior. And simulating normal behavior of the user on the website by adopting an automatic testing tool to generate normal flow.

Preprocessing the collected network traffic data, performing data cleaning and feature extraction, converting the network traffic data into a form which can be processed by a machine learning algorithm, and simultaneously eliminating unnecessary data interference.

Step two: establishing a behavior baseline model according to the normal behavior data; the behavioral baseline model is a dataset. And the normal behavior data of the user are collected periodically, and the normal baseline model is updated and adjusted. So that the behavior baseline model is automatically updated and adjusted over time and data changes, thereby maintaining fitness.

Step three: abnormal behavior detection is carried out according to the behavior baseline model, and an unknown attack detector is formed; the method comprises the following steps:

and receiving behavior information, analyzing behavior characteristics of the behavior, comparing the behavior with a behavior baseline model, and detecting abnormal behavior by combining a plurality of abnormal detection algorithms.

The abnormality detection algorithm mainly comprises an abnormality detection algorithm based on statistics and an abnormality detection algorithm based on machine learning, and comprises a collaborative filtering algorithm, an outlier detection algorithm, an abnormality analysis algorithm and a cluster analysis algorithm. The specific abnormality detection process comprises the following steps:

detecting the input behavior and the behavior baseline model by using four abnormal detection algorithms respectively;

and obtaining the scores of the abnormal detection algorithms after detection, adding all the scores, comparing the scores with an abnormal score threshold, judging the current behavior as normal behavior if the total score is smaller than the abnormal score threshold, and judging the current behavior as abnormal behavior if the total score is not smaller than the abnormal score threshold.

An unknown attack detector is established, and the detector is trained by adopting a deep neural network, and the specific process is as follows:

collecting normal and abnormal behavior data of a user to manufacture a training sample and a test sample;

Step four: and detecting abnormal behaviors of the actual behaviors of the user through an unknown attack detector. And when abnormal behavior is detected, an alarm signal is sent out to prompt a network administrator to perform corresponding processing.

The following describes various anomaly algorithms;

first, collaborative filtering algorithm is an algorithm for recommendation system, but can also be used in anomaly detection. The basic idea of collaborative filtering algorithms is to recommend items or identify unusual behavior based on the similarity between users. In anomaly detection, we can consider a user as a data point, an item as a feature, and then identify anomalous behavior by calculating the similarity between users. The specific implementation may be a user-based collaborative filtering algorithm or an item-based collaborative filtering algorithm.

In the present invention, collaborative filtering algorithms are used to infer user behavior. Specifically, the algorithm infers the behavior of the current user by comparing the historical behaviors of different users and performs intrusion detection based on these behaviors.

Collaborative filtering algorithms can be categorized as user-based and item-based collaborative filtering. User-based collaborative filtering approaches infer user behavior from similarity of behavior between users, while item-based collaborative filtering approaches infer user behavior from similarity between items. The invention adopts a collaborative filtering method based on users, which is concretely realized as follows:

first, a user behavior matrix is constructed. Each row of the matrix represents a user and each column represents a type of behavior, such as the number of visits to a website, the number of emails sent, etc. The value of the user on each column represents the historical number of actions of the user on that action type.

Then, the similarity between users is calculated. In the present invention, cosine similarity is used as a similarity measure. Specifically, let u and v be two users, a _i And b _i The historical behavior times of the users u and v on the behavior type i are respectively, n is the total number of the behavior types, and then the similarity sim (u, v) of the users u and v can be calculated as follows:

finally, the behavior of the current user is inferred from the similarity. Specifically, for the current user $u$, k users with highest similarity are found, and the weighted average of the historical behavior times of the users on each behavior type is used as the predicted behavior times of the current user on each behavior type. Wherein the weight is similarity. Namely:

wherein b _i (u) represents the number of predicted behaviors of user u on behavior type i, a _i (v) The historical behavior times of the user v on the behavior type i are represented, and N (u) represents k users with highest similarity with the user u.

The behavior of the current user can be inferred through collaborative filtering algorithm, so as to perform intrusion detection.

The next is an outlier detection algorithm, which functions to identify outlier data points that differ from the majority of data points. There are a variety of outlier detection algorithms, such as statistical-based outlier detection, cluster-based outlier detection, etc. Among them, more commonly, is statistical-based outlier detection, which identifies outlier data points by calculating the degree of dispersion of the data points. The specific implementation mode can be an outlier detection algorithm based on Gaussian distribution or an outlier detection algorithm based on a box diagram.

In the present invention, an outlier detection algorithm is used to detect outlier data points in network traffic data. The main idea is to calculate the similarity or distance between each data point and other data points and treat data points with too large a distance as outliers, i.e. outlier data points.

In a specific implementation, an outlier detection algorithm based on statistics, such as a Z-Score algorithm based on a mean and a standard deviation, a median and a Median Absolute Deviation (MAD) based MAD-Z-Score algorithm, and the like, are adopted.

Taking the Z-Score algorithm as an example, the calculation formula is as follows:

Z＝(x-u)/η

where x represents the value of the data point, u represents the mean of all data points, and σ represents the standard deviation of all data points. When the Z value of a certain data point exceeds a certain threshold value, the Z value can be regarded as an outlier.

In the invention, the outlier detection algorithm can be used for detecting abnormal IP addresses or network connections in the network traffic data, thereby realizing the purpose of intrusion detection.

The following is an anomaly analysis algorithm, which is used for analyzing the anomaly and finding out the cause and the characteristics of the anomaly. The anomaly analysis algorithm can be implemented by means of data visualization, such as drawing scatter plots, histograms, density maps, and the like. In the scatter diagram, we can mark the abnormal behavior with different colors or symbols to observe the distribution. In the histogram and density map we can compare the distribution of abnormal behavior with the distribution of normal behavior in order to find the characteristics of the abnormal behavior.

In the invention, the anomaly analysis algorithm is mainly used for analyzing the behavior sequence and judging whether the anomaly behavior exists or not. The specific implementation steps are as follows:

calculating statistical characteristics of the behavior sequences: for each user, information such as behavior types, time stamps, behavior objects and the like in the behavior sequence is extracted, and statistical characteristics such as average values, variances, maximum values, minimum values and the like in each dimension are calculated.

And (3) constructing a model: based on statistical features of the behavior sequence, a model is built, and a machine learning algorithm (such as a decision tree, a support vector machine, etc.) or a statistical method (such as a Bayesian model, a Gaussian model, etc.) can be selected.

Detecting abnormal behavior: and predicting a new behavior sequence by using the constructed model, and judging that the behavior sequence has abnormal behavior if the predicted result has a larger difference from the actual result.

The following gives a calculation formula for the anomaly analysis algorithm based on the gaussian model:

assuming that the behavior features in a dimension obey a gaussian distribution, the probability density function in that dimension is:

where μ is the mean in that dimension and σ is the standard deviation in that dimension. For a multidimensional behavioral characteristic, a multivariate gaussian distribution can be used:

where n is the number of dimensions, μ is the mean vector over each dimension, Σ is the covariance matrix. Given a behavior sequence, the probability density value of the behavior sequence under the model can be calculated, and if the probability density value is smaller than a set threshold value, the behavior sequence is judged to have abnormal behaviors.

Finally, a cluster analysis algorithm is used to classify data points into several categories for classification analysis of abnormal behavior. There are a variety of cluster analysis algorithms, such as K-Means clustering, DBSCAN clustering, and the like. Among them, the K-Means clustering algorithm is more commonly used, which can group data points into K clusters and find the center point of each cluster. In anomaly detection, we can consider the anomaly behavior as a special cluster, and then cluster the data points into clusters by the K-Means clustering algorithm to perform classification analysis on the anomaly behavior.

In the present invention, a cluster analysis algorithm is used to divide the behavior data into different groups for better anomaly detection. In particular, the cluster analysis algorithm may divide the behavior data into groups with similar behavior characteristics to determine differences between normal behavior and abnormal behavior.

Common clustering algorithms are hierarchical clustering and K-Means clustering algorithms. In the invention, a K-Means clustering algorithm can be adopted to cluster the behavior data.

The basic idea of the K-Means clustering algorithm is: k centroids are randomly selected, all data points are distributed to clusters where the centroids closest to the data points are located, the centroids of each cluster are recalculated, and the steps are repeated until the positions of the centroids are not changed any more or the preset iteration times are reached. The specific algorithm flow is as follows:

1. k centroids are randomly selected.

2. All data points are assigned to clusters in which the centroid closest to them is located.

3. For each cluster, its centroid position is recalculated.

4. Repeating the steps 2 and 3 until the position of the centroid is not changed any more or a preset number of iterations is reached.

The K-Means clustering algorithm has the advantages of simplicity, easy understanding and high computational efficiency, and has the disadvantages of requiring the number K of clusters to be determined in advance and being relatively sensitive to the selection of the initial centroid.

In the present invention, a K-Means clustering algorithm may be used to divide the behavioral data into different groups and perform anomaly detection for each group. Specifically, the operation can be performed according to the following steps:

1. all behavior data is represented in the form of vectors, each vector containing a set of features.

2. The appropriate K value is selected to divide all behavior data into K clusters.

3. For each cluster, an outlier score is calculated for its internal data points, and if the outlier score exceeds a certain threshold, the cluster is marked as an outlier cluster.

4. The data points in all abnormal clusters are marked as abnormal data.

The method for specifically calculating the outlier Score may be Z-Score method, MAD method, or the like. Wherein the Z-Score method calculates an outlier Score by calculating the distance between the data point and the cluster centroid and converting the distance to a normalized Score; the MAD method then calculates outlier scores by calculating the median distance between the data points and cluster centroids.

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. An unknown Web attack detection method based on a behavior baseline is characterized by comprising the following steps of: the method comprises the following steps:

step one: collecting normal behavior data of a user;

2. The unknown Web attack detection method based on the behavior baseline according to claim 1, wherein the normal behavior data of the user comprises behavior characteristics of the user and traffic corresponding to the behavior.

3. The method for detecting unknown Web attacks based on behavioral baselines according to claim 2 wherein the step one of generating user behavioral data is to simulate the normal behavior of a user on a Web site using an automated test tool to generate normal traffic.

4. The unknown Web attack detection method based on the behavior baseline according to claim 1, wherein the abnormal behavior detection is performed according to the behavior baseline model in the third step, and the method specifically comprises the following steps:

5. The unknown Web attack detection method based on the behavior baseline according to claim 4, wherein the abnormal behavior detection is performed by combining a plurality of abnormal detection algorithms, and specifically comprises:

6. The method for detecting unknown Web attacks based on behavioral baselines according to claim 4 or 5, wherein the plurality of anomaly detection algorithms includes a collaborative filtering algorithm, an outlier detection algorithm, an anomaly analysis algorithm, and a cluster analysis algorithm.

7. The unknown Web attack detection method based on the behavior baseline according to claim 1, wherein the unknown Web attack detection method is characterized in that:

8. The unknown Web attack detection method based on the behavior baseline according to claim 1, wherein the user normal behavior data is collected periodically, and a normal baseline model is updated and adjusted.