CN109936554B

CN109936554B - Detection method and device for distributed denial of service

Info

Publication number: CN109936554B
Application number: CN201711450402.1A
Authority: CN
Inventors: 陈君; 黄河
Original assignee: Institute of Acoustics CAS; Beijing Intellix Technologies Co Ltd
Current assignee: Zhengzhou Xinrand Network Technology Co ltd; Institute of Acoustics CAS
Priority date: 2017-12-19
Filing date: 2017-12-27
Publication date: 2021-04-20
Anticipated expiration: 2037-12-27
Also published as: CN109936554A

Abstract

The invention relates to a method and a device for detecting distributed denial of service, wherein the method comprises the following steps: iterative projection is performed on the training data to determine a first projection space. And projecting the received test data to a first projection space, and determining the projection of the test data. And determining the safety of the test data according to the distance between the projection of the test data and the training data in the first projection space. The method provided by the embodiment of the invention uses less system resources, and can effectively ensure that the firewall distinguishes the data source initiating the DDoS attack at a higher rate.

Description

Detection method and device for distributed denial of service

Technical Field

The invention relates to the field of network security, in particular to a method and a device for detecting distributed denial of service.

Background

Distributed denial of service (DDoS) refers to an attacker using a distributed technology to control a plurality of computers to perform a disk operating system (DoS) attack on one or more victims, so as to prevent the victims from providing services normally or directly cause system crash.

A conventional DDoS attack is to utilize a vulnerability of a lower layer protocol (especially, a network layer protocol), send a large amount of useless packets or forge a Transmission Control Protocol (TCP) connection, block a network or consume host resources. DDoS has two attack modes: bandwidth-exhausted and host resource-consumed. The bandwidth exhaustion type mainly comprises the steps that a large number of legal HTTP requests are sent to occupy the bandwidth of a target network, so that a normal user cannot access the Web; the purpose of the host resource exhaustion is to exhaust the host resources (e.g., Central Processing Unit (CPU), memory, etc.), and an attacker uses a small number of HTTP requests to cause the server to return large files (e.g., images, video files, etc.), or to cause the server to run some complex script (e.g., cryptographic calculation and authentication, etc.). The attack can quickly exhaust the resources of the host without high speed, and is more hidden. Both of these attack methods are highly covert, but their surface features are difficult to distinguish from normal user access behavior.

Disclosure of Invention

The invention aims to solve the problem of low efficiency caused by excessive feature retention when a mathematical feature part is extracted in the prior art.

To achieve the above object, in one aspect, the present invention provides a method for detecting a distributed denial service, including: iterative projection is performed on the training data to determine a first projection space. And projecting the received test data to a first projection space, and determining the projection of the test data. And determining the safety of the test data according to the distance between the projection of the test data and the training data in the first projection space. The method provided by the embodiment of the invention uses less system resources, and can effectively ensure that the firewall distinguishes the data source initiating the DDoS attack at a higher rate.

In an alternative implementation, the step of "iteratively projecting the training data to determine the first projection space" may include: and projecting the training data by adopting an iterative method of a projection function according to the principle of maximizing the relative entropy to obtain a new projection space. When the new projection space is no longer offset, the new projection space is the first projection space.

In another alternative implementation, the "iterative method of projection function" may include: and (4) fixed-point iteration.

In yet another alternative implementation, the method may include: the number of each group of test data is a fixed value; and if the received test data is larger than the group number of the received test data, randomly selecting the test data with a fixed group number.

In yet another alternative implementation, the step of determining the safety of the test data according to the distance between the projection of the test data and the training data in the first projection space may include: and determining the safety of the measurement data according to the measurement of the Euclidean distance.

In yet another alternative implementation, before the step of "performing an iterative projection operation on the training data to determine the first projection space", the method may further include: screening the training data to determine the connection characteristics of the training data; centralizing the training data and determining the mathematical features of the centralization of the training data.

In yet another alternative implementation, the "connection feature of the training data" may include at least one of: transmission control protocol TCP connection characteristics and traffic statistics characteristics. Wherein, the flow statistical characteristics include: network traffic statistics based on the network traffic statistics of the host and the time.

In yet another alternative implementation, the step of "centralizing the training data and determining the mathematical characteristics of the centralization of the training data" may include: and amplifying the training data by adopting a sphericizing operation.

In another aspect, the present invention provides a device for detecting a distributed denial of service, where the device may include: and the calculation module is used for carrying out iterative projection on the training data and determining a first projection space. The projection module is used for projecting the received test data to a first projection space and determining the projection of the test data; and the processing module is used for determining the safety of the test data according to the distance between the projection of the test data and the training data in the first projection space.

In an alternative implementation manner, the "calculation module" may specifically be configured to: and projecting the training data by adopting an iterative method of a projection function according to the principle of maximizing the relative entropy to obtain a new projection space. When the new projection space is no longer offset, the new projection space is the first projection space.

In yet another alternative implementation, the "processing module" may be specifically configured to: and determining the safety of the measurement data according to the measurement of the Euclidean distance.

In yet another alternative implementation, the apparatus may further include: the selection module is used for screening the training data and determining the connection characteristics of the training data; centralizing the training data and determining the mathematical features of the centralization of the training data.

In yet another alternative implementation, the "selection module" may be specifically configured to: and amplifying the training data by adopting a sphericizing operation.

Drawings

Fig. 1 is a flowchart of a method for detecting a distributed denial of service according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a spheronization operation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of another spheronization operation according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a distributed denial of service detection apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Fig. 1 is a flowchart of a method for detecting distributed denial of service according to an embodiment of the present invention, where attack traffic data of a known type is required as a training sample, and a sphericization operation is used. The method comprises the following steps:

as shown in fig. 1, the method includes S101-S107:

s101: training data is input.

Specifically, the inputs to the detection function include one or more of: an attack data set for training, a test data set, and a setter capable of controlling the number of axes of projection and the form of an iterative function.

For example: when training data are input to the function entry, the number of the required training data is not too large, so that too much storage space is occupied, the processing speed is slowed down, and the judgment accuracy is too low; in the method provided in this embodiment, the number of training data for each specific type of attack is 5, and in other embodiments, the number of training data allocated to each type of attack may be other numbers between 5 and 10.

S102: and screening the training data to determine the connection characteristics of the training data.

Specifically, the training data is screened, connection features of the training data are determined, data near the mean value are expanded to highlight statistical features, and then the training data are added to an iteration queue.

The connected features of the training data include at least one of: transmission control protocol TCP connection characteristics and traffic statistics characteristics. Wherein, the flow statistical characteristics include: network traffic statistics based on the network traffic statistics and time of the host, such as: the number of REJ packets received within a specified time.

For example: the screened input data includes 15 subclasses of the following four major classes:

the first type: the connection basic features of the TCP may include: connection duration, of the continuous type, in seconds; the number of bytes of data exchanged between the initiating host and the destination host is of a discrete type, in seconds.

The second type: the connection content characteristics of the TCP may include: the number of times of accessing the sensitive directory and file of the system, and the discrete value; the ratio of login failure or success, and a continuous value; number of operations to create a file, discrete value.

In the third category: the time-dependent flow statistics may include: the last 2 seconds, the connection number and the continuous value of the target host have the same values as those of the current connection; the number of connections, consecutive values, having the same service as the current connection, in the last 2 seconds; percentage of SYN/REJ error connections, consecutive values, that occurred in the last 2 seconds, among connections having the same target or service as the current connection; the percentage, continuous value, of the connections having the same or different service connection as the current connection occurs in the last 2 seconds, of the connections having the same target as the current connection; within the last 2 seconds, a percentage, continuous value, of the connections having the same service as the current connection, with the same or a different target connection than the destination connection, appears.

The fourth type: the traffic statistics associated with the target host may include: the number of connections, discrete values, of the last 1000 connections that have the same goal as the current connection; the percentage, continuous value, of the last 1000 connections that have the same target and the same or different service as the current connection; the percentage of connections of the last 1000 connections that have the same destination as the current connection and have the same source port or a different source port, continuous; percentage, consecutive value, of the most recent 1000 connections, with connections having the same goal as the current connection, where SYN or REJ error connections occurred; the percentage of SYN or REJ error connections, consecutive values, that occur in the last 1000 connections, and in the connections that have the same target and the same service as the current connection.

S103: centralizing the training data and determining the mathematical features of the centralization of the training data.

Specifically, the training data is centralized and magnified with unequal scale to highlight mathematical features near its center. The unequal-scale amplification may specifically be amplification by a sphericizing operation, for example: the data near the value 0 is amplified by adopting the sphericizing operation, so that the characteristic of the sparse part of the far-end data is prevented from being excessively acquired, and the data characteristic crowded near the value 0 is displayed. The method provided by the embodiment of the invention is used for researching the data characteristics near the 0 value, so that the data close to the mean value is expanded by adopting a data sphericization method, and the data characteristics in the data are shown. The sphere radius of the sphericizing operation is a moderate value positively correlated to the data variance, in this embodiment, the sphere radius is 1, and in other embodiments, the sphere radius may be adjusted according to the variance. Specifically, referring to fig. 2 and 3, fig. 2 is a schematic view before no change, and fig. 3 is a schematic view after a spheronization operation.

S104: iterative projection is performed on the training data to determine a first projection space.

Specifically, the training data is projected by adopting an iterative method of a projection function according to the principle of maximizing the relative entropy to obtain a new projection space. And reducing the dimensionality of the projection space by neglecting the projection direction with excessively low measurement, and judging whether the projection space is not deviated any more. The direction in which the relative entropy is maximized is the direction in which the data association information is minimum, in the method provided by this embodiment, classification is performed according to the direction, so that the correlation degree between the projected data is minimum, and in other embodiments, the projection axis may be set to different directions according to different requirements for the correlation degree.

When the new projection space is not deviated any more, the new projection space is the first projection space; when the new projection space shifts again, operation S104 is repeated until the projection space does not shift any more.

The iterative method of the projection function may include: and (4) fixed-point iteration. Specifically, a fixed point iteration method is selected for iteration of the projection function, and in order to enable a fitting result to be better, the iteration function needs to satisfy the similarity with the original data distribution as much as possible except for the nonlinearity and the convergence. The iteration function is a function close to the probability distribution characteristics of the data, and in the embodiment, the results obtained by different functions are compared and selected as g (x) x.e^-xIn other embodiments, a function with a heavy tail that performs better may be selected.

And (3) projecting the training data by adopting an iterative method of a projection function according to the principle of maximization of the relative entropy, judging whether the first projection space is possible to be further shifted and compressed in each round, and repeating the compression until the parameters of the first projection space are stabilized at a proper value, wherein the space contains enough data characteristics.

In addition, in the selected first projection space, the relative entropy between any two directions is as large as possible, so that the interference degree between every two base signals is minimum, and the dimensionality of the projection space can be reduced as much as possible without losing information.

S105: and projecting the received test data to a first projection space, and determining the projection of the test data.

Specifically, the number of each group of test data is a fixed value; and if the received test data is larger than the group number of the received test data, randomly selecting the test data with a fixed group number to prevent an attacker from forging the safety data at a specific position.

S106: and determining the safety of the test data according to the distance between the projection of the test data and the training data in the first projection space.

Specifically, whether the test vector is safe or not is judged according to the distance between the projection of the test data and the training data (which may be called a base vector) in the first projection space, and if the training data is not in more than one group, the test vector needs to be far enough away from the hypercube formed by all the training vectors to be safe.

In addition, the safety of the measurement data can be determined according to the measurement of the Euclidean distance. In the embodiment provided by the invention, the risk of data is judged by using the measurement of the Euclidean distance in the projection space instead of the size of the included angle, the calculated amount is slightly increased, the misjudgment of impulse flow in the same direction is avoided, and the misjudgment is safe. The euclidean distance close enough is a distance that can separate the dangerous data as much as possible and reduce the false determination rate, and in this embodiment, the threshold value is set to 1.0500, and when the threshold value is smaller than the threshold value, the test data is determined to be safe, and when the threshold value is larger than the threshold value, the test data is determined to be dangerous. In other embodiments, adjustments are made depending on the particular attack type and network environment.

S107: it is determined whether there is a new test request.

Specifically, if there is a new test request, the process returns to S105, and the process is repeated until there is no new test request. And if no new test request exists, ending the test.

Fig. 4 is a schematic structural diagram of a distributed denial of service detection apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus may include: the calculation module 401 is configured to perform iterative projection on the training data to determine a first projection space. A projection module 402, configured to project the received test data to a first projection space, and determine a projection of the test data; a processing module 403, configured to determine security of the test data according to a distance between the projection of the test data and the training data in the first projection space.

The calculating module 401 may specifically be configured to: and projecting the training data by adopting an iterative method of a projection function according to the principle of maximizing the relative entropy to obtain a new projection space. When the new projection space is no longer offset, the new projection space is the first projection space.

The iterative method of the projection function may include: and (4) fixed-point iteration. The number of each group of test data is a fixed value; and if the received test data is larger than the group number of the received test data, randomly selecting the test data with a fixed group number.

The processing module 403 may specifically be configured to: and determining the safety of the measurement data according to the measurement of the Euclidean distance.

The above apparatus may further include: a selecting module 404, configured to screen the training data and determine connection characteristics of the training data; centralizing the training data and determining the mathematical features of the centralization of the training data.

Wherein the connection characteristics of the training data may comprise at least one of: transmission control protocol TCP connection characteristics and traffic statistics characteristics. Wherein, the flow statistical characteristics include: network traffic statistics based on the network traffic statistics of the host and the time.

The selection module 404 may be specifically configured to: and amplifying the training data by adopting a sphericizing operation.

The method provided by the embodiment of the invention uses less system resources, and can effectively ensure that the firewall distinguishes the data source initiating the DDoS attack at a higher rate. The method has the innovation points that: and (5) performing sphericization processing on data near the mean value point and the vertical projection to obtain features and performing iterative convergence. According to the method, a certain amount of attack data is used as a training set, the attack data is firstly subjected to centralization processing, and then spheroidization processing is performed on the data according to the characteristic that the data feature is concentrated on a zero value, so that the feature of a low-flow part is expanded. And then, according to the principle of the maximization of the relative entropy, obtaining more features by adopting vertical projection, selecting an iteration function with similar features to iterate the data, obtaining a new low-dimensional space related to the training data after iteration is stable, and determining whether the data is dangerous or not according to the position of the test data in the space. The method has the advantages of less iteration times, high running speed, more than 90% correct judgment rate on KDD99 attack data set, and obvious improvement compared with the general PCA dimension reduction treatment.

It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether these functions are performed in hardware or software depends on the particular application of the solution and design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for detecting a distributed denial of service, comprising the steps of: amplifying training data by adopting a sphericizing operation;

performing iterative projection on the amplified training data to determine a first projection space;

projecting the received test data to the first projection space, and determining the projection of the test data;

and determining the safety of the test data according to the distance between the projection of the test data and the training data in the first projection space.

2. The method of claim 1, wherein iteratively projecting the training data to determine a first projection space comprises:

projecting the training data by adopting an iterative method of a projection function according to the principle of maximizing the relative entropy to obtain a new projection space;

when the new projection space is no longer offset, the new projection space is the first projection space.

3. The method of claim 2, wherein the iterative method of projection functions comprises: and (4) fixed-point iteration.

4. The method of claim 1, wherein the number of sets of each set of the test data is a fixed value; and if the received test data is larger than the group number of the received test data, randomly selecting the test data with a fixed group number.

5. The method of claim 1, wherein determining the safety of the test data from the distance of the projection of the test data from the training data in the first projection space comprises:

and determining the safety of the test data according to the measurement of the Euclidean distance.

6. The method of claim 1, wherein the connected features of the training data comprise at least one of: a Transmission Control Protocol (TCP) connection characteristic and a flow statistic characteristic;

wherein the traffic statistic characteristics include: network traffic statistics based on the network traffic statistics of the host and the time.