CN111600894A

CN111600894A - Network attack detection method and device

Info

Publication number: CN111600894A
Application number: CN202010428046.9A
Authority: CN
Inventors: 孙尚勇
Original assignee: New H3C Security Technologies Co Ltd
Current assignee: New H3C Security Technologies Co Ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2020-08-28
Anticipated expiration: 2040-05-20
Also published as: CN111600894B

Abstract

The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting a network attack. The method comprises the following steps: counting time points at which the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking samples before the time points as target samples; determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature; respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP; and carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain the trained classification model.

Description

Network attack detection method and device

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting a network attack.

Background

With the continuous development of information technology and the continuous popularization of computers, the security problem of the internet is increasingly highlighted. The challenge black hole (CC) attack is one of the most common Web page attack modes, and the principle is to continuously make requests to some pages with higher resource consumption to consume server resources, so that the access speed of the Web application is slow, and even the server cannot be normally connected. The method is characterized in that attack sources IP are scattered and real, and data packets of the attack sources IP are normal request behaviors, so that CC attack cannot be detected through the data packets. In safety protection, a defense against it is indispensable.

At present, a CC attack is generally protected by using an artificial intelligence (e.g., machine learning) manner, for example, logs of normal access behaviors and CC attack behaviors are collected, statistical processing is performed on the collected logs to form sample features, a classification algorithm model is trained based on the sample features, and the trained classification algorithm model is used to detect the access behaviors of the production environment to obtain a detection result. The existing CC attack detection method based on machine learning is inaccurate in IP behavior modeling period, and characteristics of each dimension feature are not considered in a targeted manner, so that the extracted sample features are possibly inaccurate, and further the detection result is inaccurate.

Disclosure of Invention

The embodiment of the application provides a network attack detection method and device, which are used for solving the problem of inaccurate detection result in the prior art.

The embodiment of the application provides the following specific technical scheme:

in a first aspect, the present application provides a network attack detection method, where the method includes:

counting time points at which the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking samples before the time points as target samples;

determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature;

respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP;

carrying out classification training on a preset classification model by adopting sample characteristic vectors respectively corresponding to all IPs to obtain a trained classification model;

and detecting the network access behavior by adopting the trained classification model.

Optionally, the dimensional characteristics include the number of visits, the number of visited URLs, and the size of the request data source.

Optionally, the determining, based on a trend of a feature value of each dimension feature in the target sample changing with time, an ideal attack time point corresponding to each feature dimension in each dimension feature, and obtaining a time period corresponding to each dimension feature includes:

when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;

when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;

and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.

Optionally, the step of respectively extracting a first eigenvalue of each dimension characteristic of each IP in each time period and a second eigenvalue of each dimension characteristic of all IPs in each time period, and taking the first eigenvalue and the second eigenvalue corresponding to each IP as a sample eigenvector corresponding to the IP includes:

for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;

extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;

and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.

In a second aspect, the present application provides a network attack detection apparatus, including:

the statistical unit is used for counting a time point when the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking a sample before the time point as a target sample;

the determining unit is used for determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature;

the extraction unit is used for respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all the IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP;

the training unit is used for carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain a trained classification model;

and the detection unit is used for detecting the network access behavior by adopting the trained classification model.

Optionally, when the ideal attack time point corresponding to each feature dimension in each dimension feature is determined based on a trend of a feature value of each dimension feature in the target sample changing with time, and a time period corresponding to each dimension feature is obtained, the determining unit is specifically configured to:

Optionally, when the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period are respectively extracted, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as the sample eigenvector corresponding to the IP, the extraction unit is specifically configured to:

In a third aspect, the present application provides another network attack detecting device, including:

a memory for storing program instructions;

and a processor, configured to call the program instructions stored in the memory, and execute any one of the methods according to the first aspect according to the obtained program.

In a fourth aspect, the present application provides a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any of the first aspects.

The beneficial effect of this application is as follows:

in summary, the network attack detection method provided by the application counts time points at which the number of negative samples is N times the number of positive samples based on the generation time of each sample in the collected sample data, and takes samples before the generation time of the time points as target samples; determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature; respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP; carrying out classification training on a preset classification model by adopting sample characteristic vectors respectively corresponding to all IPs to obtain a trained classification model; and detecting the network access behavior by adopting the trained classification model.

By adopting the network attack detection method provided by the application, a target sample is screened from the sample data by analyzing the mode that the number of negative samples exceeds the number of positive samples in a certain proportion according to the log generation time, further, the time period for counting the characteristic value of each dimension characteristic is determined according to the characteristic that the characteristic value of each dimension characteristic is based on the time variation trend, then the sample characteristic vectors respectively corresponding to each IP extracted according to the time period respectively corresponding to each dimension characteristic are extracted, the sample characteristic vectors respectively corresponding to each extracted IP are adopted to train the preset classification model, the training effect of the classification model is improved, and the accuracy of attack detection by adopting the trained classification model is further improved.

Drawings

Fig. 1 is a schematic flowchart of a network attack detection method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a network attack detection apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of another network attack detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First, the term "and" in the embodiment of the present application is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

When the present application refers to the ordinal numbers "first", "second", "third" or "fourth", etc., it should be understood that this is done for differentiation only, unless it is clear from the context that the order is actually expressed.

The scheme of the present application will be described in detail by specific examples, but the present application is not limited to the following examples.

Exemplarily, referring to fig. 1, a schematic flow chart of a network attack detection method provided by the present application is shown, and a detailed flow chart of the network attack detection method is as follows:

step 100: and counting time points when the number of the negative samples is N times of the number of the positive samples based on the generation time of each sample in the collected sample data, and taking the sample before the time point as a target sample.

In practical application, a certain time is needed for a CC attack to be connected with an attack behavior identified, if the log time zone is analyzed to exceed the period too much, the phenomenon that the detection cannot be carried out in time so as to send out an early warning can occur, and if the log time zone is too short, whether the behavior of the IP is the CC attack or not can not be judged. And the problem that the accuracy of the classification model is not good due to the fact that the sample accuracy is not enough can exist when the analysis log area is too long or too short, and the accuracy of attack detection by adopting the classification model is not high.

In the embodiment of the application, logs of various types generated by a certain service system in a certain time period can be collected in advance, the collected logs are preprocessed, logs of access types are screened out, and the screened logs of the access types are used as sample data. Optionally, the sample data is an access behavior log. For example, logs of normal access traffic system behavior and logs of CC attack traffic system behavior may be collected by the security protection platform/management platform. It should be noted that, in the embodiment of the present application, each access behavior in sample data collected in advance is known, that is, it is known that any access behavior is a normal access service system behavior (positive sample), or a CC attack service system behavior (negative sample).

Each access behavior log in the sample data comprises the time of the corresponding access behavior, that is, the generation time of the log corresponding to the access behavior can be understood. Then, the accumulated number of the positive samples and the accumulated number of the negative samples may be statistically analyzed according to the chronological order of the log generation time, and a time point at which the accumulated number of the negative samples is N times the accumulated number of the positive samples may be determined according to the accumulated number of the positive samples and the accumulated number of the negative samples, where N may be preset according to different application scenarios. Then, a sample whose log generation time is before the determined time point is taken as a target sample.

Step 110: and determining an ideal attack time point corresponding to each feature dimension in the dimension features respectively based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature respectively.

In the embodiment of the present application, the dimensional features include, but are not limited to, the following dimensional features corresponding to the IPs: number of visits, number of URLs visited, size of the requested data source. I.e. the number of times each IP (client's IP) accesses the service system, the number of URLs each IP accesses, the size of the data source each IP requests.

In practical applications, the access frequency of each IP to the service system may be different in different time periods, the number of URLs accessed may also be different, and the size of the data source requested each time may also be different.

Then, as can be seen from the above, in the embodiment of the present application, when determining the ideal attack time point corresponding to each feature dimension in the dimension features based on the trend of the feature value of each dimension feature in the target sample changing with time, and obtaining the time period corresponding to each dimension feature, a preferred implementation manner is:

In the embodiment of the present application, when determining the time point at which the feature change speed corresponding to each dimension feature is the fastest, a preferred implementation manner is to, for each dimension feature (the number of access times, the number of accessed URLs, and the size of a request data source), draw a corresponding curve with time (second) as a vertical coordinate and an accumulated value corresponding to the dimension feature as a horizontal coordinate, and then determine the time point at which the dimension feature change speed corresponding to the curve is the fastest to obtain the time point corresponding to the dimension feature. And then, determining time periods corresponding to the dimensional features respectively according to the time point and the generation time of the earliest sample in the sample data.

Step 120: and respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP.

Further, in the embodiment of the present application, feature values of the dimensional features corresponding to the IPs and feature values of the dimensional features corresponding to all the IPs need to be extracted according to the time periods and the target samples corresponding to the dimensional features, so as to obtain sample feature vectors corresponding to the IPs.

Specifically, in the embodiment of the present application, when a first eigenvalue of each dimension characteristic of each IP in each time period and a second eigenvalue of each dimension characteristic of all IPs in each time period are respectively extracted, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as a sample eigenvector corresponding to the IP, a preferred implementation manner is as follows:

As can be seen from the above, for each IP, a sample feature vector (18-dimensional vector) corresponding to each IP can be respectively formed by extracting feature values of the dimensional features (the number of visits, the number of URLs visited, and the size of requested data) corresponding to the IP in different time periods (T1, T2, and T3) and feature values of the dimensional features of all IPs in different time periods.

For example, IP 1 corresponds to a sample feature vector of: (the number of visits IP 1 has in the past T1 seconds, the number of visits IP 1 has in the past T2 seconds, the number of visits IP 1 has in the past T3 seconds; the average number of visits all IPs have in the past T1 seconds, the average number of visits all IPs have in the past T2 seconds, the average number of visits all IPs have in the past T3 seconds; the number of URLs IP 1 has accessed in the past T1 seconds; the number of URLs IP 1 has accessed in the past T2 seconds; the average number of URLs IP 1 has accessed in the past T3 seconds; the average number of URLs all IPs have accessed in the past T1 seconds; the average number of URLs all IPs have accessed in the past T2 seconds; the average number of URLs all IP 1 has accessed in the past T3 seconds; the data size IP 1 has requested in the past T1 seconds; the data size requested by IP 1 in the past T2 seconds; the average data size requested by IP 1 in the past T3 seconds; the data size requested in the past T1 seconds; the average data size requested in the past T1 seconds; the data size requested in the past T1, average size of data requested by all IPs within the past T2 seconds, average size of data requested by all IPs within the past T3 seconds. ).

Step 130: and carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain the trained classification model.

Specifically, the sample feature vectors corresponding to the IPs may be input into a preset classification algorithm model, and the classification algorithm model is trained, where the sample feature vectors corresponding to the IPs are sample feature vectors with labels, that is, it is known which IPs are IPs that initiate CC attacks to the service system, and which IPs are IPs that normally access the service system. In this embodiment, the specific training process of the classification algorithm model is not described herein again.

In the embodiment of the application, the preset algorithm model is not limited to a multi-layer perceptron, a support vector machine, a classification algorithm model such as a decision tree, bayes, logistic regression and the like.

Step 140: and detecting the network access behavior by adopting the trained classification model.

Specifically, the newly generated log of the access type is processed, feature vectors of all IPs in the past T1, T2 and T3 seconds are extracted and input into a trained classification model, and as long as the feature vectors in a time period are judged to be CC attacks, the IP is considered to be the IP which initiates the CC attacks, the IP and a behavior log are recorded, and early warning is given out.

Based on the foregoing embodiment, referring to fig. 2, a schematic structural diagram of a network attack detection apparatus provided in the present application is shown, where the apparatus includes:

the statistical unit 20 is configured to count a time point at which the number of negative samples is N times the number of positive samples based on the generation time of each sample in the collected sample data, and take a sample before the time point as a target sample;

a determining unit 21, configured to determine, based on a trend of a feature value of each dimension feature in the target sample changing with time, an ideal attack time point corresponding to each feature dimension in each dimension feature, so as to obtain a time period corresponding to each dimension feature;

the extracting unit 22 is configured to extract a first feature value of each dimension feature of each IP in each time period and a second feature value of each dimension feature of all IPs in each time period, and use the first feature value and the second feature value corresponding to each IP as a sample feature vector corresponding to the IP;

the training unit 23 is configured to perform classification training on a preset classification model by using sample feature vectors corresponding to the IPs, respectively, to obtain a trained classification model;

and the detection unit 24 is configured to detect a network access behavior by using the trained classification model.

Optionally, when the ideal attack time point corresponding to each feature dimension in each dimension feature is determined based on a trend of a feature value of each dimension feature in the target sample changing with time, and a time period corresponding to each dimension feature is obtained, the determining unit 21 is specifically configured to:

Optionally, when the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period are respectively extracted, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as the sample eigenvector corresponding to the IP, the extraction unit 22 is specifically configured to:

Further, referring to fig. 3, the present application also provides a network attack detecting device, which includes a memory 30 and a processor 31, wherein,

a memory 30 for storing program instructions;

and a processor 31 for calling the program instructions stored in the memory 30 and executing any one of the above method embodiments according to the obtained program.

Still further, the present application provides a computer storage medium having computer-executable instructions stored thereon for causing a computer to perform any of the above-described method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims

1. A network attack detection method, the method comprising:

2. The method of claim 1, wherein the dimensional characteristics include number of visits, number of URLs visited, size of requesting data source.

3. The method according to claim 1 or 2, wherein the step of determining an ideal attack time point corresponding to each feature dimension in the dimension features based on a trend of feature values of the dimension features in the target sample over time to obtain a time period corresponding to each dimension feature comprises:

4. The method of claim 3, wherein the step of extracting the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period respectively, and using the first eigenvalue and the second eigenvalue corresponding to each IP as the sample eigenvector corresponding to the IP comprises:

5. A cyber attack detecting apparatus, the apparatus comprising:

6. The apparatus of claim 5, wherein the dimensional characteristics include number of visits, number of URLs visited, and size of requested data source.

7. The apparatus according to claim 5 or 6, wherein, when determining the ideal attack time point corresponding to each feature dimension in the dimension features based on a trend of a feature value of each dimension feature in the target sample changing with time to obtain a time period corresponding to each dimension feature, the determining unit is specifically configured to:

8. The apparatus according to claim 7, wherein when the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period are extracted respectively, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as the sample eigenvector corresponding to the IP, the extraction unit is specifically configured to:

9. A computing device, wherein the computing device comprises:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method of any one of claims 1 to 4 according to the obtained program.

10. A computer storage medium having computer-executable instructions stored thereon for causing a computer to perform the method of any one of claims 1-4.