CN111600894A - Network attack detection method and device - Google Patents

Network attack detection method and device Download PDF

Info

Publication number
CN111600894A
CN111600894A CN202010428046.9A CN202010428046A CN111600894A CN 111600894 A CN111600894 A CN 111600894A CN 202010428046 A CN202010428046 A CN 202010428046A CN 111600894 A CN111600894 A CN 111600894A
Authority
CN
China
Prior art keywords
time period
sample
dimension
time
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010428046.9A
Other languages
Chinese (zh)
Other versions
CN111600894B (en
Inventor
孙尚勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN202010428046.9A priority Critical patent/CN111600894B/en
Publication of CN111600894A publication Critical patent/CN111600894A/en
Application granted granted Critical
Publication of CN111600894B publication Critical patent/CN111600894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting a network attack. The method comprises the following steps: counting time points at which the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking samples before the time points as target samples; determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature; respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP; and carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain the trained classification model.

Description

Network attack detection method and device
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting a network attack.
Background
With the continuous development of information technology and the continuous popularization of computers, the security problem of the internet is increasingly highlighted. The challenge black hole (CC) attack is one of the most common Web page attack modes, and the principle is to continuously make requests to some pages with higher resource consumption to consume server resources, so that the access speed of the Web application is slow, and even the server cannot be normally connected. The method is characterized in that attack sources IP are scattered and real, and data packets of the attack sources IP are normal request behaviors, so that CC attack cannot be detected through the data packets. In safety protection, a defense against it is indispensable.
At present, a CC attack is generally protected by using an artificial intelligence (e.g., machine learning) manner, for example, logs of normal access behaviors and CC attack behaviors are collected, statistical processing is performed on the collected logs to form sample features, a classification algorithm model is trained based on the sample features, and the trained classification algorithm model is used to detect the access behaviors of the production environment to obtain a detection result. The existing CC attack detection method based on machine learning is inaccurate in IP behavior modeling period, and characteristics of each dimension feature are not considered in a targeted manner, so that the extracted sample features are possibly inaccurate, and further the detection result is inaccurate.
Disclosure of Invention
The embodiment of the application provides a network attack detection method and device, which are used for solving the problem of inaccurate detection result in the prior art.
The embodiment of the application provides the following specific technical scheme:
in a first aspect, the present application provides a network attack detection method, where the method includes:
counting time points at which the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking samples before the time points as target samples;
determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature;
respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP;
carrying out classification training on a preset classification model by adopting sample characteristic vectors respectively corresponding to all IPs to obtain a trained classification model;
and detecting the network access behavior by adopting the trained classification model.
Optionally, the dimensional characteristics include the number of visits, the number of visited URLs, and the size of the request data source.
Optionally, the determining, based on a trend of a feature value of each dimension feature in the target sample changing with time, an ideal attack time point corresponding to each feature dimension in each dimension feature, and obtaining a time period corresponding to each dimension feature includes:
when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;
when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;
and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.
Optionally, the step of respectively extracting a first eigenvalue of each dimension characteristic of each IP in each time period and a second eigenvalue of each dimension characteristic of all IPs in each time period, and taking the first eigenvalue and the second eigenvalue corresponding to each IP as a sample eigenvector corresponding to the IP includes:
for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;
extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;
and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.
In a second aspect, the present application provides a network attack detection apparatus, including:
the statistical unit is used for counting a time point when the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking a sample before the time point as a target sample;
the determining unit is used for determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature;
the extraction unit is used for respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all the IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP;
the training unit is used for carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain a trained classification model;
and the detection unit is used for detecting the network access behavior by adopting the trained classification model.
Optionally, the dimensional characteristics include the number of visits, the number of visited URLs, and the size of the request data source.
Optionally, when the ideal attack time point corresponding to each feature dimension in each dimension feature is determined based on a trend of a feature value of each dimension feature in the target sample changing with time, and a time period corresponding to each dimension feature is obtained, the determining unit is specifically configured to:
when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;
when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;
and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.
Optionally, when the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period are respectively extracted, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as the sample eigenvector corresponding to the IP, the extraction unit is specifically configured to:
for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;
extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;
and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.
In a third aspect, the present application provides another network attack detecting device, including:
a memory for storing program instructions;
and a processor, configured to call the program instructions stored in the memory, and execute any one of the methods according to the first aspect according to the obtained program.
In a fourth aspect, the present application provides a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any of the first aspects.
The beneficial effect of this application is as follows:
in summary, the network attack detection method provided by the application counts time points at which the number of negative samples is N times the number of positive samples based on the generation time of each sample in the collected sample data, and takes samples before the generation time of the time points as target samples; determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature; respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP; carrying out classification training on a preset classification model by adopting sample characteristic vectors respectively corresponding to all IPs to obtain a trained classification model; and detecting the network access behavior by adopting the trained classification model.
By adopting the network attack detection method provided by the application, a target sample is screened from the sample data by analyzing the mode that the number of negative samples exceeds the number of positive samples in a certain proportion according to the log generation time, further, the time period for counting the characteristic value of each dimension characteristic is determined according to the characteristic that the characteristic value of each dimension characteristic is based on the time variation trend, then the sample characteristic vectors respectively corresponding to each IP extracted according to the time period respectively corresponding to each dimension characteristic are extracted, the sample characteristic vectors respectively corresponding to each extracted IP are adopted to train the preset classification model, the training effect of the classification model is improved, and the accuracy of attack detection by adopting the trained classification model is further improved.
Drawings
Fig. 1 is a schematic flowchart of a network attack detection method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a network attack detection apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of another network attack detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
First, the term "and" in the embodiment of the present application is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
When the present application refers to the ordinal numbers "first", "second", "third" or "fourth", etc., it should be understood that this is done for differentiation only, unless it is clear from the context that the order is actually expressed.
The scheme of the present application will be described in detail by specific examples, but the present application is not limited to the following examples.
Exemplarily, referring to fig. 1, a schematic flow chart of a network attack detection method provided by the present application is shown, and a detailed flow chart of the network attack detection method is as follows:
step 100: and counting time points when the number of the negative samples is N times of the number of the positive samples based on the generation time of each sample in the collected sample data, and taking the sample before the time point as a target sample.
In practical application, a certain time is needed for a CC attack to be connected with an attack behavior identified, if the log time zone is analyzed to exceed the period too much, the phenomenon that the detection cannot be carried out in time so as to send out an early warning can occur, and if the log time zone is too short, whether the behavior of the IP is the CC attack or not can not be judged. And the problem that the accuracy of the classification model is not good due to the fact that the sample accuracy is not enough can exist when the analysis log area is too long or too short, and the accuracy of attack detection by adopting the classification model is not high.
In the embodiment of the application, logs of various types generated by a certain service system in a certain time period can be collected in advance, the collected logs are preprocessed, logs of access types are screened out, and the screened logs of the access types are used as sample data. Optionally, the sample data is an access behavior log. For example, logs of normal access traffic system behavior and logs of CC attack traffic system behavior may be collected by the security protection platform/management platform. It should be noted that, in the embodiment of the present application, each access behavior in sample data collected in advance is known, that is, it is known that any access behavior is a normal access service system behavior (positive sample), or a CC attack service system behavior (negative sample).
Each access behavior log in the sample data comprises the time of the corresponding access behavior, that is, the generation time of the log corresponding to the access behavior can be understood. Then, the accumulated number of the positive samples and the accumulated number of the negative samples may be statistically analyzed according to the chronological order of the log generation time, and a time point at which the accumulated number of the negative samples is N times the accumulated number of the positive samples may be determined according to the accumulated number of the positive samples and the accumulated number of the negative samples, where N may be preset according to different application scenarios. Then, a sample whose log generation time is before the determined time point is taken as a target sample.
Step 110: and determining an ideal attack time point corresponding to each feature dimension in the dimension features respectively based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature respectively.
In the embodiment of the present application, the dimensional features include, but are not limited to, the following dimensional features corresponding to the IPs: number of visits, number of URLs visited, size of the requested data source. I.e. the number of times each IP (client's IP) accesses the service system, the number of URLs each IP accesses, the size of the data source each IP requests.
In practical applications, the access frequency of each IP to the service system may be different in different time periods, the number of URLs accessed may also be different, and the size of the data source requested each time may also be different.
Then, as can be seen from the above, in the embodiment of the present application, when determining the ideal attack time point corresponding to each feature dimension in the dimension features based on the trend of the feature value of each dimension feature in the target sample changing with time, and obtaining the time period corresponding to each dimension feature, a preferred implementation manner is:
when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;
when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;
and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.
In the embodiment of the present application, when determining the time point at which the feature change speed corresponding to each dimension feature is the fastest, a preferred implementation manner is to, for each dimension feature (the number of access times, the number of accessed URLs, and the size of a request data source), draw a corresponding curve with time (second) as a vertical coordinate and an accumulated value corresponding to the dimension feature as a horizontal coordinate, and then determine the time point at which the dimension feature change speed corresponding to the curve is the fastest to obtain the time point corresponding to the dimension feature. And then, determining time periods corresponding to the dimensional features respectively according to the time point and the generation time of the earliest sample in the sample data.
Step 120: and respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP.
Further, in the embodiment of the present application, feature values of the dimensional features corresponding to the IPs and feature values of the dimensional features corresponding to all the IPs need to be extracted according to the time periods and the target samples corresponding to the dimensional features, so as to obtain sample feature vectors corresponding to the IPs.
Specifically, in the embodiment of the present application, when a first eigenvalue of each dimension characteristic of each IP in each time period and a second eigenvalue of each dimension characteristic of all IPs in each time period are respectively extracted, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as a sample eigenvector corresponding to the IP, a preferred implementation manner is as follows:
for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;
extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;
and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.
As can be seen from the above, for each IP, a sample feature vector (18-dimensional vector) corresponding to each IP can be respectively formed by extracting feature values of the dimensional features (the number of visits, the number of URLs visited, and the size of requested data) corresponding to the IP in different time periods (T1, T2, and T3) and feature values of the dimensional features of all IPs in different time periods.
For example, IP 1 corresponds to a sample feature vector of: (the number of visits IP 1 has in the past T1 seconds, the number of visits IP 1 has in the past T2 seconds, the number of visits IP 1 has in the past T3 seconds; the average number of visits all IPs have in the past T1 seconds, the average number of visits all IPs have in the past T2 seconds, the average number of visits all IPs have in the past T3 seconds; the number of URLs IP 1 has accessed in the past T1 seconds; the number of URLs IP 1 has accessed in the past T2 seconds; the average number of URLs IP 1 has accessed in the past T3 seconds; the average number of URLs all IPs have accessed in the past T1 seconds; the average number of URLs all IPs have accessed in the past T2 seconds; the average number of URLs all IP 1 has accessed in the past T3 seconds; the data size IP 1 has requested in the past T1 seconds; the data size requested by IP 1 in the past T2 seconds; the average data size requested by IP 1 in the past T3 seconds; the data size requested in the past T1 seconds; the average data size requested in the past T1 seconds; the data size requested in the past T1, average size of data requested by all IPs within the past T2 seconds, average size of data requested by all IPs within the past T3 seconds. ).
Step 130: and carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain the trained classification model.
Specifically, the sample feature vectors corresponding to the IPs may be input into a preset classification algorithm model, and the classification algorithm model is trained, where the sample feature vectors corresponding to the IPs are sample feature vectors with labels, that is, it is known which IPs are IPs that initiate CC attacks to the service system, and which IPs are IPs that normally access the service system. In this embodiment, the specific training process of the classification algorithm model is not described herein again.
In the embodiment of the application, the preset algorithm model is not limited to a multi-layer perceptron, a support vector machine, a classification algorithm model such as a decision tree, bayes, logistic regression and the like.
Step 140: and detecting the network access behavior by adopting the trained classification model.
Specifically, the newly generated log of the access type is processed, feature vectors of all IPs in the past T1, T2 and T3 seconds are extracted and input into a trained classification model, and as long as the feature vectors in a time period are judged to be CC attacks, the IP is considered to be the IP which initiates the CC attacks, the IP and a behavior log are recorded, and early warning is given out.
Based on the foregoing embodiment, referring to fig. 2, a schematic structural diagram of a network attack detection apparatus provided in the present application is shown, where the apparatus includes:
the statistical unit 20 is configured to count a time point at which the number of negative samples is N times the number of positive samples based on the generation time of each sample in the collected sample data, and take a sample before the time point as a target sample;
a determining unit 21, configured to determine, based on a trend of a feature value of each dimension feature in the target sample changing with time, an ideal attack time point corresponding to each feature dimension in each dimension feature, so as to obtain a time period corresponding to each dimension feature;
the extracting unit 22 is configured to extract a first feature value of each dimension feature of each IP in each time period and a second feature value of each dimension feature of all IPs in each time period, and use the first feature value and the second feature value corresponding to each IP as a sample feature vector corresponding to the IP;
the training unit 23 is configured to perform classification training on a preset classification model by using sample feature vectors corresponding to the IPs, respectively, to obtain a trained classification model;
and the detection unit 24 is configured to detect a network access behavior by using the trained classification model.
Optionally, the dimensional characteristics include the number of visits, the number of visited URLs, and the size of the request data source.
Optionally, when the ideal attack time point corresponding to each feature dimension in each dimension feature is determined based on a trend of a feature value of each dimension feature in the target sample changing with time, and a time period corresponding to each dimension feature is obtained, the determining unit 21 is specifically configured to:
when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;
when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;
and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.
Optionally, when the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period are respectively extracted, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as the sample eigenvector corresponding to the IP, the extraction unit 22 is specifically configured to:
for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;
extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;
and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.
Further, referring to fig. 3, the present application also provides a network attack detecting device, which includes a memory 30 and a processor 31, wherein,
a memory 30 for storing program instructions;
and a processor 31 for calling the program instructions stored in the memory 30 and executing any one of the above method embodiments according to the obtained program.
Still further, the present application provides a computer storage medium having computer-executable instructions stored thereon for causing a computer to perform any of the above-described method embodiments.
In summary, the network attack detection method provided by the application counts time points at which the number of negative samples is N times the number of positive samples based on the generation time of each sample in the collected sample data, and takes samples before the generation time of the time points as target samples; determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature; respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP; carrying out classification training on a preset classification model by adopting sample characteristic vectors respectively corresponding to all IPs to obtain a trained classification model; and detecting the network access behavior by adopting the trained classification model.
By adopting the network attack detection method provided by the application, a target sample is screened from the sample data by analyzing the mode that the number of negative samples exceeds the number of positive samples in a certain proportion according to the log generation time, further, the time period for counting the characteristic value of each dimension characteristic is determined according to the characteristic that the characteristic value of each dimension characteristic is based on the time variation trend, then the sample characteristic vectors respectively corresponding to each IP extracted according to the time period respectively corresponding to each dimension characteristic are extracted, the sample characteristic vectors respectively corresponding to each extracted IP are adopted to train the preset classification model, the training effect of the classification model is improved, and the accuracy of attack detection by adopting the trained classification model is further improved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (10)

1. A network attack detection method, the method comprising:
counting time points at which the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking samples before the time points as target samples;
determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature;
respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP;
carrying out classification training on a preset classification model by adopting sample characteristic vectors respectively corresponding to all IPs to obtain a trained classification model;
and detecting the network access behavior by adopting the trained classification model.
2. The method of claim 1, wherein the dimensional characteristics include number of visits, number of URLs visited, size of requesting data source.
3. The method according to claim 1 or 2, wherein the step of determining an ideal attack time point corresponding to each feature dimension in the dimension features based on a trend of feature values of the dimension features in the target sample over time to obtain a time period corresponding to each dimension feature comprises:
when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;
when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;
and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.
4. The method of claim 3, wherein the step of extracting the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period respectively, and using the first eigenvalue and the second eigenvalue corresponding to each IP as the sample eigenvector corresponding to the IP comprises:
for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;
extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;
and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.
5. A cyber attack detecting apparatus, the apparatus comprising:
the statistical unit is used for counting a time point when the number of negative samples is N times of the number of positive samples based on the generation time of each sample in the collected sample data, and taking a sample before the time point as a target sample;
the determining unit is used for determining an ideal attack time point corresponding to each feature dimension in each dimension feature based on the trend of the feature value of each dimension feature in the target sample changing along with time to obtain a time period corresponding to each dimension feature;
the extraction unit is used for respectively extracting a first characteristic value of each dimension characteristic of each IP in each time period and a second characteristic value of each dimension characteristic of all the IPs in each time period, and taking the first characteristic value and the second characteristic value corresponding to each IP as a sample characteristic vector corresponding to the IP;
the training unit is used for carrying out classification training on the preset classification model by adopting the sample characteristic vectors respectively corresponding to the IPs to obtain a trained classification model;
and the detection unit is used for detecting the network access behavior by adopting the trained classification model.
6. The apparatus of claim 5, wherein the dimensional characteristics include number of visits, number of URLs visited, and size of requested data source.
7. The apparatus according to claim 5 or 6, wherein, when determining the ideal attack time point corresponding to each feature dimension in the dimension features based on a trend of a feature value of each dimension feature in the target sample changing with time to obtain a time period corresponding to each dimension feature, the determining unit is specifically configured to:
when the dimensionality characteristic is the access frequency, respectively counting accumulated values of the access frequency in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a first time point with the highest change speed of the access frequency, and taking the first time point as an ideal attack time point corresponding to the access frequency to obtain a first time period T1 corresponding to the access frequency;
when the dimension characteristic is the number of accessed URLs, respectively counting the accumulated value of the number of accessed URLs in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a second time point with the fastest change speed of the number of accessed URLs, and taking the second time point as an ideal attack time point corresponding to the number of accessed URLs to obtain a second time period T2 corresponding to the number of accessed URLs;
and when the dimension characteristic is the size of the request data source, respectively counting the accumulated value of the size of the request data source in the target sample by taking seconds as a unit based on the generation time of each sample in the target sample, determining a third time point at which the size of the request data source changes at the highest speed, and taking the third time point as an ideal attack time point corresponding to the size of the request data source to obtain a third time period T3 corresponding to the size of the request data source.
8. The apparatus according to claim 7, wherein when the first eigenvalue of each dimension characteristic of each IP in each time period and the second eigenvalue of each dimension characteristic of all IPs in each time period are extracted respectively, and the first eigenvalue and the second eigenvalue corresponding to each IP are used as the sample eigenvector corresponding to the IP, the extraction unit is specifically configured to:
for each IP included in the target sample, performing the following operations: extracting the number of visits, the number of accessed URLs and the requested data size of the IP in a first time period T1, a second time period T2 and a third time period T3;
extracting the average visit times, the average visited URL number and the average requested data size of all the IPs contained in the target sample in a first time period T1, a second time period T2 and a third time period T3;
and taking the number of visits, the size of the data of the visited URLs and the average number of visits, the number of the average visited URLs and the size of the data of the average request corresponding to any IP in a first time period T1, a second time period T2 and a third time period T3 as sample feature vectors corresponding to the IP.
9. A computing device, wherein the computing device comprises:
a memory for storing program instructions;
a processor for calling the program instructions stored in the memory and executing the method of any one of claims 1 to 4 according to the obtained program.
10. A computer storage medium having computer-executable instructions stored thereon for causing a computer to perform the method of any one of claims 1-4.
CN202010428046.9A 2020-05-20 2020-05-20 Network attack detection method and device Active CN111600894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010428046.9A CN111600894B (en) 2020-05-20 2020-05-20 Network attack detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010428046.9A CN111600894B (en) 2020-05-20 2020-05-20 Network attack detection method and device

Publications (2)

Publication Number Publication Date
CN111600894A true CN111600894A (en) 2020-08-28
CN111600894B CN111600894B (en) 2023-05-16

Family

ID=72189778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010428046.9A Active CN111600894B (en) 2020-05-20 2020-05-20 Network attack detection method and device

Country Status (1)

Country Link
CN (1) CN111600894B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101819A (en) * 2020-10-28 2020-12-18 平安国际智慧城市科技股份有限公司 Food risk prediction method, device, equipment and storage medium
CN113536302A (en) * 2021-07-26 2021-10-22 北京计算机技术及应用研究所 Interface caller safety rating method based on deep learning
CN114499917A (en) * 2021-10-25 2022-05-13 中国银联股份有限公司 CC attack detection method and CC attack detection device
US20230052533A1 (en) * 2020-03-05 2023-02-16 Aetna Inc. Systems and methods for identifying access anomalies using network graphs

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770699A (en) * 2004-11-01 2006-05-10 中兴通讯股份有限公司 Network safety pre-warning method
CN107888616A (en) * 2017-12-06 2018-04-06 北京知道创宇信息技术有限公司 The detection method of construction method and Webshell the attack website of disaggregated model based on URI
CN107896217A (en) * 2017-11-28 2018-04-10 重庆邮电大学 The caching pollution attack detection method of multi-parameter in content center network
CN107920062A (en) * 2017-11-03 2018-04-17 北京知道创宇信息技术有限公司 A kind of construction method and computing device of service logic Attack Detection Model Based
US20180115568A1 (en) * 2016-10-21 2018-04-26 Neusoft Corporation Method and device for detecting network intrusion
EP3379772A1 (en) * 2016-02-24 2018-09-26 Nippon Telegraph And Telephone Corporation Analysis method, analysis device, and analysis program
CN108900542A (en) * 2018-08-10 2018-11-27 海南大学 Ddos attack detection method and device based on LSTM prediction model
CN109951491A (en) * 2019-03-28 2019-06-28 腾讯科技(深圳)有限公司 Network attack detecting method, device, equipment and storage medium
CN109995770A (en) * 2019-03-19 2019-07-09 中国民航大学 A kind of LDoS attack detection method based on queue distribution
CN110113226A (en) * 2019-04-16 2019-08-09 新华三信息安全技术有限公司 A kind of method and device of detection device exception

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770699A (en) * 2004-11-01 2006-05-10 中兴通讯股份有限公司 Network safety pre-warning method
EP3379772A1 (en) * 2016-02-24 2018-09-26 Nippon Telegraph And Telephone Corporation Analysis method, analysis device, and analysis program
US20180115568A1 (en) * 2016-10-21 2018-04-26 Neusoft Corporation Method and device for detecting network intrusion
CN107920062A (en) * 2017-11-03 2018-04-17 北京知道创宇信息技术有限公司 A kind of construction method and computing device of service logic Attack Detection Model Based
CN107896217A (en) * 2017-11-28 2018-04-10 重庆邮电大学 The caching pollution attack detection method of multi-parameter in content center network
CN107888616A (en) * 2017-12-06 2018-04-06 北京知道创宇信息技术有限公司 The detection method of construction method and Webshell the attack website of disaggregated model based on URI
CN108900542A (en) * 2018-08-10 2018-11-27 海南大学 Ddos attack detection method and device based on LSTM prediction model
CN109995770A (en) * 2019-03-19 2019-07-09 中国民航大学 A kind of LDoS attack detection method based on queue distribution
CN109951491A (en) * 2019-03-28 2019-06-28 腾讯科技(深圳)有限公司 Network attack detecting method, device, equipment and storage medium
CN110113226A (en) * 2019-04-16 2019-08-09 新华三信息安全技术有限公司 A kind of method and device of detection device exception

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张斌;刘自豪;董书琴;李立勋;: "基于偏二叉树SVM多分类算法的应用层DDoS检测方法" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230052533A1 (en) * 2020-03-05 2023-02-16 Aetna Inc. Systems and methods for identifying access anomalies using network graphs
US11848952B2 (en) * 2020-03-05 2023-12-19 Aetna Inc. Systems and methods for identifying access anomalies using network graphs
CN112101819A (en) * 2020-10-28 2020-12-18 平安国际智慧城市科技股份有限公司 Food risk prediction method, device, equipment and storage medium
CN113536302A (en) * 2021-07-26 2021-10-22 北京计算机技术及应用研究所 Interface caller safety rating method based on deep learning
CN114499917A (en) * 2021-10-25 2022-05-13 中国银联股份有限公司 CC attack detection method and CC attack detection device
CN114499917B (en) * 2021-10-25 2024-01-09 中国银联股份有限公司 CC attack detection method and CC attack detection device

Also Published As

Publication number Publication date
CN111600894B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN111600894A (en) Network attack detection method and device
EP2691848B1 (en) Determining machine behavior
CN108092962B (en) Malicious URL detection method and device
US11481492B2 (en) Method and system for static behavior-predictive malware detection
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
CN107241296B (en) Webshell detection method and device
US10284570B2 (en) System and method to detect threats to computer based devices and systems
CN108881294A (en) Attack source IP portrait generation method and device based on attack
WO2017003593A1 (en) Customized network traffic models to detect application anomalies
CN110855648B (en) Early warning control method and device for network attack
CN109525551A (en) A method of the CC based on statistical machine learning attacks protection
CN103631787A (en) Webpage type recognition method and webpage type recognition device
CN111049783A (en) Network attack detection method, device, equipment and storage medium
TWI701932B (en) Identity authentication method, server and client equipment
CN106998336B (en) Method and device for detecting user in channel
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
Abubaker et al. Exploring permissions in android applications using ensemble-based extra tree feature selection
CN110955890B (en) Method and device for detecting malicious batch access behaviors and computer storage medium
Zuo Defense of Computer Network Viruses Based on Data Mining Technology.
CN112565164A (en) Dangerous IP identification method, dangerous IP identification device and computer readable storage medium
Suhuan et al. Android malware detection based on logistic regression and XGBoost
CN111382432A (en) Malicious software detection and classification model generation method and device
US10990762B2 (en) Chat analysis using machine learning
CN111787002A (en) Method and system for analyzing service data network security
CN111563276B (en) Webpage tampering detection method, detection system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant