CN107294993B - WEB abnormal traffic monitoring method based on ensemble learning - Google Patents

WEB abnormal traffic monitoring method based on ensemble learning Download PDF

Info

Publication number
CN107294993B
CN107294993B CN201710543858.6A CN201710543858A CN107294993B CN 107294993 B CN107294993 B CN 107294993B CN 201710543858 A CN201710543858 A CN 201710543858A CN 107294993 B CN107294993 B CN 107294993B
Authority
CN
China
Prior art keywords
url
model
monitoring method
traffic monitoring
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710543858.6A
Other languages
Chinese (zh)
Other versions
CN107294993A (en
Inventor
李智星
沈柯
于洪
张冠群
代南瑶
胡聪
胡峰
王进
雷大江
欧阳卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201710543858.6A priority Critical patent/CN107294993B/en
Publication of CN107294993A publication Critical patent/CN107294993A/en
Application granted granted Critical
Publication of CN107294993B publication Critical patent/CN107294993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Abstract

The invention requests to protect a WEB abnormal traffic monitoring method based on ensemble learning, which comprises five processes of data preprocessing, characteristic engineering construction, data set reconstruction, model establishment and fusion and model testing. The data preprocessing is effective information extraction on the URL data. The construction of the feature engineering is to extract and construct the URL features by adopting statistical methods such as information entropy, mutual information and the like. After the feature engineering construction is completed, the data set is adjusted according to different access properties, and the adjusted data set is input into four machine learning algorithms of XGboost, LightGBM and the like for supervised learning. And after the learner is constructed, integrating the learner by adopting a Bagging framework. And (4) reselecting the data set from the original data set to perform classification prediction, typing the label in a majority voting mode, and checking the accuracy of the model. In the process of using the model, the URL is input into the model, five sub-models in the model respectively give out respective label probabilities, and the label with the highest probability is given out as a final label.

Description

WEB abnormal traffic monitoring method based on ensemble learning
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to various statistical algorithms and machine learning algorithms.
Background
1. Network security problem in information era
Today, the scale of computer networks and the number of people using the internet are reaching unprecedented scale, and the subsequent scale is the increasingly prominent problem of network security. As the most important means for resisting network attacks, the development and upgrading of abnormal traffic monitoring are imminent. After more than twenty years of development, the research on flow monitoring has evolved into a plurality of branches, however, in practical application, the effect is not satisfactory, and the difficulties are mainly focused on the following aspects:
1) carrying out real-time monitoring on the illegal behavior mode by using a fixed rule so as to cause overhigh false alarm rate;
2) when the feature matching is applied, the feature library needs to be updated manually, and an unknown attack mode cannot be detected;
3) the detection performance of the system is greatly influenced by the huge number of rules, and the maintenance of the rule base becomes difficult to maintain;
4) when the abnormal flow detection system with the blocking function falsely detects the normal communication behavior, the normal communication can be blocked;
5) when the data storage capacity of the monitoring system has a bottleneck, the monitoring system is vulnerable to denial of service, and communication is blocked.
Based on the above problems with the abnormal flow detection system, current research on the system is mainly focused on three directions: feature matching, rule reasoning and machine learning.
2. Machine learning
In recent years, a machine learning method is increasingly applied to algorithm design of abnormal flow detection. The problems of updating of the feature library and manual maintenance of the rule library in feature matching are solved without too much manual intervention, and the automation degree is greatly improved; the method has strong adaptability to different input data, breaks through the high false alarm rate impasse of rule reasoning, and can obtain higher accuracy rate in the face of unknown attacks.
However, single machine learning does not perfectly solve the problem. The statistical method considers that all events are generated by the statistical model, and ignores the risk that the distribution model set in advance in the parameter method may not accord with the real data, so that the expected result is greatly deviated. In addition, most systems formed by statistical models work in an off-line state and cannot meet the requirement of real-time monitoring, so that very high-efficiency performance is required for achieving high accuracy; and the statistical method is very difficult to determine the threshold, and the false alarm rate can be increased due to the fact that the threshold is too high or too low.
Although the machine learning algorithm can seamlessly combine the prior and posterior knowledge and overcome the defect that the frame is not intuitive enough, the problems of noise data interference, wrong sampling method, excessive modeling variables and the like in the simple classification and clustering algorithm can cause overfitting, and a good monitoring effect cannot be achieved. The accuracy of the model depends on certain assumptions, which are reflected in the behavior patterns of the target system and the network, and the violation of the assumptions can cause a great reduction in accuracy.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The WEB abnormal traffic monitoring method based on ensemble learning effectively improves the accuracy of the original machine learning method on abnormal traffic monitoring. The technical scheme of the invention is as follows:
a WEB abnormal traffic monitoring method based on ensemble learning comprises the following steps:
1) data preprocessing: acquiring a uniform resource locator URL record, cutting and separating the uniform resource locator URL record, and extracting effective information;
2) constructing a characteristic project: respectively extracting the characteristics of common instruction attack, database attack, cross-site scripting attack and local file inclusion attack and the uniform resource locator URL of normal network access by using a statistical method;
3) and (3) data set reconstruction: aiming at the five access properties, the total data set is sorted according to respective characteristics, and the tags are adjusted to the access properties and the other access properties;
4) establishing a model: for data sets corresponding to the five access properties, four machine learning algorithms of XGboost (extreme gradient boost), Light GBM (lightweight gradient hoist), RF (random forest) and LR (logistic regression) are respectively applied to supervised learning of the data, and a bagging framework integrated learner is applied to obtain respective identification models for the five access properties;
5) and (3) testing a model: and (4) testing the partial data set reserved in advance in the step (4) and checking the accuracy of the model.
Further, the step 1) of extracting the valid URL information includes the steps of: for an unprocessed URL, firstly removing invalid data after "#"; press the remaining segments by "? Cutting; dividing file path segments by '/' and '; the query part is divided by "&" and "&"; and respectively putting the parameters and the values obtained by division into a processing function for regular matching.
Further, the processing function replaces numbers with dates and times, the career is replaced by "$ 0", a character string composed of lower case letters with the length less than 10 is changed to "s", a character string at the beginning of "Ox" with the length greater than 2 is changed to "Ox 1234", a plurality of spaces are reduced to one space, and the processed fragments are the URL information fragments required by the model.
Further, the step 2) of constructing the feature engineering specifically includes: and (3) calculating the length abnormal value P by using the Chebyshev inequality in statistics and the mean value and the variance of the length according to the length of the URL parameter value: character distribution, namely calculating an abnormal value alpha of the character distribution by using Chi-square test in statistics; enumerating types, and calculating the conditions of the input attribute values in the enumeration exception types; extracting key words, searching URL common characteristics with the same access property, after scanning all URL data, performing frequency recording on all character strings adjacent to physical positions, and performing mutual information calculation on the rest character strings after screening out character strings with too low frequency.
Further, the length abnormal value of the URL parameter value may be calculated by using the chebyshev inequality in statistics and the mean and variance of the length, and the calculation formula includes:
Figure BDA0001342552800000031
where X represents the length of the URL parameter value, μ is the length mean, σ2Is length variance, k represents the number of standard deviations;
further, the calculating the abnormal value α of the character distribution by using the chi-square test in statistics includes: for character string s1,s2,…,sn},CD(s)iIndicating the i-th probability value in CD(s), ICDiRepresenting the ith probability value in the ICD, then
Figure BDA0001342552800000041
Where i is 1,2, …, n, i.e. the ith probability value in the ICD is the mean of the ith probability values of all samples in the sample set;
Figure BDA0001342552800000042
further, the enumerating type is a condition in enumerating exception types to which the input of the calculation attribute value belongs, the functions f and g are defined, the function f is a linear increasing function, g (x) represents a sample function, when training samples are sequentially input, g is added with 1 if new samples are met, otherwise g is subtracted with 1,
f(x)=x
Figure BDA0001342552800000043
the correlation coefficient ρ of the functions f and g obtained when learning of all samples is completed can be defined by the following formula:
Figure BDA0001342552800000044
where Var (f) and Var (g) are the variances of functions f and g, respectively, and Covar (f, g) is the covariance of functions f and g.
Further, the keyword extraction mutual information shows whether the internal combination mode of the character string is tight, and the calculation formula is as follows:
Figure BDA0001342552800000045
wherein, P(s)1s2s3) Representing a character string s1s2s3Probability of occurrence, P(s)1s2)、P(s2s3) The meanings are similar.
Further, the method also comprises a step of calculating the richness degree of the left and right adjacent characters of the character string adjacent characters, wherein the richness degree of the left and right adjacent characters can be obtained by using the information entropy
Figure BDA0001342552800000046
Where p (i) represents the probability of the occurrence of the neighbourhood i of the string.
Further, the Bagging is an integrated learning framework which performs sub-sampling from a training set to form a sub-training set required by each basic model, synthesizes the prediction results of all the basic models to generate a final prediction result, reselects a data set from an original data set on the basis of a learner to perform classified prediction, finalizes a label in a majority voting mode, and simultaneously, checks the accuracy of the model.
The invention has the following advantages and beneficial effects:
the invention uses statistical method to slice URL and extract features, ensuring the integrity and reliability of feature extraction. Meanwhile, various machine learning algorithms including extremely high accuracy XGboost (extreme gradient boost), RF (random forest) and the like are integrated, high accuracy of the models in flow anomaly monitoring is guaranteed, visiting URLs are input into five models in the monitoring process to be predicted to determine whether the models are known anomalies or not, and meanwhile unknown anomalies can be identified.
Drawings
FIG. 1 is a flowchart of the overall method of the present invention in providing a preferred embodiment;
FIG. 2 is a diagram illustrating an example of the URL segmentation and extraction in the present method;
FIG. 3 is a diagram of a bagging framework integration process in the present method;
fig. 4 is a flow chart of abnormal traffic monitoring in this model.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the invention provides a model for solving abnormal flow monitoring. Fig. 1 shows a flow chart of the entire model. Preprocessing the data set, e.g. on "&The processing efficiency is improved by dividing symbols such as "" and "", and extracting effective information in the URL. FIG. 2 is an example of URL cutting. And the processed data is subjected to feature extraction through statistical methods such as mutual information, information entropy and the like. After the feature engineering construction is finished, according to the difference of access properties, data sets with different features are respectively constructed, and the replacement labels are divided into two types: current access nature, and others. At the same time, part of the data is extracted as a test set. And respectively performing machine learning on the five reconstructed data sets. Introducing the eXtreme Gradient Boosting,Light Gradient Boosting MachineThe method comprises the steps of carrying out supervised learning on a data set by four machine learning algorithms of Random Forest and logic Regression, and obtaining mutually independent recognition models aiming at different access properties through a bagging framework integrated learner. Fig. 3 is a bagging framework integration process. And (4) respectively bringing the reserved test sets into the recognition models for testing, and checking the accuracy of the models.
Important processes of the whole improved abnormal flow monitoring model comprise: extracting URL information, constructing a feature project, training a multi-algorithm learner and integrating a bagging framework.
Information extraction of URL
In order to improve the processing efficiency of the model, effective information extraction of the URL is important. For an unprocessed URL:
1) firstly, invalid data after "#" needs to be removed;
2) press the remaining segments by "? "carry out cutting
3) Dividing file path segments by '/' and ';
4) the query part is divided by "&" and "&";
and respectively putting the parameters and values obtained by the division in 3) and 4) into a processing function for regular matching. The processing function replaces the number with the date and time, the career is changed to "$ 0", the character string consisting of lower case letters less than 10 is changed to "s", the character string beginning with "Ox" greater than 2 is changed to "Ox 1234", and a plurality of spaces are reduced to one space. And the processed segment is the URL information segment required by the model.
Second, structure of characteristic engineering
It is known that the construction of feature engineering seriously affects the effectiveness and accuracy of the model.
1) Length of URL parameter value: the length outlier P can be calculated using the chebyshev inequality in statistics and the mean and variance of the length,
Figure BDA0001342552800000061
where μ is the length mean, σ2Is length variance, k represents the number of standard deviations;
2) character distribution: the abnormal value α of the character distribution is calculated using the chi-square test in statistics. For character string s1,s2,…,sn},CD(s)iIndicating the i-th probability value in CD(s), ICDiRepresenting the ith probability value in the ICD, then
Figure BDA0001342552800000071
Where i is 1,2, …, n. That is, the ith probability value in the ICD is the mean of the ith probability values of all samples in the sample set;
Figure BDA0001342552800000072
3) enumerated types: it is very common that the legal input of a certain attribute value belongs to an enumeration type, for example, the legal parameter of the "sender" attribute is "{ large, fe }", and any input that does not belong to both cases should belong to an abnormal case. Defining functions f and g, wherein the function f is a linear increasing function, and when training samples are input sequentially, g is added with 1 if new samples are met, and otherwise, g is subtracted with 1.
f(x)=x
Figure BDA0001342552800000073
The correlation coefficient ρ of the functions f and g obtained when learning of all samples is completed can be defined by the following formula:
Figure BDA0001342552800000074
where Var (f) and Var (g) are the variances of functions f and g, respectively, and Co var (f, g) is the covariance of functions f and g;
4) extracting keywords: in order to find the common features of the URLs with the same access property, it is important to perform keyword extraction on the URLs with the same access type. After scanning all URL data, recording all character strings adjacent to the physical position frequently. And (4) screening character strings with too low frequency, and then performing mutual information calculation on the rest character strings. The mutual information shows whether the internal combination mode of the character string is compact or not, and the calculation formula is as follows:
Figure BDA0001342552800000075
wherein, P(s)1s2s3) Representing a character string s1s2s3Probability of occurrence, P(s)1s2)、P(s2s3) The meanings are similar.
In addition, it is also necessary to calculate the degree of richness of the left and right adjacent characters of the character string, and the richer the left and right adjacent characters are, the more flexible the character string is in the data set, and the higher the possibility of being the kind of URL keyword is. The rich degree of the left and right adjacent characters can use informationEntropy acquisition
Figure BDA0001342552800000076
Where p (i) represents the probability of the occurrence of the neighbourhood i of the string.
Training of three, multiple algorithm learner
Before training the data, a small change in the data is required. The URL features for each access property extend into the entire data set, forming five different data sets. And meanwhile, the original tags are changed, only the tags of the access properties are reserved, and the tags of the URL data of the rest access properties are replaced by others.
The XGboost, LightGBM, RF and LR selected in the algorithm are tested to be machine learning algorithms with higher accuracy and strongest problem fitting property.
1) XGboost: the XGboost is an optimized algorithm based on lifting algorithms such as AdaBoost and GBDT, can be used for linear classification, and can be regarded as a linear regression algorithm with regularization of L1 and L2; compared with the traditional GBDT, the method has the advantages that regularization functions are added, so that the overfitting is prevented, in the aspect of a distributed algorithm, the XGboost sorts the features of each dimension in one machine and stores the features in a Block structure. So multiple feature calculations can be distributed across different machines and the final results are aggregated. Thus, the XGboost has the capability of distributed computing; because the characteristic value is only used for sorting, the abnormal characteristic value has less influence on the learning of the XGboost model; each calculation is only to select the features with the largest gradient reduction, so that the feature correlation selection problem is solved;
2) LightGBM: the LightGBM is a framework for realizing the GBDT algorithm, supports high-efficiency parallel training, has higher speed, lower memory consumption, better accuracy and better distributed support, and can quickly process mass data.
3) Random Forest is particularly suitable for multi-classification problems, has high training and predicting speed and good performance on a data set; the fault tolerance capability to the training data is strong; very high dimensional data can be processed without feature selection, namely: thousands of variables which are not deleted can be processed, and good effect is achieved when a large number of characteristics extracted by keys are processed; an internal unbiased estimate of the generalized error can be generated during the classification process; the interaction among the characteristics and the importance degree of the characteristics can be detected in the training process; overfitting cannot occur;
4) logistic Regression the idea of Logistic Regression is to divide a data set into two parts by using a hyperplane, wherein the two parts are respectively positioned at two sides of the hyperplane and belong to two different categories, and the data labeled on the URL data set with each access property is just matched when the data set is processed. FIG. 4 is a schematic diagram of two classification principles of Logistic Regression. In addition, the method has the advantages of small calculation amount in classification, high speed, extremely low storage resource and convenience in observing the probability scores of the samples.
Four, Bagging framework integration
Bagging is an integrated learning framework which forms a sub-training set required by each base model from a training set by sub-sampling and integrates the predicted results of all the base models to generate a final predicted result. On the basis of a learner, a data set is reselected from an original data set to carry out classification prediction, a label is finalized in a majority voting mode, and meanwhile, the model accuracy is checked. Since the integral model expectation of the framework is similar to the expectation of the base model, this means that the deviation of the integral model is similar to the deviation of the base model, and the variance of the integral model decreases with the increase of the number of the base models, the improvement of the overfitting capability is prevented, and the model accuracy is remarkably improved. Table 1 is a comparison table of experimental accuracy after integration of each machine learning algorithm and Bagging;
TABLE 1 model accuracy comparison Table
Figure BDA0001342552800000091
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (10)

1. A WEB abnormal traffic monitoring method based on ensemble learning is characterized by comprising the following steps:
1) data preprocessing: acquiring a uniform resource locator URL record, cutting and separating the uniform resource locator URL record, and extracting effective information;
2) constructing a characteristic project: respectively extracting the characteristics of common instruction attack, database attack, cross-site scripting attack and local file inclusion attack and the uniform resource locator URL of normal network access by using a statistical method;
3) and (3) data set reconstruction: aiming at the five access properties, the total data set is sorted according to respective characteristics, and the tags are adjusted to the access properties and the other access properties;
4) establishing a model: for data sets corresponding to the five access properties, four machine learning algorithms of XGboost extreme gradient lifting, Light GBM lightweight gradient lifting, RF random forest and LR logistic regression are respectively used for supervised learning of the data, and a bagging framework integrated learner is used for obtaining respective identification models aiming at the five access properties;
5) and (3) testing a model: testing the part of the data set reserved in advance in the step 4) and checking the accuracy of the model.
2. The WEB abnormal traffic monitoring method based on ensemble learning according to claim 1, wherein the step 1) of extracting the URL valid information includes the steps of: for an unprocessed URL, firstly removing invalid data after "#"; press the remaining segments by "? Cutting; dividing file path segments by '/' and '; the query part is divided by "&" and "&"; and respectively putting the parameters and the values obtained by division into a processing function for regular matching.
3. The integrated learning-based WEB abnormal traffic monitoring method according to claim 2, wherein the processing function replaces numbers with dates and times, the career is replaced with "$ 0", a character string consisting of lower-case letters with a length less than 10 is changed to "s", a character string at the beginning of "Ox" with a length greater than 2 is changed to "Ox 1234", a plurality of spaces are reduced to one space, and the processed fragments are URL information fragments required by the model.
4. The integrated learning-based WEB abnormal traffic monitoring method according to claim 2, wherein the step 2) of constructing a feature project specifically comprises: and (3) calculating the length abnormal value P by using the Chebyshev inequality in statistics and the mean value and the variance of the length according to the length of the URL parameter value: character distribution, namely calculating an abnormal value alpha of the character distribution by using Chi-square test in statistics; enumerating types, wherein the input of the attribute values belongs to the specific conditions in the enumerated type exceptions; extracting key words, searching URL common characteristics with the same access property, after scanning all URL data, performing frequency recording on all character strings adjacent to physical positions, and performing mutual information calculation on the rest character strings after screening out character strings with too low frequency.
5. The integrated learning-based WEB abnormal traffic monitoring method according to claim 4, wherein the length abnormal value of the URL parameter value is calculated by using Chebyshev inequality in statistics and mean and variance of the length, and the calculation formula comprises:
Figure FDA0002674113300000021
wherein X is the length of the URL parameter value, mu is the length mean value, sigma is the length mean value, and k represents the number of standard deviations;
6. the integrated learning-based WEB abnormal traffic monitoring method according to claim 4, wherein the character distribution is calculated by using Chi-square test in statisticsThe abnormal value α of the character distribution specifically includes: for character string s1,s2,…,sn},CD(s)iIndicating the i-th probability value in CD(s), ICDiRepresenting the ith probability value in the ICD, then
Figure FDA0002674113300000022
Where i is 1,2, …, n, i.e. the ith probability value in the ICD is the mean of the ith probability values of all samples in the sample set;
Figure FDA0002674113300000023
7. the integrated learning-based WEB anomaly traffic monitoring method according to claim 4, wherein the enumeration type is used for calculating which case the attribute value input belongs to the enumeration type anomaly, functions f and g are defined, the function f is a linear increasing function, g (x) represents a sample function, and when training samples are sequentially input, g is increased by 1 if new samples are encountered, otherwise g is decreased by 1;
f(x)=x
Figure FDA0002674113300000031
the correlation coefficient ρ of the functions f and g obtained when learning of all samples is completed can be defined by the following formula:
Figure FDA0002674113300000032
where Var (f) and Var (g) are the variances of functions f and g, respectively, and Covar (f, g) is the covariance of functions f and g.
8. The integrated learning-based WEB abnormal traffic monitoring method according to claim 4, wherein the keyword extraction mutual information shows whether the internal combination mode of the character string is tight, and the calculation formula is as follows:
Figure FDA0002674113300000033
wherein, P(s)1s2s3) Representing a character string s1s2s3Probability of occurrence, P(s)1s2) Representing a character string s1s2Probability of occurrence, P(s)1) Representing a character string s1Probability of occurrence, P(s)3) Representing a character string s3Probability of occurrence, P(s)2s3) Representing a character string s2s3The probability of occurrence.
9. The WEB abnormal traffic monitoring method based on ensemble learning according to claim 4, further comprising a step of calculating the richness degree of the left and right neighbourhoods of the character string neighbourhood, wherein the richness degree of the left and right neighbourhoods can be obtained by using the information entropy
Figure FDA0002674113300000034
Where p (i) represents the probability of the occurrence of the neighbourhood i of the string.
10. The integrated learning-based WEB abnormal traffic monitoring method according to one of claims 1 to 9, wherein the Bagging is an integrated learning framework that sub-samples the training set to form a sub-training set required by each base model, integrates the prediction results of all the base models to generate a final prediction result, and on the basis of a learner, reselects a data set from an original data set to perform classification prediction, finalizes tags in a majority voting manner, and simultaneously, checks the model accuracy.
CN201710543858.6A 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning Active CN107294993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710543858.6A CN107294993B (en) 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710543858.6A CN107294993B (en) 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning

Publications (2)

Publication Number Publication Date
CN107294993A CN107294993A (en) 2017-10-24
CN107294993B true CN107294993B (en) 2021-02-09

Family

ID=60100438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710543858.6A Active CN107294993B (en) 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN107294993B (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038155A (en) * 2017-12-02 2018-05-15 宝牧科技(天津)有限公司 A kind of detection method of network URL exceptions
CN107944986B (en) * 2017-12-28 2022-02-15 广东工业大学 Method, system and equipment for recommending O2O commodities
CN110086749A (en) * 2018-01-25 2019-08-02 阿里巴巴集团控股有限公司 Data processing method and device
CN108491717A (en) * 2018-03-28 2018-09-04 四川长虹电器股份有限公司 A kind of xss systems of defense and its implementation based on machine learning
CN108764568B (en) * 2018-05-28 2020-10-23 哈尔滨工业大学 Data prediction model tuning method and device based on LSTM network
CN109167753A (en) * 2018-07-23 2019-01-08 中国科学院计算机网络信息中心 A kind of detection method and device of network intrusions flow
CN109408591B (en) * 2018-10-12 2021-11-09 北京聚云位智信息科技有限公司 Decision-making distributed database system supporting SQL (structured query language) driven AI (Artificial Intelligence) and feature engineering
CN109325193B (en) * 2018-10-16 2021-02-26 杭州安恒信息技术股份有限公司 WAF normal flow modeling method and device based on machine learning
CN111444931A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Method and device for detecting abnormal access data
CN111582879A (en) * 2019-01-30 2020-08-25 浙江远图互联科技股份有限公司 Anti-fraud medical insurance identification method based on genetic algorithm
CN111600919B (en) * 2019-02-21 2023-04-07 北京金睛云华科技有限公司 Method and device for constructing intelligent network application protection system model
CN109951484B (en) * 2019-03-20 2021-01-26 四川长虹电器股份有限公司 Test method and system for attacking machine learning product
CN110046757B (en) * 2019-04-08 2022-11-29 中国人民解放军第四军医大学 Outpatient clinic volume prediction system and prediction method based on LightGBM algorithm
CN110175635B (en) * 2019-05-07 2022-08-30 南京邮电大学 OTT application program user classification method based on Bagging algorithm
CN110263539A (en) * 2019-05-15 2019-09-20 湖南警察学院 A kind of Android malicious application detection method and system based on concurrent integration study
CN110363223A (en) * 2019-06-20 2019-10-22 华南理工大学 Industrial flow data processing method, detection method, system, device and medium
CN110443274A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium
CN110415462B (en) * 2019-07-31 2021-09-03 中国工商银行股份有限公司 ATM equipment banknote adding optimization method and device
CN110598774B (en) * 2019-09-03 2023-04-07 中电长城网际安全技术研究院(北京)有限公司 Encrypted flow detection method and device, computer readable storage medium and electronic equipment
CN111104466B (en) * 2019-12-25 2023-07-28 中国长峰机电技术研究设计院 Method for quickly classifying massive database tables
CN111371794B (en) * 2020-03-09 2022-01-18 北京金睛云华科技有限公司 Shadow domain detection model, detection model establishing method, detection method and system
CN111767275B (en) * 2020-06-28 2024-04-19 北京林克富华技术开发有限公司 Data processing method and device and data processing system
CN113361597B (en) * 2021-06-04 2023-07-21 北京天融信网络安全技术有限公司 Training method and device for URL detection model, electronic equipment and storage medium
CN113469730A (en) * 2021-06-08 2021-10-01 北京化工大学 Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene
CN114915563A (en) * 2021-12-07 2022-08-16 天翼数字生活科技有限公司 Network flow prediction method and system
CN114169440A (en) * 2021-12-08 2022-03-11 北京百度网讯科技有限公司 Model training method, data processing method, device, electronic device and medium
CN113936765A (en) * 2021-12-17 2022-01-14 北京因数健康科技有限公司 Method and device for generating periodic behavior report, storage medium and electronic equipment
CN114513341B (en) * 2022-01-21 2023-09-12 上海斗象信息科技有限公司 Malicious traffic detection method, malicious traffic detection device, terminal and computer readable storage medium
CN116127236B (en) * 2023-04-19 2023-07-21 远江盛邦(北京)网络安全科技股份有限公司 Webpage web component identification method and device based on parallel structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735074A (en) * 2015-03-31 2015-06-24 江苏通付盾信息科技有限公司 Malicious URL detection method and implement system thereof
CN105024989A (en) * 2014-11-26 2015-11-04 哈尔滨安天科技股份有限公司 Malicious URL heuristic detection method and system based on abnormal port
CN106131071A (en) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 A kind of Web method for detecting abnormality and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9244931B2 (en) * 2011-10-11 2016-01-26 Microsoft Technology Licensing, Llc Time-aware ranking adapted to a search engine application
US9070046B2 (en) * 2012-10-17 2015-06-30 Microsoft Technology Licensing, Llc Learning-based image webpage index selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105024989A (en) * 2014-11-26 2015-11-04 哈尔滨安天科技股份有限公司 Malicious URL heuristic detection method and system based on abnormal port
CN104735074A (en) * 2015-03-31 2015-06-24 江苏通付盾信息科技有限公司 Malicious URL detection method and implement system thereof
CN106131071A (en) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 A kind of Web method for detecting abnormality and device

Also Published As

Publication number Publication date
CN107294993A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN107294993B (en) WEB abnormal traffic monitoring method based on ensemble learning
CN112765603B (en) Abnormity tracing method combining system log and origin graph
CN109547423B (en) WEB malicious request deep detection system and method based on machine learning
CN110826648B (en) Method for realizing fault detection by utilizing time sequence clustering algorithm
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN111143838B (en) Database user abnormal behavior detection method
CN113609770B (en) Rolling bearing RUL prediction method based on piecewise linear fitting HI and LSTM
CN114912500A (en) Unsupervised log anomaly detection method based on pre-training model
CN116318830A (en) Log intrusion detection system based on generation of countermeasure network
CN106846170B (en) Generator set trip monitoring method and monitoring device thereof
Nuiaa et al. Evolving Dynamic Fuzzy Clustering (EDFC) to Enhance DRDoS_DNS Attacks Detection Mechnism.
Xu et al. TLS-WGAN-GP: A generative adversarial network model for data-driven fault root cause location
CN112039907A (en) Automatic testing method and system based on Internet of things terminal evaluation platform
Singh et al. User behaviour based insider threat detection in critical infrastructures
Jiang et al. DOS: Diverse outlier sampling for out-of-distribution detection
CN113657443B (en) On-line Internet of things equipment identification method based on SOINN network
CN115860243A (en) Fault prediction method and system based on industrial Internet of things data
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
Jayaramulu et al. DLOT-Net: A Deep Learning Tool For Outlier Identification
CN115545125B (en) Software defect association rule network pruning method and system
CN109614489B (en) Bug report severity recognition method based on transfer learning and feature extraction
Liangguang et al. Enhanced Temporal Graph Embedding for Identifying Fraudulent Transactions on Transaction Networks
Gupta et al. A detailed Study of different Clustering Algorithms in Data Mining
CN114697108A (en) System log anomaly detection method based on ensemble learning
Wu Review of Anomaly Detection Based on Log Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant