CN107294993A - A kind of WEB abnormal flow monitoring methods based on integrated study - Google Patents

A kind of WEB abnormal flow monitoring methods based on integrated study Download PDF

Info

Publication number
CN107294993A
CN107294993A CN201710543858.6A CN201710543858A CN107294993A CN 107294993 A CN107294993 A CN 107294993A CN 201710543858 A CN201710543858 A CN 201710543858A CN 107294993 A CN107294993 A CN 107294993A
Authority
CN
China
Prior art keywords
url
data
character string
length
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710543858.6A
Other languages
Chinese (zh)
Other versions
CN107294993B (en
Inventor
李智星
沈柯
于洪
张冠群
代南瑶
胡聪
胡峰
王进
雷大江
欧阳卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201710543858.6A priority Critical patent/CN107294993B/en
Publication of CN107294993A publication Critical patent/CN107294993A/en
Application granted granted Critical
Publication of CN107294993B publication Critical patent/CN107294993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is claimed a kind of WEB abnormal flow monitoring methods based on integrated study, including data prediction, construction feature engineering, Data set reconstruction, and the foundation of model is with merging and five processes of model measurement.Data prediction is to carry out effective information extraction to url data.The structure of Feature Engineering is extraction and the structure that URL features are carried out using statistical methods such as comentropy, mutual informations.After the completion of Feature Engineering is built, for different access property, data set is adjusted, supervised learning is carried out in four kinds of machine learning algorithms such as input XGBoost, LightGBM.After learner construction complete, using Bagging framework integrated study devices.Choose data set again on raw data set and carry out classification prediction, label, testing model accuracy rate are decided in the way of majority ballot.Using in model process, by URL input models, five submodels in model can provide respective label probability respectively, and probability highest label is provided as final label.

Description

A kind of WEB abnormal flow monitoring methods based on integrated study
Technical field
The invention belongs to machine learning techniques field, and in particular to a variety of statistical algorithms and machine learning algorithm, this calculation Method employs new feature extraction mode, and carrying out novelty to statistics and machine learning algorithm merges, and realizes to WEB exception streams The monitoring of amount.
Background technology
1st, the network security problem of information age
In today of information huge explosion, the scale and internet number of users of computer network have all reached unprecedented scale, And come one after another, it is highlighting further for network security problem.It is used as the main means for resisting network attack, abnormal flow prison The research and development of survey are extremely urgent with upgrading.By the development of more than 20 years, the research of flow monitoring evolved multiple branches, but In actual applications, effect is but and not fully up to expectations, and its difficult point is concentrated mainly on following several aspects:
1) unlawful practice pattern is carried out into monitoring in real time with unalterable rules causes rate of false alarm too high;
2) when with characteristic matching, feature database needs manual update, it is impossible to detect unknown attack mode;
3) huge regular quantity causes system detectio performance to receive very big influence, and the maintenance of rule base becomes to be difficult to Safeguard;
4) the abnormal traffic detection system with block function is in flase drop proper communication behavior, and proper communication can be hindered It is disconnected;
5) when monitoring system data storage capacities have bottleneck, Denial of Service attack is subject to, communication will be blocked.
Problem above is had based on abnormal traffic detection system, currently three sides are concentrated mainly on the systematic research Upwards:Characteristic matching, rule-based reasoning and machine learning.
2nd, machine learning
In recent years, the method for machine learning is more and more applied to the algorithm design of abnormal traffic detection.It is not required to Want too many manual intervention to solve the manpower maintenance issues of the renewal of feature database and rule base in characteristic matching, substantially increase certainly Dynamicization degree;To the strong adaptability of different input datas, the high rate of false alarm deadlock of rule-based reasoning is broken, in face of unknown attack Higher accuracy rate can be obtained.
However, single machine learning can not perfectly solve problem.Statistical method therein thinks all events all Produced by statistical model, this method have ignored what the distributed model being previously set in parametric technique may not be inconsistent with True Data Risk, so as to produce very large deviation with expected results.The system that other statistical model is constituted works under off-line state mostly, nothing Method meets the requirement monitored in real time, thus to reach the very efficient performance of high-accuracy needs;And statistical method is for threshold value Determination it is extremely difficult, threshold value is too high, it is too low can all cause can cause the rising of rate of failing to report.
And machine learning algorithm is by priori aposterior knowledge seamless combination although can overcome framework not enough intuitively shortcoming, so And simple classification, clustering algorithm due to noise data interference, methods of sampling mistake, excessive modeling variable the problems such as can cause Fitting, can not reach good monitoring effect.And the accuracy of model need to rely on certain it is assumed that these hypothesis are to be embodied in In goal systems, the behavior pattern of network, the significantly decline of accuracy rate will be caused with assuming to run counter to.
The content of the invention
Present invention seek to address that above problem of the prior art.Propose one kind and effectively improve former machine learning method pair The WEB abnormal flow monitoring method methods based on integrated study of the accuracy rate of abnormal flow monitoring.Technical scheme It is as follows:
A kind of WEB abnormal flow monitoring methods based on integrated study, it comprises the following steps:
1) data prediction:Uniform resource position mark URL record is obtained, and progress is recorded to uniform resource position mark URL Cutting separation, extracts effective information;
2) construction feature engineering:With statistical method to common instruction attack, database attack, cross-site scripting attack Carry out the extraction of feature respectively comprising the uniform resource position mark URL that attack and proper network are accessed with local file;
3) Data set reconstruction:For five kinds of access properties, total data set is arranged according to respective feature respectively, will be marked Label be adjusted to the access property and other;
4) model is set up:To five kinds of data sets accessed corresponding to property, with XGBoost, (extreme gradient is carried respectively Rise), Light GBM (lightweight gradient elevator), RF (random forest), four kinds of machine learning algorithm logarithms of LR (logistic regression) According to supervised learning is carried out, with bagging framework integrated study devices, obtain for this five kinds access respective identification moulds of property Type;
5) model measurement:The partial data collection of advance reservation in step 4 is tested, testing model accuracy rate.
Further, the step 1) URL effective informations extraction include step:For a untreated URL:First Remove the invalid data after " # ";By rest segment by "" cut;Sub-argument goes out file path fragment, is drawn with "/" with "=" Point;Query portion is divided with " & " with "=";Parameter obtained by division is respectively put into progress canonical in processing function with value and matched.
Further, the processing function can replace numeral with date and time, and disorderly symbol is replaced by that " $ 0 ", length is less than The character string of 10 lowercase composition is changed to " s ", and the character string that " Ox " that length is more than 2 starts is changed to " Ox1234 ", multiple Space is condensed to a space, and the fragment after being disposed is the URL information fragment that model needs.
Further, the step 2) construction feature engineering specifically includes:The length of URL parameter value, using in statistics Chebyshev inequality, and average and the variance of length calculate the exceptional value P of length:Character is distributed, and utilizes statistics In Chi-square Test calculating character distribution exceptional value α;Enumeration type, is enumerated in Exception Type belonging to the input of computation attribute value Situation;Keyword abstraction, finds the identical URL common traits for accessing property, after all url datas are scanned, to property Manage the adjacent character string in position and carry out frequency record, mutual information meter is done to remaining character string after screening out the too low character string of the frequency Calculate.
Further, the length exceptional value of the URL parameter value, utilizes the Chebyshev inequality and length in statistics The average of degree can calculate the exceptional value P of length with variance, and calculation formula includes:
Wherein X represents the length of URL parameter value, and μ is length average, σ2For length variance, k represents standard deviation number;
Further, the character distribution is specific using the exceptional value α of the Chi-square Test calculating character distribution in statistics Including:For character string { s1,s2,…,sn},CD(s)iRepresent i-th of probable value in CD (s), ICDiRepresent i-th in ICD Individual probable value, thenI-th of probable value in wherein i=1,2 ..., n, i.e. ICD is institute in sample set There is the average of i-th of probable value of sample distribution;
Further, the enumeration type, the situation in Exception Type is enumerated belonging to the input of computation attribute value, described fixed Adopted function f and g, function f are linear increasing functions, and g (x) represents sample function, when sequentially inputting training sample, if running into Then g adds 1 to new samples, and otherwise g subtracts 1,
F (x)=x
The function f and g that are obtained after all samples all learn to terminate correlation coefficient ρ can be defined by following formula:
Wherein Var (f) and Var (g) are function f and g variance respectively, and Covar (f, g) is function f and g covariance.
Further, the keyword abstraction mutual information embodies whether character string internal combustion mode is close, and it is calculated Formula is as follows:
Wherein, P (s1s2s3) represent character string s1s2s3The probability of appearance, P (s1s2)、P(s2s3) implication is similar.
Further, in addition to the step of the adjacent word in left and right of the adjacent word of calculating character string enriches degree, the adjacent word in its left and right is rich Rich degree can be obtained with use information entropyWherein P (i) represents what the adjacent word i of the character string occurred Probability.
Further, the Bagging is that the son carried out from training set required for sub-sample constitutes each basic mode type is instructed Practice collection, the result to all base model predictions carries out integrating the final integrated study framework predicted the outcome of generation, in learner On the basis of, choose data set again from raw data set and carry out classification prediction, decide label in the way of majority ballot, together When, testing model accuracy rate.
Advantages of the present invention and have the beneficial effect that:
The present invention uses statistical method, URL is cut into slices, feature extraction, it is ensured that the integrality of feature extraction with Reliability.Integrated a variety of machine learning algorithms, including the high XGBoost of accuracy rate (extreme gradient lifting), RF are (at random simultaneously Forest) etc., it is ensured that model carries out high accuracy during Traffic Anomaly monitoring, and visiting URL is inputted into five moulds in monitoring process It is predicted to identify whether be known exception in type, while unknown exception can also be identified.
Brief description of the drawings
Fig. 1 is the method overall flow figure that the present invention provides preferred embodiment;
Fig. 2 is to URL cut and extract exemplary plot in this method;
Fig. 3 is this method bagging framework integrating process figures;
Fig. 4 is abnormal flow monitoring flow chart under this model.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed Carefully describe.Described embodiment is only a part of embodiment of the present invention.
The present invention solve above-mentioned technical problem technical scheme be:
The present invention proposes a model for being used to solve abnormal flow monitoring.Fig. 1 show the flow chart of whole model. Data set is pre-processed, such as " ", "=" symbol split, the effective information in URL is extracted, to improve processing Efficiency.Fig. 2 is that URL cuts example.Data after processing carry out feature extraction by statistical methods such as mutual information, comentropies. After Feature Engineering construction is finished, according to the difference for accessing property, the data set of different characteristic is constructed respectively, and it is two to change label Class:Current accessed property and other.At the same time, extract partial data and be used as test set.To five data sets after reconstruct point Machine learning is not carried out.Introducing eXtreme Gradient Boosting,Light Gradient Boosting Machine、 Random Forest, tetra- kinds of machine learning algorithms of Logistic Regression carry out supervised learning to data set, and lead to Bagging framework integrated study devices are crossed, the separate identification model for different access property is obtained.Fig. 3 is bagging Framework integrating process.Reserved test set is brought into identification model respectively and tested, testing model accuracy.
The significant process of whole improved abnormal flow monitoring model includes:URL information extraction, the structure of Feature Engineering Make, the training of many Algorithm Learning devices, bagging frameworks are integrated.
First, URL information extraction
In order to improve the treatment effeciency of model, the effective information extraction to URL is most important.It is untreated for one URL:
1) need to remove the invalid data after " # " first;
2) by rest segment by "" cut
3) sub-argument goes out file path fragment, is divided with "/" with "=";
4) query portion is divided with " & " with "=";
Parameter obtained by 3), 4) dividing is respectively put into progress canonical in processing function with value and matched.Handling function can be by Numeral replaced with date and time, disorderly symbol be replaced by " $ 0 ", length be less than 10 lowercase constitute character string be changed to " s ", The character string that " Ox " that length is more than 2 starts is changed to " Ox1234 ", and multiple spaces are condensed to a space.Fragment after being disposed The URL information fragment that as model needs.
2nd, the construction of Feature Engineering
It is well known that the construction of Feature Engineering drastically influence the validity and accuracy rate of model.
1) length of URL parameter value:Can using the Chebyshev inequality and the average of length in statistics and variance To calculate the exceptional value P of length,
Wherein μ is length average, σ2For length variance, k represents standard deviation number;
2) character is distributed:Utilize the exceptional value α of the Chi-square Test calculating character distribution in statistics.For character string { s1, s2,…,sn},CD(s)iRepresent i-th of probable value in CD (s), ICDiI-th of probable value in ICD is represented, thenWherein i=1,2 ..., n.That is i-th of probable value in ICD is all sample distributions in sample set The average of i-th of probable value;
3) enumeration type:The situation that the legal input of some property value belongs to enumeration type is very universal, for example The legal parameters of " gender " attribute are " { male, female } ", and any input for being not belonging to both of these case should all belong to Abnormal conditions.Defined function f and g, function f is linear increasing function, when sequentially inputting training sample, if running into new samples Then g adds 1, and otherwise g subtracts 1.
F (x)=x
The function f and g that are obtained after all samples all learn to terminate correlation coefficient ρ can be defined by following formula:
Wherein Var (f) and Var (g) are function f and g variance respectively, and Co var (f, g) are function f and g covariances;
4) keyword abstraction:In order to find the URL common traits of identical access property, the URL of same access type is closed Keyword is extracted and is particularly important.After all url datas are scanned, the character string adjacent to all physical locations carries out frequency note Record.Mutual information calculating is done to remaining character string after screening out the too low character string of the frequency.Mutual information embodies character string internal combustion Whether mode is close, and its calculation formula is as follows:
Wherein, P (s1s2s3) represent character string s1s2s3The probability of appearance, P (s1s2)、P(s2s3) implication is similar.
In addition it is also necessary to which the adjacent word in left and right of calculating character string neighbour's word enriches degree, left and right neighbour's word is abundanter, and the character string exists It is more flexible in data set, it is that the possibility of this kind of URL keyword is bigger.The abundant degree of the adjacent word in its left and right can use letter Entropy is ceased to obtainWherein P (i) represents the probability that the adjacent word i of the character string occurs.
3rd, the training of many Algorithm Learning devices
, it is necessary to which data are done with a little change before training data.URL features for every kind of access property are expanded to entirely In data set, five different data sets are formed.Change former label simultaneously, only retain the label of the access property, residue is accessed The label of the url data of property is all replaced with other.
XGBoost, LightGBM, RF, LR on selected by algorithm, by test, are that accuracy rate is higher, are pasted with problem Conjunction property most strong machine learning algorithm.
1)XGBoost:XGBoost is the algorithm being optimized on the basis of the boosting algorithms such as AdaBoost and GBDT, Available for linear classification, the linear regression algorithm with L1 and L2 regularizations can be regarded as;The regularization more than traditional GBDT Function thus lifted in terms of preventing over-fitting it is a lot, in terms of distributed algorithm, XGBoost can exist the feature of every dimension It is ranked up, and is stored in Block structures in one machine.Held so multiple feature calculations can be distributed in different machines OK, end product collects.XGBoost is so caused to be provided with the ability that distribution is calculated;Because characteristic value is finally simply used in Sequence, so characteristic value influences less to XGBoost model learnings;Simply the reduction of selection gradient is maximum for each calculating Feature is so feature correlation select permeability is also solved;
2)LightGBM:LightGBM is a framework for realizing GBDT algorithms, supports efficient parallel training, and Possess faster speed, lower memory consumption, preferably more preferable accuracy rate, distributed support, can quickly handle magnanimity Data.
3)Random Forest:Random Forest are particularly suitable to do many classification problems, and training and predetermined speed are fast, Showed on data set good;Fault-tolerant ability to training data is strong;It can handle very high-dimensional data, and it goes without doing feature Selection, i.e.,:The thousands of variable do not deleted can be handled, is played in the big measure feature that processing is gone out with key extracted Good effectiveness;The inside unbiased esti-mator of an extensive error can be generated during classification;It can train The importance degree of influencing each other between feature and feature is detected in journey;It is not in overfitting;
4)Logistic Regression:The thought of logistic regression is that data set is divided into two parts with a hyperplane, This two parts respectively be located at hyperplane both sides, and belong to two it is different classes of, just suiting will be every kind of in processing data collection Access the data that the URL data set of property labels again.Fig. 4 is the principle of classification schematic diagrames of Logistic Regression two. In addition, amount of calculation is very small during its classification, quickly, storage resource is extremely low for speed, and is easy to observation sample probability score.
4th, Bagging frameworks are integrated
Bagging is a kind of from training set from the sub- training set carried out required for sub-sample constitutes each basic mode type, to institute The result for having base model prediction, which integrate, produces the final integrated study framework predicted the outcome.On the basis of learner, Choose data set again from raw data set and carry out classification prediction, label is decided in the way of majority ballot, meanwhile, examine mould Type accuracy rate.Because the block mold of the framework is expected to be similar to the expectation of basic mode type, this also implies that the inclined of block mold Difference is approximate with the deviation of basic mode type, while the variance of block mold can increasing and reduce with base pattern number, it is therefore prevented that cross plan The enhancing of conjunction ability, model accuracy rate can be significantly improved.Table 1 is that each machine learning algorithm and the integrated rear experiments of Bagging are accurate The rate table of comparisons;
The model accuracy rate table of comparisons of table 1
The above embodiment is interpreted as being merely to illustrate the present invention rather than limited the scope of the invention. After the content for the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims (10)

1. a kind of WEB abnormal flow monitoring methods based on integrated study, it is characterised in that comprise the following steps:
1) data prediction:Uniform resource position mark URL record is obtained, and uniform resource position mark URL record is cut Separation, extracts effective information;
2) construction feature engineering:With statistical method to common instruction attack, database attack, cross-site scripting attack and sheet Ground file carries out the extraction of feature comprising the uniform resource position mark URL that attack and proper network are accessed respectively;
3) Data set reconstruction:For five kinds of access properties, total data set is arranged according to respective feature respectively, label is adjusted It is whole for the access property and other;
4) model is set up:To five kinds of data sets accessed corresponding to property, respectively with the extreme gradient liftings of XGBoost, Light GBM lightweight gradients elevator, RF random forests, four kinds of machine learning algorithms of LR logistic regressions carry out having supervision to learn to data Practise, with bagging framework integrated study devices, obtain for this five kinds access respective identification models of property;
5) model measurement:To step 4) in the partial data collection of advance reservation test, testing model accuracy rate.
2. the WEB abnormal flow monitoring methods according to claim 1 based on integrated study, it is characterised in that the step The extraction of rapid 1) URL effective informations includes step:For a untreated URL:The invalid data after " # " is removed first;Will Rest segment by "" cut;Sub-argument goes out file path fragment, is divided with "/" with "=";Query portion is with " & " and "=" Divide;Parameter obtained by division is respectively put into progress canonical in processing function with value and matched.
3. the WEB abnormal flow monitoring methods according to claim 2 based on integrated study, it is characterised in that the place Reason function can replace numeral with date and time, and disorderly symbol is replaced by " $ 0 ", the character that lowercase of the length less than 10 is constituted Falsification is " s ", and the character string that " Ox " that length is more than 2 starts is changed to " Ox1234 ", and multiple spaces are condensed to a space, have handled Fragment after finishing is the URL information fragment that model needs.
4. the WEB abnormal flow monitoring methods according to claim 2 based on integrated study, it is characterised in that the step Rapid 2) construction feature engineering is specifically included:The length of URL parameter value, utilizes the Chebyshev inequality in statistics, Yi Jichang The average of degree calculates the exceptional value P of length with variance:Character is distributed, and is distributed using the Chi-square Test calculating character in statistics Exceptional value α;Enumeration type, the concrete condition that the input of computation attribute value belongs in enumerated type exception;Keyword is taken out Take, find the identical URL common traits for accessing property, after all url datas are scanned, the character adjacent to all physical locations String carries out frequency record, and mutual information calculating is done to remaining character string after screening out the too low character string of the frequency.
5. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described The length exceptional value of URL parameter value, can be counted using the Chebyshev inequality and the average of length in statistics with variance The exceptional value P of length is calculated, calculation formula includes:
Wherein X is the length of URL parameter value, and μ is length average, σ2For length variance, k represents standard deviation number.
6. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described Character distribution is specifically included using the exceptional value α of the Chi-square Test calculating character distribution in statistics:For character string { s1, s2,…,sn},CD(s)iRepresent i-th of probable value in CD (s), ICDiI-th of probable value in ICD is represented, thenI-th of probable value in wherein i=1,2 ..., n, i.e. ICD is all sample distributions in sample set The average of i-th of probable value;
7. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described Enumeration type, the input of computation attribute value belongs to which kind of abnormal situation of enumeration type, the defined function f and g, and function f is Linear increasing function, g (x) represents sample function, and when sequentially inputting training sample, if running into new samples, then g plus 1, otherwise g Subtract 1,
F (x)=x
The function f and g that are obtained after all samples all learn to terminate correlation coefficient ρ can be defined by following formula:
Wherein Var (f) and Var (g) are function f and g variance respectively, and Covar (f, g) is function f and g covariance.
8. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described Keyword abstraction mutual information embodies whether character string internal combustion mode is close, and its calculation formula is as follows:
Wherein, P (s1s2s3) represent character string s1s2s3The probability of appearance, P (s1s2) represent character string s1s2The probability of appearance, P (s2s3) represent character string s2s3The probability of appearance.
9. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that also wrap The step of adjacent word in left and right for including the adjacent word of calculating character string enriches degree, the abundant degree of the adjacent word in its left and right can be obtained with use information entropy Wherein P (i) represents the probability that the adjacent word i of the character string occurs.
10. the exception of network traffic real-time monitoring system based on big data according to one of claim 1-9, its feature exists In the Bagging is that the sub- training set required for each basic mode type of sub-sample composition is carried out from training set, to all basic modes The result of type prediction, which integrate, produces the final integrated study framework predicted the outcome, on the basis of learner, from original Again data set is chosen on data set and carries out classification prediction, label is decided in the way of majority ballot, meanwhile, testing model is accurate Rate.
CN201710543858.6A 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning Active CN107294993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710543858.6A CN107294993B (en) 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710543858.6A CN107294993B (en) 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning

Publications (2)

Publication Number Publication Date
CN107294993A true CN107294993A (en) 2017-10-24
CN107294993B CN107294993B (en) 2021-02-09

Family

ID=60100438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710543858.6A Active CN107294993B (en) 2017-07-05 2017-07-05 WEB abnormal traffic monitoring method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN107294993B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944986A (en) * 2017-12-28 2018-04-20 广东工业大学 A kind of O2O Method of Commodity Recommendation, system and equipment
CN108038155A (en) * 2017-12-02 2018-05-15 宝牧科技(天津)有限公司 A kind of detection method of network URL exceptions
CN108491717A (en) * 2018-03-28 2018-09-04 四川长虹电器股份有限公司 A kind of xss systems of defense and its implementation based on machine learning
CN108764568A (en) * 2018-05-28 2018-11-06 哈尔滨工业大学 A kind of data prediction model tuning method and device based on LSTM networks
CN109167753A (en) * 2018-07-23 2019-01-08 中国科学院计算机网络信息中心 A kind of detection method and device of network intrusions flow
CN109325193A (en) * 2018-10-16 2019-02-12 杭州安恒信息技术股份有限公司 WAF normal discharge modeling method and device based on machine learning
CN109408591A (en) * 2018-10-12 2019-03-01 北京聚云位智信息科技有限公司 Support the AI of SQL driving and the decision type distributed data base system of Feature Engineering
CN109951484A (en) * 2019-03-20 2019-06-28 四川长虹电器股份有限公司 The test method and system attacked for machine learning product
CN110046757A (en) * 2019-04-08 2019-07-23 中国人民解放军第四军医大学 Number of Outpatients forecasting system and prediction technique based on LightGBM algorithm
CN110086749A (en) * 2018-01-25 2019-08-02 阿里巴巴集团控股有限公司 Data processing method and device
CN110175635A (en) * 2019-05-07 2019-08-27 南京邮电大学 OTT application user classification method based on Bagging algorithm
CN110263539A (en) * 2019-05-15 2019-09-20 湖南警察学院 A kind of Android malicious application detection method and system based on concurrent integration study
CN110363223A (en) * 2019-06-20 2019-10-22 华南理工大学 Industrial flow data processing method, detection method, system, device and medium
CN110415462A (en) * 2019-07-31 2019-11-05 中国工商银行股份有限公司 Atm device adds paper money optimization method and device
CN110443274A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium
CN110598774A (en) * 2019-09-03 2019-12-20 中电长城网际安全技术研究院(北京)有限公司 Encrypted flow detection method and device, computer readable storage medium and electronic equipment
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables
CN111371794A (en) * 2020-03-09 2020-07-03 北京金睛云华科技有限公司 Shadow domain detection model, detection model establishing method, detection method and system
CN111444931A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Method and device for detecting abnormal access data
CN111582879A (en) * 2019-01-30 2020-08-25 浙江远图互联科技股份有限公司 Anti-fraud medical insurance identification method based on genetic algorithm
CN111600919A (en) * 2019-02-21 2020-08-28 北京金睛云华科技有限公司 Web detection method and device based on artificial intelligence
CN111767275A (en) * 2020-06-28 2020-10-13 北京林克富华技术开发有限公司 Data processing method and device and data processing system
CN113361597A (en) * 2021-06-04 2021-09-07 北京天融信网络安全技术有限公司 URL detection model training method and device, electronic equipment and storage medium
CN113469730A (en) * 2021-06-08 2021-10-01 北京化工大学 Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene
CN113936765A (en) * 2021-12-17 2022-01-14 北京因数健康科技有限公司 Method and device for generating periodic behavior report, storage medium and electronic equipment
CN114169440A (en) * 2021-12-08 2022-03-11 北京百度网讯科技有限公司 Model training method, data processing method, device, electronic device and medium
CN114513341A (en) * 2022-01-21 2022-05-17 上海斗象信息科技有限公司 Malicious traffic detection method, device, terminal and computer readable storage medium
CN114915563A (en) * 2021-12-07 2022-08-16 天翼数字生活科技有限公司 Network flow prediction method and system
CN116127236A (en) * 2023-04-19 2023-05-16 远江盛邦(北京)网络安全科技股份有限公司 Webpage web component identification method and device based on parallel structure

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130091128A1 (en) * 2011-10-11 2013-04-11 Microsoft Corporation Time-Aware Ranking Adapted to a Search Engine Application
US20140105488A1 (en) * 2012-10-17 2014-04-17 Microsoft Corporation Learning-based image page index selection
CN104735074A (en) * 2015-03-31 2015-06-24 江苏通付盾信息科技有限公司 Malicious URL detection method and implement system thereof
CN105024989A (en) * 2014-11-26 2015-11-04 哈尔滨安天科技股份有限公司 Malicious URL heuristic detection method and system based on abnormal port
CN106131071A (en) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 A kind of Web method for detecting abnormality and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130091128A1 (en) * 2011-10-11 2013-04-11 Microsoft Corporation Time-Aware Ranking Adapted to a Search Engine Application
US20140105488A1 (en) * 2012-10-17 2014-04-17 Microsoft Corporation Learning-based image page index selection
CN105024989A (en) * 2014-11-26 2015-11-04 哈尔滨安天科技股份有限公司 Malicious URL heuristic detection method and system based on abnormal port
CN104735074A (en) * 2015-03-31 2015-06-24 江苏通付盾信息科技有限公司 Malicious URL detection method and implement system thereof
CN106131071A (en) * 2016-08-26 2016-11-16 北京奇虎科技有限公司 A kind of Web method for detecting abnormality and device

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038155A (en) * 2017-12-02 2018-05-15 宝牧科技(天津)有限公司 A kind of detection method of network URL exceptions
CN107944986B (en) * 2017-12-28 2022-02-15 广东工业大学 Method, system and equipment for recommending O2O commodities
CN107944986A (en) * 2017-12-28 2018-04-20 广东工业大学 A kind of O2O Method of Commodity Recommendation, system and equipment
CN110086749A (en) * 2018-01-25 2019-08-02 阿里巴巴集团控股有限公司 Data processing method and device
CN108491717A (en) * 2018-03-28 2018-09-04 四川长虹电器股份有限公司 A kind of xss systems of defense and its implementation based on machine learning
CN108764568A (en) * 2018-05-28 2018-11-06 哈尔滨工业大学 A kind of data prediction model tuning method and device based on LSTM networks
CN108764568B (en) * 2018-05-28 2020-10-23 哈尔滨工业大学 Data prediction model tuning method and device based on LSTM network
CN109167753A (en) * 2018-07-23 2019-01-08 中国科学院计算机网络信息中心 A kind of detection method and device of network intrusions flow
CN109408591A (en) * 2018-10-12 2019-03-01 北京聚云位智信息科技有限公司 Support the AI of SQL driving and the decision type distributed data base system of Feature Engineering
CN109408591B (en) * 2018-10-12 2021-11-09 北京聚云位智信息科技有限公司 Decision-making distributed database system supporting SQL (structured query language) driven AI (Artificial Intelligence) and feature engineering
CN109325193A (en) * 2018-10-16 2019-02-12 杭州安恒信息技术股份有限公司 WAF normal discharge modeling method and device based on machine learning
CN111444931A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Method and device for detecting abnormal access data
CN111582879A (en) * 2019-01-30 2020-08-25 浙江远图互联科技股份有限公司 Anti-fraud medical insurance identification method based on genetic algorithm
CN111600919B (en) * 2019-02-21 2023-04-07 北京金睛云华科技有限公司 Method and device for constructing intelligent network application protection system model
CN111600919A (en) * 2019-02-21 2020-08-28 北京金睛云华科技有限公司 Web detection method and device based on artificial intelligence
CN109951484A (en) * 2019-03-20 2019-06-28 四川长虹电器股份有限公司 The test method and system attacked for machine learning product
CN110046757A (en) * 2019-04-08 2019-07-23 中国人民解放军第四军医大学 Number of Outpatients forecasting system and prediction technique based on LightGBM algorithm
CN110175635A (en) * 2019-05-07 2019-08-27 南京邮电大学 OTT application user classification method based on Bagging algorithm
CN110175635B (en) * 2019-05-07 2022-08-30 南京邮电大学 OTT application program user classification method based on Bagging algorithm
CN110263539A (en) * 2019-05-15 2019-09-20 湖南警察学院 A kind of Android malicious application detection method and system based on concurrent integration study
CN110363223A (en) * 2019-06-20 2019-10-22 华南理工大学 Industrial flow data processing method, detection method, system, device and medium
CN110443274A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 Method for detecting abnormality, device, computer equipment and storage medium
CN110443274B (en) * 2019-06-28 2024-05-07 平安科技(深圳)有限公司 Abnormality detection method, abnormality detection device, computer device, and storage medium
CN110415462A (en) * 2019-07-31 2019-11-05 中国工商银行股份有限公司 Atm device adds paper money optimization method and device
CN110598774A (en) * 2019-09-03 2019-12-20 中电长城网际安全技术研究院(北京)有限公司 Encrypted flow detection method and device, computer readable storage medium and electronic equipment
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables
CN111104466B (en) * 2019-12-25 2023-07-28 中国长峰机电技术研究设计院 Method for quickly classifying massive database tables
CN111371794B (en) * 2020-03-09 2022-01-18 北京金睛云华科技有限公司 Shadow domain detection model, detection model establishing method, detection method and system
CN111371794A (en) * 2020-03-09 2020-07-03 北京金睛云华科技有限公司 Shadow domain detection model, detection model establishing method, detection method and system
CN111767275A (en) * 2020-06-28 2020-10-13 北京林克富华技术开发有限公司 Data processing method and device and data processing system
CN111767275B (en) * 2020-06-28 2024-04-19 北京林克富华技术开发有限公司 Data processing method and device and data processing system
CN113361597A (en) * 2021-06-04 2021-09-07 北京天融信网络安全技术有限公司 URL detection model training method and device, electronic equipment and storage medium
CN113361597B (en) * 2021-06-04 2023-07-21 北京天融信网络安全技术有限公司 Training method and device for URL detection model, electronic equipment and storage medium
CN113469730A (en) * 2021-06-08 2021-10-01 北京化工大学 Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene
CN114915563A (en) * 2021-12-07 2022-08-16 天翼数字生活科技有限公司 Network flow prediction method and system
CN114169440A (en) * 2021-12-08 2022-03-11 北京百度网讯科技有限公司 Model training method, data processing method, device, electronic device and medium
CN113936765A (en) * 2021-12-17 2022-01-14 北京因数健康科技有限公司 Method and device for generating periodic behavior report, storage medium and electronic equipment
CN114513341B (en) * 2022-01-21 2023-09-12 上海斗象信息科技有限公司 Malicious traffic detection method, malicious traffic detection device, terminal and computer readable storage medium
CN114513341A (en) * 2022-01-21 2022-05-17 上海斗象信息科技有限公司 Malicious traffic detection method, device, terminal and computer readable storage medium
CN116127236A (en) * 2023-04-19 2023-05-16 远江盛邦(北京)网络安全科技股份有限公司 Webpage web component identification method and device based on parallel structure

Also Published As

Publication number Publication date
CN107294993B (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN107294993A (en) A kind of WEB abnormal flow monitoring methods based on integrated study
CN107766883A (en) A kind of optimization random forest classification method and system based on weighted decision tree
CN111881983B (en) Data processing method and device based on classification model, electronic equipment and medium
CN106469181B (en) User behavior pattern analysis method and device
CN108717408A (en) A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system
CN107203467A (en) The reference test method and device of supervised learning algorithm under a kind of distributed environment
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
KR102105319B1 (en) Esg based enterprise assessment device and operating method thereof
CN104636449A (en) Distributed type big data system risk recognition method based on LSA-GCC
CN106681305A (en) Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment
Gao et al. A process fault diagnosis method using multi‐time scale dynamic feature extraction based on convolutional neural network
CN113609770B (en) Rolling bearing RUL prediction method based on piecewise linear fitting HI and LSTM
Golestani et al. Real-time prediction of employee engagement using social media and text mining
Maakoul et al. Towards evaluating the COVID’19 related fake news problem: case of morocco
CN107368516A (en) A kind of log audit method and device based on hierarchical clustering
CN106156179A (en) A kind of information retrieval method and device
CN113837266A (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
Ismaili et al. A supervised methodology to measure the variables contribution to a clustering
Jha et al. Criminal behaviour analysis and segmentation using k-means clustering
CN117272204A (en) Abnormal data detection method, device, storage medium and electronic equipment
Carvalho et al. Using political party affiliation data to measure civil servants' risk of corruption
CN116865994A (en) Network data security prediction method based on big data
CN104200222B (en) Object identifying method in a kind of picture based on factor graph model
Yu et al. An automatic recognition method of journal impact factor manipulation
CN115204475A (en) Drug rehabilitation place security incident risk assessment method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant