CN107294993A - A kind of WEB abnormal flow monitoring methods based on integrated study - Google Patents
A kind of WEB abnormal flow monitoring methods based on integrated study Download PDFInfo
- Publication number
- CN107294993A CN107294993A CN201710543858.6A CN201710543858A CN107294993A CN 107294993 A CN107294993 A CN 107294993A CN 201710543858 A CN201710543858 A CN 201710543858A CN 107294993 A CN107294993 A CN 107294993A
- Authority
- CN
- China
- Prior art keywords
- url
- data
- character string
- length
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention is claimed a kind of WEB abnormal flow monitoring methods based on integrated study, including data prediction, construction feature engineering, Data set reconstruction, and the foundation of model is with merging and five processes of model measurement.Data prediction is to carry out effective information extraction to url data.The structure of Feature Engineering is extraction and the structure that URL features are carried out using statistical methods such as comentropy, mutual informations.After the completion of Feature Engineering is built, for different access property, data set is adjusted, supervised learning is carried out in four kinds of machine learning algorithms such as input XGBoost, LightGBM.After learner construction complete, using Bagging framework integrated study devices.Choose data set again on raw data set and carry out classification prediction, label, testing model accuracy rate are decided in the way of majority ballot.Using in model process, by URL input models, five submodels in model can provide respective label probability respectively, and probability highest label is provided as final label.
Description
Technical field
The invention belongs to machine learning techniques field, and in particular to a variety of statistical algorithms and machine learning algorithm, this calculation
Method employs new feature extraction mode, and carrying out novelty to statistics and machine learning algorithm merges, and realizes to WEB exception streams
The monitoring of amount.
Background technology
1st, the network security problem of information age
In today of information huge explosion, the scale and internet number of users of computer network have all reached unprecedented scale,
And come one after another, it is highlighting further for network security problem.It is used as the main means for resisting network attack, abnormal flow prison
The research and development of survey are extremely urgent with upgrading.By the development of more than 20 years, the research of flow monitoring evolved multiple branches, but
In actual applications, effect is but and not fully up to expectations, and its difficult point is concentrated mainly on following several aspects:
1) unlawful practice pattern is carried out into monitoring in real time with unalterable rules causes rate of false alarm too high;
2) when with characteristic matching, feature database needs manual update, it is impossible to detect unknown attack mode;
3) huge regular quantity causes system detectio performance to receive very big influence, and the maintenance of rule base becomes to be difficult to
Safeguard;
4) the abnormal traffic detection system with block function is in flase drop proper communication behavior, and proper communication can be hindered
It is disconnected;
5) when monitoring system data storage capacities have bottleneck, Denial of Service attack is subject to, communication will be blocked.
Problem above is had based on abnormal traffic detection system, currently three sides are concentrated mainly on the systematic research
Upwards:Characteristic matching, rule-based reasoning and machine learning.
2nd, machine learning
In recent years, the method for machine learning is more and more applied to the algorithm design of abnormal traffic detection.It is not required to
Want too many manual intervention to solve the manpower maintenance issues of the renewal of feature database and rule base in characteristic matching, substantially increase certainly
Dynamicization degree;To the strong adaptability of different input datas, the high rate of false alarm deadlock of rule-based reasoning is broken, in face of unknown attack
Higher accuracy rate can be obtained.
However, single machine learning can not perfectly solve problem.Statistical method therein thinks all events all
Produced by statistical model, this method have ignored what the distributed model being previously set in parametric technique may not be inconsistent with True Data
Risk, so as to produce very large deviation with expected results.The system that other statistical model is constituted works under off-line state mostly, nothing
Method meets the requirement monitored in real time, thus to reach the very efficient performance of high-accuracy needs;And statistical method is for threshold value
Determination it is extremely difficult, threshold value is too high, it is too low can all cause can cause the rising of rate of failing to report.
And machine learning algorithm is by priori aposterior knowledge seamless combination although can overcome framework not enough intuitively shortcoming, so
And simple classification, clustering algorithm due to noise data interference, methods of sampling mistake, excessive modeling variable the problems such as can cause
Fitting, can not reach good monitoring effect.And the accuracy of model need to rely on certain it is assumed that these hypothesis are to be embodied in
In goal systems, the behavior pattern of network, the significantly decline of accuracy rate will be caused with assuming to run counter to.
The content of the invention
Present invention seek to address that above problem of the prior art.Propose one kind and effectively improve former machine learning method pair
The WEB abnormal flow monitoring method methods based on integrated study of the accuracy rate of abnormal flow monitoring.Technical scheme
It is as follows:
A kind of WEB abnormal flow monitoring methods based on integrated study, it comprises the following steps:
1) data prediction:Uniform resource position mark URL record is obtained, and progress is recorded to uniform resource position mark URL
Cutting separation, extracts effective information;
2) construction feature engineering:With statistical method to common instruction attack, database attack, cross-site scripting attack
Carry out the extraction of feature respectively comprising the uniform resource position mark URL that attack and proper network are accessed with local file;
3) Data set reconstruction:For five kinds of access properties, total data set is arranged according to respective feature respectively, will be marked
Label be adjusted to the access property and other;
4) model is set up:To five kinds of data sets accessed corresponding to property, with XGBoost, (extreme gradient is carried respectively
Rise), Light GBM (lightweight gradient elevator), RF (random forest), four kinds of machine learning algorithm logarithms of LR (logistic regression)
According to supervised learning is carried out, with bagging framework integrated study devices, obtain for this five kinds access respective identification moulds of property
Type;
5) model measurement:The partial data collection of advance reservation in step 4 is tested, testing model accuracy rate.
Further, the step 1) URL effective informations extraction include step:For a untreated URL:First
Remove the invalid data after " # ";By rest segment by "" cut;Sub-argument goes out file path fragment, is drawn with "/" with "="
Point;Query portion is divided with " & " with "=";Parameter obtained by division is respectively put into progress canonical in processing function with value and matched.
Further, the processing function can replace numeral with date and time, and disorderly symbol is replaced by that " $ 0 ", length is less than
The character string of 10 lowercase composition is changed to " s ", and the character string that " Ox " that length is more than 2 starts is changed to " Ox1234 ", multiple
Space is condensed to a space, and the fragment after being disposed is the URL information fragment that model needs.
Further, the step 2) construction feature engineering specifically includes:The length of URL parameter value, using in statistics
Chebyshev inequality, and average and the variance of length calculate the exceptional value P of length:Character is distributed, and utilizes statistics
In Chi-square Test calculating character distribution exceptional value α;Enumeration type, is enumerated in Exception Type belonging to the input of computation attribute value
Situation;Keyword abstraction, finds the identical URL common traits for accessing property, after all url datas are scanned, to property
Manage the adjacent character string in position and carry out frequency record, mutual information meter is done to remaining character string after screening out the too low character string of the frequency
Calculate.
Further, the length exceptional value of the URL parameter value, utilizes the Chebyshev inequality and length in statistics
The average of degree can calculate the exceptional value P of length with variance, and calculation formula includes:
Wherein X represents the length of URL parameter value, and μ is length average, σ2For length variance, k represents standard deviation number;
Further, the character distribution is specific using the exceptional value α of the Chi-square Test calculating character distribution in statistics
Including:For character string { s1,s2,…,sn},CD(s)iRepresent i-th of probable value in CD (s), ICDiRepresent i-th in ICD
Individual probable value, thenI-th of probable value in wherein i=1,2 ..., n, i.e. ICD is institute in sample set
There is the average of i-th of probable value of sample distribution;
Further, the enumeration type, the situation in Exception Type is enumerated belonging to the input of computation attribute value, described fixed
Adopted function f and g, function f are linear increasing functions, and g (x) represents sample function, when sequentially inputting training sample, if running into
Then g adds 1 to new samples, and otherwise g subtracts 1,
F (x)=x
The function f and g that are obtained after all samples all learn to terminate correlation coefficient ρ can be defined by following formula:
Wherein Var (f) and Var (g) are function f and g variance respectively, and Covar (f, g) is function f and g covariance.
Further, the keyword abstraction mutual information embodies whether character string internal combustion mode is close, and it is calculated
Formula is as follows:
Wherein, P (s1s2s3) represent character string s1s2s3The probability of appearance, P (s1s2)、P(s2s3) implication is similar.
Further, in addition to the step of the adjacent word in left and right of the adjacent word of calculating character string enriches degree, the adjacent word in its left and right is rich
Rich degree can be obtained with use information entropyWherein P (i) represents what the adjacent word i of the character string occurred
Probability.
Further, the Bagging is that the son carried out from training set required for sub-sample constitutes each basic mode type is instructed
Practice collection, the result to all base model predictions carries out integrating the final integrated study framework predicted the outcome of generation, in learner
On the basis of, choose data set again from raw data set and carry out classification prediction, decide label in the way of majority ballot, together
When, testing model accuracy rate.
Advantages of the present invention and have the beneficial effect that:
The present invention uses statistical method, URL is cut into slices, feature extraction, it is ensured that the integrality of feature extraction with
Reliability.Integrated a variety of machine learning algorithms, including the high XGBoost of accuracy rate (extreme gradient lifting), RF are (at random simultaneously
Forest) etc., it is ensured that model carries out high accuracy during Traffic Anomaly monitoring, and visiting URL is inputted into five moulds in monitoring process
It is predicted to identify whether be known exception in type, while unknown exception can also be identified.
Brief description of the drawings
Fig. 1 is the method overall flow figure that the present invention provides preferred embodiment;
Fig. 2 is to URL cut and extract exemplary plot in this method;
Fig. 3 is this method bagging framework integrating process figures;
Fig. 4 is abnormal flow monitoring flow chart under this model.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed
Carefully describe.Described embodiment is only a part of embodiment of the present invention.
The present invention solve above-mentioned technical problem technical scheme be:
The present invention proposes a model for being used to solve abnormal flow monitoring.Fig. 1 show the flow chart of whole model.
Data set is pre-processed, such as " ", "=" symbol split, the effective information in URL is extracted, to improve processing
Efficiency.Fig. 2 is that URL cuts example.Data after processing carry out feature extraction by statistical methods such as mutual information, comentropies.
After Feature Engineering construction is finished, according to the difference for accessing property, the data set of different characteristic is constructed respectively, and it is two to change label
Class:Current accessed property and other.At the same time, extract partial data and be used as test set.To five data sets after reconstruct point
Machine learning is not carried out.Introducing eXtreme Gradient Boosting,Light Gradient Boosting Machine、
Random Forest, tetra- kinds of machine learning algorithms of Logistic Regression carry out supervised learning to data set, and lead to
Bagging framework integrated study devices are crossed, the separate identification model for different access property is obtained.Fig. 3 is bagging
Framework integrating process.Reserved test set is brought into identification model respectively and tested, testing model accuracy.
The significant process of whole improved abnormal flow monitoring model includes:URL information extraction, the structure of Feature Engineering
Make, the training of many Algorithm Learning devices, bagging frameworks are integrated.
First, URL information extraction
In order to improve the treatment effeciency of model, the effective information extraction to URL is most important.It is untreated for one
URL:
1) need to remove the invalid data after " # " first;
2) by rest segment by "" cut
3) sub-argument goes out file path fragment, is divided with "/" with "=";
4) query portion is divided with " & " with "=";
Parameter obtained by 3), 4) dividing is respectively put into progress canonical in processing function with value and matched.Handling function can be by
Numeral replaced with date and time, disorderly symbol be replaced by " $ 0 ", length be less than 10 lowercase constitute character string be changed to " s ",
The character string that " Ox " that length is more than 2 starts is changed to " Ox1234 ", and multiple spaces are condensed to a space.Fragment after being disposed
The URL information fragment that as model needs.
2nd, the construction of Feature Engineering
It is well known that the construction of Feature Engineering drastically influence the validity and accuracy rate of model.
1) length of URL parameter value:Can using the Chebyshev inequality and the average of length in statistics and variance
To calculate the exceptional value P of length,
Wherein μ is length average, σ2For length variance, k represents standard deviation number;
2) character is distributed:Utilize the exceptional value α of the Chi-square Test calculating character distribution in statistics.For character string { s1,
s2,…,sn},CD(s)iRepresent i-th of probable value in CD (s), ICDiI-th of probable value in ICD is represented, thenWherein i=1,2 ..., n.That is i-th of probable value in ICD is all sample distributions in sample set
The average of i-th of probable value;
3) enumeration type:The situation that the legal input of some property value belongs to enumeration type is very universal, for example
The legal parameters of " gender " attribute are " { male, female } ", and any input for being not belonging to both of these case should all belong to
Abnormal conditions.Defined function f and g, function f is linear increasing function, when sequentially inputting training sample, if running into new samples
Then g adds 1, and otherwise g subtracts 1.
F (x)=x
The function f and g that are obtained after all samples all learn to terminate correlation coefficient ρ can be defined by following formula:
Wherein Var (f) and Var (g) are function f and g variance respectively, and Co var (f, g) are function f and g covariances;
4) keyword abstraction:In order to find the URL common traits of identical access property, the URL of same access type is closed
Keyword is extracted and is particularly important.After all url datas are scanned, the character string adjacent to all physical locations carries out frequency note
Record.Mutual information calculating is done to remaining character string after screening out the too low character string of the frequency.Mutual information embodies character string internal combustion
Whether mode is close, and its calculation formula is as follows:
Wherein, P (s1s2s3) represent character string s1s2s3The probability of appearance, P (s1s2)、P(s2s3) implication is similar.
In addition it is also necessary to which the adjacent word in left and right of calculating character string neighbour's word enriches degree, left and right neighbour's word is abundanter, and the character string exists
It is more flexible in data set, it is that the possibility of this kind of URL keyword is bigger.The abundant degree of the adjacent word in its left and right can use letter
Entropy is ceased to obtainWherein P (i) represents the probability that the adjacent word i of the character string occurs.
3rd, the training of many Algorithm Learning devices
, it is necessary to which data are done with a little change before training data.URL features for every kind of access property are expanded to entirely
In data set, five different data sets are formed.Change former label simultaneously, only retain the label of the access property, residue is accessed
The label of the url data of property is all replaced with other.
XGBoost, LightGBM, RF, LR on selected by algorithm, by test, are that accuracy rate is higher, are pasted with problem
Conjunction property most strong machine learning algorithm.
1)XGBoost:XGBoost is the algorithm being optimized on the basis of the boosting algorithms such as AdaBoost and GBDT,
Available for linear classification, the linear regression algorithm with L1 and L2 regularizations can be regarded as;The regularization more than traditional GBDT
Function thus lifted in terms of preventing over-fitting it is a lot, in terms of distributed algorithm, XGBoost can exist the feature of every dimension
It is ranked up, and is stored in Block structures in one machine.Held so multiple feature calculations can be distributed in different machines
OK, end product collects.XGBoost is so caused to be provided with the ability that distribution is calculated;Because characteristic value is finally simply used in
Sequence, so characteristic value influences less to XGBoost model learnings;Simply the reduction of selection gradient is maximum for each calculating
Feature is so feature correlation select permeability is also solved;
2)LightGBM:LightGBM is a framework for realizing GBDT algorithms, supports efficient parallel training, and
Possess faster speed, lower memory consumption, preferably more preferable accuracy rate, distributed support, can quickly handle magnanimity
Data.
3)Random Forest:Random Forest are particularly suitable to do many classification problems, and training and predetermined speed are fast,
Showed on data set good;Fault-tolerant ability to training data is strong;It can handle very high-dimensional data, and it goes without doing feature
Selection, i.e.,:The thousands of variable do not deleted can be handled, is played in the big measure feature that processing is gone out with key extracted
Good effectiveness;The inside unbiased esti-mator of an extensive error can be generated during classification;It can train
The importance degree of influencing each other between feature and feature is detected in journey;It is not in overfitting;
4)Logistic Regression:The thought of logistic regression is that data set is divided into two parts with a hyperplane,
This two parts respectively be located at hyperplane both sides, and belong to two it is different classes of, just suiting will be every kind of in processing data collection
Access the data that the URL data set of property labels again.Fig. 4 is the principle of classification schematic diagrames of Logistic Regression two.
In addition, amount of calculation is very small during its classification, quickly, storage resource is extremely low for speed, and is easy to observation sample probability score.
4th, Bagging frameworks are integrated
Bagging is a kind of from training set from the sub- training set carried out required for sub-sample constitutes each basic mode type, to institute
The result for having base model prediction, which integrate, produces the final integrated study framework predicted the outcome.On the basis of learner,
Choose data set again from raw data set and carry out classification prediction, label is decided in the way of majority ballot, meanwhile, examine mould
Type accuracy rate.Because the block mold of the framework is expected to be similar to the expectation of basic mode type, this also implies that the inclined of block mold
Difference is approximate with the deviation of basic mode type, while the variance of block mold can increasing and reduce with base pattern number, it is therefore prevented that cross plan
The enhancing of conjunction ability, model accuracy rate can be significantly improved.Table 1 is that each machine learning algorithm and the integrated rear experiments of Bagging are accurate
The rate table of comparisons;
The model accuracy rate table of comparisons of table 1
The above embodiment is interpreted as being merely to illustrate the present invention rather than limited the scope of the invention.
After the content for the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes
Change and modification equally falls into the scope of the claims in the present invention.
Claims (10)
1. a kind of WEB abnormal flow monitoring methods based on integrated study, it is characterised in that comprise the following steps:
1) data prediction:Uniform resource position mark URL record is obtained, and uniform resource position mark URL record is cut
Separation, extracts effective information;
2) construction feature engineering:With statistical method to common instruction attack, database attack, cross-site scripting attack and sheet
Ground file carries out the extraction of feature comprising the uniform resource position mark URL that attack and proper network are accessed respectively;
3) Data set reconstruction:For five kinds of access properties, total data set is arranged according to respective feature respectively, label is adjusted
It is whole for the access property and other;
4) model is set up:To five kinds of data sets accessed corresponding to property, respectively with the extreme gradient liftings of XGBoost, Light
GBM lightweight gradients elevator, RF random forests, four kinds of machine learning algorithms of LR logistic regressions carry out having supervision to learn to data
Practise, with bagging framework integrated study devices, obtain for this five kinds access respective identification models of property;
5) model measurement:To step 4) in the partial data collection of advance reservation test, testing model accuracy rate.
2. the WEB abnormal flow monitoring methods according to claim 1 based on integrated study, it is characterised in that the step
The extraction of rapid 1) URL effective informations includes step:For a untreated URL:The invalid data after " # " is removed first;Will
Rest segment by "" cut;Sub-argument goes out file path fragment, is divided with "/" with "=";Query portion is with " & " and "="
Divide;Parameter obtained by division is respectively put into progress canonical in processing function with value and matched.
3. the WEB abnormal flow monitoring methods according to claim 2 based on integrated study, it is characterised in that the place
Reason function can replace numeral with date and time, and disorderly symbol is replaced by " $ 0 ", the character that lowercase of the length less than 10 is constituted
Falsification is " s ", and the character string that " Ox " that length is more than 2 starts is changed to " Ox1234 ", and multiple spaces are condensed to a space, have handled
Fragment after finishing is the URL information fragment that model needs.
4. the WEB abnormal flow monitoring methods according to claim 2 based on integrated study, it is characterised in that the step
Rapid 2) construction feature engineering is specifically included:The length of URL parameter value, utilizes the Chebyshev inequality in statistics, Yi Jichang
The average of degree calculates the exceptional value P of length with variance:Character is distributed, and is distributed using the Chi-square Test calculating character in statistics
Exceptional value α;Enumeration type, the concrete condition that the input of computation attribute value belongs in enumerated type exception;Keyword is taken out
Take, find the identical URL common traits for accessing property, after all url datas are scanned, the character adjacent to all physical locations
String carries out frequency record, and mutual information calculating is done to remaining character string after screening out the too low character string of the frequency.
5. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described
The length exceptional value of URL parameter value, can be counted using the Chebyshev inequality and the average of length in statistics with variance
The exceptional value P of length is calculated, calculation formula includes:
Wherein X is the length of URL parameter value, and μ is length average, σ2For length variance, k represents standard deviation number.
6. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described
Character distribution is specifically included using the exceptional value α of the Chi-square Test calculating character distribution in statistics:For character string { s1,
s2,…,sn},CD(s)iRepresent i-th of probable value in CD (s), ICDiI-th of probable value in ICD is represented, thenI-th of probable value in wherein i=1,2 ..., n, i.e. ICD is all sample distributions in sample set
The average of i-th of probable value;
7. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described
Enumeration type, the input of computation attribute value belongs to which kind of abnormal situation of enumeration type, the defined function f and g, and function f is
Linear increasing function, g (x) represents sample function, and when sequentially inputting training sample, if running into new samples, then g plus 1, otherwise g
Subtract 1,
F (x)=x
The function f and g that are obtained after all samples all learn to terminate correlation coefficient ρ can be defined by following formula:
Wherein Var (f) and Var (g) are function f and g variance respectively, and Covar (f, g) is function f and g covariance.
8. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that described
Keyword abstraction mutual information embodies whether character string internal combustion mode is close, and its calculation formula is as follows:
Wherein, P (s1s2s3) represent character string s1s2s3The probability of appearance, P (s1s2) represent character string s1s2The probability of appearance, P
(s2s3) represent character string s2s3The probability of appearance.
9. the exception of network traffic real-time monitoring system according to claim 4 based on big data, it is characterised in that also wrap
The step of adjacent word in left and right for including the adjacent word of calculating character string enriches degree, the abundant degree of the adjacent word in its left and right can be obtained with use information entropy
Wherein P (i) represents the probability that the adjacent word i of the character string occurs.
10. the exception of network traffic real-time monitoring system based on big data according to one of claim 1-9, its feature exists
In the Bagging is that the sub- training set required for each basic mode type of sub-sample composition is carried out from training set, to all basic modes
The result of type prediction, which integrate, produces the final integrated study framework predicted the outcome, on the basis of learner, from original
Again data set is chosen on data set and carries out classification prediction, label is decided in the way of majority ballot, meanwhile, testing model is accurate
Rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710543858.6A CN107294993B (en) | 2017-07-05 | 2017-07-05 | WEB abnormal traffic monitoring method based on ensemble learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710543858.6A CN107294993B (en) | 2017-07-05 | 2017-07-05 | WEB abnormal traffic monitoring method based on ensemble learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107294993A true CN107294993A (en) | 2017-10-24 |
CN107294993B CN107294993B (en) | 2021-02-09 |
Family
ID=60100438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710543858.6A Active CN107294993B (en) | 2017-07-05 | 2017-07-05 | WEB abnormal traffic monitoring method based on ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107294993B (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944986A (en) * | 2017-12-28 | 2018-04-20 | 广东工业大学 | A kind of O2O Method of Commodity Recommendation, system and equipment |
CN108038155A (en) * | 2017-12-02 | 2018-05-15 | 宝牧科技(天津)有限公司 | A kind of detection method of network URL exceptions |
CN108491717A (en) * | 2018-03-28 | 2018-09-04 | 四川长虹电器股份有限公司 | A kind of xss systems of defense and its implementation based on machine learning |
CN108764568A (en) * | 2018-05-28 | 2018-11-06 | 哈尔滨工业大学 | A kind of data prediction model tuning method and device based on LSTM networks |
CN109167753A (en) * | 2018-07-23 | 2019-01-08 | 中国科学院计算机网络信息中心 | A kind of detection method and device of network intrusions flow |
CN109325193A (en) * | 2018-10-16 | 2019-02-12 | 杭州安恒信息技术股份有限公司 | WAF normal discharge modeling method and device based on machine learning |
CN109408591A (en) * | 2018-10-12 | 2019-03-01 | 北京聚云位智信息科技有限公司 | Support the AI of SQL driving and the decision type distributed data base system of Feature Engineering |
CN109951484A (en) * | 2019-03-20 | 2019-06-28 | 四川长虹电器股份有限公司 | The test method and system attacked for machine learning product |
CN110046757A (en) * | 2019-04-08 | 2019-07-23 | 中国人民解放军第四军医大学 | Number of Outpatients forecasting system and prediction technique based on LightGBM algorithm |
CN110086749A (en) * | 2018-01-25 | 2019-08-02 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN110175635A (en) * | 2019-05-07 | 2019-08-27 | 南京邮电大学 | OTT application user classification method based on Bagging algorithm |
CN110263539A (en) * | 2019-05-15 | 2019-09-20 | 湖南警察学院 | A kind of Android malicious application detection method and system based on concurrent integration study |
CN110363223A (en) * | 2019-06-20 | 2019-10-22 | 华南理工大学 | Industrial flow data processing method, detection method, system, device and medium |
CN110415462A (en) * | 2019-07-31 | 2019-11-05 | 中国工商银行股份有限公司 | Atm device adds paper money optimization method and device |
CN110443274A (en) * | 2019-06-28 | 2019-11-12 | 平安科技(深圳)有限公司 | Method for detecting abnormality, device, computer equipment and storage medium |
CN110598774A (en) * | 2019-09-03 | 2019-12-20 | 中电长城网际安全技术研究院(北京)有限公司 | Encrypted flow detection method and device, computer readable storage medium and electronic equipment |
CN111104466A (en) * | 2019-12-25 | 2020-05-05 | 航天科工网络信息发展有限公司 | Method for rapidly classifying massive database tables |
CN111371794A (en) * | 2020-03-09 | 2020-07-03 | 北京金睛云华科技有限公司 | Shadow domain detection model, detection model establishing method, detection method and system |
CN111444931A (en) * | 2019-01-17 | 2020-07-24 | 北京京东尚科信息技术有限公司 | Method and device for detecting abnormal access data |
CN111582879A (en) * | 2019-01-30 | 2020-08-25 | 浙江远图互联科技股份有限公司 | Anti-fraud medical insurance identification method based on genetic algorithm |
CN111600919A (en) * | 2019-02-21 | 2020-08-28 | 北京金睛云华科技有限公司 | Web detection method and device based on artificial intelligence |
CN111767275A (en) * | 2020-06-28 | 2020-10-13 | 北京林克富华技术开发有限公司 | Data processing method and device and data processing system |
CN113361597A (en) * | 2021-06-04 | 2021-09-07 | 北京天融信网络安全技术有限公司 | URL detection model training method and device, electronic equipment and storage medium |
CN113469730A (en) * | 2021-06-08 | 2021-10-01 | 北京化工大学 | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene |
CN113936765A (en) * | 2021-12-17 | 2022-01-14 | 北京因数健康科技有限公司 | Method and device for generating periodic behavior report, storage medium and electronic equipment |
CN114169440A (en) * | 2021-12-08 | 2022-03-11 | 北京百度网讯科技有限公司 | Model training method, data processing method, device, electronic device and medium |
CN114513341A (en) * | 2022-01-21 | 2022-05-17 | 上海斗象信息科技有限公司 | Malicious traffic detection method, device, terminal and computer readable storage medium |
CN114915563A (en) * | 2021-12-07 | 2022-08-16 | 天翼数字生活科技有限公司 | Network flow prediction method and system |
CN116127236A (en) * | 2023-04-19 | 2023-05-16 | 远江盛邦(北京)网络安全科技股份有限公司 | Webpage web component identification method and device based on parallel structure |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130091128A1 (en) * | 2011-10-11 | 2013-04-11 | Microsoft Corporation | Time-Aware Ranking Adapted to a Search Engine Application |
US20140105488A1 (en) * | 2012-10-17 | 2014-04-17 | Microsoft Corporation | Learning-based image page index selection |
CN104735074A (en) * | 2015-03-31 | 2015-06-24 | 江苏通付盾信息科技有限公司 | Malicious URL detection method and implement system thereof |
CN105024989A (en) * | 2014-11-26 | 2015-11-04 | 哈尔滨安天科技股份有限公司 | Malicious URL heuristic detection method and system based on abnormal port |
CN106131071A (en) * | 2016-08-26 | 2016-11-16 | 北京奇虎科技有限公司 | A kind of Web method for detecting abnormality and device |
-
2017
- 2017-07-05 CN CN201710543858.6A patent/CN107294993B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130091128A1 (en) * | 2011-10-11 | 2013-04-11 | Microsoft Corporation | Time-Aware Ranking Adapted to a Search Engine Application |
US20140105488A1 (en) * | 2012-10-17 | 2014-04-17 | Microsoft Corporation | Learning-based image page index selection |
CN105024989A (en) * | 2014-11-26 | 2015-11-04 | 哈尔滨安天科技股份有限公司 | Malicious URL heuristic detection method and system based on abnormal port |
CN104735074A (en) * | 2015-03-31 | 2015-06-24 | 江苏通付盾信息科技有限公司 | Malicious URL detection method and implement system thereof |
CN106131071A (en) * | 2016-08-26 | 2016-11-16 | 北京奇虎科技有限公司 | A kind of Web method for detecting abnormality and device |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038155A (en) * | 2017-12-02 | 2018-05-15 | 宝牧科技(天津)有限公司 | A kind of detection method of network URL exceptions |
CN107944986B (en) * | 2017-12-28 | 2022-02-15 | 广东工业大学 | Method, system and equipment for recommending O2O commodities |
CN107944986A (en) * | 2017-12-28 | 2018-04-20 | 广东工业大学 | A kind of O2O Method of Commodity Recommendation, system and equipment |
CN110086749A (en) * | 2018-01-25 | 2019-08-02 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN108491717A (en) * | 2018-03-28 | 2018-09-04 | 四川长虹电器股份有限公司 | A kind of xss systems of defense and its implementation based on machine learning |
CN108764568A (en) * | 2018-05-28 | 2018-11-06 | 哈尔滨工业大学 | A kind of data prediction model tuning method and device based on LSTM networks |
CN108764568B (en) * | 2018-05-28 | 2020-10-23 | 哈尔滨工业大学 | Data prediction model tuning method and device based on LSTM network |
CN109167753A (en) * | 2018-07-23 | 2019-01-08 | 中国科学院计算机网络信息中心 | A kind of detection method and device of network intrusions flow |
CN109408591A (en) * | 2018-10-12 | 2019-03-01 | 北京聚云位智信息科技有限公司 | Support the AI of SQL driving and the decision type distributed data base system of Feature Engineering |
CN109408591B (en) * | 2018-10-12 | 2021-11-09 | 北京聚云位智信息科技有限公司 | Decision-making distributed database system supporting SQL (structured query language) driven AI (Artificial Intelligence) and feature engineering |
CN109325193A (en) * | 2018-10-16 | 2019-02-12 | 杭州安恒信息技术股份有限公司 | WAF normal discharge modeling method and device based on machine learning |
CN111444931A (en) * | 2019-01-17 | 2020-07-24 | 北京京东尚科信息技术有限公司 | Method and device for detecting abnormal access data |
CN111582879A (en) * | 2019-01-30 | 2020-08-25 | 浙江远图互联科技股份有限公司 | Anti-fraud medical insurance identification method based on genetic algorithm |
CN111600919B (en) * | 2019-02-21 | 2023-04-07 | 北京金睛云华科技有限公司 | Method and device for constructing intelligent network application protection system model |
CN111600919A (en) * | 2019-02-21 | 2020-08-28 | 北京金睛云华科技有限公司 | Web detection method and device based on artificial intelligence |
CN109951484A (en) * | 2019-03-20 | 2019-06-28 | 四川长虹电器股份有限公司 | The test method and system attacked for machine learning product |
CN110046757A (en) * | 2019-04-08 | 2019-07-23 | 中国人民解放军第四军医大学 | Number of Outpatients forecasting system and prediction technique based on LightGBM algorithm |
CN110175635A (en) * | 2019-05-07 | 2019-08-27 | 南京邮电大学 | OTT application user classification method based on Bagging algorithm |
CN110175635B (en) * | 2019-05-07 | 2022-08-30 | 南京邮电大学 | OTT application program user classification method based on Bagging algorithm |
CN110263539A (en) * | 2019-05-15 | 2019-09-20 | 湖南警察学院 | A kind of Android malicious application detection method and system based on concurrent integration study |
CN110363223A (en) * | 2019-06-20 | 2019-10-22 | 华南理工大学 | Industrial flow data processing method, detection method, system, device and medium |
CN110443274A (en) * | 2019-06-28 | 2019-11-12 | 平安科技(深圳)有限公司 | Method for detecting abnormality, device, computer equipment and storage medium |
CN110443274B (en) * | 2019-06-28 | 2024-05-07 | 平安科技(深圳)有限公司 | Abnormality detection method, abnormality detection device, computer device, and storage medium |
CN110415462A (en) * | 2019-07-31 | 2019-11-05 | 中国工商银行股份有限公司 | Atm device adds paper money optimization method and device |
CN110598774A (en) * | 2019-09-03 | 2019-12-20 | 中电长城网际安全技术研究院(北京)有限公司 | Encrypted flow detection method and device, computer readable storage medium and electronic equipment |
CN111104466A (en) * | 2019-12-25 | 2020-05-05 | 航天科工网络信息发展有限公司 | Method for rapidly classifying massive database tables |
CN111104466B (en) * | 2019-12-25 | 2023-07-28 | 中国长峰机电技术研究设计院 | Method for quickly classifying massive database tables |
CN111371794B (en) * | 2020-03-09 | 2022-01-18 | 北京金睛云华科技有限公司 | Shadow domain detection model, detection model establishing method, detection method and system |
CN111371794A (en) * | 2020-03-09 | 2020-07-03 | 北京金睛云华科技有限公司 | Shadow domain detection model, detection model establishing method, detection method and system |
CN111767275A (en) * | 2020-06-28 | 2020-10-13 | 北京林克富华技术开发有限公司 | Data processing method and device and data processing system |
CN111767275B (en) * | 2020-06-28 | 2024-04-19 | 北京林克富华技术开发有限公司 | Data processing method and device and data processing system |
CN113361597A (en) * | 2021-06-04 | 2021-09-07 | 北京天融信网络安全技术有限公司 | URL detection model training method and device, electronic equipment and storage medium |
CN113361597B (en) * | 2021-06-04 | 2023-07-21 | 北京天融信网络安全技术有限公司 | Training method and device for URL detection model, electronic equipment and storage medium |
CN113469730A (en) * | 2021-06-08 | 2021-10-01 | 北京化工大学 | Customer repurchase prediction method and device based on RF-LightGBM fusion model under non-contract scene |
CN114915563A (en) * | 2021-12-07 | 2022-08-16 | 天翼数字生活科技有限公司 | Network flow prediction method and system |
CN114169440A (en) * | 2021-12-08 | 2022-03-11 | 北京百度网讯科技有限公司 | Model training method, data processing method, device, electronic device and medium |
CN113936765A (en) * | 2021-12-17 | 2022-01-14 | 北京因数健康科技有限公司 | Method and device for generating periodic behavior report, storage medium and electronic equipment |
CN114513341B (en) * | 2022-01-21 | 2023-09-12 | 上海斗象信息科技有限公司 | Malicious traffic detection method, malicious traffic detection device, terminal and computer readable storage medium |
CN114513341A (en) * | 2022-01-21 | 2022-05-17 | 上海斗象信息科技有限公司 | Malicious traffic detection method, device, terminal and computer readable storage medium |
CN116127236A (en) * | 2023-04-19 | 2023-05-16 | 远江盛邦(北京)网络安全科技股份有限公司 | Webpage web component identification method and device based on parallel structure |
Also Published As
Publication number | Publication date |
---|---|
CN107294993B (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107294993A (en) | A kind of WEB abnormal flow monitoring methods based on integrated study | |
CN107766883A (en) | A kind of optimization random forest classification method and system based on weighted decision tree | |
CN111881983B (en) | Data processing method and device based on classification model, electronic equipment and medium | |
CN106469181B (en) | User behavior pattern analysis method and device | |
CN108717408A (en) | A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system | |
CN107203467A (en) | The reference test method and device of supervised learning algorithm under a kind of distributed environment | |
CN111612041A (en) | Abnormal user identification method and device, storage medium and electronic equipment | |
KR102105319B1 (en) | Esg based enterprise assessment device and operating method thereof | |
CN104636449A (en) | Distributed type big data system risk recognition method based on LSA-GCC | |
CN106681305A (en) | Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment | |
Gao et al. | A process fault diagnosis method using multi‐time scale dynamic feature extraction based on convolutional neural network | |
CN113609770B (en) | Rolling bearing RUL prediction method based on piecewise linear fitting HI and LSTM | |
Golestani et al. | Real-time prediction of employee engagement using social media and text mining | |
Maakoul et al. | Towards evaluating the COVID’19 related fake news problem: case of morocco | |
CN107368516A (en) | A kind of log audit method and device based on hierarchical clustering | |
CN106156179A (en) | A kind of information retrieval method and device | |
CN113837266A (en) | Software defect prediction method based on feature extraction and Stacking ensemble learning | |
Ismaili et al. | A supervised methodology to measure the variables contribution to a clustering | |
Jha et al. | Criminal behaviour analysis and segmentation using k-means clustering | |
CN117272204A (en) | Abnormal data detection method, device, storage medium and electronic equipment | |
Carvalho et al. | Using political party affiliation data to measure civil servants' risk of corruption | |
CN116865994A (en) | Network data security prediction method based on big data | |
CN104200222B (en) | Object identifying method in a kind of picture based on factor graph model | |
Yu et al. | An automatic recognition method of journal impact factor manipulation | |
CN115204475A (en) | Drug rehabilitation place security incident risk assessment method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |