CN111600919B

CN111600919B - Method and device for constructing intelligent network application protection system model

Info

Publication number: CN111600919B
Application number: CN201910128774.5A
Authority: CN
Inventors: 曲武
Original assignee: Beijing Jinjingyunhua Technology Co ltd
Current assignee: Beijing Jinjingyunhua Technology Co ltd
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2023-04-07
Anticipated expiration: 2039-02-21
Also published as: CN111600919A

Abstract

The application discloses a method and a device for constructing an intelligent network application protection system model. The method comprises the following steps: step A, acquiring training sample data; b, respectively acquiring different types of feature vectors corresponding to the training sample data by using at least two feature extraction algorithms; step C, respectively training the base model by using different supervised learning algorithms and different types of feature vector files of training sample data, and performing cross validation; and D, constructing according to the base model by adopting an integrated learning technology to obtain an intelligent network application protection system model.

Description

Method and device for constructing intelligent network application protection system model

Technical Field

The present application relates to the field of information processing, and in particular, to a method and an apparatus for constructing an intelligent network application protection system model.

Background

The openness, ease of use, ease of development, and popularity of Web applications have made Web application security issues increasingly prominent. With the rapid development of the internet, the Web applications are diversified and rapidly developed, and attack forms for the Web applications are also diversified and various, wherein SQL (Structured Query Language) injection attack and XSS (cross site script) attack are the most popular attacks, have the greatest harm, and are the main ways of information leakage at present. The two types of attacks have the characteristics of large harm, multiple types, quick variation, hidden attack and the like, and are one of important tasks for detecting the attack of the Web application.

In the development history of Web application attack Detection, a blacklist Detection mechanism depending on rules is generally adopted, and whether a Web application firewall or IDS (Intrusion Detection Systems) or the like is adopted, messages are matched depending on a regular expression built in a Detection engine. The method can resist most attacks generally, but the following problems generally exist in the practical application environment, the maintenance and the upgrade of the rule base are difficult, and the false alarm is not easy to modify in time; the rule level is difficult to set, false alarm can be generated if the rule level is too tight, and false alarm can be missed if the rule level is too wide; the feature library is overstaffed and seriously affects detection performance and normal service requirements.

There have been many research results aiming at Web attack detection, mainly including off-line analysis and real-time analysis. For offline analysis, adeva et al and almgaren et al propose to discover attack behavior by analyzing Web logs; and the He Pengcheng et al propose and increase network parameter indexes on the basis of log analysis so as to discover attacks. The offline analysis has a large defect, cannot detect and block in real time, and is difficult to meet the real-time protection requirement of the Web site. For the real-time analysis mode, in addition to the traditional feature rule detection method, the machine learning method is also widely researched. Zhang et al propose to use artificial feature extraction and use SVM to train a classifier for detecting Web attacks; vishnu et al predict XSS script attacks using a 3-machine learning algorithm based on na iotave bayes, SVM and J48 decision trees; rathore et al propose an XSS detection tool XSSClasifier that detects XSS attacks on SNS sites using 10 different machine learning algorithms. Fang Yong et al propose SQL injection-based word segmentation vectors and train SQL injection detection models using the deep learning algorithm LSTM. These methods have the problems of single selection of feature vector extraction algorithm, simple use of one or more supervised learning algorithms, and lack of in-depth research for Web attack detection using ensemble learning and reinforcement learning.

In the field of Web attack detection, the learning mode can be deep learning, integrated learning or reinforcement learning, and a mature application system is not formed based on the research of improving the Web attack detection capability. Particularly, in the field of Web attack detection, how to improve the detection accuracy and detection efficiency of Web attacks and reduce false alarms is a new challenge.

In view of this, the prior art is in need of improvement and advancement.

Disclosure of Invention

In order to solve the technical problem, the Application provides a method and a device for constructing an intelligent WAF (Web Application Firewall, network Application protection system) model, which can improve the detection accuracy and detection efficiency of a Web attack model.

In order to achieve the purpose of the present application, the present application provides a method for constructing an intelligent WAF model, comprising:

step A, acquiring training sample data;

b, respectively acquiring different types of feature vectors corresponding to the training sample data by using at least two feature extraction algorithms;

step C, respectively training the base model by using different supervised learning algorithms and different types of feature vectors of training sample data, and performing cross validation;

and D, constructing an intelligent WAF model according to the base model by adopting an integrated learning technology.

In one exemplary embodiment, the step a includes:

preprocessing a Web legal load and a Web attack load to obtain training sample data; the Web legal load and the Web attack load are obtained according to any one or more of the following modes:

acquiring a Web legal load and a Web attack load from network traffic processed by WAF equipment;

acquiring a Web legal load from a prerecorded legal website by utilizing a Web crawler technology;

acquiring a Web attack load from a pre-recorded open source community by utilizing a Web crawler technology;

and acquiring the Web attack load detected in the process of executing the penetration test operation.

In one exemplary embodiment, the step B includes:

carrying out data cleaning operation on the training sample data;

extracting features from multiple angles respectively by utilizing multiple feature extraction algorithms for the cleaned training sample data, and constructing feature vectors of different categories;

respectively carrying out quantization processing on the feature vectors of different categories;

and respectively storing the quantized and coded feature vectors of different categories into different feature files.

In one exemplary embodiment, the step C includes:

and C1, respectively performing steps C1 and C2 on the feature vector of each category of the training sample data:

step C1, respectively training different base models for the feature vectors of the category by adopting at least two supervised learning algorithms, and performing cross validation on at least two base models generated by training to obtain a cross validation result of the base models;

c2, selecting optimal Top-k optimal supervised learning algorithms by utilizing a pre-stored basic model selection strategy according to the cross validation result of the basic model, wherein Top-k is a natural number;

and C1, after the step C1 and the step C2 are finished on the feature vectors of all the categories of the training sample data, recording the category of the feature vector adopted by each base model training and Top-k supervised learning algorithms and cross validation results obtained when the feature vector of each category is trained.

In one exemplary embodiment, the step D includes:

step D1, training the generated base model according to the feature vector type of each base model, the obtained Top-k supervised learning algorithms and the cross validation result to obtain a plurality of base classifiers;

step D2, combining the obtained base classifiers according to the pre-obtained demand information to obtain a high-level classifier;

step D3, training the high-level classifier according to an integration strategy, and performing cross validation on the trained high-level classifier to obtain a cross validation result of the high-level classifier;

and D4, generating an intelligent WAF model according to the cross validation result of the high-level classifier.

In one exemplary embodiment, the step D2 includes:

and combining the obtained base classifiers by adopting a preset integration strategy of a composite pyramid model to obtain a high-level classifier, wherein different supervised learning algorithms used by the high-level classifier all use the same module for processing and comprise at least one of a training sample set acquisition module, a feature extraction module and a detection result judgment module.

In an exemplary embodiment, after the step D, the method further includes:

and detecting network flow by using the intelligent WAF model.

In an exemplary embodiment, after the step D, the method further comprises:

step E, constructing a Web attack killing-free sample capable of resisting the intelligent WAF model to obtain a Web attack killing-free sample training data set;

step F, continuously constructing new training sample data by using the Web attack killing-free sample training data set and the training sample data in the step A; and B, performing the steps C and D based on new training sample data.

In one exemplary embodiment, the step E includes:

acquiring data which is detected and confirmed as a Web attack load as black sample data;

generating a Web attack killing-free sample capable of resisting the intelligent WAF model based on the acquired black sample data by utilizing a reinforcement learning technology;

and obtaining a Web attack killing-free sample training data set.

In an exemplary embodiment, the generating, by using a reinforcement learning technique, web attack killing-free sample data capable of resisting the intelligent WAF model based on the obtained black samples includes:

generating a Web attack load disguised as a Web legal load based on the black sample data by utilizing a reinforcement learning technology;

after the Web attack load is subjected to searching and killing processing operation preset by the intelligent WAF model, determining a corresponding reward value for the processing result of each Web attack load by using a reinforcement learning technology;

recording a reward value corresponding to the Web attack load;

and determining a Web attack killing-free sample from the Web attack load according to the reward value.

An apparatus for constructing an intelligent WAF model, comprising a processor and a memory, wherein the memory stores a computer program, and the processor is configured to call the computer program in the memory to implement any of the above methods.

According to the technical scheme provided by the embodiment of the application, training sample data is obtained; respectively acquiring different types of feature vectors corresponding to the training sample data by utilizing at least two feature extraction algorithms; respectively training the base models by using different supervised learning algorithms and different types of feature vectors of training sample data, and performing cross validation; by adopting an integrated learning technology and constructing and obtaining the intelligent WAF model according to the base model, the detection precision and the detection efficiency of the Web attack behavior can be improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.

Fig. 1 is a flowchart of a method for constructing an intelligent WAF model according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for constructing an intelligent WAF model according to another embodiment of the present disclosure;

FIG. 3 is a data flow diagram illustrating a method of constructing the intelligent WAF model shown in FIG. 2;

FIG. 4 is a schematic diagram of a composite pyramid model provided in an embodiment of the present application;

fig. 5 is a flowchart illustrating a reinforcement learning process of an intelligent WAF model according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a reinforcement learning A3C algorithm of the intelligent WAF model according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of a method for detection by a hybrid learning-based intelligent WAF model according to yet another embodiment of the present application;

fig. 8 is a schematic structural diagram of an apparatus for constructing an intelligent WAF model according to an embodiment of the present disclosure;

fig. 9 is a schematic deployment diagram of a construction apparatus of an intelligent WAF model according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

Fig. 1 is a flowchart of a method for constructing an intelligent WAF model according to an embodiment of the present disclosure. The method shown in fig. 1 comprises:

a, acquiring training sample data;

in one exemplary embodiment, the step a includes:

In the exemplary embodiment, a traditional WAF device can be used to process network traffic and obtain large-scale Web legal loads and Web attack loads; the Web crawler technology can be utilized to obtain large-scale Web legal loads from legal websites and large-scale Web attack loads from open-source communities; the Web attack load used in the penetration test process can be obtained by using various penetration test tools.

In an exemplary embodiment, the step a may further include: and preprocessing the acquired Web legal load and the Web attack load, and constructing a training sample data set.

the feature extraction strategy can include at least one of a syntax tree algorithm, an N-Gram algorithm, a word embedding vector algorithm and an abnormal feature extraction algorithm.

Through various feature extraction strategies, the feature information of the web load can be effectively obtained, and the intelligent WAF model can be conveniently and accurately established.

In one exemplary embodiment, the step B includes:

carrying out data cleaning operation on the training sample data;

In the exemplary embodiment, after the training sample data set is acquired, data cleaning is performed, so that the data quality can be improved.

In the present exemplary embodiment, the feature vectors after quantization coding are stored as different feature files according to feature extraction algorithms of different categories.

Step C, respectively training the base models by using different supervised learning algorithms and different types of feature vectors of the training sample data, and performing cross validation;

the base model can be a preset model trained by a feature vector of training sample data according to a supervised learning algorithm; each supervised learning algorithm respectively corresponds to different base models.

In one exemplary embodiment, the step C includes:

and C1 and C2 are carried out on all types of feature vectors of the training sample data

And then, recording the type of the feature vector adopted by each base model training and Top-k supervised learning algorithms and cross validation results obtained during the feature vector training of each type.

In the exemplary embodiment, data training is performed on the feature vectors of multiple categories of the same training sample data, so that a supervised learning strategy of the feature vectors conforming to the training sample data can be comprehensively found, and the detection efficiency and accuracy are improved.

In one exemplary embodiment, the step D includes:

In the present exemplary embodiment, the base classifier may refer to a classifier used by the base model; a higher level classifier may refer to a classifier that is a combination of base classifiers.

In the exemplary embodiment, the integration strategy can be selected according to the actual design requirement and the test requirement, and the selection of the base classifier can adapt to different requirements, establish the required intelligent WAF model in a targeted manner, improve the generalization capability of the intelligent WAF model, and further improve the detection precision and the detection efficiency.

In one exemplary embodiment, the step D2 includes:

In an exemplary embodiment, after the step D, the method further includes:

and detecting network flow by using the intelligent WAF model.

In the exemplary embodiment, the real-time traffic of the network can be detected, so that the network security is improved and ensured.

In an exemplary embodiment, after the step D, the method further includes:

The embodiment is equivalent to an iterative training intelligent WAF model, can greatly improve the capability of the intelligent WAF for detecting the Web attack killing-free sample, can continuously improve the confrontation capability of the Web attack killing-free sample through a reinforcement learning technology, and is beneficial to improving the network security of a computer system.

In the exemplary embodiment, by obtaining the sample data without checking and killing, a hacker can be effectively prevented from disguising the web attack load as legal data, the checking and killing operation is bypassed, the detection accuracy of the web protection model is improved, and the data security is improved.

In one exemplary embodiment, the step E includes:

generating a Web attack killing-free sample capable of resisting the intelligent WAF model based on the obtained black sample data by utilizing a reinforcement learning technology;

and obtaining a Web attack killing-free sample training data set.

In the exemplary embodiment, the detected Web attack data is used to obtain the deformed attack data, so as to improve the detection capability of the Web detection model and realize the automatic upgrade function.

In an exemplary embodiment, the generating, by using reinforcement learning technology, web attack killing-free sample data capable of resisting the intelligent WAF model based on the obtained black samples includes:

recording a reward value corresponding to the Web attack load;

In the present exemplary embodiment, a reinforcement learning Environment (Environment) and parameters suitable for Web intrusion detection countermeasure can be defined; defining a reinforcement learning State (State) suitable for Web intrusion detection countermeasure; defining an Action (Action) suitable for Web intrusion detection countermeasure; defining reinforcement learning Reward (Reward) applicable to Web intrusion detection countermeasures; defining a reinforcement learning intelligence (Agent) suitable for Web intrusion detection counterwork; and training a reinforcement learning model suitable for Web intrusion detection countermeasures, and performing effect verification.

The exemplary embodiment utilizes a reinforcement learning model of Web intrusion detection countermeasures to generate a killing-free sample data set; continuously constructing a new training sample data set based on the killing-free sample data set and the training sample set obtained in the step A; performing data cleaning and feature extraction on the obtained training sample data, performing data cleaning on a training sample set, and respectively obtaining different types of feature vectors of the sample by utilizing multiple feature extraction algorithms; respectively training the base models by using different types of feature vectors and different learning algorithms by using various supervised learning algorithms, and performing cross validation; an intelligent WAF fusion model is built by utilizing an integrated learning technology, an intelligent WAF detection model is automatically retrained, the continuous building of the intelligent WAF model is realized, and the capability of detecting the Web attack killing-free sample by the intelligent WAF is continuously improved.

In the exemplary embodiment, the reinforcement learning method can continuously generate the Web attack killing-free sample, and further improve the capability of the intelligent WAF to detect the Web attack killing-free sample through continuous construction of the intelligent WAF model.

In an exemplary embodiment, a training sample data set used by a supervised learning algorithm can be acquired by using technologies such as traditional WAF equipment labeling, web crawlers and penetration testing tools; then, carrying out data cleaning on the training sample set, and respectively obtaining different types of feature vectors of the samples by utilizing multiple feature extraction algorithms; respectively training the base models by using different types of feature vectors and different learning algorithms by using various supervised learning algorithms, and performing cross validation; an intelligent WAF fusion model is constructed by utilizing an ensemble learning technology, and the detection precision is improved.

An exemplary embodiment provides an integration strategy based on a composite pyramid model, and detection efficiency is improved; step E, a Web attack killing-free sample capable of bypassing the intelligent WAF is automatically and continuously generated based on the existing black sample by using a reinforcement learning technology, and a Web attack killing-free sample training data set is constructed; and iteratively training a new intelligent WAF detection model by using the generated Web attack killing-free sample training data set, thereby continuously improving the capability of the intelligent WAF for detecting the Web attack killing-free sample.

According to the method provided by the application, training sample data is obtained; respectively acquiring different types of feature vectors corresponding to the training sample data by utilizing at least two feature extraction algorithms; respectively training the base models by using different supervised learning algorithms and different types of feature vectors of training sample data, and performing cross validation; by adopting an integrated learning technology and constructing and obtaining the intelligent WAF model according to the base model, the detection precision and the detection efficiency of the Web attack behavior can be improved.

The methods provided in this application are further described below:

fig. 2 is a schematic flow chart of a method for constructing an intelligent WAF model according to another embodiment of the present application. As shown in fig. 2, the method for constructing an intelligent WAF model according to this embodiment includes the following processing steps:

step S101: obtaining a sample;

the method comprises the following steps of acquiring a training sample set used by a supervised learning algorithm by utilizing the technologies of traditional WAF (Web Application Firewall, web Application protection system) equipment labeling, web crawlers, penetration testing tools and the like;

fig. 3 is a schematic data flow diagram of the method for constructing an intelligent WAF model according to this embodiment.

In an exemplary embodiment, step S101 may be processed in the following manner, including:

step S1011, using several WAF devices, may come from different security providers, using differentiated Web attack detection techniques. The devices can use a low false alarm mode during rule configuration, false alarms can be further reduced through voting of detection results among the devices, and the detection results can be simply filtered to expand a training data set. The aim of acquiring large-scale Web legal load and Web attack load is fulfilled by acquiring training data at a local point with larger Web flow;

step S1012, for an authoritative legitimate website, such as Alex-Top50, the probability of SQL injection vulnerability and XSS is low, and a large-scale Web legitimate load sample can be obtained from the authoritative legitimate website by using a Web crawler technology. In an open source community or a security site, such as gitubb, a large number of users upload and share SQL injection samples and XSS samples, and large-scale Web attack load samples can be obtained from the sample sharing sites by using a Web crawler technology;

step S1013, in the process of performing the penetration test on the Web server, the penetration test tool automatically uses different Web attack loads to perform the exploit test on the Web server, and determines whether the exploit is successful or not by analyzing the return packet, that is, whether a corresponding exploit exists in the target Web server or not. For the existing Web penetration testing tool set, respectively carrying out vulnerability utilization testing on a target Web server, analyzing Web attack flow through a bypass flow analysis engine, and extracting a Web attack load sample;

in step S1014, the training sample raw data set provided in step S1011, step S1012 and step S1013 is obtained, including the Web legal load and the Web attack load. Then, preprocessing a training sample data set, including filtering repeated samples, performing code conversion (converting codes such as Base64 and hexadecimal codes into ASCII codes), deleting abnormal samples, removing unique attributes (the attributes can not describe the distribution rule of the samples), and the like, and constructing the training data set;

in the Web attack process, an attacker can change character features of an injected statement through coding, so that the character features cannot be matched by a blacklist during detection, and the coded statement can be normally executed after being analyzed by a server. The most commonly used encoding techniques are URL encoding, UTF8 encoding, base64 encoding, hexadecimal encoding, char encoding, etc., and these encodings may be used in combination.

Step S102: data cleaning and feature extraction, namely cleaning data of a training sample set, and respectively acquiring different types of feature vectors of the samples by utilizing various feature extraction algorithms including a syntax tree algorithm, an N-Gram algorithm, a word embedding vector algorithm, an abnormal feature extraction algorithm and the like;

in an exemplary embodiment, step S102 may specifically include the following processing:

and step S1021, performing data cleaning on the training data set, wherein the data cleaning comprises the operations of processing missing values, attribute coding, data standardization and regularization, feature selection, principal component analysis and the like. The data cleaning process is closely related to the feature extraction algorithm, and the process and means of data cleaning of different types of feature extraction algorithms are greatly different;

in an exemplary embodiment, step S1021 may specifically include the following processing:

in step S10211, missing value processing may mainly include three methods including:

directly using features containing missing values;

deleting features that contain missing values, wherein the method is effective when an attribute that contains a missing value contains a large number of missing values and only a very small number of valid values;

the method comprises the steps of missing value completion, wherein the missing value completion method comprises mean value interpolation, homogeneous mean value interpolation, modeling prediction, high-dimensional mapping, multiple interpolation, maximum likelihood estimation, compressed sensing, matrix completion and the like;

step S10212, feature coding, inputting feature of the machine learning model, requiring a digital feature vector, performing corresponding quantization coding on various special feature values, and describing the technique by taking feature binary coding, unique hot coding and word embedding vector coding as examples:

(1) The method comprises the following steps of (1) feature binarization, wherein in the process of feature binarization, a numerical attribute is converted into a Boolean attribute, and a threshold value is set as a separation point for dividing the attribute value into 0 and 1;

(2) One-Hot Encoding, employs an N-bit state register to encode N possible values, each state being represented by an independent register and only One of which is active at any time. Advantages of one-hot encoding: the ability to handle non-numeric attributes; features are expanded to a certain extent; the coded attributes are sparse, with a large number of zero components;

(3) The one-hot coding representation mode is very intuitive, but has two disadvantages, firstly, the length of each dimension of the matrix is the length of a dictionary, which wastes space and is not beneficial to calculation. Second, the unique hot coding matrix is equivalent to simply numbering each word, but the relationship between words is completely hidden. Word-embedded vector coding solves both problems. The word embedding vector coding matrix assigns a vector representation with a fixed length to each word, and the length can be set by self and is far shorter than the length of a dictionary in practical length. And the angle between two word vectors can be used as a measure of the proximity between them.

Step S10213, data normalization and regularization of the special case vector:

where data normalization may refer to scaling the attributes of a sample to some specified range. This is because some algorithms require the sample to have zero mean and unit variance, so it is necessary to eliminate the influence when different attributes of the sample have different magnitudes, and the difference in magnitude will cause the attribute with larger magnitude to occupy dominant position, and further cause the iterative convergence speed to slow down. Moreover, the sample distance dependent algorithm is very sensitive to the magnitude of the data;

where data regularization may refer to scaling a certain norm of a sample (e.g., L1 norm) to 1.

Step S10214, feature selection, namely feature selection, wherein the process of selecting a relevant feature subset from a given feature set is called feature selection, so that dimension disasters are reduced, and difficulty of learning tasks is lowered.

Step S10215, reducing dimensions, and when the number of features is large, the problems of large calculation amount and long training time are caused. Moreover, when the number of features is greater than the number of samples, each sample has its own uniqueness, and the sample points are more dispersed in the high-dimensional space, requiring a reduction in the feature matrix dimensions to avoid overfitting. Common dimension reduction methods include Linear Discriminant Analysis (LDA), principal Component Analysis (PCA) and other algorithms.

Step S1022, respectively acquiring different types of feature vectors of the sample by using various feature extraction algorithms including a syntax tree algorithm, an abnormal feature extraction algorithm, a word embedding vector algorithm, an N-Gram algorithm and the like;

in an exemplary embodiment, step S1022 may specifically include the following processing:

step S10221, an AST (Abstract Syntax Tree) Syntax Tree algorithm, the generating process is as follows:

(1) Acquiring the characteristics of an SQL statement, and firstly analyzing the SQL statement, including lexical analysis and syntactic analysis to obtain a syntax tree of the SQL;

(2) Then, traversing the SQL syntax tree, cutting to change the SQL syntax tree into a standard syntax tree, and obtaining the main structure of the SQL statement. The cutting process mainly replaces the user input parts such as numbers, character strings and the like in the SQL sentence by the characters of the characteristics and deletes some useless nodes;

(3) Secondly, feature extraction is carried out on the SQL standard syntax tree, and in the extraction process, in order to reduce vector space, numbers, characters and the like need to be generalized;

(4) Finally, according to the word set model, performing word segmentation on the SQL sentence, after word segmentation, counting the frequency of each word, selecting all or part of the words as hash table key values according to needs, and numbering the hash tables in sequence, thereby obtaining the codes of the hash tables on the character strings.

The formalization of the above process is described as follows: and extracting mode features according to the standard syntax tree, wherein f (K) is a feature vector of the kth SQL statement, and f (K) = { f _ (K _1,) f _ (K _2, \8230;) f _ (K _ L) } (1 ≦ K ≦ K). Wherein K is the total number of the training samples of the attack, L is the number of the extracted features, f _ (K _ L) is the L-th feature vector of the K-th SQL statement, and L is more than or equal to 1 and less than or equal to L. To reduce temporal and spatial complexity, the above eigenvectors are generalized s (k) = { s _ (k _1,) s _ (k _2, \8230;) s _ (k _ L) } according to the word segmentation principle. Wherein s _ (k _ l) is a generalization symbol corresponding to the l-th vector of the feature vector of the k-th SQL statement. To utilize the HASH algorithm, a HASH calculation is performed on s (k), i.e., H (k) = HASH (s (k)). Wherein, H (k) is the serialization coding of the kth SQL statement.

Step S10222, an abnormal feature extraction algorithm, which is mainly to perform data statistics on the sample, extract a feature set with a large discrimination of black and white samples, and map the feature set into a space vector. The feature set comprises whether the typical Web attack load keywords exist or not, the percentage of numeric characters in the sample, the percentage of capital characters, the percentage of truncated characters, the percentage of special characters and the like.

Feature sets can be basically divided into two main categories:

(1) Basic feature extraction: extracting the keywords in the sample according to the principle of extracting the keywords according to the importance degree and the number of the keywords. The content of the description of the patent is Web attack load, and the keywords of the Web attack load are common SQL statements, for example, the query keywords comprise unity, select, order by, group by and the like;

(2) Deformation feature extraction: if the common keywords of the Web attack load exist in the sample, the characteristic value is 1, and if the common keywords of the Web attack load do not exist, the characteristic value is 0. However, the access text is usually a string of character strings, and the content of the character strings has great difference due to different website designs, and the access text needs to be subjected to word segmentation, and the word segmentation is performed by using 3 types of characters, which are respectively: spaces, "/", "&".

According to the basic characteristic and the deformation characteristic extraction result of the Web attack load, the statistical content of the extraction result comprises the following steps:

(1) Converting the capital and the small of partial characters in the query statement according to the proportion of the capital characters in the character string and the deformation attack in the Web attack load so as to avoid detection;

(2) The proportion of space characters in the character string is mainly used for attacking the space characters;

(3) The proportion of the special characters in the character string is mainly closed truncated characters, and common closed truncated characters include { } "," [ ] "," = ","? "#" "/", etc. Mainly aims at the deformation attack of inline annotation sequences and truncation characters;

(4) The proportion of the digital characters in the character string is mainly used for dynamic query deformation attack;

(5) The proportion of the special prefix characters in the character string, such as $ #, \ x, \\ u,% and the like.

In step S10223, a word embedding vector algorithm converts the text into words, and then each word is converted into a vector representation of fixed length, thereby facilitating learning.

The main carrier of the Web load is an accessed character string, the character string is regarded as a text, and an embedded word vector model can be selected to establish a semantic model, so that a machine can understand HTML languages such as < script > and alert (). And taking the word with the most occurrence times in the white sample to form a vocabulary table, marking other words as 'UKN', modeling by using word2vec function classes, and taking 128 dimensions as word space dimensions.

Wherein, word2Vec is a tool for Google to convert natural language into computer understandable feature vectors in 2013. Word2Vec mainly includes CBOW (Continuous Bag-Of-Words) and Skip-Gram. In the CBOW model training process, word vectors corresponding to words which are related to a certain word context are input, and the word vectors of the words are output; and the Skip-Gram is opposite to the word vector of the specific word, and the context word vector of the specific word is output. The data after word segmentation is used as training data of a text vector, a word vector model is obtained through training, words can be converted into vectors which can be understood by a computer through the model, and for example, the words select are converted as follows: [5.52, -2.44, -0.998, -1.69,1.88,2.89,0.905, -1.36, -1.84,0.59, -3.93,1.415, -0.035, -7.43, -0.683, -4.07].

In step S1023, a feature vector is acquired and quantization processing is performed thereon. In step S1022, the mentioned one-hot encoding and word embedding vectors are all one of quantization methods. Generally, the selection of the quantization processing algorithm is closely related to the feature extraction algorithm;

and step S1024, in order to train different base models, an integrated learning and voting mechanism is further utilized to complete the construction task of the intelligent WAF. In the step, the quantized and coded feature vectors are respectively stored into different feature files by using different types of feature extraction algorithms according to input data required by different models.

Step S103: training and selecting a base model, respectively training the base model by using different types of feature vectors and different learning algorithms by using various supervised learning algorithms such as SVM, random forest, LSTM, CNN and the like, and performing cross validation;

different base models have different generalization ability for different classes of feature vectors. The purpose of the fusion model is to combine the advantages of different base models to achieve the purpose of complementation. In short, if there are independent and complementary features between the base models, then a better model can be obtained by correctly fusing multiple base models.

In an exemplary embodiment, step S103 specifically includes the following processing:

step S1031, selecting a feature vector file and obtaining a feature vector set, respectively training different base models by using a plurality of different supervised learning algorithms, and respectively performing cross validation;

step S1032, combining the cross validation result, selecting optimal Top-k optimal supervised learning algorithms through a base model selection strategy, wherein Top-k is a natural number;

to achieve good fusion, the selection strategy of the base model is described as follows:

(1) Accuracy (accuracycacy): most base models should be at least more accurate than random guessing to help the final output, rather than helping it. Therefore, the accuracy of the base model is important for the fusion model;

(2) Diversity (diversity): the base models need to be different from each other, and the (highly correlated) homogeneous base models cannot be complemented. Therefore, if the basic model is highly homogeneous and is not complementary, the fusion model has no meaning and is simply repeated, and the operation cost is increased unnecessarily;

(3) In the process of simultaneously improving the accuracy and the diversity, after reaching a certain degree, the improvement of the accuracy can influence the diversity, and vice versa. Therefore, the accuracy and diversity need to be chosen or chosen according to the service conditions to achieve the optimal effect of the intelligent WAF fusion model.

And step S1033, replacing the feature vector categories, executing steps S1031 and S1032 in a circulating mode, and acquiring optimal Top-k optimal supervised learning algorithms aiming at the feature vectors of different categories, wherein Top-k is a natural number, so as to construct different base models.

And S1034, outputting the feature vector types and the optimal Top-k optimal supervised learning algorithms, wherein the Top-k is information such as natural numbers and cross validation results thereof.

Step S104: an intelligent WAF fusion model is constructed by utilizing an ensemble learning technology, and the detection precision is improved. Meanwhile, an integration strategy based on the composite pyramid model is provided, and the detection efficiency is improved;

in an exemplary embodiment, step S104 specifically includes the following processing:

step S1041, obtaining the output result of step S103, including outputting the feature vector type, and the Top-k optimal supervised learning algorithms, where Top-k is the information such as natural number and its cross validation result. Then, training a plurality of base classifiers according to the information;

step S1042, according to the service requirement condition of the intelligent WAF, an integration strategy (or called as a fusion strategy) is formulated, and according to the integration strategy base classifier, combination is carried out and the cross validation result and generalization capability are tested. An example integration strategy is described below:

the training data set is described below, where X ∈ R ^ (n × d), there are n Web load samples, and each Web load sample has d-dimensional features. Based on the data set, the selected basic model set is utilized to construct a strategy example as follows:

(1) And selecting different types of classifiers, such as logistic regression, k-nearest neighbor, SVM, random forest, LSTM and CNN, and training on the whole X set. The diversity of the model depends on the different expression capacities of the basic model for the data, namely the difference between the logistic regression, the k neighbor and the support vector machine, and the assumption and the expression extraction capacities of the basic model for the data are different;

(2) And selecting the same type of classifier, and training the whole X set by using different parameters, such as different k values for k neighbors. The diversity of the model is determined by the difference between different hyper-parameters on the same model. From a certain point of view, also "classifiers of different types" can be understood. In this case, it is not difficult to see that the difference in parameters is much smaller than the difference between models. For example, k =1 may not differ much from k =3, because the assumptions for the data are the same for the same model;

(3) And selecting the same classifier and training on different training sets. For example, X is divided into m Web load samples (there may be duplicates), and then the base model is trained on the m Web load samples individually. It is readily apparent that the variability in this case is derived from the variability of the data. It is easy to find that if n is small (small sample) and it is desired to divide m sub-samples (using sampling), the sub-samples are highly repetitive and thus the diversity is compromised. However, if each subsample is very small, the model cannot learn data, the performance is very poor, and the accuracy is very low;

(4) Training is performed on different classes of feature vectors using the same classifier. The training data set comprises m types of different feature vectors, and m base models can be constructed for training. The diversity comes from the difference of data, and the models have different generalization errors in different hyperspaces, and finally, the models are fused to obtain a complementary effect;

(5) And obtaining an optimal result by using a plurality of integrated classifiers in a voting way. Training the base classifier on the feature vectors of different classes, then training different high-level integrated classifiers according to the integration strategy, and finally selecting the voting strategy to complete the fusion of the high-level integrated classifiers.

For accurate illustration, the present embodiment illustrates the principle by 5 models fusion strategies, and in fact, they can be used in combination, such as: policy (1) and policy (3), policy (1) and policy (4), policy (3) and policy (4), and the like. There are many combinations of model fusion, no single correct method, usually multiple tests, and the goal is to try to keep the balance between "accuracy" and "diversity". It is to be understood that the present application is not limited to the above 5 model fusion strategies, and thus other model fusion strategies may be used while maintaining the spirit of the present application.

Step S1043, in the field of Web intrusion detection, a real-time analysis of a large flow rate is required, so that a high requirement is imposed on the efficiency of model fusion. In order to meet the requirements of both detection efficiency and precision for model fusion, an integration strategy based on a composite pyramid model is provided, the composite pyramid model is shown in fig. 4, and a pyramid from the bottom layer of the composite pyramid model to the top layer of the composite pyramid model comprises 4 layers, namely a data layer, a feature extraction layer, a model fusion layer and a decision layer. The data layer has the main function of acquiring a training data set; the feature extraction layer has the main function of feature engineering, and processes the training data set by using various feature extraction algorithms to obtain a feature vector set; the model fusion layer comprises a plurality of supervised learning models (or integrated learning models) with higher processing efficiency, and performs integrated learning and detection by utilizing a fusion strategy; and the decision layer comprises one or more supervision learning models (or ensemble learning models) with lower processing efficiency, and the ensemble learning and detection are carried out by utilizing the fusion strategy. In the detection process, a large number of Web legal load samples are processed and filtered by the efficient model fusion layer, and a small number of suspected Web load samples are sent to the decision layer with relatively low efficiency for final judgment. Through the strategic design of the composite pyramid model, the model fusion meets the requirements of detection efficiency and precision to a certain extent;

s1044, selecting an optimal fusion model to generate an intelligent WAF detection model according to the cross validation result of the high-level classifier and the model generalization capability of the current network of the user due to the fact that different integration strategies have large effect difference;

step S105: a problem often faced by hackers when hacking into a WAF-protected website is bypassing, and usually hackers will use a common bypassing method based on experience to continuously adjust for the blocking situation of the WAF. By means of a reinforcement learning technology, a bypassing strategy of a hacker is simulated, a mode of bypassing the existing WAF equipment is automatically found, further, web attack killing-free samples capable of bypassing the intelligent WAF are automatically and continuously generated by the aid of the existing Web attack load samples, a Web attack killing-free sample training data set is constructed, and a new intelligent WAF detection model is iteratively trained by the aid of the killing-free sample training data set through continuous construction of the intelligent WAF. Therefore, the detection capability of the intelligent WAF is continuously improved.

The reinforcement learning process of the intelligent WAF model is shown in FIG. 5;

in an exemplary embodiment, step S105 specifically includes the following processing:

step S1051, initializing a reinforcement learning Environment suitable for Web intrusion detection countermeasure, including Environment (Environment), agent (Agent), action (Action), status (Status), and Reward (Reward). The method comprises a single-intelligence (Agent) reinforcement learning algorithm and a multi-intelligence (Agent) reinforcement learning algorithm in a reinforcement learning algorithm implementation link. For accurate illustration, the present embodiment illustrates the principle by a multi-Agent reinforcement learning algorithm. It should be understood that the present application is not limited to a multi-Agent reinforcement learning algorithm, and thus other reinforcement learning algorithms may be used while maintaining the spirit of the present application;

step S1052, the method is applied to an Environment for reinforcement learning (Environment) of Web intrusion detection countermeasure and related parameters.

Environment (Environment), environment will receive a series of actions (Action) executed by the intelligence (Agent), and evaluate the quality of the series of actions, and convert the series of actions into a quantifiable Reward (Reward) to be fed back to the intelligence (Agent), but not tell the intelligence (Agent) how to learn the actions. An Agent can only learn by its past experience. At the same time, the environment also provides State (State) information as intelligence (Agent) does. The environment refers to an intelligent WAF environment, and in order to realize the intelligent WAF environment class quickly, an OpenAI Gym framework can be used, and a training sample loading function, an action conversion table defining function, an action execution function defining function, an environment resetting function and the like are defined;

step S1053, a State of reinforcement learning (State) applicable to the Web intrusion detection countermeasure refers to the environment information where the Agent is located, and includes all information used by the Agent to select an Action (Action), which is a function of History (History). The State (State) is a 2-dimensional vector, [ kill, kill-exempt ], which can be coded as "01" and "10". The method provided by the application can obtain whether the current sample is exempted from killing or not from the current state, namely, the intelligent WAF detection is successfully bypassed.

Step S1054 is applicable to reinforcement learning Action (Action) of Web intrusion detection countermeasure, and in the reinforcement learning field, the Action may be continuous or discrete. The Action (Action) refers to an Action list from a bypass intelligent WAF detection component, such as codes (16-system codes and 10-system codes), notes, carriage returns, TABs, case confusion and the like, the Action list comprises 69 types of bypass operations and a null operation, and the size of the Action space is related to the number of the bypass operations and can be expanded according to actual conditions. It is to be appreciated that the present application is not limited to 69 bypass operations, and thus other operations to bypass the intelligent WAF can be extended and the bypass operations can be used in combination while maintaining the spirit of the present application.

Step S1055, applying to reinforcement learning Reward (Reward) of Web intrusion detection countermeasure, providing a quantifiable scalar feedback signal to the Agent by the Environment (Environment), and evaluating the quality of the Action (Action) performed by the Agent at a certain Time Step (Time Step). Reinforcement learning is based on a maximum jackpot assumption: in reinforcement learning, the goal of an Agent in a series of Action choices is to maximize the cumulative reward in the future. Examples of rewards (rewarded) described in this patent are as follows:

smart WAF success, forward reward (10);

smart WAF failure, reward (0);

it should be appreciated that the present application is not limited to this example reward pattern and, thus, other reward functions may be defined as well, while maintaining the spirit of the present application.

And step S1056, the method is suitable for an Agent for reinforcement learning of Web intrusion detection countermeasures, wherein the Agent is the core of the whole reinforcement learning system. It is able to sense the State of the environment (State) and to maximize the long-term Reward value by learning to select an appropriate Action (Action) based on the Reward (Reward) signal provided by the environment. In short, the wisdom is to learn a series of environment state to action mappings based on the rewards offered by the environment as feedback, and the action selection principle is to maximize the probability of reward accumulated in the future. The selected action affects not only the reward at the current moment but also the reward at the next moment or even in the future, so the basic rules of intelligence in the learning process are: if an action brings a positive reward to the environment, the action is strengthened, otherwise the action is gradually weakened. The intellectualization is an intelligent WAF detection model, the characteristic vector of the Web load sample is extracted to the intelligent WAF detection model through a characteristic extraction module, and the detection result of the model is obtained;

in this embodiment, the A3C algorithm is selected to be packaged into Agent, and the defined key attributes include state _ size (state space size), action _ size (action space size), and the like. The A3C algorithm completely realizes an Actor-Critic framework, introduces an asynchronous training strategy to break data correlation, and greatly accelerates the training speed while improving the performance. As shown in fig. 6, the reinforcement learning A3C algorithm of the intelligent WAF model of this embodiment is implemented in a distributed learning manner, where a neural network receives a current state as an input signal of an input node, and the input signal is propagated through each node of a hidden layer, then transmitted to an output node, and finally associated with an action list bypassing detection. At the same time, the weight w between the nodes can determine to which output node the input signal is transmitted. Wherein each node has a threshold value, and when the sum of the inputted information and the sum of the weights exceed the threshold value, the input signal is transmitted to the next node. Therefore, there is a need to optimize the weights before the nodes so that the optimal bypass behavior can be selected for the current state. In the implementation process of the A3C algorithm, an intelligence is also called a 'working thread', a 'parameter server' can share the learning experience of a plurality of 'working threads', and the 'parameter server' and the 'working thread' have the same neuron network structure. During the learning process, each "worker thread" is able to store in memory the experience obtained from the learning, once the number of experiences exceeds a certain threshold, calculate its own gradient parameter Δ w = grad, update the weight w, and push the weight w to the "parameter server". The "parameter server" then updates the weight w on the other "worker threads" with the new weight w. This process is repeated until the learning termination condition is satisfied.

The specific steps of the learning process are described as follows:

(1) The initialization weight w of the parameter server is a random value;

(2) "parameter Server" synchronizes weights w to multiple "work threads";

(3) Using a neural network, the "worker thread" selects an appropriate bypass Action (Action) according to the current input State (State);

(4) The State (State) is changed to the next State (State) due to the change of the bypass Action (Action), and then the 'work thread' can obtain the Reward (Reward) corresponding to the Action (Action);

(5) The 'working thread' stores the acquired learning experience in a memory, and comprises a current State (State), an Action (Action), a Reward (Reward) and a next State (State);

(6) The 'working thread' repeats the step (3), the step (4) and the step (5);

(7) For a designated 'work thread', when the experience quantity in the memory exceeds a certain threshold value, calculating a network gradient parameter Δ w = grad by using the experience in the memory including a current State (State), an Action (Action), a Reward (Reward) and a next State (State);

(8) The 'working thread' pushes a network gradient parameter (delta w = grad) to a 'parameter server';

(9) The 'parameter server' updates the network weight w by using the network gradient parameter (delta w = grad) from the 'working thread';

(10) And (5) returning to the step (2) and circulating.

Through the steps, the A3C algorithm can asynchronously learn by using a plurality of wisdom bodies (working threads), and learning experience is shared, so that the learning progress is faster.

And step S1057, training and testing. Training steps are set, training is started and 10 cross-validation is performed. And persisting the log and the trained model. Because the training takes longer time, a checkpoint point needs to be set, so that the learning can be continued after the learning is interrupted abnormally;

step S106: continuously constructing an intelligent WAF by using the generated Web attack killing-free sample training data set, and iteratively training a new intelligent WAF detection model so as to continuously improve the capability of the intelligent WAF for detecting the Web attack killing-free sample;

in an exemplary embodiment, step S106 specifically includes the following processing:

step S1061, generating a killing-free sample data set by using a Web intrusion detection counterattack reinforcement learning model;

step S1062, continuously constructing a new training sample data set based on the killing-free sample data set and the training sample set obtained in the step S101;

and step S1063, automatically retraining the intelligent WAF detection model based on the step S1061 and the step S1062, realizing continuous construction of the intelligent WAF model, and further continuously improving the capability of the intelligent WAF for detecting the Web attack killing-free sample.

Yet another embodiment of the present application provides a method for detection by a hybrid learning-based intelligent WAF model. Fig. 7 is a schematic flowchart of a method for performing detection by using an intelligent WAF model based on hybrid learning according to this embodiment. As shown in fig. 7, the detection flow includes the following processing steps:

step S201, utilizing a flow analysis device to analyze network flow in real time and extract session metadata of a network 7 layer;

step S202, acquiring session metadata information of a 7-layer network, and selecting HTTP session metadata from the session metadata information, wherein the HTTP session metadata comprises Web load information such as URL (uniform resource locator), POST (POST) and the like;

step S203, acquiring a feature vector of the Web load by using the feature extraction algorithm provided in the step S102;

and step S204, guiding the intelligent WAF detection model, predicting the characteristic vector of the Web load, and judging whether the Web load is a Web attack load or a Web legal load. If the load is the Web attack load, judging the type of XSS attack, SQL injection attack and the like;

step S205, outputting a detection result, which includes five tuple information (source IP, target IP, source port, target port, and protocol), an attack type, an attack payload, and an attack return payload. Wherein, the attack return load can further use the rule to judge whether the attack is successful;

in summary, with the aid of the technical solution of the embodiment of the present application, a black and white sample set used by a supervised learning algorithm can be obtained by using the technologies of the traditional WAF device labeling, the web crawler, the penetration testing tool, and the like, and then different types of feature vectors of the samples are obtained by using multiple feature extraction algorithms. And training the base models respectively by using different types of feature vectors and different learning algorithms by using various supervised learning algorithms, and constructing the intelligent WAF model by using an integrated learning technology. And finally, automatically and continuously generating a Web attack killing-free sample capable of bypassing the intelligent WAF based on the existing black sample by using a reinforcement learning technology, constructing a Web attack killing-free sample training data set, continuously constructing through the intelligent WAF, and iteratively training a new intelligent WAF detection model by using the killing-free sample training data set. Therefore, the detection precision and the detection efficiency of the intelligent WAF are greatly improved, and the capability of the intelligent WAF for detecting the Web attack killing-free sample is enhanced. Meanwhile, the method solves different problems in the construction process of the intelligent WAF detection model by using a hybrid learning technology, identifies Web legal flow and attack flow and specific types of the attack flow by using supervised learning, improves the detection precision and detection efficiency of the intelligent WAF by using integrated learning including SQL injection attack and XSS attack, and processes continuous and complex states in the sample killing-free operation process by using reinforcement learning.

In summary, the present application relates to an intelligent WAF construction method and apparatus based on hybrid learning, in which techniques such as traditional WAF device labeling, web crawlers, penetration testing tools, and the like are used to obtain a black-and-white sample set used by a supervised learning algorithm, and then a plurality of feature extraction algorithms are used to obtain different types of feature vectors of the samples respectively. And training the base models respectively by using different types of feature vectors and different learning algorithms by using various supervised learning algorithms, and constructing the intelligent WAF model by using an integrated learning technology. And finally, automatically and continuously generating a Web attack killing-free sample capable of bypassing the intelligent WAF based on the existing black sample by using a reinforcement learning technology, constructing a Web attack killing-free sample training data set, continuously constructing through the intelligent WAF, and iteratively training a new intelligent WAF detection model by using the killing-free sample training data set. Therefore, the detection precision and the detection efficiency of the intelligent WAF are greatly improved, and the Web attack killing-free sample detection capability of the intelligent WAF is enhanced. Meanwhile, the method solves different problems in the construction process of the intelligent WAF detection model by using a hybrid learning technology, identifies Web legal flow and attack flow and specific types of the attack flow by using supervised learning, improves the detection precision and detection efficiency of the intelligent WAF by using integrated learning including SQL injection attack and XSS attack, and processes continuous and complex states in the sample killing-free operation process by using reinforcement learning.

The application provides an artificial intelligence based web detection device, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for calling the computer program in the processor to realize the method of the embodiment shown in fig. 1 or the method of any one of the exemplary embodiments based on the method of the embodiment shown in fig. 1.

According to an embodiment of the present application, an intelligent WAF device based on hybrid learning is provided. Fig. 8 is a schematic structural diagram of an apparatus for constructing an intelligent WAF model according to an embodiment of the present application. The intelligent WAF model construction device specifically comprises 6 modules, including a sample acquisition module, a feature extraction module, a base model training module, an intelligent WAF model detection module and a reinforcement learning module, wherein the specific modules are described as follows:

a sample acquisition module 301, which acquires a training sample set used by a basic model training module by using a traditional WAF device labeling component, a web crawler component, an infiltration testing tool component, and the like;

a feature extraction module 302, which includes a data cleaning component and a feature extraction component, and is used for cleaning data of the training sample set and respectively acquiring different types of feature vector data of the samples by using multiple feature extraction programs;

a base model training module 303, which comprises a training component and a selection component of the base model, respectively trains the base model by using different types of feature vectors and different learning methods by using a plurality of supervised learning programs, and selects the base model according to a selection strategy;

the intelligent WAF model training module 304 integrates the base model acquired by the base model training module 303 according to an integration strategy by using an integrated learning component to construct an intelligent WAF fusion model, so that the detection precision and the detection efficiency are improved;

and an intelligent WAF model detection module 305, which detects the Web load by using an intelligent WAF fusion model, and determines whether the Web load is a Web attack load or a Web legal load. If the load is the Web attack load, judging the type of XSS attack, SQL injection attack and the like;

a reinforcement learning module 306 that initializes a reinforcement learning Environment including Environment (Environment), agent (Agent), action (Action), status (Status), and Reward (Reward). Meanwhile, a reinforcement learning model is established, network structure parameters and training parameters of the model are set, and a reinforcement learning classifier is trained. And finally, automatically and continuously generating a Web attack killing-free sample capable of bypassing the intelligent WAF by the module based on the existing black sample, and constructing a Web attack killing-free sample training data set. And iteratively training a new intelligent WAF detection model by using the generated Web attack killing-free sample training data set, thereby continuously improving the capability of the intelligent WAF for detecting the Web attack killing-free sample.

According to the embodiment of the application, the device for constructing the intelligent WAF model is also provided. Fig. 9 is a schematic deployment diagram of a device for constructing an intelligent WAF model according to an embodiment of the present application. As shown in fig. 9, the apparatus for constructing an intelligent WAF model according to the embodiment of the present application includes a traffic analysis server, an intelligent WAF server, a supervised learning server, a sample server, and a reinforcement learning server. The server may be software or hardware. The specific functions are described as follows:

the reinforcement learning server 401 automatically and continuously generates a Web attack killing-free sample capable of bypassing the intelligent WAF based on the black sample provided by the sample server by using a reinforcement learning technology, and constructs and outputs a training data set of the Web attack killing-free sample.

And the sample server 402 is responsible for acquiring, processing and storing the Web load sample. The method comprises the steps that a training sample set used by a supervised learning server is obtained by utilizing the technologies of traditional WAF equipment labeling, web crawlers, penetration testing tools, reinforcement learning and the like, and is stored as a feature vector file after being processed and extracted;

and the supervised learning server 403 trains and selects the base model by using the feature vector file acquired by the sample server. Meanwhile, based on the selected base model, an intelligent WAF fusion model is constructed by utilizing an ensemble learning technology, and an intelligent WAF detection model file is output;

and the intelligent WAF server 404 acquires the data output by the traffic analysis server, predicts the characteristic vector of the Web load, and judges whether the Web load is a Web attack load or a Web legal load. If the load is the Web attack load, the types of XSS attack, SQL injection attack and the like are further judged. Finally, outputting a detection result, wherein the detection result comprises information such as quintuple information (source IP, target IP, source port, target port and protocol), web attack load, attack type, web attack return load and the like;

the traffic analysis server 405 analyzes the network traffic in real time, extracts session metadata information of the 7 layers of the network, selects HTTP session metadata including URL, POST and other Web load information, and extracts a Web load feature vector by using a feature extraction method. Finally, outputting the data to be detected, wherein the data to be detected comprises information such as quintuple information (source IP, target IP, source port, target port and protocol), web load characteristic vector, web request return load and the like;

the intelligent WAF device provided by the embodiment of the application can cover enough Web attack scenes in the selected training set, and can discover the hidden association mode among the Web attack loads through the continuous supervised learning and integrated learning processes, and the intelligent WAF device can quickly and accurately detect the Web attack through the association mode. And finally, automatically and continuously generating a Web attack killing-free sample capable of bypassing the intelligent WAF based on the existing black sample by using a reinforcement learning technology, constructing a Web attack killing-free sample training data set, and iteratively training a new intelligent WAF detection model by using the killing-free sample training data set through continuous construction of the intelligent WAF, thereby greatly enhancing the capability of detecting the Web attack killing-free sample by using the intelligent WAF.

In summary, the present application relates to an intelligent WAF construction method and apparatus based on hybrid learning, in which techniques such as traditional WAF device labeling, web crawlers, penetration testing tools, and the like are used to obtain a black-and-white sample set used by a supervised learning algorithm, and then a plurality of feature extraction algorithms are used to obtain different types of feature vectors of the samples respectively. And training the base models respectively by using different types of feature vectors and different learning algorithms by using various supervised learning algorithms, and constructing the intelligent WAF model by using an integrated learning technology. And finally, automatically and continuously generating a Web attack killing-free sample capable of bypassing the intelligent WAF based on the existing black sample by using a reinforcement learning technology, constructing a Web attack killing-free sample training data set, and iteratively training a new intelligent WAF detection model by using the killing-free sample training data set through continuous construction of the intelligent WAF. Therefore, the detection precision and the detection efficiency of the intelligent WAF are greatly improved, and the capability of the intelligent WAF for detecting the Web attack killing-free sample is enhanced. Meanwhile, the method solves different problems in the construction process of the intelligent WAF detection model by using a hybrid learning technology, identifies Web legal flow and attack flow and specific types of the attack flow by using supervised learning, improves the detection precision and detection efficiency of the intelligent WAF by using integrated learning including SQL injection attack and XSS attack, and processes continuous and complex states in the sample killing-free operation process by using reinforcement learning.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A construction method of an intelligent network application protection system WAF model is characterized by comprising the following steps:

step A, acquiring training sample data;

step D, an integrated learning technology is adopted, and an intelligent WAF model is constructed according to the base model;

wherein the different category feature vectors include at least one of:

the proportion of the capital characters in the character string is used for judging the behavior of avoiding detection by converting the capital and the small of partial characters in the query sentence;

the proportion of space characters in the character string is used for detecting the null character attack;

the proportion of the closed truncated characters in the character string is used for detecting the inline annotation sequence and the truncated character deformation attack;

the proportion of the digital characters in the character string is used for detecting the dynamic query deformation attack;

the method comprises the following steps of (1) accounting for the preset prefix character in a character string, wherein the prefix character comprises at least one of $ #, \\ x, \ u and%;

the step C comprises the following steps:

after the step C1 and the step C2 are completed on the feature vectors of all categories of the training sample data, recording the category of the feature vector adopted by each base model training and Top-k supervised learning algorithms and cross validation results obtained when the feature vector of each category is trained; the step D comprises the following steps:

2. The method of claim 1, wherein step a comprises:

3. The method of claim 1, wherein step B comprises:

carrying out data cleaning operation on the training sample data;

4. The method according to claim 1, wherein said step D2 comprises:

and combining the obtained base classifiers by adopting a preset integration strategy of a composite pyramid model to obtain a high-level classifier, wherein different supervised learning algorithms used by the high-level classifier are processed by using the same module and comprise at least one of a training sample set acquisition module, a feature extraction module and a detection result judgment module.

5. The method of claim 1, wherein after step D, the method further comprises:

and detecting network flow by using the intelligent WAF model.

6. The method according to any one of claims 1 to 5, wherein after step D, the method further comprises:

7. The method of claim 6, wherein step E comprises:

and obtaining a Web attack killing-free sample training data set.

8. The method according to claim 7, wherein the generating Web attack killing-free sample data capable of resisting the intelligent WAF model based on the obtained black samples by utilizing reinforcement learning technology comprises:

recording a reward value corresponding to the Web attack load;

9. An apparatus for constructing an intelligent web application protection system (WAF) model, comprising a processor and a memory, wherein the memory stores a computer program, and the processor is configured to call the computer program in the memory to implement the method according to any one of claims 1 to 8.