CN111526136A

CN111526136A - Malicious attack detection method, system, device and medium based on cloud WAF

Info

Publication number: CN111526136A
Application number: CN202010294600.9A
Authority: CN
Inventors: 刘庭辉
Original assignee: Ucloud Technology Co ltd
Current assignee: Ucloud Technology Co ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-08-11

Abstract

The invention relates to the field of network security, in particular to a malicious attack detection method, a malicious attack detection system, malicious attack detection equipment and a malicious attack detection medium based on cloud WAF. The malicious attack detection method based on the cloud WAF comprises the following steps: after receiving an access request to a service line source station, performing parameter extraction on the access request; judging whether the access request is abnormal or not based on the HMM model; if the access request is abnormal, judging whether the access request is an attack or not based on the SVM classifier; and if the attack is the attack, intercepting the access request. The HMM model is organically combined with the continuous dynamic signal processing capability and the excellent time sequence modeling capability of the HMM model and the extremely strong two-classification capability of the SVM classifier under the condition of small samples, and a retraining mechanism is added, so that the model attenuation is effectively slowed down, the robustness is good, the behavior of bypassing the detection can be effectively prevented, the high accuracy rate is ensured, and the HMM model has the advantages of lower missing report rate and false report rate and cost saving.

Description

Malicious attack detection method, system, device and medium based on cloud WAF

Technical Field

The invention relates to the field of network security, in particular to a malicious attack detection method, a malicious attack detection system, malicious attack detection equipment and a malicious attack detection medium based on cloud WAF.

Background

With the rapid development of internet technology, Web has been widely applied to enterprise informatization, electronic commerce, electronic government affairs and the like, and the network brings convenience and high efficiency to people and has increasingly prominent network security problem; attacks and damages to the Web are increasing continuously, and high statistical data show that 75% of attacks are directed at Web application; however, many enterprises have not been fully realized and prepared, and many developers have no experience, which gives hackers a chance to take advantage of; once the attacks are successful, great harm such as information leakage, illegal account transfer, server paralysis and the like is caused to individuals or enterprises, so that how to quickly and accurately detect the potential Web malicious attacks becomes a very important topic in the field of network security.

For this problem, the solution commonly used in the prior art can be roughly divided as follows: the data sources are classified into content-based and behavior-based; the method is divided into characteristic code-based and rule-based according to the detection method. However, the existing detection methods generally have the disadvantages of requiring continuous updating of the rule base and being easily bypassed by attackers, such as character string feature filtering method, and the most common bypassing method for hackers is to use/, >, and segment keywords such as sel < > ect; sel/. star/ect, etc., so that the detection method is prone to false negative problems.

Patent application No. CN201811206594.6, "machine learning-based WAF normal flow modeling method and apparatus", the method mainly includes: calculating a target switching probability based on characters in the target URL; and establishing a WAF normal flow model by combining the characters in the target URL and the target switching probability through a machine learning algorithm. Chinese patent with patent application number CN201910551406.1, "a method and a terminal for implementing a Web application vulnerability detection rule engine", the method includes: sending a vulnerability scanning http request constructed based on the rule statement description to a Web application server; receiving a response result of the Web application server to the vulnerability scanning http request; and matching the response result to obtain a matching result based on vulnerability scanning strategy information in the rule. The chinese patent with patent application number CN201910924052.0, "a method for filtering SQL injection attacks," provides a filtering type detection method by setting judgment conditions such as an IP blacklist, keyword analysis, request mode, and the like. The two methods both apply the technical means of rule protection, feature codes and the like, and have the problems of high missing report rate and false report rate, high maintenance cost and the like. The above modes not only have no retraining mechanism, so that the model accuracy rate can be reduced along with time, but also have the problems of low performance and high missing report rate due to the fact that the maintenance rule base needs to be updated continuously.

Disclosure of Invention

In order to solve the above problems, the present invention aims to provide a malicious attack detection method, system, device and medium based on cloud WAF, which adopts a dual-model mechanism combining unsupervised and supervised to better ensure accuracy, and then a retraining mechanism can effectively and always ensure higher accuracy, and simultaneously abandons the use of a rule base to reduce labor and maintenance cost.

On one hand, the invention discloses a malicious attack detection method based on cloud WAF, which comprises the following steps:

after receiving an access request to a service line source station, performing parameter extraction on the access request;

judging whether the access request is abnormal or not based on an HMM model;

if not, forwarding the access request;

if the access request is abnormal, judging whether the access request is an attack or not based on an SVM classifier;

if the attack is the attack, intercepting the access request, storing the access request into a database, and sending client alarm information;

and if not, forwarding the access request.

Optionally, the determining whether the access request is abnormal based on the HMM model includes:

receiving a normal access request as white sample data, and extracting parameters of the white sample data;

training to obtain the HMM model based on the white sample data and calculating an abnormal probability threshold;

judging whether the access request is matched with the HMM model;

if not, forwarding the access request;

if so, judging whether the probability value of the access request is smaller than the abnormal probability threshold value or not;

if the access request is not smaller than the abnormal probability threshold, forwarding the access request;

and if the access request is smaller than the abnormal probability threshold, judging whether the access request is an attack or not based on an SVM classifier.

Optionally, the determining whether the anomaly is an attack based on the SVM classifier includes:

after a data set containing normal access request data and malicious access request data is collected through the network and WAF production environment, converting the data set into a digital feature matrix;

training based on the digital feature matrix to obtain an SVM classifier;

and judging whether the access request is an attack or not based on the classification result of the SVM classifier.

Optionally, the parameter extraction includes:

and disassembling the white sample data or the access request, extracting the parameters, and generalizing the parameters.

Optionally, the generalization specifically is:

capital and small English letters are generalized to be 'A';

the number is generalized to 'N';

chinese or Chinese character symbol is generalized to "C";

other characters are generalized as "T".

Optionally, the HMM model and the SVM classifier are retrained according to at least one of a time period, a data volume, and a data accuracy, respectively, to update the parameters.

On the other hand, the invention discloses a malicious attack detection system based on cloud WAF, which comprises:

the first extraction module is used for extracting parameters of an access request after receiving the access request to a service line source station;

the anomaly detection module is used for judging whether the access request is abnormal or not based on an HMM model, if not, the access request is forwarded, and if so, whether the access request is an attack or not is judged based on an SVM classifier;

and the attack detection module is used for judging whether the access request is an attack or not based on the SVM classifier, intercepting the access request if the access request is the attack, storing the access request into a database and sending client alarm information, and forwarding the access request if the access request is not the attack.

Optionally, the anomaly detection module includes:

the second extraction module is used for receiving a normal access request as white sample data and extracting parameters of the white sample data;

the first training module is used for training to obtain the HMM model based on the white sample data and calculating an abnormal probability threshold;

the first detection module is used for judging whether the access request is matched with the HMM model or not, if not, the access request is forwarded, and if yes, the access request is forwarded to the second detection module;

the second detection module is used for judging whether the probability value of the access request is smaller than the abnormal probability threshold value or not;

and if the abnormal probability is smaller than the abnormal probability threshold, turning to an attack detection module.

Optionally, the attack detection module includes:

the data set collection module is used for converting a data set into a digital feature matrix after collecting the data set containing normal access request data and malicious access request data through the network and WAF production environment;

the second training module is used for training based on the digital feature matrix to obtain an SVM classifier;

and the third detection module is used for judging whether the access request is an attack or not based on the classification result of the SVM classifier.

In another aspect, the present invention discloses a malicious attack detection device based on cloud WAF, which is characterized in that the device includes a memory storing computer executable instructions and a processor, the processor is configured to execute the instructions to implement a malicious attack detection method based on cloud WAF, and the method includes:

judging whether the access request is abnormal or not based on an HMM model;

if not, forwarding the access request;

if the attack is the attack, intercepting the access request;

and if not, forwarding the access request.

In another aspect, the present invention discloses a computer storage medium encoded with a computer program, wherein the computer program includes instructions that are executed by one or more computers to implement a cloud WAF-based malicious attack detection method, the method including:

judging whether the access request is abnormal or not based on an HMM model;

if not, forwarding the access request;

if the attack is the attack, intercepting the access request;

and if not, forwarding the access request.

Compared with the prior art, the invention has the main differences and the effects that:

according to the method, the HMM model is suitable for processing continuous dynamic signals, the excellent time sequence modeling capability of the HMM model, the extremely strong two-classification capability of the SVM classifier under the condition of small samples and the acquired ip information are organically combined, so that a large part of common web attacks can be covered by abnormal steps, the attack steps are used for accurately stripping the attacks, the HMM model learns normal data, and the SVM classifier only needs small samples, so that the method has good feasibility for the rare black samples in the field of network security, the high accuracy is better guaranteed by the double-model mechanism, and the false alarm rate are reduced.

Drawings

FIG. 1 is a block diagram of a hardware architecture of a server of a cloud WAF-based malicious attack detection method according to the present invention;

FIG. 2 is an overall flowchart of a cloud WAF-based malicious attack detection method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating the steps of determining whether an access request is abnormal based on the HMM model in FIG. 2;

FIG. 4 is a detailed flowchart of the step of determining whether the anomaly is an attack based on the SVM classifier in FIG. 2;

FIG. 5 is a block diagram of a cloud WAF-based malicious attack detection system according to a second embodiment of the present invention;

FIG. 6 is a detailed block diagram of the anomaly detection module of FIG. 5;

fig. 7 is a detailed block diagram of the attack detection module according to fig. 5.

Detailed Description

In order to make the purpose and technical solution of the embodiments of the present invention clearer, the technical solution of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.

According to an embodiment of the present invention, there is provided an embodiment of a cloud WAF-based malicious attack detection method, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

The method embodiment provided by the application mode can be executed in a server, and fig. 1 is a hardware structure block diagram of the server of the cloud WAF-based malicious attack detection method according to the invention. As shown in fig. 1, the server 100 may include one or more (only one shown in the figure) processors 101 (the processors 101 may include, but are not limited to, processing devices such as central processing units CPU, image processing units GPU, digital signal processing units DSP, microprocessor MCU, or programmable logic devices FPGA, etc.), input/output interfaces 102 for interacting with a user, a memory 103 for storing data, and a transmission device 104 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the electronic device. For example, server 100 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The input/output interface 102 may be connected to one or more displays, touch screens, etc. for displaying data transmitted from the server 100, and may also be connected to a keyboard, a stylus, a touch pad, and/or a mouse, etc. for inputting user instructions such as selection, creation, editing, etc.

The memory 103 may be configured to store a database, a queue, and software programs and modules of application software, such as program instructions/modules corresponding to the cloud WAF-based malicious attack detection method in the embodiment of the present invention, and the processor 101 executes various functional applications and data processing by running the software programs and modules stored in the memory 103, that is, implements the cloud WAF-based malicious attack detection method. The memory 103 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 103 may further include memory located remotely from processor 101, which may be connected to server 100 over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 104 is used to receive or transmit data via a network, which may include various connection types, such as wired, wireless communication links, cloud or fiber optic cables, and so forth. The above-described specific example of the network may include the internet provided by the communication provider of the server 100.

Fig. 2 is an overall flowchart of a cloud WAF-based malicious attack detection method according to a first embodiment of the present invention. As shown in fig. 2, method 200 may include:

step 202, after receiving an access request to a service line source station, performing parameter extraction on the access request;

step 204, judging whether the access request is abnormal or not based on an HMM model;

if not, executing step 210 to forward the access request;

if the access request is abnormal, executing step 206, and judging whether the access request is an attack or not based on the SVM classifier;

if the attack is an attack, executing step 208, intercepting the access request, storing the access request into a database, and sending client alarm information;

if not, step 210 is executed to forward the access request.

Optionally, the step 204 of determining whether the access request is abnormal based on the HMM model includes:

step 2041, receiving a normal access request as white sample data, and extracting parameters of the white sample data;

step 2042, training based on the white sample data to obtain an HMM model and calculating an abnormal probability threshold;

step 2043, judging whether the access request is matched with an HMM model;

if not, go to step 210, forward the access request;

if so, executing step 2044 to determine whether the probability value of the access request is less than the abnormal probability threshold value;

if the abnormal probability is not less than the abnormal probability threshold, executing step 210, and forwarding the request to a corresponding service line source station;

if the value is less than the threshold value of the abnormal probability, step 206 is executed, and whether the access request is an attack is judged based on the SVM classifier.

Optionally, the parameter extraction includes: and disassembling the white sample data or the access request, extracting the parameters, and generalizing the parameters.

Optionally, the generalization is specifically:

capital and small English letters are generalized to be 'A';

the number is generalized to 'N';

chinese or Chinese character symbol is generalized to "C";

other characters are generalized as "T".

Optionally, in step 206, determining whether the anomaly is an attack based on the SVM classifier includes:

step 2061, after collecting the data set containing the normal access request data and the malicious access request data, converting the data set into a digital feature matrix;

step 2062, training based on the digital feature matrix to obtain an SVM classifier;

step 2063, judging whether the access request is an attack or not based on the classification result of the SVM classifier;

if the attack is positive, executing step 208 to intercept the access request;

if not, step 210 is executed, and the access request is forwarded.

Specifically, each http request data is first processed by parameter parsing, ETL, decoding, and the like, and the http request data may include: request parameters, parameter name itself, URL path of the request, http request header, etc.

In common XSS and SQL injection, the attack load is mainly concentrated in the request parameters, taking XSS as an example:

/0_1/include/dialog/select_media.php？userid＝％3Cscript％3Ealert(1)％3C/script％3E

the following is partially normal log data:

/0_1/include/dialog/select_media.php？userid＝admin123

/0_1/include/dialog/select_media.php？userid＝root

/0_1/include/dialog/select_media.php？userid＝maidou0806；

generalization operation taking the uid field as an example, the value of the uid is taken as an observation sequence, and the value of the uid is generalized: [ a-zA-Z ] is generalized to A; [0-9] to N; converting [ \\\ - ] into C; other characters are generalized as T, etc.

Hidden Markov Models (HMMs) are time-series probabilistic models, and achieve the purpose of identifying arbitrary request data by modeling input normal http request data. Fig. 3 is a specific flowchart of the step of determining whether an access request is abnormal based on the HMM model in fig. 2, and as shown in fig. 3, a certain number of normal access requests are input into the HMM model as white sample data, for example, the number is set to 500, and when the number of the normal access requests reaches 500, the normal access requests are input into the HMM model for training, and an abnormal probability threshold P1 is calculated. Inputting the access request into an HMM model, judging whether the parameter name of the access data is matched with the parameter name of the HMM model, if the access data is not matched with the HMM model, forwarding the access data to a corresponding service line source station, and executing the next access operation; if the access data are matched with the HMM model, obtaining a probability value P2, comparing the probability value P2 with an abnormal probability threshold value P1, if P2 is not less than P1, forwarding the access data to a corresponding service line source station, and executing the next access operation; if P2 < P1, a judgment is carried out on whether the access request is an attack or not based on the SVM classifier.

Fig. 4 is a specific flowchart of the step of determining whether the anomaly is an attack based on the SVM classifier in fig. 2, and as shown in fig. 4, first, training models of normal and malicious data such as sql, xss and the like collected in advance in the online and WAF production environments, storing model files, and predicting the abnormal data by calling related models in the later period.

The training phase first obtains a data set of normal access request data and malicious access request data. The malicious request part sample is as follows:

/examples/jsp/cal/feedsplitter.php？format＝../../../../../../../../../etc/passwd\x00&debug＝1

/phpwebfilemgr/index.php？f＝../../../../../../../../../etc/passwd

whether the malicious access request data set or the normal access request data set is an irregular and indefinite-length character string list, and the processing is difficult to be directly carried out by an algorithm, so that the processing and the conversion into a digital characteristic matrix are required to train the detection model. The SVM classifier can be obtained by training through technologies such as TF-IDF and word vectors.

The Support Vector Machine (SVM) model is the most widely used two classifiers at present, and shows many specific advantages in solving small sample, non-linear and high-dimensional pattern recognition. The basic principle is to find an optimal hyperplane meeting the data classification requirement, so that the distance between the hyperplane and two types of sample points is the largest under the condition that the classification precision of the hyperplane is ensured. The hyperplane should satisfy ω T · X + b ═ 0, where ω is the adjustable weight vector, b is the offset, X is the eigenvector, and T is the transposed sign of the matrix. The optimal hyperplane requires the maximum classification interval, and the distance between two parallel hyperplanes is 2/| | ω |, i.e. requires the minimization of | | ω | |, i.e. there is a minimization equation when solving:

in order to make all samples out of the hyperplane, the above equation should also satisfy the constraint Yi ω T · Xi + b > 1, Yi ∈ { -1, 1}, i ═ 1, 2, … l, where Yi represents the sample class and l is the number of samples.

Then judging whether the access request is an attack or not based on the classification result of the SVM classifier, if not, forwarding the access data to a corresponding service line source station, and executing the next access operation; if the attack is the attack, the access request is intercepted, the access request is stored in a database for storage, and client alarm information is sent to prompt a user that the access request is possibly a malicious access request, and meanwhile, the access request is labeled so as to be convenient for later manual review.

Specifically, taking the data volume as an example, when the data volume stored in the model reaches a certain threshold value, the threshold value can be set according to actual requirements, the HMM model and the SVM classifier can be retrained automatically, parameters are updated adaptively, and in the later maintenance process, the model is retrained only according to new data without continuously maintaining a rule base. The retraining mechanism effectively slows down model attenuation, has better robustness, can effectively prevent the behavior of bypassing detection, and ensures high accuracy.

Fig. 5 is a block diagram of a cloud WAF-based malicious attack detection system according to a second embodiment of the present invention. As shown in fig. 5, the system 300 may include:

the first extraction module 302, after receiving an access request to a service line source station, performs parameter extraction on the access request;

an anomaly detection module 304, which judges whether the access request is abnormal based on the HMM model, if not, forwards the access request, and if so, judges whether the access request is an attack based on the SVM classifier;

and the attack detection module 306 judges whether the access request is an attack or not based on the SVM classifier, intercepts the access request if the access request is the attack, and forwards the access request if the access request is not the attack.

Fig. 6 is a specific structural diagram of the abnormality detection module in fig. 5, and as shown in fig. 6, optionally, the abnormality detection module 304 further includes:

the second extraction module 3041, which receives the normal access request as white sample data and performs parameter extraction on the white sample data;

a first training module 3042, training to obtain the HMM model based on the white sample data and calculating an anomaly probability threshold;

the first detection module 3043 determining whether the access request matches the HMM model, if not, forwarding the access request, and if so, switching to the second detection module;

the second detecting module 3044, which determines whether the probability value of the access request is smaller than the abnormal probability threshold;

if the access request is not less than the abnormal probability threshold value, the access request is forwarded;

Fig. 7 is a specific structural diagram of the attack detection module in fig. 5, and as shown in fig. 7, optionally, the attack detection module 306 further includes:

the data set collection module 3061, after collecting data sets including normal access request data and malicious access request data through the web and the WAF production environment, converts the data sets into a digital feature matrix;

the second training module 3062, training based on the digital feature matrix to obtain the SVM classifier;

the third detection module 3063 determines whether the access request is an attack based on the classification result of the SVM classifier.

The first embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

Each method embodiment of the present invention can be implemented by software, hardware, firmware, or the like. Whether the present invention is implemented as software, hardware, or firmware, the instruction code may be stored in any type of computer-accessible memory (e.g., permanent or modifiable, volatile or non-volatile, solid or non-solid, fixed or removable media, etc.). Also, the Memory may be, for example, Programmable Array Logic (PAL), Random Access Memory (RAM), Programmable Read Only Memory (PROM), Read-Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic disk, an optical disk, a Digital Versatile Disk (DVD), or the like.

It should be noted that, each unit/module mentioned in each device embodiment of the present invention is a logical unit/module, and physically, one logical unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units, and the physical implementation manner of these logical units itself is not the most important, and the combination of the functions implemented by these logical units is the key to solve the technical problem provided by the present invention. Furthermore, the above-mentioned embodiments of the apparatus of the present invention do not introduce elements that are less relevant for solving the technical problems of the present invention in order to highlight the innovative part of the present invention, which does not indicate that there are no other elements in the above-mentioned embodiments of the apparatus.

It is to be noted that in the claims and the description of the present patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A malicious attack detection method based on cloud WAF is characterized by comprising the following steps:

judging whether the access request is abnormal or not based on an HMM model;

if not, forwarding the access request;

if the attack is the attack, intercepting the access request;

and if not, forwarding the access request.

2. The method of claim 1, wherein determining whether the access request is anomalous based on an HMM model comprises:

judging whether the access request is matched with the HMM model;

if not, forwarding the access request;

3. The method of claim 1 or 2, wherein the SVM-based classifier determining whether the access request is an attack comprises:

after collecting a data set containing normal access request data and malicious access request data, converting the data set into a digital feature matrix;

training based on the digital feature matrix to obtain an SVM classifier;

4. The method according to any of claims 1-3, wherein the parameters comprise: request parameters, parameter name itself, URL path of the request, http request header.

5. The method according to any one of claims 1-3, wherein the parameter extraction comprises:

6. The method according to any one of claims 1 to 3, wherein the generalization is in particular:

capital and small English letters are generalized to be 'A';

the number is generalized to 'N';

chinese or Chinese character symbol is generalized to "C";

other characters are generalized as "T".

7. The method of any of claims 1-3, wherein the HMM model and the SVM classifier are retrained to update the parameters based on at least one of a time period, a data volume, and a data accuracy rate, respectively.

8. The method according to any of claims 1-3, wherein intercepting the access request comprises: and storing the access request into a database and sending the access request to client alarm information.

9. A cloud WAF-based malicious attack detection system, the system comprising:

10. The system of claim 7, wherein the anomaly detection module comprises:

11. The system of claim 7, wherein the attack detection module comprises:

12. A cloud WAF-based malicious attack detection device, the device comprising a memory storing computer-executable instructions and a processor configured to execute the instructions to implement a cloud WAF-based malicious attack detection method, the method comprising:

judging whether the access request is abnormal or not based on an HMM model;

if not, forwarding the access request;

if the attack is the attack, intercepting the access request;

and if not, forwarding the access request.

13. A computer storage medium encoded with a computer program, the computer program comprising instructions executable by one or more computers to implement a cloud WAF-based malicious attack detection method, the method comprising:

judging whether the access request is abnormal or not based on an HMM model;

if not, forwarding the access request;

if the attack is the attack, intercepting the access request;

and if not, forwarding the access request.