CN107644166A

CN107644166A - It is a kind of based on the WEB application safety protecting method learnt automatically

Info

Publication number: CN107644166A
Application number: CN201710863641.3A
Authority: CN
Inventors: 罗智高
Original assignee: Chengdu Zhidaochuangyu Information Technology Co Ltd
Current assignee: Chengdu Zhidaochuangyu Information Technology Co Ltd
Priority date: 2017-09-22
Filing date: 2017-09-22
Publication date: 2018-01-30

Abstract

The invention discloses a kind of based on the WEB application safety protecting method learnt automatically, comprise the following steps, step 1：Screen the daily record of non-attack request；Step 2：Field in daily record, by regular expression set of the machine learning generation with ad hoc rules, form white list rule；Step 3：The request received is matched using regular expression set, intercepts or mark the request not in white list rule；Step 4：Request to mark is identified, and if normal, is then added in white list rule, is then intercepted if attack；The present invention can generate white list rule by autonomous learning, fail to report when in use low with rate of false alarm, unknown leak can also be protected.

Description

It is a kind of based on the WEB application safety protecting method learnt automatically

Technical field

The present invention relates to field of computer technology, and in particular to a kind of based on the WEB application fire prevention system learnt automatically Defence method.

Background technology

WEB application system is developed using various dynamic WEB techniques, based on B/S (browser/server) pattern Transacter；Currently, WEB security threats grow in intensity, and for a user, WEB is a disaster safely；It is most normal at present Way is fire wall, fire wall can filter out the data of non-traffic port, the leak for preventing non-Web service from occurring；But It is that traditional WEB application fire wall is all detected using intrusion feature database to request, so as to which whether decision request is normally please Ask；The content of request is then returned if normal request, if query-attack, then interception request and returns to prompt message；But pass The WEB application fire wall of system is in use towards various types of website, it may appear that reports by mistake and fails to report；It can only defend public The leak opened, unknown leak can not be defendd before rule is not upgraded.

The content of the invention

The present invention provide it is a kind of can learn automatically based on the WEB application safety protecting method learnt automatically.

The technical solution adopted by the present invention is：It is a kind of based on the WEB application safety protecting method learnt automatically, it is including following Step：

Step 1：Extract the access log of WEB application, the daily record of screening non-attack request；

Step 2：According to the field in the daily record filtered out in step 1, there is ad hoc rules by machine learning generation Regular expression set, form white list rule；

Step 3：The request received is matched using the regular expression set generated in step 2, intercepts or marks Remember the not request in white list rule；

Step 4：The request marked in step 3 is identified, if normal, is then added in white list rule, such as Fruit then intercepts for attack.

Further, recognition methods is traditional WAF rule detections or association analysis in the step 4.

Further, it is to carry out Keywords matching using script that non-attack requesting method is screened in step 1, and filtering attack please Ask.

Further, the generation method of the step 2 regular expression set is as follows：

S1：Field is obtained from the most long public substring in character string beginning, according to this substring create-rule；

S2：Remove public most long substring part, remainder is calculated into similarity two-by-two according to string editing distance；

S3：The character string for being less than certain threshold value with other similarity of character string is extracted, generates independent matched rule, and with The rule generated in step S1 is spliced；

S4：Remaining character string repeat step S1-S3 after being extracted in step S3, the compatible rule merging generated with step 3, until Travel through all character strings.

Further, the field in the step 2 includes the field in URL, Cookie, Referer and self-defined record.

The beneficial effects of the invention are as follows：

(1) present invention is according to the WEB daily records normally accessed, by learning to generate white list rule automatically；

(2) present invention can carry out real-time update to white list rule set, available for different types of website；

(3) present invention fails to report low with rate of false alarm when in use, and can also be protected for unknown leak.

Brief description of the drawings

Fig. 1 is schematic flow sheet of the present invention.

Embodiment

The present invention will be further described with specific embodiment below in conjunction with the accompanying drawings.

As shown in figure 1, it is a kind of based on the WEB application safety protecting method learnt automatically, comprise the following steps：

Step 1：The access log of WEB application is extracted from WEB server or in traditional WAF equipment, screening non-attack please The daily record asked；

Screening technique is to carry out Keywords matching using script, filters query-attack, may filter out in WAF daily records Attack logs, or by manually identifying one by one, determine whether to attack.

Such as：Set of URL closes：

Query-attack set of URL can be filtered out by above-mentioned screening technique to close：

Step 2：According to the word of URL, Cookie, Referer and other self-defined records in the daily record filtered out in step 1 Section, by regular expression set of the machine learning generation with ad hoc rules, form white list rule；

Regular expression generation method is illustrated using following set of URL cooperations as example.

S1：Obtain in all URL from the most long public substring at character string beginning, directly generate and advise for this substring Then；

Such as most long substring is " http in above-mentioned set of URL conjunction://www.xxxx.com/”

The rule of generation is http:\/\/www\.xxxx\.com\/；

S2：Remove public most long substring part, remaining part substring is calculated two-by-two according to string editing distance Similarity, obtain result shown in table 1；

The Similarity Measure result of table 1

The character string relatively low with other similarity of character string is extracted, generates individually rule；

S3：It is too low with the similarity of other character strings less than 50.0 expression character string to set upper table intermediate value, directly processes Corresponding text string extracting is into matched rule；

The rule that this step obtains：(:Download | list), rule is obtained after the result splicing obtained with step S1 For：http:\/\/www.xxxx.com\/(:download|list)；

Repeat step S1-S3 obtain news /detailId=(:7126|4512|1231|7793)；

It can further optimize to obtain：news\/detail\Id=d+；

After the compatible rule merging generated with step 3, obtain：

http\/\/www\.xxxx\.com\/(:news\/detail\Id=d+ | download | list) $.

The new request received to a server, firewall system can be attempted to extract parameter therein to be entered using rule Row matching, the URL of such as one visitor's request are：

http://www.xxxx.com/news/detailId=1126unionselect 1,2,3,4

Due to regular expression " news /detailId=d+ " only allow id parameters for numeral, and contain herein Character strings such as " union ", canonical can not match the string content of back, so this request is just in white list rule Outside；According to the setting of user, directly this request can be intercepted, this request can also be marked, enter traveling one Step is analyzed to determine whether query-attack.

Rapid 4：The request marked in step 3 is identified, if normal, is then added in white list rule set, such as Fruit then intercepts for attack.

Request to mark carries out analysis identification with reference to the context of the information request after this IP or login, and analysis can herein For traditional WAF rule detection, also analysis can be associated according to the nearest access record of some visitor, or be other detections The combination of mode.

Wherein traditional WAF detections are mainly that leak rule known to use (according to the utilization information of open leak, is write Regular expression) request to visitor matches；Association analysis, which refers to be recorded according to the nearest access of some visitor, to be carried out Analysis, such as fire wall None- identified some request whether there is attack, and find this according to the historical record of access The attack for all existing and determining is accessed several times before individual visitor, then current request is determined as query-attack；If sentence Result is determined for attack, then direct interception request, if it is determined that being normal request, is then added in white list rule.

Further, manual review can also be finally carried out, edits the rule of generation, this single stepping is primarily to inspection Look into the correctness of machine create-rule and allow keeper to add white list manually.

Manual review is primarily to find that automatically generate rule causes to intercept with the presence or absence of mistake by mistake；It is specific as follows：

Check and intercept daily record, if there is normal access (to separate normal request according to the characteristic area of common attack type, join Examine OWASP documents) request be intercepted；If wrong interception be present, edit corresponding regular (being added to white list collection).

It there may be instant invention overcomes the attack detecting of traditional WEB fire walls intrusion feature database and largely report by mistake and fail to report The defects of；By the WEB daily records normally accessed in analysis of history, generation white list rule, according to the setting of user, can only allow Request in clearance white list rule, so as to defend the attack that hacker initiates.

Wen Zhong, regular expression refer to a kind of logical formula to string operation, with some the specific words defined in advance The combination of symbol and these specific characters；WAF refers to WEB application guard system；Leak refer to hardware, software, agreement specific implementation or Defect present on System Security Policy, it can enable attacker that system is accessed or destroyed in the case of unauthorized；Editor Between distance refers to two word strings, as the minimum edit operation number needed for one changes into another；The edit operation of license includes One character is substituted for another character, inserts a character, deletes a character；In general, editing distance is smaller, and two The similarity of individual string is bigger；OWASP refers to open WEB application program safety project；URL refers to URL；Cookie Refer to the data being stored on user local terminal；Referer refers to source website address.

Claims

It is 1. a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that to comprise the following steps：

Step 1：Extract the access log of WEB application, the daily record of screening non-attack request；

Step 2：According to the field in the daily record filtered out in step 1, pass through canonical of the machine learning generation with ad hoc rules Expression formula set, form white list rule；

Step 3：The request received is matched using the regular expression set generated in step 2, intercepts or marks not Request in white list rule；

Step 4：The request marked in step 3 is identified, if normal, is then added in white list rule, if Attack then intercepts.
It is 2. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that institute It is traditional WAF rule detections or association analysis to state recognition methods in step 4.
It is 3. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that step It is to carry out Keywords matching using script that non-attack requesting method is screened in rapid 1, filters query-attack.
It is 4. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that institute The generation method for stating step 2 regular expression set is as follows：

S1：Field is obtained from the most long public substring in character string beginning, according to this substring create-rule；

S2：Remove public most long substring part, remainder is calculated into similarity two-by-two according to string editing distance；

S3：The character string for being less than certain threshold value with other similarity of character string is extracted, generates independent matched rule, and and step The rule generated in S1 is spliced；

S4：Remaining character string repeat step S1-S3 after being extracted in step S3, the compatible rule merging generated with step 3, until traversal All character strings.
It is 5. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that institute The field stated in step 2 includes the field in URL, Cookie, Referer and self-defined record.