CN107644166A - It is a kind of based on the WEB application safety protecting method learnt automatically - Google Patents

It is a kind of based on the WEB application safety protecting method learnt automatically Download PDF

Info

Publication number
CN107644166A
CN107644166A CN201710863641.3A CN201710863641A CN107644166A CN 107644166 A CN107644166 A CN 107644166A CN 201710863641 A CN201710863641 A CN 201710863641A CN 107644166 A CN107644166 A CN 107644166A
Authority
CN
China
Prior art keywords
rule
request
web application
attack
white list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710863641.3A
Other languages
Chinese (zh)
Inventor
罗智高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhidaochuangyu Information Technology Co Ltd
Original Assignee
Chengdu Zhidaochuangyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhidaochuangyu Information Technology Co Ltd filed Critical Chengdu Zhidaochuangyu Information Technology Co Ltd
Priority to CN201710863641.3A priority Critical patent/CN107644166A/en
Publication of CN107644166A publication Critical patent/CN107644166A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of based on the WEB application safety protecting method learnt automatically, comprise the following steps, step 1:Screen the daily record of non-attack request;Step 2:Field in daily record, by regular expression set of the machine learning generation with ad hoc rules, form white list rule;Step 3:The request received is matched using regular expression set, intercepts or mark the request not in white list rule;Step 4:Request to mark is identified, and if normal, is then added in white list rule, is then intercepted if attack;The present invention can generate white list rule by autonomous learning, fail to report when in use low with rate of false alarm, unknown leak can also be protected.

Description

It is a kind of based on the WEB application safety protecting method learnt automatically
Technical field
The present invention relates to field of computer technology, and in particular to a kind of based on the WEB application fire prevention system learnt automatically Defence method.
Background technology
WEB application system is developed using various dynamic WEB techniques, based on B/S (browser/server) pattern Transacter;Currently, WEB security threats grow in intensity, and for a user, WEB is a disaster safely;It is most normal at present Way is fire wall, fire wall can filter out the data of non-traffic port, the leak for preventing non-Web service from occurring;But It is that traditional WEB application fire wall is all detected using intrusion feature database to request, so as to which whether decision request is normally please Ask;The content of request is then returned if normal request, if query-attack, then interception request and returns to prompt message;But pass The WEB application fire wall of system is in use towards various types of website, it may appear that reports by mistake and fails to report;It can only defend public The leak opened, unknown leak can not be defendd before rule is not upgraded.
The content of the invention
The present invention provide it is a kind of can learn automatically based on the WEB application safety protecting method learnt automatically.
The technical solution adopted by the present invention is:It is a kind of based on the WEB application safety protecting method learnt automatically, it is including following Step:
Step 1:Extract the access log of WEB application, the daily record of screening non-attack request;
Step 2:According to the field in the daily record filtered out in step 1, there is ad hoc rules by machine learning generation Regular expression set, form white list rule;
Step 3:The request received is matched using the regular expression set generated in step 2, intercepts or marks Remember the not request in white list rule;
Step 4:The request marked in step 3 is identified, if normal, is then added in white list rule, such as Fruit then intercepts for attack.
Further, recognition methods is traditional WAF rule detections or association analysis in the step 4.
Further, it is to carry out Keywords matching using script that non-attack requesting method is screened in step 1, and filtering attack please Ask.
Further, the generation method of the step 2 regular expression set is as follows:
S1:Field is obtained from the most long public substring in character string beginning, according to this substring create-rule;
S2:Remove public most long substring part, remainder is calculated into similarity two-by-two according to string editing distance;
S3:The character string for being less than certain threshold value with other similarity of character string is extracted, generates independent matched rule, and with The rule generated in step S1 is spliced;
S4:Remaining character string repeat step S1-S3 after being extracted in step S3, the compatible rule merging generated with step 3, until Travel through all character strings.
Further, the field in the step 2 includes the field in URL, Cookie, Referer and self-defined record.
The beneficial effects of the invention are as follows:
(1) present invention is according to the WEB daily records normally accessed, by learning to generate white list rule automatically;
(2) present invention can carry out real-time update to white list rule set, available for different types of website;
(3) present invention fails to report low with rate of false alarm when in use, and can also be protected for unknown leak.
Brief description of the drawings
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
The present invention will be further described with specific embodiment below in conjunction with the accompanying drawings.
As shown in figure 1, it is a kind of based on the WEB application safety protecting method learnt automatically, comprise the following steps:
Step 1:The access log of WEB application is extracted from WEB server or in traditional WAF equipment, screening non-attack please The daily record asked;
Screening technique is to carry out Keywords matching using script, filters query-attack, may filter out in WAF daily records Attack logs, or by manually identifying one by one, determine whether to attack.
Such as:Set of URL closes:
Query-attack set of URL can be filtered out by above-mentioned screening technique to close:
Step 2:According to the word of URL, Cookie, Referer and other self-defined records in the daily record filtered out in step 1 Section, by regular expression set of the machine learning generation with ad hoc rules, form white list rule;
Regular expression generation method is illustrated using following set of URL cooperations as example.
S1:Obtain in all URL from the most long public substring at character string beginning, directly generate and advise for this substring Then;
Such as most long substring is " http in above-mentioned set of URL conjunction://www.xxxx.com/”
The rule of generation is http:\/\/www\.xxxx\.com\/;
S2:Remove public most long substring part, remaining part substring is calculated two-by-two according to string editing distance Similarity, obtain result shown in table 1;
The Similarity Measure result of table 1
The character string relatively low with other similarity of character string is extracted, generates individually rule;
S3:It is too low with the similarity of other character strings less than 50.0 expression character string to set upper table intermediate value, directly processes Corresponding text string extracting is into matched rule;
The rule that this step obtains:(:Download | list), rule is obtained after the result splicing obtained with step S1 For:http:\/\/www.xxxx.com\/(:download|list);
S4:Remaining character string repeat step S1-S3 after being extracted in step S3, the compatible rule merging generated with step 3, until Travel through all character strings.
Repeat step S1-S3 obtain news /detailId=(:7126|4512|1231|7793);
It can further optimize to obtain:news\/detail\Id=d+;
After the compatible rule merging generated with step 3, obtain:
http\/\/www\.xxxx\.com\/(:news\/detail\Id=d+ | download | list) $.
Step 3:The request received is matched using the regular expression set generated in step 2, intercepts or marks Remember the not request in white list rule;
The new request received to a server, firewall system can be attempted to extract parameter therein to be entered using rule Row matching, the URL of such as one visitor's request are:
http://www.xxxx.com/news/detailId=1126unionselect 1,2,3,4
Due to regular expression " news /detailId=d+ " only allow id parameters for numeral, and contain herein Character strings such as " union ", canonical can not match the string content of back, so this request is just in white list rule Outside;According to the setting of user, directly this request can be intercepted, this request can also be marked, enter traveling one Step is analyzed to determine whether query-attack.
Rapid 4:The request marked in step 3 is identified, if normal, is then added in white list rule set, such as Fruit then intercepts for attack.
Request to mark carries out analysis identification with reference to the context of the information request after this IP or login, and analysis can herein For traditional WAF rule detection, also analysis can be associated according to the nearest access record of some visitor, or be other detections The combination of mode.
Wherein traditional WAF detections are mainly that leak rule known to use (according to the utilization information of open leak, is write Regular expression) request to visitor matches;Association analysis, which refers to be recorded according to the nearest access of some visitor, to be carried out Analysis, such as fire wall None- identified some request whether there is attack, and find this according to the historical record of access The attack for all existing and determining is accessed several times before individual visitor, then current request is determined as query-attack;If sentence Result is determined for attack, then direct interception request, if it is determined that being normal request, is then added in white list rule.
Further, manual review can also be finally carried out, edits the rule of generation, this single stepping is primarily to inspection Look into the correctness of machine create-rule and allow keeper to add white list manually.
Manual review is primarily to find that automatically generate rule causes to intercept with the presence or absence of mistake by mistake;It is specific as follows:
Check and intercept daily record, if there is normal access (to separate normal request according to the characteristic area of common attack type, join Examine OWASP documents) request be intercepted;If wrong interception be present, edit corresponding regular (being added to white list collection).
It there may be instant invention overcomes the attack detecting of traditional WEB fire walls intrusion feature database and largely report by mistake and fail to report The defects of;By the WEB daily records normally accessed in analysis of history, generation white list rule, according to the setting of user, can only allow Request in clearance white list rule, so as to defend the attack that hacker initiates.
Wen Zhong, regular expression refer to a kind of logical formula to string operation, with some the specific words defined in advance The combination of symbol and these specific characters;WAF refers to WEB application guard system;Leak refer to hardware, software, agreement specific implementation or Defect present on System Security Policy, it can enable attacker that system is accessed or destroyed in the case of unauthorized;Editor Between distance refers to two word strings, as the minimum edit operation number needed for one changes into another;The edit operation of license includes One character is substituted for another character, inserts a character, deletes a character;In general, editing distance is smaller, and two The similarity of individual string is bigger;OWASP refers to open WEB application program safety project;URL refers to URL;Cookie Refer to the data being stored on user local terminal;Referer refers to source website address.

Claims (5)

  1. It is 1. a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that to comprise the following steps:
    Step 1:Extract the access log of WEB application, the daily record of screening non-attack request;
    Step 2:According to the field in the daily record filtered out in step 1, pass through canonical of the machine learning generation with ad hoc rules Expression formula set, form white list rule;
    Step 3:The request received is matched using the regular expression set generated in step 2, intercepts or marks not Request in white list rule;
    Step 4:The request marked in step 3 is identified, if normal, is then added in white list rule, if Attack then intercepts.
  2. It is 2. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that institute It is traditional WAF rule detections or association analysis to state recognition methods in step 4.
  3. It is 3. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that step It is to carry out Keywords matching using script that non-attack requesting method is screened in rapid 1, filters query-attack.
  4. It is 4. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that institute The generation method for stating step 2 regular expression set is as follows:
    S1:Field is obtained from the most long public substring in character string beginning, according to this substring create-rule;
    S2:Remove public most long substring part, remainder is calculated into similarity two-by-two according to string editing distance;
    S3:The character string for being less than certain threshold value with other similarity of character string is extracted, generates independent matched rule, and and step The rule generated in S1 is spliced;
    S4:Remaining character string repeat step S1-S3 after being extracted in step S3, the compatible rule merging generated with step 3, until traversal All character strings.
  5. It is 5. according to claim 1 a kind of based on the WEB application safety protecting method learnt automatically, it is characterised in that institute The field stated in step 2 includes the field in URL, Cookie, Referer and self-defined record.
CN201710863641.3A 2017-09-22 2017-09-22 It is a kind of based on the WEB application safety protecting method learnt automatically Pending CN107644166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710863641.3A CN107644166A (en) 2017-09-22 2017-09-22 It is a kind of based on the WEB application safety protecting method learnt automatically

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710863641.3A CN107644166A (en) 2017-09-22 2017-09-22 It is a kind of based on the WEB application safety protecting method learnt automatically

Publications (1)

Publication Number Publication Date
CN107644166A true CN107644166A (en) 2018-01-30

Family

ID=61111896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710863641.3A Pending CN107644166A (en) 2017-09-22 2017-09-22 It is a kind of based on the WEB application safety protecting method learnt automatically

Country Status (1)

Country Link
CN (1) CN107644166A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520180A (en) * 2018-03-01 2018-09-11 中国科学院信息工程研究所 A kind of firmware Web leak detection methods and system based on various dimensions
CN110661680A (en) * 2019-09-11 2020-01-07 深圳市永达电子信息股份有限公司 Method and system for detecting data stream white list based on regular expression
CN111835737A (en) * 2020-06-29 2020-10-27 中国平安财产保险股份有限公司 WEB attack protection method based on automatic learning and related equipment thereof
CN111935133A (en) * 2020-08-06 2020-11-13 北京顶象技术有限公司 White list generation method and device
CN111953638A (en) * 2019-05-17 2020-11-17 北京京东尚科信息技术有限公司 Network attack behavior detection method and device and readable storage medium
CN112148842A (en) * 2020-10-13 2020-12-29 厦门安胜网络科技有限公司 Method, device and storage medium for reducing false alarm rate in attack detection
CN113162909A (en) * 2021-03-10 2021-07-23 北京顶象技术有限公司 White list generation method and device based on AI (Artificial Intelligence), electronic equipment and readable medium
CN113259303A (en) * 2020-02-12 2021-08-13 网宿科技股份有限公司 White list self-learning method and device based on machine learning technology
CN113660230A (en) * 2021-08-06 2021-11-16 杭州安恒信息技术股份有限公司 Cloud security protection test method, system, computer and readable storage medium
CN114039778A (en) * 2021-11-09 2022-02-11 深信服科技股份有限公司 Request processing method, device, equipment and readable storage medium
CN114422206A (en) * 2021-12-29 2022-04-29 北京致远互联软件股份有限公司 JAVA WEB dynamic configuration security defense method
CN114500018A (en) * 2022-01-17 2022-05-13 武汉大学 Web application firewall security detection and reinforcement system and method based on neural network
CN117201194A (en) * 2023-11-06 2023-12-08 华中科技大学 URL classification method, device and system based on character string similarity calculation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN103166966A (en) * 2013-03-07 2013-06-19 星云融创(北京)信息技术有限公司 Method and device for distinguishing illegal access request to website
CN103428196A (en) * 2012-12-27 2013-12-04 北京安天电子设备有限公司 URL white list-based WEB application intrusion detecting method and apparatus
CN104361283A (en) * 2014-12-05 2015-02-18 网宿科技股份有限公司 Web attack protection method
US20160344696A1 (en) * 2013-03-27 2016-11-24 Fortinet, Inc. Firewall policy management
CN106415507A (en) * 2014-06-06 2017-02-15 日本电信电话株式会社 Log analysis device, attack detection device, attack detection method and program
CN106657006A (en) * 2016-11-17 2017-05-10 北京中电普华信息技术有限公司 Software information safety protection method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN103428196A (en) * 2012-12-27 2013-12-04 北京安天电子设备有限公司 URL white list-based WEB application intrusion detecting method and apparatus
CN103166966A (en) * 2013-03-07 2013-06-19 星云融创(北京)信息技术有限公司 Method and device for distinguishing illegal access request to website
US20160344696A1 (en) * 2013-03-27 2016-11-24 Fortinet, Inc. Firewall policy management
CN106415507A (en) * 2014-06-06 2017-02-15 日本电信电话株式会社 Log analysis device, attack detection device, attack detection method and program
CN104361283A (en) * 2014-12-05 2015-02-18 网宿科技股份有限公司 Web attack protection method
CN106657006A (en) * 2016-11-17 2017-05-10 北京中电普华信息技术有限公司 Software information safety protection method and device

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520180B (en) * 2018-03-01 2020-04-24 中国科学院信息工程研究所 Multi-dimension-based firmware Web vulnerability detection method and system
CN108520180A (en) * 2018-03-01 2018-09-11 中国科学院信息工程研究所 A kind of firmware Web leak detection methods and system based on various dimensions
CN111953638B (en) * 2019-05-17 2023-06-27 北京京东尚科信息技术有限公司 Network attack behavior detection method and device and readable storage medium
CN111953638A (en) * 2019-05-17 2020-11-17 北京京东尚科信息技术有限公司 Network attack behavior detection method and device and readable storage medium
CN110661680A (en) * 2019-09-11 2020-01-07 深圳市永达电子信息股份有限公司 Method and system for detecting data stream white list based on regular expression
CN110661680B (en) * 2019-09-11 2023-03-14 深圳市永达电子信息股份有限公司 Method and system for detecting data stream white list based on regular expression
CN113259303A (en) * 2020-02-12 2021-08-13 网宿科技股份有限公司 White list self-learning method and device based on machine learning technology
EP3886394A4 (en) * 2020-02-12 2021-09-29 Wangsu Science & Technology Co., Ltd. Machine learning technique based whitelist self-learning method and device
CN111835737A (en) * 2020-06-29 2020-10-27 中国平安财产保险股份有限公司 WEB attack protection method based on automatic learning and related equipment thereof
CN111835737B (en) * 2020-06-29 2024-04-02 中国平安财产保险股份有限公司 WEB attack protection method based on automatic learning and related equipment thereof
CN111935133A (en) * 2020-08-06 2020-11-13 北京顶象技术有限公司 White list generation method and device
CN112148842A (en) * 2020-10-13 2020-12-29 厦门安胜网络科技有限公司 Method, device and storage medium for reducing false alarm rate in attack detection
CN113162909A (en) * 2021-03-10 2021-07-23 北京顶象技术有限公司 White list generation method and device based on AI (Artificial Intelligence), electronic equipment and readable medium
CN113660230A (en) * 2021-08-06 2021-11-16 杭州安恒信息技术股份有限公司 Cloud security protection test method, system, computer and readable storage medium
CN113660230B (en) * 2021-08-06 2023-02-28 杭州安恒信息技术股份有限公司 Cloud security protection testing method and system, computer and readable storage medium
CN114039778A (en) * 2021-11-09 2022-02-11 深信服科技股份有限公司 Request processing method, device, equipment and readable storage medium
CN114422206A (en) * 2021-12-29 2022-04-29 北京致远互联软件股份有限公司 JAVA WEB dynamic configuration security defense method
CN114422206B (en) * 2021-12-29 2024-02-02 北京致远互联软件股份有限公司 JAVA WEB dynamic configuration security defense method
CN114500018B (en) * 2022-01-17 2022-10-14 武汉大学 Web application firewall security detection and reinforcement system and method based on neural network
CN114500018A (en) * 2022-01-17 2022-05-13 武汉大学 Web application firewall security detection and reinforcement system and method based on neural network
CN117201194A (en) * 2023-11-06 2023-12-08 华中科技大学 URL classification method, device and system based on character string similarity calculation
CN117201194B (en) * 2023-11-06 2024-01-05 华中科技大学 URL classification method, device and system based on character string similarity calculation

Similar Documents

Publication Publication Date Title
CN107644166A (en) It is a kind of based on the WEB application safety protecting method learnt automatically
CN110233849B (en) Method and system for analyzing network security situation
US10178107B2 (en) Detection of malicious domains using recurring patterns in domain names
US10721245B2 (en) Method and device for automatically verifying security event
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
Lee et al. A novel method for SQL injection attack detection based on removing SQL query attribute values
Nelms et al. {ExecScent}: Mining for New {C&C} Domains in Live Networks with Adaptive Control Protocol Templates
KR101001132B1 (en) Method and System for Determining Vulnerability of Web Application
JP6397932B2 (en) A system for identifying machines infected with malware that applies language analysis to network requests from endpoints
CN105844140A (en) Website login brute force crack method and system capable of identifying verification code
CN112738126A (en) Attack tracing method based on threat intelligence and ATT & CK
CN110351248B (en) Safety protection method and device based on intelligent analysis and intelligent current limiting
CN111931173A (en) APT attack intention-based operation authority control method
CN103428196A (en) URL white list-based WEB application intrusion detecting method and apparatus
CN112887341B (en) External threat monitoring method
CN110177114A (en) The recognition methods of network security threats index, unit and computer readable storage medium
CN107612924A (en) Attacker's localization method and device based on wireless network invasion
CN107016298B (en) Webpage tampering monitoring method and device
CN109347808B (en) Safety analysis method based on user group behavior activity
CN111104579A (en) Identification method and device for public network assets and storage medium
CN103166966A (en) Method and device for distinguishing illegal access request to website
WO2017063274A1 (en) Method for automatically determining malicious-jumping and malicious-nesting offensive websites
CN112199677A (en) Data processing method and device
CN111953697A (en) APT attack identification and defense method
CN114021040A (en) Method and system for alarming and protecting malicious event based on service access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180130

RJ01 Rejection of invention patent application after publication