CN101471818A - Detection method and system for malevolence injection script web page - Google Patents

Detection method and system for malevolence injection script web page Download PDF

Info

Publication number
CN101471818A
CN101471818A CNA2007103039855A CN200710303985A CN101471818A CN 101471818 A CN101471818 A CN 101471818A CN A2007103039855 A CNA2007103039855 A CN A2007103039855A CN 200710303985 A CN200710303985 A CN 200710303985A CN 101471818 A CN101471818 A CN 101471818A
Authority
CN
China
Prior art keywords
webpage
script
bunch
dynamic content
malevolence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007103039855A
Other languages
Chinese (zh)
Other versions
CN101471818B (en
Inventor
叶润国
胡振宇
朱钱杭
李博
骆拥政
牛妍萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Venus Information Technology Co Ltd
Original Assignee
Beijing Venus Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Venus Information Technology Co Ltd filed Critical Beijing Venus Information Technology Co Ltd
Priority to CN2007103039855A priority Critical patent/CN101471818B/en
Publication of CN101471818A publication Critical patent/CN101471818A/en
Application granted granted Critical
Publication of CN101471818B publication Critical patent/CN101471818B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for detecting a web page embedded with malicious scripts, and a system thereof, belonging to the technical field of computing network. The method comprises the following steps: traversing with a web page crawler and downloading all the web pages from a website to be scanned; performing cluster analysis on the downloaded web pages and extracting a web page cluster template; and detecting whether the web pages in the cluster contain embedded malicious scripts by using the web page cluster template. The system comprises a web page crawler module, a dynamic web page content flirtation module, a dynamic web page content clustering module, a web page cluster template extraction module and a embedded malicious script detection module.

Description

A kind of malevolence injection script web page detection method and system
Technical field
The present invention relates to a kind of malevolence injection script web page detection method and system, belong to technical field of the computer network.
Background technology
Along with the development of Internet technology and Web technology, Web is no longer only for the Internet user provides the static content service, and can provide various Dynamic Web content services according to user's needs.Because Web service has easy deployment and advantage such as easy-to-use, the application of now a lot of legacy clients/server modes all begins to be transformed into the application based on Web, comprises that those are to application such as very high e-bank of safety requirements and electronics security.
Web is applied in live and work for people when offering convenience, and also brought a lot of safety problems, and script injection attacks incident is a topmost safety problem in these safety problems.What the script injection attacks existed has its source in: there is defective in the web application code, it fails user input data is endured strict scrutiny and filters, to such an extent as to malicious attacker can be injected malicious script by user input fields, the malicious script of these injections is used the dynamic content webpage provide by Web and is presented to victim's Web browser and carry out, and steals victim's sensitive data or carry out purpose such as malicious action under victim's safe context environment thereby reach.Organize the OWASP statistics according to the opening of internationally famous Web safety, 2007, script injection attacks incident (comprise the cross-site scripting attack incident, it belongs to script injection attacks category) occupy first of the ten big Web security incidents.Organize 2002 to 2007 statistics about script injection attacks incident in CVE storehouse from international vulnerability database, the occurrence frequency of script injection attacks security incident just is being growth trend year by year.
The Web that has its source in that script injection attacks leak exists uses the defective on realizing, and these programming defectives can not be avoided fully, therefore, need help find script injection attacks security breaches potential in the Web application and before being utilized, repair security breaches by the Web application safety vulnerability scanning instrument of special use by the code patch mode by the hacker.But, Web application safety vulnerability scanning instrument can not be found all script injection attacks security breaches, and, utilized script to inject the dynamic content webpage that comprises malevolence injection script that security breaches produce by the hacker for those, Web service security scan instrument also can't detect.
Summary of the invention
The invention provides a kind of malevolence injection script web page detection method and system.Malevolence injection script web page detection method of the present invention and system have overcome traditional Web security scan method and system can only find to comprise the webpage of script injection loophole defective and can't find that those have been utilized the script injection loophole successfully to inject the defective of the dynamic content webpage of malicious script by the hacker, is to replenish having the favourable of Web site information safety risk estimating method and system now.The basic detection principle of malevolence injection script web page detection method of the present invention and system is: most successful script injection attacks all will change the DOM Document Object Model structure of its dynamic content web page template, if can successfully extract each dynamic content web page template, then, can detect this dynamic content webpage and whether comprise malevolence injection script by dynamic content web document object model structure and web page template are compared.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of malevolence injection script web page detection method, described malevolence injection script web page detection method may further comprise the steps:
Use the spiders traversal and download the step that is scanned all webpages of website;
To downloading the step that webpage carries out cluster analysis and extracts webpage bunch template;
Utilize each webpage in the webpage bunch template detection bunch whether to comprise the step of malevolence injection script.
Preferably, may further comprise the steps of described malevolence injection script web page detection method to downloading the step that webpage carries out cluster analysis and extract webpage bunch template:
Preliminary treatment is carried out in set to web pages downloaded, filters out those and the relevant webpage of static Web object requests, only keeps those dynamic content webpages relevant with the Dynamic Web object requests;
According to the webpage URL(uniform resource locator) dynamic content webpage is carried out cluster, obtain the dynamic content webpage bunch after the cluster;
For each dynamic content webpage bunch, extract its common document object model tree as this dynamic content webpage bunch template.
Preferably, the step of utilizing each webpage in the webpage bunch template detection bunch whether to comprise malevolence injection script of described malevolence injection script web page detection method may further comprise the steps:
For each dynamic content webpage bunch, will bunch in each dynamic web page be converted to document object model tree, and compare with this webpage bunch template, find each DOM Document Object Model subtree that exceeds webpage bunch template contours;
For each the DOM Document Object Model subtree that exceeds webpage bunch template contours, attempt therefrom to extract injection script;
The injection script that extracts is carried out grammaticality detect,, confirm that then its host's dynamic content webpage is the webpage that comprises malevolence injection script if grammer is correct.
Preferably, the extraction injection script step from the DOM Document Object Model subtree set that exceeds webpage bunch template contours of described malevolence injection script web page detection method is the combination in any of following 5 kinds of script extracting methods:
From the DOM Document Object Model subtree each<script extract the Javascript/VBScript script the label;
From the event-driven function of each html tag of DOM Document Object Model subtree, extract the Javascript/VBScript script;
From the particular attribute-value of each html tag of DOM Document Object Model subtree, extract the Javascript/VBScript script;
From the DOM Document Object Model subtree each<STYLE extract the Javascript/VBScript script the CSS of label definition;
From the CSS that the Style attribute of each html tag of DOM Document Object Model subtree is introduced, extract the JavaScript/VBScript script.
Preferably, the injection script to extracting of described a kind of malevolence injection script web page detection method carries out grammer and detects the grammer detection that step is supported two kinds of scripts: if the script that extracts is the Javascript script, then adopts standard Javascript syntax gauge that the JavaScript script that extracts is carried out grammer and detect; If the script type of extracting is VBScript, then adopts standard VBScript syntax gauge that the VBScript script that extracts is carried out grammer and detect.
A kind of malevolence injection script web page detection system, described malevolence injection script web page detection system comprises:
The spiders module; Dynamic content home page filter module; Dynamic content webpage cluster module; Webpage bunch template extraction module; The malevolence injection script web page detection module.
Wherein, described spiders module traversal and download are scanned all webpages of website;
Described dynamic content home page filter module is connected with the spiders module, reception is by the set of spiders module web pages downloaded and carry out preliminary treatment, filter out those and the relevant webpage of static Web object requests, only keep those dynamic content webpages relevant with the Dynamic Web object requests;
Described dynamic content webpage cluster module is connected with dynamic content home page filter module, dynamic content collections of web pages behind the receiving filtration, according to webpage URL(uniform resource locator) (Universal Resource Location, URL) the dynamic content webpage is carried out cluster, obtain the dynamic content webpage bunch after the cluster;
Described webpage bunch template extraction module is connected with dynamic content webpage cluster module, receive the webpage that obtains after the cluster bunch, will bunch in all dynamic content webpages be converted to corresponding document object model tree, the maximum shared trunk tree of extracting these document object model tree is as this webpage bunch template;
Described malevolence injection script web page detection module is connected with dynamic content webpage cluster module with webpage bunch template extraction module, utilizes whether each webpage comprises malevolence injection script in the webpage bunch template detection dynamic content webpage bunch.
Beneficial effect of the present invention: the present invention utilizes the cluster analysis technology that all dynamic web contents that are scanned the website are carried out cluster analysis by unified character locator URL, and utilize each dynamic content webpage that the DOM Document Object Model structure analysis method obtains cluster analysis bunch to carry out the extraction of webpage bunch template, based on the webpage bunch template of extracting, whether comprise injection script in the DOM Document Object Model subtree that each dynamic content webpage exceeds webpage bunch template contours and detect malevolence injection script web page by detecting.Malevolence injection script web page detection method of the present invention and system have only the malevolence injection script web page that could find by security expert's manual analysis before can detecting those, are traditional Web website script to be injected a kind of favourable of security scan method and system replenish.
Description of drawings
Fig. 1 is a malevolence injection script web page detection method job step of the present invention;
Fig. 2 is a conventional web reptile workflow;
Fig. 3 is for carrying out the cluster analysis step to the dynamic content webpage of downloading;
Fig. 4 is a structure dynamic content webpage bunch embodiment;
Fig. 5 is a step of extracting webpage bunch template from dynamic content page bunch;
Fig. 6 is for to be converted to document object model tree embodiment with the dynamic content webpage;
Fig. 7 extracts maximum common document object model tree embodiment from two document object model tree;
Fig. 8 exceeds webpage bunch template document object model subtree set embodiment for extracting from dynamic content page;
Fig. 9 is for carrying out the workflow diagram that injection script detects to the DOM Document Object Model subtree set that exceeds webpage bunch template contours;
Figure 10 is used to realize that for described injection script to extracting carries out grammaticality testing process figure;
Figure 11 is malevolence injection script web page detection system structural framing figure of the present invention.
The present invention is further described below in conjunction with drawings and Examples.
Embodiment
As shown in Figure 1, malevolence injection script web page detection method of the present invention may further comprise the steps:
101 use the spiders traversal and download all webpages that are scanned the website;
The dynamic content webpage of 102 pairs of downloads carries out cluster analysis;
103 extract the webpage bunch template of each dynamic content webpage bunch;
104 utilize all malevolence injection script web pages in the webpage bunch template detection bunch;
105 structure malevolence injection script web page alert events are also reported to the police.
Employed spiders is known spiders technology in the described malevolence injection script detection method.For making those of ordinary skills can better grasp malevolence injection script web page detection method of the present invention, simply introduce the workflow of spiders here.As shown in Figure 2, spiders be one from specifying the Web website to extract the program of webpage automatically, it comprises that a webpage grasps module 210, web page contents analysis module 220, a web storage module 230 and a webpage and grasps task queue 240.At first, the uniform resource position mark URL that is scanned the one or several Initial pages of Web website 250 being joined webpage as first batch of webpage extracting task grasps in the task queue 240; The module 210 that grasps webpage grasps from webpage and obtains webpage extracting task the task queue, and the simulation Web browser sends the HTTP request to Web server, receives the html page content of web server response; The html page content that receives is sent to web page contents analysis module 220 and does content analysis; Web page contents analysis module 220 extracts all and belongs to the super connection that is scanned the website from the html page content that receives, and generates new webpage extracting task and join webpage and grasp in the task queue 240; Simultaneously, the html page content that web page contents analysis module 220 has been analyzed is intactly stored in the web storage module 230, for described other resume module of malevolence injection script web page detection system; When being empty in the webpage extracting task queue, the spiders task is finished.
As shown in Figure 3, in the malevolence injection script web page detection method of the present invention, described dynamic content webpage to download carries out the cluster analysis step and may further comprise the steps:
310 collections of web pages data cleansing pre-treatment step, promptly preliminary treatment is carried out in set to web pages downloaded, filters out those and the relevant webpage of static Web object requests, only keeps those dynamic content webpages relevant with the Dynamic Web object requests;
320 dynamic content webpage cluster analysis steps are promptly carried out cluster according to the webpage uniform resource position mark URL to the dynamic content webpage, obtain the dynamic content webpage bunch after the cluster.
The described collections of web pages data cleansing pre-treatment step 310 concrete courses of work are: for each html web page of downloading, determine subsequent operation according to its uniform resource position mark URL attribute: if the Web obj ect file extension name of this URL request shows that it is a static Web object, then filters out this html web page; Otherwise, show that the Web object that this URL asks is a dynamic content webpage, keep these html web pages to do further cluster analysis.Here, the Web object that can show the URL request is that the file extension of static Web object includes but not limited to following extension name: .GIF .png, and .jpg .html .htm .txt .pdf, .GIF .doc .exe; These extension name also can be by artificial appointment.
The concrete course of work of described cluster analysis step 320 to the residue dynamic content collections of web pages after the process data cleansing is: at first, all dynamic content webpage URL are resolved by { subdirectory structure }+{ filename }+{ parameter }, such as general/cgi-bin/documents/printpost.asp? pid=123 is decomposed into subdirectory structure "/cgi-bin/documents/ ", filename " printpost.asp " and parameter " and pid=123 "; Then, all dynamic content webpages are carried out cluster, obtain initial webpage bunch according to URL subdirectory structure and filename; At last, for each Initial page bunch,, then this webpage bunch is carried out further cluster analysis according to the URL parameter format if having webpage to comprise the URL parameter in this webpage bunch.
Embodiment 1
Embodiment 1 is the described example that the dynamic content collections of web pages is carried out cluster analysis.Here suppose that spiders downloaded 8 webpages from being scanned the website, its corresponding URL is as follows respectively:
1)/cgi-bin/bbs/printpost.asp?pid=123
2)/cgi-bin/bbs/printpost.asp?pid=140
3)/documents/teaching/chapterl.htm
4)/cgi-bin/authors/authorsdetail?aid=1400
5)/documents/teaching/chapter3.html
6)/images/teaching/logo.GIF
7)/cgi-bin/authors/authorsdetail.asp?aid=1450
8)/documents/pdf/introduction.pdf
At first, the Web obj ect file extension name according to the URL request filters out and the relevant webpage of static Web object requests; Here, those file extent are by name " .pdf "; " .htm "; " .GIF "; " .html " the Web object that URL asked is evident as static Web object; therefore filter out URL3, URL5, URL6 and URL8, and only remaining URL1, URL2, the pairing webpage of URL4, URL7 are the dynamic content webpage.
Then, these four URL are carried out cluster by bibliographic structure and filename, obtain two initial webpages bunch: webpage bunch 1 is { URL1, URL2}; Webpage bunches 2 is { URL4, URL7};
At last, to each Initial page bunch, carry out cluster once more by the URL parameter format.Be very easy to find: the parameter of two URL of webpage bunch 1 is respectively " pid=123 " and " pid=140 ", they have identical URL parameter format " pid=integer ", therefore, the pairing dynamic content webpage of URL1 and URL2 belongs to same webpage bunch; The parameter of two URL of webpage bunches 2 is respectively " aid=1400 " and " aid=1450 ", they have identical URL parameter format " aid=integer ", so URL4 and the pairing dynamic content webpage of URL7 belong to same webpage bunch.In this way Gou Zao dynamic content webpage bunch as shown in Figure 4, here, printpost.asp node 450 expression dynamic content webpages bunch 1, it comprises URL1 and URL2; Adetail.asp node 460 expression dynamic content webpages bunches 2, it comprises URL4 and URL7.
As shown in Figure 5, the described step of extracting webpage bunch template from dynamic content page bunch is as follows:
A certain dynamic content webpage during 510 a webpages bunch template initial value is set to bunch, and convert corresponding document object model tree T to m
520 get the webpage bunch a certain webpage k of residue collections of web pages, are converted into document object model tree T k
530 extract document object model tree T mAnd T kMaximum common document object model tree T g, and webpage bunch stencil value is set is T g
540 repeating step 520 and steps 530 are empty up to a webpage bunch residue collections of web pages, output webpage bunch stencil value.
Embodiment 2
The embodiment that described dynamic content webpage with example in the table 1 is converted to document object model tree as shown in Figure 6.
A dynamic content webpage of table 1 example
<html>
<head><title>BBS?group</title></head>
<body>
<div?align=center>
<B>Good?Morning.Alice!</B>
</div>
</body>
</html>
In the accompanying drawing 6, each html tag is expressed as a node in the document object model tree, and the hierarchical relationship between each html tag is expressed as subtree and child node relationships in document object model tree.
Embodiment 3
Described from document object model tree T mAnd T kThe maximum common document object model tree T of middle extraction gAn embodiment as shown in Figure 7.As shown in accompanying drawing 7, document object model tree T mComprise 8 nodes, document object model tree T kComprise 9 nodes; The maximum common document object model tree T that extracts gComprise 7 nodes.
Embodiment 4
Described extraction dynamic content webpage exceed webpage bunch template contours DOM Document Object Model subtree set an embodiment as shown in Figure 8.In the accompanying drawing 8, the pairing document object model tree of webpage bunch template is by T mExpression, the pairing document object model tree of dynamic content webpage is by T kExpression; The DOM Document Object Model subtree set that exceeds webpage bunch template contours of extracting comprises two subtrees: subtree that is made of independent FONT node 831 and the subtree that is made of SCRIPT node 832 and TEXT node 833.
As shown in Figure 9, describedly the step that injection script detects is carried out in the DOM Document Object Model subtree set that exceeds webpage bunch template contours may further comprise the steps:
910 for each DOM Document Object Model subtree in the set, attempts therefrom to extract all possible injection script;
All injection scripts of 920 pairs of extractions carry out grammaticality and detect;
If 930 exist the correct injection script of grammer, then marking host's dynamic content webpage relevant documentation object model subtree is the script injection region;
If 940 DOM Document Object Model subtree sets are combined into sky, then whole detection finishes; Otherwise turning back to step 910 continues to carry out.
Described from the DOM Document Object Model subtree, extract might injection script step 910 be the combination in any of following 5 kinds of script extracting methods:
A) from the DOM Document Object Model subtree each<script extract the Javascript/VBScript script the label;
B) set from document object mould type and extract the Javascript/VBScript script the event-driven function of each html tag;
C) set from document object mould type and extract the Javascript/VBScript script the particular attribute-value of each html tag;
D) from the DOM Document Object Model subtree each<STYLE extract the Javascript/VBScript script the CSS of label definition;
E) from the CSS that the Style attribute of each html tag of DOM Document Object Model subtree is introduced, extract the JavaScript/VBScript script.
The step 920 that the grammaticality of carrying out described injection script to extraction detects is supported the detection to two kinds of injection scripts: if the script that extracts is the Javascript script, then adopt standard Javascript syntax gauge that the JavaScript script that extracts is carried out the grammer detection; If the script type of extracting is VBScript, then adopts standard VBScript syntax gauge that the VBScript script that extracts is carried out grammer and detect.Accompanying drawing 10 is used to realize that for described injection script to extracting carries out grammaticality testing process figure.With Javascript type injection script is example, at first, morphological analysis rule 1001 according to JavaScript morphology normalized definition definition Javascript script generates corresponding lexical analyzer by the lexical analyzer Core Generator according to morphological analysis rule 1001 then; The lexical analyzer Core Generator here can adopt GNU FLEX lexical analyzer instrument, also can adopt other instrument; Then, according to JavaScript syntax gauge definition JavaScript rule governing parsing 1002, generate corresponding syntax analyzer by the syntax analyzer Core Generator according to JavaScript rule governing parsing 1002 then; The syntax analyzer Core Generator here can adopt the GNUYACC/BISON instrument, also can adopt other instrument.When Javascript script grammaticality detected, each section JavaScript script 1003 for extracting at first carried out morphological analysis via lexical analyzer 1004, obtains a series of lexical token; These lexical tokens will be imported into syntactic analysis, and it 1005 does the grammaticality analysis.Grammaticality detection for the VBScript type script that extracts is similar with JavaScript type script workflow, and the pairing morphological analysis rule of different just VBScript type scripts is different with rule governing parsing.
For all injection scripts that from the DOM Document Object Model subtree that exceeds webpage bunch template, extract, if have at least one section injection script to pass through the grammaticality inspection of syntax analyzer, then marking relevant documentation object model subtree is the script injection region, and produces an affair alarm that detects malevolence injection script in this host's dynamic content webpage.
As shown in Figure 11, malevolence injection script web page detection system of the present invention comprises spiders module 1110, dynamic content home page filter module 1120, dynamic content webpage cluster module 1130, webpage bunch template extraction module 1140 and malevolence injection script web page detection module 1150, wherein, described spiders module 1110 traversals and download are scanned all webpages of website; Described dynamic content home page filter module 1120 is connected with spiders module 1110, reception is by the set of spiders module web pages downloaded and carry out preliminary treatment, filter out those and the relevant webpage of static Web object requests, only keep those dynamic content webpages relevant with the Dynamic Web object requests; Described dynamic content webpage cluster module 1130 is connected with dynamic content home page filter module 1120, dynamic content collections of web pages behind the receiving filtration, (URL) carries out cluster to the dynamic content webpage according to the webpage URL(uniform resource locator), obtains the dynamic content webpage bunch after the cluster; Described webpage bunch template extraction module 1140 is connected with dynamic content webpage cluster module 1130, receive the webpage that obtains after each cluster bunch, all dynamic content webpages in each bunch are converted to corresponding document object model tree, and the maximum shared trunk tree of extracting these document object model tree is as this webpage bunch template; Described malevolence injection script web page detection module 1150 is connected with dynamic content webpage cluster module 1130 with webpage bunch template extraction module 1140, utilizes whether each webpage comprises malevolence injection script in the webpage bunch template detection dynamic content webpage bunch.

Claims (7)

1. malevolence injection script web page detection method is characterized in that may further comprise the steps:
1) use spiders traversal and download to be scanned the step of all webpages of website;
2) to downloading the step that webpage carries out cluster analysis;
3) extract the step of the webpage bunch template of each dynamic content webpage bunch;
4) utilize each dynamic content webpage in the webpage bunch template detection bunch whether to comprise the step of malevolence injection script.
2. malevolence injection script web page detection method as claimed in claim 1 is characterized in that, describedly may further comprise the steps downloading the step that webpage carries out cluster analysis:
1) preliminary treatment is carried out in set to web pages downloaded, filters out those and the relevant webpage of static Web object requests, only keeps those dynamic content webpages relevant with the Dynamic Web object requests;
2) according to the webpage uniform resource position mark URL dynamic content webpage is carried out cluster, obtain the dynamic content webpage bunch after the cluster.
3. malevolence injection script web page detection method as claimed in claim 1, it is characterized in that, the step of each dynamic content webpage bunch template of described extraction is: for each dynamic content webpage bunch, extract its common document object model tree as this dynamic content webpage bunch template.
4. malevolence injection script web page detection method as claimed in claim 1 is characterized in that, the described step of utilizing each webpage in the webpage bunch template detection bunch whether to comprise malevolence injection script may further comprise the steps:
1) for each dynamic content webpage bunch, will bunch in each dynamic web page be converted to document object model tree, and compare with this webpage bunch template, find each DOM Document Object Model subtree that exceeds webpage bunch template contours;
2) for each the DOM Document Object Model subtree that exceeds webpage bunch template contours, attempt therefrom to extract injection script;
3) injection script that extracts is carried out grammaticality and detect,, confirm that then its host's dynamic content webpage is the webpage that comprises malevolence injection script if grammer is correct.
5. malevolence injection script web page detection method as claimed in claim 4 is characterized in that, the described step of extracting injection script from the DOM Document Object Model subtree that exceeds webpage bunch template contours is the combination in any of following 5 kinds of script extracting methods:
1) from document object model tree each<script extract the Javascript/VBScript script the label;
2) from the event-driven function of each html tag of document object model tree, extract the Javascript/VBScript script;
3) from the particular attribute-value of each html tag of document object model tree, extract the Javascript/VBScript script;
4) from document object model tree each<STYLE extract Javascript/VB Script script the CSS of label definition;
5) from the CSS that the Style attribute of each html tag of document object model tree is introduced, extract the JavaScript/VBScript script.
6. a kind of malevolence injection script web page detection method as claimed in claim 4, it is characterized in that, described injection script to extraction carries out grammer and detects the detection of supporting two kinds of injection script language: if the script that extracts is the Javascript script, then adopt standard Javascript syntax gauge that the JavaScript script that extracts is carried out the grammer detection; If the script type of extracting is VBScript, then adopts standard VBScript syntax gauge that the VBScript script that extracts is carried out grammer and detect.
7. malevolence injection script web page detection system, it is characterized in that comprising spiders module, dynamic content home page filter module, dynamic content webpage cluster module, webpage bunch template extraction module and malevolence injection script web page detection module, wherein, described spiders module traversal and download are scanned all webpages of website; Described dynamic content home page filter module is connected with the spiders module, reception is by the set of spiders module web pages downloaded and carry out preliminary treatment, filter out those and the relevant webpage of static Web object requests, only keep those dynamic content webpages relevant with the Dynamic Web object requests; Described dynamic content webpage cluster module is connected with dynamic content home page filter module, and the dynamic content collections of web pages behind the receiving filtration is carried out cluster according to the webpage URL(uniform resource locator) to the dynamic content webpage, obtains the dynamic content webpage bunch after the cluster; Described webpage bunch template extraction module is connected with dynamic content webpage cluster module, receive the webpage that obtains after each cluster bunch, all dynamic content webpages in each bunch are converted to corresponding document object model tree, and the maximum shared trunk tree of extracting these document object model tree is as this webpage bunch template; Described malevolence injection script web page detection module is connected with dynamic content webpage cluster module with webpage bunch template extraction module, utilizes whether each webpage comprises malevolence injection script in the webpage bunch template detection dynamic content webpage bunch.
CN2007103039855A 2007-12-24 2007-12-24 Detection method and system for malevolence injection script web page Expired - Fee Related CN101471818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007103039855A CN101471818B (en) 2007-12-24 2007-12-24 Detection method and system for malevolence injection script web page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007103039855A CN101471818B (en) 2007-12-24 2007-12-24 Detection method and system for malevolence injection script web page

Publications (2)

Publication Number Publication Date
CN101471818A true CN101471818A (en) 2009-07-01
CN101471818B CN101471818B (en) 2011-05-04

Family

ID=40828961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007103039855A Expired - Fee Related CN101471818B (en) 2007-12-24 2007-12-24 Detection method and system for malevolence injection script web page

Country Status (1)

Country Link
CN (1) CN101471818B (en)

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102147842A (en) * 2010-07-23 2011-08-10 卡巴斯基实验室封闭式股份公司 Defense of malware of network resource
CN102222310A (en) * 2011-07-18 2011-10-19 深圳证券信息有限公司 Security information publishing method and platform
CN102446255A (en) * 2011-12-30 2012-05-09 奇智软件(北京)有限公司 Method and device for detecting page tamper
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN102902686A (en) * 2011-07-27 2013-01-30 腾讯科技(深圳)有限公司 Web page detection method and system
CN102999420A (en) * 2011-09-13 2013-03-27 阿里巴巴集团控股有限公司 XSS (Cross Site Scripting) testing method and XSS testing system based on DOM (Document Object Model)
CN102129528B (en) * 2010-01-19 2013-05-15 北京启明星辰信息技术股份有限公司 WEB page tampering identification method and system
CN103428219A (en) * 2013-08-25 2013-12-04 金华比奇网络技术有限公司 Web vulnerability scanning method based on webpage template matching
CN103905422A (en) * 2013-12-17 2014-07-02 哈尔滨安天科技股份有限公司 Method and system for searching for webshell with assistance of local simulation request
CN104063491A (en) * 2011-12-30 2014-09-24 北京奇虎科技有限公司 Method and device for detecting page distortion
US8892687B1 (en) 2013-12-06 2014-11-18 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
US8954583B1 (en) 2014-01-20 2015-02-10 Shape Security, Inc. Intercepting and supervising calls to transformed operations and objects
US8997226B1 (en) 2014-04-17 2015-03-31 Shape Security, Inc. Detection of client-side malware activity
CN104657271A (en) * 2015-03-03 2015-05-27 成都金盘电子科大多媒体技术有限公司 Automatic testing method for standard conformance of health information shared documents
US9083739B1 (en) 2014-05-29 2015-07-14 Shape Security, Inc. Client/server authentication using dynamic credentials
US9210171B1 (en) 2014-05-29 2015-12-08 Shape Security, Inc. Selectively protecting valid links to pages of a web site
US9225737B2 (en) 2013-03-15 2015-12-29 Shape Security, Inc. Detecting the introduction of alien content
US9225729B1 (en) 2014-01-21 2015-12-29 Shape Security, Inc. Blind hash compression
US9338143B2 (en) 2013-03-15 2016-05-10 Shape Security, Inc. Stateless web content anti-automation
US9405910B2 (en) 2014-06-02 2016-08-02 Shape Security, Inc. Automatic library detection
US9438625B1 (en) 2014-09-09 2016-09-06 Shape Security, Inc. Mitigating scripted attacks using dynamic polymorphism
US9479529B2 (en) 2014-07-22 2016-10-25 Shape Security, Inc. Polymorphic security policy action
US9479526B1 (en) 2014-11-13 2016-10-25 Shape Security, Inc. Dynamic comparative analysis method and apparatus for detecting and preventing code injection and other network attacks
US9544329B2 (en) 2014-03-18 2017-01-10 Shape Security, Inc. Client/server security by an intermediary executing instructions received from a server and rendering client application instructions
US9608975B2 (en) 2015-03-30 2017-03-28 Shape Security, Inc. Challenge-dynamic credential pairs for client/server request validation
US9720814B2 (en) 2015-05-22 2017-08-01 Microsoft Technology Licensing, Llc Template identification for control of testing
CN107590387A (en) * 2017-09-04 2018-01-16 杭州安恒信息技术有限公司 EL expression formula injection loopholes detection method, device and electronic equipment
CN107948120A (en) * 2016-10-12 2018-04-20 阿里巴巴集团控股有限公司 leak detection method and device
US9954893B1 (en) 2014-09-23 2018-04-24 Shape Security, Inc. Techniques for combating man-in-the-browser attacks
CN108133037A (en) * 2018-01-09 2018-06-08 广东电网有限责任公司电力科学研究院 A kind of webpage vulnerability scanning method and system
CN108920955A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108920950A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108985059A (en) * 2018-06-29 2018-12-11 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN109165333A (en) * 2018-07-12 2019-01-08 电子科技大学 A kind of high speed Theme Crawler of Content method based on web data
US10212130B1 (en) 2015-11-16 2019-02-19 Shape Security, Inc. Browser extension firewall
US10367903B2 (en) 2015-05-21 2019-07-30 Shape Security, Inc. Security systems for mitigating attacks from a headless browser executing on a client computer
US10375026B2 (en) 2015-10-28 2019-08-06 Shape Security, Inc. Web transaction status tracking
US10447726B2 (en) 2016-03-11 2019-10-15 Shape Security, Inc. Mitigating attacks on server computers by enforcing platform policies on client computers
US10536479B2 (en) 2013-03-15 2020-01-14 Shape Security, Inc. Code modification for automation detection
CN110768943A (en) * 2018-09-20 2020-02-07 哈尔滨安天科技集团股份有限公司 Polymorphic URL detection method and device and storage medium
US10567419B2 (en) 2015-07-06 2020-02-18 Shape Security, Inc. Asymmetrical challenges for web security
US10567363B1 (en) 2016-03-03 2020-02-18 Shape Security, Inc. Deterministic reproduction of system state using seeded pseudo-random number generators
US10567386B2 (en) 2015-07-07 2020-02-18 Shape Security, Inc. Split serving of computer code
WO2020057523A1 (en) * 2018-09-18 2020-03-26 华为技术有限公司 Method and device for triggering vulnerability detection
US10868819B2 (en) 2014-09-19 2020-12-15 Shape Security, Inc. Systems for detecting a headless browser executing on a client computer
CN112100083A (en) * 2020-11-13 2020-12-18 北京智慧星光信息技术有限公司 Crawler template change monitoring method and system, electronic equipment and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8869281B2 (en) 2013-03-15 2014-10-21 Shape Security, Inc. Protecting against the introduction of alien content
US9825984B1 (en) 2014-08-27 2017-11-21 Shape Security, Inc. Background analysis of web content
US10044750B2 (en) 2015-01-16 2018-08-07 Microsoft Technology Licensing, Llc Code labeling based on tokenized code samples
EP3414695B1 (en) 2016-02-12 2021-08-11 Shape Security, Inc. Reverse proxy computer: deploying countermeasures in response to detecting an autonomous browser executing on a client computer
US9917850B2 (en) 2016-03-03 2018-03-13 Shape Security, Inc. Deterministic reproduction of client/server computer state or output sent to one or more client computers

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100478953C (en) * 2006-09-28 2009-04-15 北京理工大学 Static feature based web page malicious scenarios detection method

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129528B (en) * 2010-01-19 2013-05-15 北京启明星辰信息技术股份有限公司 WEB page tampering identification method and system
CN102147842A (en) * 2010-07-23 2011-08-10 卡巴斯基实验室封闭式股份公司 Defense of malware of network resource
CN102147842B (en) * 2010-07-23 2014-12-10 卡巴斯基实验室封闭式股份公司 System of and method for defending a malware of network resource
CN102222310A (en) * 2011-07-18 2011-10-19 深圳证券信息有限公司 Security information publishing method and platform
CN102902686A (en) * 2011-07-27 2013-01-30 腾讯科技(深圳)有限公司 Web page detection method and system
CN102999420A (en) * 2011-09-13 2013-03-27 阿里巴巴集团控股有限公司 XSS (Cross Site Scripting) testing method and XSS testing system based on DOM (Document Object Model)
CN102999420B (en) * 2011-09-13 2016-02-03 阿里巴巴集团控股有限公司 Based on cross site scripting leak method of testing and the system of DOM
CN104063491A (en) * 2011-12-30 2014-09-24 北京奇虎科技有限公司 Method and device for detecting page distortion
CN102446255A (en) * 2011-12-30 2012-05-09 奇智软件(北京)有限公司 Method and device for detecting page tamper
CN102446255B (en) * 2011-12-30 2014-06-25 奇智软件(北京)有限公司 Method and device for detecting page tamper
CN104063491B (en) * 2011-12-30 2018-07-24 北京奇虎科技有限公司 A kind of method and device that the detection page is distorted
CN102682098B (en) * 2012-04-27 2014-05-14 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
US9338143B2 (en) 2013-03-15 2016-05-10 Shape Security, Inc. Stateless web content anti-automation
US10536479B2 (en) 2013-03-15 2020-01-14 Shape Security, Inc. Code modification for automation detection
US9609006B2 (en) 2013-03-15 2017-03-28 Shape Security, Inc. Detecting the introduction of alien content
US9225737B2 (en) 2013-03-15 2015-12-29 Shape Security, Inc. Detecting the introduction of alien content
CN103428219A (en) * 2013-08-25 2013-12-04 金华比奇网络技术有限公司 Web vulnerability scanning method based on webpage template matching
CN103428219B (en) * 2013-08-25 2016-05-18 金华比奇网络技术有限公司 A kind of web vulnerability scanning method based on web page template coupling
US10027628B2 (en) 2013-12-06 2018-07-17 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
US9270647B2 (en) 2013-12-06 2016-02-23 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
US8892687B1 (en) 2013-12-06 2014-11-18 Shape Security, Inc. Client/server security by an intermediary rendering modified in-memory objects
CN103905422A (en) * 2013-12-17 2014-07-02 哈尔滨安天科技股份有限公司 Method and system for searching for webshell with assistance of local simulation request
CN103905422B (en) * 2013-12-17 2017-04-26 哈尔滨安天科技股份有限公司 Method and system for searching for webshell with assistance of local simulation request
US8954583B1 (en) 2014-01-20 2015-02-10 Shape Security, Inc. Intercepting and supervising calls to transformed operations and objects
US10652275B2 (en) 2014-01-20 2020-05-12 Shape Security, Inc. Management of calls to transformed operations and objects
US9225729B1 (en) 2014-01-21 2015-12-29 Shape Security, Inc. Blind hash compression
US9544329B2 (en) 2014-03-18 2017-01-10 Shape Security, Inc. Client/server security by an intermediary executing instructions received from a server and rendering client application instructions
US8997226B1 (en) 2014-04-17 2015-03-31 Shape Security, Inc. Detection of client-side malware activity
US10187408B1 (en) 2014-04-17 2019-01-22 Shape Security, Inc. Detecting attacks against a server computer based on characterizing user interactions with the client computing device
US9705902B1 (en) 2014-04-17 2017-07-11 Shape Security, Inc. Detection of client-side malware activity
US9210171B1 (en) 2014-05-29 2015-12-08 Shape Security, Inc. Selectively protecting valid links to pages of a web site
US9083739B1 (en) 2014-05-29 2015-07-14 Shape Security, Inc. Client/server authentication using dynamic credentials
US9621583B2 (en) 2014-05-29 2017-04-11 Shape Security, Inc. Selectively protecting valid links to pages of a web site
US9716702B2 (en) 2014-05-29 2017-07-25 Shape Security, Inc. Management of dynamic credentials
US11552936B2 (en) 2014-05-29 2023-01-10 Shape Security, Inc. Management of dynamic credentials
US9405910B2 (en) 2014-06-02 2016-08-02 Shape Security, Inc. Automatic library detection
US9479529B2 (en) 2014-07-22 2016-10-25 Shape Security, Inc. Polymorphic security policy action
US9438625B1 (en) 2014-09-09 2016-09-06 Shape Security, Inc. Mitigating scripted attacks using dynamic polymorphism
US10868819B2 (en) 2014-09-19 2020-12-15 Shape Security, Inc. Systems for detecting a headless browser executing on a client computer
US9954893B1 (en) 2014-09-23 2018-04-24 Shape Security, Inc. Techniques for combating man-in-the-browser attacks
US9479526B1 (en) 2014-11-13 2016-10-25 Shape Security, Inc. Dynamic comparative analysis method and apparatus for detecting and preventing code injection and other network attacks
CN104657271A (en) * 2015-03-03 2015-05-27 成都金盘电子科大多媒体技术有限公司 Automatic testing method for standard conformance of health information shared documents
CN104657271B (en) * 2015-03-03 2017-05-03 成都金盘电子科大多媒体技术有限公司 Automatic testing method for standard conformance of health information shared documents
US9608975B2 (en) 2015-03-30 2017-03-28 Shape Security, Inc. Challenge-dynamic credential pairs for client/server request validation
US10367903B2 (en) 2015-05-21 2019-07-30 Shape Security, Inc. Security systems for mitigating attacks from a headless browser executing on a client computer
US10798202B2 (en) 2015-05-21 2020-10-06 Shape Security, Inc. Security systems for mitigating attacks from a headless browser executing on a client computer
US9720814B2 (en) 2015-05-22 2017-08-01 Microsoft Technology Licensing, Llc Template identification for control of testing
US10567419B2 (en) 2015-07-06 2020-02-18 Shape Security, Inc. Asymmetrical challenges for web security
US10567386B2 (en) 2015-07-07 2020-02-18 Shape Security, Inc. Split serving of computer code
US11171925B2 (en) 2015-10-28 2021-11-09 Shape Security, Inc. Evaluating and modifying countermeasures based on aggregate transaction status
US10375026B2 (en) 2015-10-28 2019-08-06 Shape Security, Inc. Web transaction status tracking
US10826872B2 (en) 2015-11-16 2020-11-03 Shape Security, Inc. Security policy for browser extensions
US10212130B1 (en) 2015-11-16 2019-02-19 Shape Security, Inc. Browser extension firewall
US10567363B1 (en) 2016-03-03 2020-02-18 Shape Security, Inc. Deterministic reproduction of system state using seeded pseudo-random number generators
US10447726B2 (en) 2016-03-11 2019-10-15 Shape Security, Inc. Mitigating attacks on server computers by enforcing platform policies on client computers
CN107948120B (en) * 2016-10-12 2020-11-24 阿里巴巴集团控股有限公司 Vulnerability detection method and device
CN107948120A (en) * 2016-10-12 2018-04-20 阿里巴巴集团控股有限公司 leak detection method and device
CN107590387A (en) * 2017-09-04 2018-01-16 杭州安恒信息技术有限公司 EL expression formula injection loopholes detection method, device and electronic equipment
CN108133037A (en) * 2018-01-09 2018-06-08 广东电网有限责任公司电力科学研究院 A kind of webpage vulnerability scanning method and system
CN108985059A (en) * 2018-06-29 2018-12-11 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108920955A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108985059B (en) * 2018-06-29 2021-09-24 北京奇虎科技有限公司 Webpage backdoor detection method, device, equipment and storage medium
CN108920950A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108920955B (en) * 2018-06-29 2022-03-11 北京奇虎科技有限公司 Webpage backdoor detection method, device, equipment and storage medium
CN109165333A (en) * 2018-07-12 2019-01-08 电子科技大学 A kind of high speed Theme Crawler of Content method based on web data
WO2020057523A1 (en) * 2018-09-18 2020-03-26 华为技术有限公司 Method and device for triggering vulnerability detection
CN110768943A (en) * 2018-09-20 2020-02-07 哈尔滨安天科技集团股份有限公司 Polymorphic URL detection method and device and storage medium
CN112100083A (en) * 2020-11-13 2020-12-18 北京智慧星光信息技术有限公司 Crawler template change monitoring method and system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN101471818B (en) 2011-05-04

Similar Documents

Publication Publication Date Title
CN101471818B (en) Detection method and system for malevolence injection script web page
CN102129528B (en) WEB page tampering identification method and system
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
CN101035128B (en) Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
CN105760379B (en) Method and device for detecting webshell page based on intra-domain page association relation
CN102663319B (en) Prompting method and device for download link security
CN101895516B (en) Method and device for positioning cross-site scripting attack source
US20020065955A1 (en) Client-based objectifying of text pages
CN104021231B (en) The method and apparatus that webpage is shown in browser
CN104933168B (en) A kind of web page contents automatic acquiring method
US20140006430A1 (en) Indexing multimedia web content
WO2014145336A2 (en) Systems and methods for tokenizing and parsing user-generated content to enable the prevention of attacks
CN103345532A (en) Method and device for extracting webpage information
CN102915318A (en) Method and device for positioning and searching information in browser
CN103067387A (en) Monitoring system and monitoring method for anti phishing
CN103744987B (en) Video website media asset integrating method and system based on DOM tree matching
CN109976840A (en) The method and system of multilingual automatic adaptation are realized under a kind of separation platform based on front and back
CN103853770B (en) The method and system of model content in a kind of extraction forum Web pages
CN108573152A (en) Detect method, apparatus, server and the storage medium of SQL injection attack
CN101763432A (en) Method for constructing lightweight webpage dynamic view
CN103475673B (en) Fishing website recognition methods, device and client
CN103365919B (en) Web analysis container and method
CN108494728B (en) Method, device, equipment and medium for creating blacklist library for preventing traffic hijacking
CN114443928A (en) Web text data crawler method and system
AU2008360993B2 (en) Method and apparatus for generating standard document identifiers from content references

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110504

Termination date: 20161224