CN101471818A

CN101471818A - Detection method and system for malevolence injection script web page

Info

Publication number: CN101471818A
Application number: CNA2007103039855A
Authority: CN
Inventors: 叶润国; 胡振宇; 朱钱杭; 李博; 骆拥政; 牛妍萍
Original assignee: Beijing Venus Information Technology Co Ltd
Current assignee: Beijing Venus Information Technology Co Ltd
Priority date: 2007-12-24
Filing date: 2007-12-24
Publication date: 2009-07-01
Anticipated expiration: 2027-12-24
Also published as: CN101471818B

Abstract

The invention relates to a method for detecting a web page embedded with malicious scripts, and a system thereof, belonging to the technical field of computing network. The method comprises the following steps: traversing with a web page crawler and downloading all the web pages from a website to be scanned; performing cluster analysis on the downloaded web pages and extracting a web page cluster template; and detecting whether the web pages in the cluster contain embedded malicious scripts by using the web page cluster template. The system comprises a web page crawler module, a dynamic web page content flirtation module, a dynamic web page content clustering module, a web page cluster template extraction module and a embedded malicious script detection module.

Description

A kind of malevolence injection script web page detection method and system

Technical field

The present invention relates to a kind of malevolence injection script web page detection method and system, belong to technical field of the computer network.

Background technology

Along with the development of Internet technology and Web technology, Web is no longer only for the Internet user provides the static content service, and can provide various Dynamic Web content services according to user's needs.Because Web service has easy deployment and advantage such as easy-to-use, the application of now a lot of legacy clients/server modes all begins to be transformed into the application based on Web, comprises that those are to application such as very high e-bank of safety requirements and electronics security.

Web is applied in live and work for people when offering convenience, and also brought a lot of safety problems, and script injection attacks incident is a topmost safety problem in these safety problems.What the script injection attacks existed has its source in: there is defective in the web application code, it fails user input data is endured strict scrutiny and filters, to such an extent as to malicious attacker can be injected malicious script by user input fields, the malicious script of these injections is used the dynamic content webpage provide by Web and is presented to victim's Web browser and carry out, and steals victim's sensitive data or carry out purpose such as malicious action under victim's safe context environment thereby reach.Organize the OWASP statistics according to the opening of internationally famous Web safety, 2007, script injection attacks incident (comprise the cross-site scripting attack incident, it belongs to script injection attacks category) occupy first of the ten big Web security incidents.Organize 2002 to 2007 statistics about script injection attacks incident in CVE storehouse from international vulnerability database, the occurrence frequency of script injection attacks security incident just is being growth trend year by year.

The Web that has its source in that script injection attacks leak exists uses the defective on realizing, and these programming defectives can not be avoided fully, therefore, need help find script injection attacks security breaches potential in the Web application and before being utilized, repair security breaches by the Web application safety vulnerability scanning instrument of special use by the code patch mode by the hacker.But, Web application safety vulnerability scanning instrument can not be found all script injection attacks security breaches, and, utilized script to inject the dynamic content webpage that comprises malevolence injection script that security breaches produce by the hacker for those, Web service security scan instrument also can't detect.

Summary of the invention

The invention provides a kind of malevolence injection script web page detection method and system.Malevolence injection script web page detection method of the present invention and system have overcome traditional Web security scan method and system can only find to comprise the webpage of script injection loophole defective and can't find that those have been utilized the script injection loophole successfully to inject the defective of the dynamic content webpage of malicious script by the hacker, is to replenish having the favourable of Web site information safety risk estimating method and system now.The basic detection principle of malevolence injection script web page detection method of the present invention and system is: most successful script injection attacks all will change the DOM Document Object Model structure of its dynamic content web page template, if can successfully extract each dynamic content web page template, then, can detect this dynamic content webpage and whether comprise malevolence injection script by dynamic content web document object model structure and web page template are compared.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of malevolence injection script web page detection method, described malevolence injection script web page detection method may further comprise the steps:

Use the spiders traversal and download the step that is scanned all webpages of website;

To downloading the step that webpage carries out cluster analysis and extracts webpage bunch template;

Utilize each webpage in the webpage bunch template detection bunch whether to comprise the step of malevolence injection script.

Preferably, may further comprise the steps of described malevolence injection script web page detection method to downloading the step that webpage carries out cluster analysis and extract webpage bunch template:

Preliminary treatment is carried out in set to web pages downloaded, filters out those and the relevant webpage of static Web object requests, only keeps those dynamic content webpages relevant with the Dynamic Web object requests;

According to the webpage URL(uniform resource locator) dynamic content webpage is carried out cluster, obtain the dynamic content webpage bunch after the cluster;

For each dynamic content webpage bunch, extract its common document object model tree as this dynamic content webpage bunch template.

Preferably, the step of utilizing each webpage in the webpage bunch template detection bunch whether to comprise malevolence injection script of described malevolence injection script web page detection method may further comprise the steps:

For each dynamic content webpage bunch, will bunch in each dynamic web page be converted to document object model tree, and compare with this webpage bunch template, find each DOM Document Object Model subtree that exceeds webpage bunch template contours;

For each the DOM Document Object Model subtree that exceeds webpage bunch template contours, attempt therefrom to extract injection script;

The injection script that extracts is carried out grammaticality detect,, confirm that then its host's dynamic content webpage is the webpage that comprises malevolence injection script if grammer is correct.

Preferably, the extraction injection script step from the DOM Document Object Model subtree set that exceeds webpage bunch template contours of described malevolence injection script web page detection method is the combination in any of following 5 kinds of script extracting methods:

From the DOM Document Object Model subtree each＜script extract the Javascript/VBScript script the label;

From the event-driven function of each html tag of DOM Document Object Model subtree, extract the Javascript/VBScript script;

From the particular attribute-value of each html tag of DOM Document Object Model subtree, extract the Javascript/VBScript script;

From the DOM Document Object Model subtree each＜STYLE extract the Javascript/VBScript script the CSS of label definition;

From the CSS that the Style attribute of each html tag of DOM Document Object Model subtree is introduced, extract the JavaScript/VBScript script.

Preferably, the injection script to extracting of described a kind of malevolence injection script web page detection method carries out grammer and detects the grammer detection that step is supported two kinds of scripts: if the script that extracts is the Javascript script, then adopts standard Javascript syntax gauge that the JavaScript script that extracts is carried out grammer and detect; If the script type of extracting is VBScript, then adopts standard VBScript syntax gauge that the VBScript script that extracts is carried out grammer and detect.

A kind of malevolence injection script web page detection system, described malevolence injection script web page detection system comprises:

The spiders module; Dynamic content home page filter module; Dynamic content webpage cluster module; Webpage bunch template extraction module; The malevolence injection script web page detection module.

Wherein, described spiders module traversal and download are scanned all webpages of website;

Described dynamic content home page filter module is connected with the spiders module, reception is by the set of spiders module web pages downloaded and carry out preliminary treatment, filter out those and the relevant webpage of static Web object requests, only keep those dynamic content webpages relevant with the Dynamic Web object requests;

Described dynamic content webpage cluster module is connected with dynamic content home page filter module, dynamic content collections of web pages behind the receiving filtration, according to webpage URL(uniform resource locator) (Universal Resource Location, URL) the dynamic content webpage is carried out cluster, obtain the dynamic content webpage bunch after the cluster;

Described webpage bunch template extraction module is connected with dynamic content webpage cluster module, receive the webpage that obtains after the cluster bunch, will bunch in all dynamic content webpages be converted to corresponding document object model tree, the maximum shared trunk tree of extracting these document object model tree is as this webpage bunch template;

Described malevolence injection script web page detection module is connected with dynamic content webpage cluster module with webpage bunch template extraction module, utilizes whether each webpage comprises malevolence injection script in the webpage bunch template detection dynamic content webpage bunch.

Beneficial effect of the present invention: the present invention utilizes the cluster analysis technology that all dynamic web contents that are scanned the website are carried out cluster analysis by unified character locator URL, and utilize each dynamic content webpage that the DOM Document Object Model structure analysis method obtains cluster analysis bunch to carry out the extraction of webpage bunch template, based on the webpage bunch template of extracting, whether comprise injection script in the DOM Document Object Model subtree that each dynamic content webpage exceeds webpage bunch template contours and detect malevolence injection script web page by detecting.Malevolence injection script web page detection method of the present invention and system have only the malevolence injection script web page that could find by security expert's manual analysis before can detecting those, are traditional Web website script to be injected a kind of favourable of security scan method and system replenish.

Description of drawings

Fig. 1 is a malevolence injection script web page detection method job step of the present invention;

Fig. 2 is a conventional web reptile workflow;

Fig. 3 is for carrying out the cluster analysis step to the dynamic content webpage of downloading;

Fig. 4 is a structure dynamic content webpage bunch embodiment;

Fig. 5 is a step of extracting webpage bunch template from dynamic content page bunch;

Fig. 6 is for to be converted to document object model tree embodiment with the dynamic content webpage;

Fig. 7 extracts maximum common document object model tree embodiment from two document object model tree;

Fig. 8 exceeds webpage bunch template document object model subtree set embodiment for extracting from dynamic content page;

Fig. 9 is for carrying out the workflow diagram that injection script detects to the DOM Document Object Model subtree set that exceeds webpage bunch template contours;

Figure 10 is used to realize that for described injection script to extracting carries out grammaticality testing process figure;

Figure 11 is malevolence injection script web page detection system structural framing figure of the present invention.

The present invention is further described below in conjunction with drawings and Examples.

Embodiment

As shown in Figure 1, malevolence injection script web page detection method of the present invention may further comprise the steps:

101 use the spiders traversal and download all webpages that are scanned the website;

The dynamic content webpage of 102 pairs of downloads carries out cluster analysis;

103 extract the webpage bunch template of each dynamic content webpage bunch;

104 utilize all malevolence injection script web pages in the webpage bunch template detection bunch;

105 structure malevolence injection script web page alert events are also reported to the police.

Employed spiders is known spiders technology in the described malevolence injection script detection method.For making those of ordinary skills can better grasp malevolence injection script web page detection method of the present invention, simply introduce the workflow of spiders here.As shown in Figure 2, spiders be one from specifying the Web website to extract the program of webpage automatically, it comprises that a webpage grasps module 210, web page contents analysis module 220, a web storage module 230 and a webpage and grasps task queue 240.At first, the uniform resource position mark URL that is scanned the one or several Initial pages of Web website 250 being joined webpage as first batch of webpage extracting task grasps in the task queue 240; The module 210 that grasps webpage grasps from webpage and obtains webpage extracting task the task queue, and the simulation Web browser sends the HTTP request to Web server, receives the html page content of web server response; The html page content that receives is sent to web page contents analysis module 220 and does content analysis; Web page contents analysis module 220 extracts all and belongs to the super connection that is scanned the website from the html page content that receives, and generates new webpage extracting task and join webpage and grasp in the task queue 240; Simultaneously, the html page content that web page contents analysis module 220 has been analyzed is intactly stored in the web storage module 230, for described other resume module of malevolence injection script web page detection system; When being empty in the webpage extracting task queue, the spiders task is finished.

As shown in Figure 3, in the malevolence injection script web page detection method of the present invention, described dynamic content webpage to download carries out the cluster analysis step and may further comprise the steps:

310 collections of web pages data cleansing pre-treatment step, promptly preliminary treatment is carried out in set to web pages downloaded, filters out those and the relevant webpage of static Web object requests, only keeps those dynamic content webpages relevant with the Dynamic Web object requests;

320 dynamic content webpage cluster analysis steps are promptly carried out cluster according to the webpage uniform resource position mark URL to the dynamic content webpage, obtain the dynamic content webpage bunch after the cluster.

The described collections of web pages data cleansing pre-treatment step 310 concrete courses of work are: for each html web page of downloading, determine subsequent operation according to its uniform resource position mark URL attribute: if the Web obj ect file extension name of this URL request shows that it is a static Web object, then filters out this html web page; Otherwise, show that the Web object that this URL asks is a dynamic content webpage, keep these html web pages to do further cluster analysis.Here, the Web object that can show the URL request is that the file extension of static Web object includes but not limited to following extension name: .GIF .png, and .jpg .html .htm .txt .pdf, .GIF .doc .exe; These extension name also can be by artificial appointment.

The concrete course of work of described cluster analysis step 320 to the residue dynamic content collections of web pages after the process data cleansing is: at first, all dynamic content webpage URL are resolved by { subdirectory structure }+{ filename }+{ parameter }, such as general/cgi-bin/documents/printpost.asp? pid=123 is decomposed into subdirectory structure "/cgi-bin/documents/ ", filename " printpost.asp " and parameter " and pid=123 "; Then, all dynamic content webpages are carried out cluster, obtain initial webpage bunch according to URL subdirectory structure and filename; At last, for each Initial page bunch,, then this webpage bunch is carried out further cluster analysis according to the URL parameter format if having webpage to comprise the URL parameter in this webpage bunch.

Embodiment 1

Embodiment 1 is the described example that the dynamic content collections of web pages is carried out cluster analysis.Here suppose that spiders downloaded 8 webpages from being scanned the website, its corresponding URL is as follows respectively:

1)/cgi-bin/bbs/printpost.asp？pid＝123

2)/cgi-bin/bbs/printpost.asp？pid＝140

3)/documents/teaching/chapterl.htm

4)/cgi-bin/authors/authorsdetail？aid＝1400

5)/documents/teaching/chapter3.html

6)/images/teaching/logo.GIF

7)/cgi-bin/authors/authorsdetail.asp？aid＝1450

8)/documents/pdf/introduction.pdf

At first, the Web obj ect file extension name according to the URL request filters out and the relevant webpage of static Web object requests; Here, those file extent are by name " .pdf "; " .htm "; " .GIF "; " .html " the Web object that URL asked is evident as static Web object; therefore filter out URL3, URL5, URL6 and URL8, and only remaining URL1, URL2, the pairing webpage of URL4, URL7 are the dynamic content webpage.

Then, these four URL are carried out cluster by bibliographic structure and filename, obtain two initial webpages bunch: webpage bunch 1 is { URL1, URL2}; Webpage bunches 2 is { URL4, URL7};

At last, to each Initial page bunch, carry out cluster once more by the URL parameter format.Be very easy to find: the parameter of two URL of webpage bunch 1 is respectively " pid=123 " and " pid=140 ", they have identical URL parameter format " pid=integer ", therefore, the pairing dynamic content webpage of URL1 and URL2 belongs to same webpage bunch; The parameter of two URL of webpage bunches 2 is respectively " aid=1400 " and " aid=1450 ", they have identical URL parameter format " aid=integer ", so URL4 and the pairing dynamic content webpage of URL7 belong to same webpage bunch.In this way Gou Zao dynamic content webpage bunch as shown in Figure 4, here, printpost.asp node 450 expression dynamic content webpages bunch 1, it comprises URL1 and URL2; Adetail.asp node 460 expression dynamic content webpages bunches 2, it comprises URL4 and URL7.

As shown in Figure 5, the described step of extracting webpage bunch template from dynamic content page bunch is as follows:

A certain dynamic content webpage during 510 a webpages bunch template initial value is set to bunch, and convert corresponding document object model tree T to _m

520 get the webpage bunch a certain webpage k of residue collections of web pages, are converted into document object model tree T _k

530 extract document object model tree T _mAnd T _kMaximum common document object model tree T _g, and webpage bunch stencil value is set is T _g

540 repeating step 520 and steps 530 are empty up to a webpage bunch residue collections of web pages, output webpage bunch stencil value.

Embodiment 2

The embodiment that described dynamic content webpage with example in the table 1 is converted to document object model tree as shown in Figure 6.

A dynamic content webpage of table 1 example

<html>

<head><title>BBS?group</title></head>

<body>

<div?align＝center>

<B>Good?Morning.Alice！</B>

</div>

</body>

</html>

In the accompanying drawing 6, each html tag is expressed as a node in the document object model tree, and the hierarchical relationship between each html tag is expressed as subtree and child node relationships in document object model tree.

Embodiment 3

Described from document object model tree T _mAnd T _kThe maximum common document object model tree T of middle extraction _gAn embodiment as shown in Figure 7.As shown in accompanying drawing 7, document object model tree T _mComprise 8 nodes, document object model tree T _kComprise 9 nodes; The maximum common document object model tree T that extracts _gComprise 7 nodes.

Embodiment 4

Described extraction dynamic content webpage exceed webpage bunch template contours DOM Document Object Model subtree set an embodiment as shown in Figure 8.In the accompanying drawing 8, the pairing document object model tree of webpage bunch template is by T _mExpression, the pairing document object model tree of dynamic content webpage is by T _kExpression; The DOM Document Object Model subtree set that exceeds webpage bunch template contours of extracting comprises two subtrees: subtree that is made of independent FONT node 831 and the subtree that is made of SCRIPT node 832 and TEXT node 833.

As shown in Figure 9, describedly the step that injection script detects is carried out in the DOM Document Object Model subtree set that exceeds webpage bunch template contours may further comprise the steps:

910 for each DOM Document Object Model subtree in the set, attempts therefrom to extract all possible injection script;

All injection scripts of 920 pairs of extractions carry out grammaticality and detect;

If 930 exist the correct injection script of grammer, then marking host's dynamic content webpage relevant documentation object model subtree is the script injection region;

If 940 DOM Document Object Model subtree sets are combined into sky, then whole detection finishes; Otherwise turning back to step 910 continues to carry out.

Described from the DOM Document Object Model subtree, extract might injection script step 910 be the combination in any of following 5 kinds of script extracting methods:

A) from the DOM Document Object Model subtree each＜script extract the Javascript/VBScript script the label;

B) set from document object mould type and extract the Javascript/VBScript script the event-driven function of each html tag;

C) set from document object mould type and extract the Javascript/VBScript script the particular attribute-value of each html tag;

D) from the DOM Document Object Model subtree each＜STYLE extract the Javascript/VBScript script the CSS of label definition;

E) from the CSS that the Style attribute of each html tag of DOM Document Object Model subtree is introduced, extract the JavaScript/VBScript script.

The step 920 that the grammaticality of carrying out described injection script to extraction detects is supported the detection to two kinds of injection scripts: if the script that extracts is the Javascript script, then adopt standard Javascript syntax gauge that the JavaScript script that extracts is carried out the grammer detection; If the script type of extracting is VBScript, then adopts standard VBScript syntax gauge that the VBScript script that extracts is carried out grammer and detect.Accompanying drawing 10 is used to realize that for described injection script to extracting carries out grammaticality testing process figure.With Javascript type injection script is example, at first, morphological analysis rule 1001 according to JavaScript morphology normalized definition definition Javascript script generates corresponding lexical analyzer by the lexical analyzer Core Generator according to morphological analysis rule 1001 then; The lexical analyzer Core Generator here can adopt GNU FLEX lexical analyzer instrument, also can adopt other instrument; Then, according to JavaScript syntax gauge definition JavaScript rule governing parsing 1002, generate corresponding syntax analyzer by the syntax analyzer Core Generator according to JavaScript rule governing parsing 1002 then; The syntax analyzer Core Generator here can adopt the GNUYACC/BISON instrument, also can adopt other instrument.When Javascript script grammaticality detected, each section JavaScript script 1003 for extracting at first carried out morphological analysis via lexical analyzer 1004, obtains a series of lexical token; These lexical tokens will be imported into syntactic analysis, and it 1005 does the grammaticality analysis.Grammaticality detection for the VBScript type script that extracts is similar with JavaScript type script workflow, and the pairing morphological analysis rule of different just VBScript type scripts is different with rule governing parsing.

For all injection scripts that from the DOM Document Object Model subtree that exceeds webpage bunch template, extract, if have at least one section injection script to pass through the grammaticality inspection of syntax analyzer, then marking relevant documentation object model subtree is the script injection region, and produces an affair alarm that detects malevolence injection script in this host's dynamic content webpage.

As shown in Figure 11, malevolence injection script web page detection system of the present invention comprises spiders module 1110, dynamic content home page filter module 1120, dynamic content webpage cluster module 1130, webpage bunch template extraction module 1140 and malevolence injection script web page detection module 1150, wherein, described spiders module 1110 traversals and download are scanned all webpages of website; Described dynamic content home page filter module 1120 is connected with spiders module 1110, reception is by the set of spiders module web pages downloaded and carry out preliminary treatment, filter out those and the relevant webpage of static Web object requests, only keep those dynamic content webpages relevant with the Dynamic Web object requests; Described dynamic content webpage cluster module 1130 is connected with dynamic content home page filter module 1120, dynamic content collections of web pages behind the receiving filtration, (URL) carries out cluster to the dynamic content webpage according to the webpage URL(uniform resource locator), obtains the dynamic content webpage bunch after the cluster; Described webpage bunch template extraction module 1140 is connected with dynamic content webpage cluster module 1130, receive the webpage that obtains after each cluster bunch, all dynamic content webpages in each bunch are converted to corresponding document object model tree, and the maximum shared trunk tree of extracting these document object model tree is as this webpage bunch template; Described malevolence injection script web page detection module 1150 is connected with dynamic content webpage cluster module 1130 with webpage bunch template extraction module 1140, utilizes whether each webpage comprises malevolence injection script in the webpage bunch template detection dynamic content webpage bunch.

Claims

1. malevolence injection script web page detection method is characterized in that may further comprise the steps:

1) use spiders traversal and download to be scanned the step of all webpages of website;

2) to downloading the step that webpage carries out cluster analysis;

3) extract the step of the webpage bunch template of each dynamic content webpage bunch;

4) utilize each dynamic content webpage in the webpage bunch template detection bunch whether to comprise the step of malevolence injection script.

2. malevolence injection script web page detection method as claimed in claim 1 is characterized in that, describedly may further comprise the steps downloading the step that webpage carries out cluster analysis:

1) preliminary treatment is carried out in set to web pages downloaded, filters out those and the relevant webpage of static Web object requests, only keeps those dynamic content webpages relevant with the Dynamic Web object requests;

2) according to the webpage uniform resource position mark URL dynamic content webpage is carried out cluster, obtain the dynamic content webpage bunch after the cluster.

3. malevolence injection script web page detection method as claimed in claim 1, it is characterized in that, the step of each dynamic content webpage bunch template of described extraction is: for each dynamic content webpage bunch, extract its common document object model tree as this dynamic content webpage bunch template.

4. malevolence injection script web page detection method as claimed in claim 1 is characterized in that, the described step of utilizing each webpage in the webpage bunch template detection bunch whether to comprise malevolence injection script may further comprise the steps:

1) for each dynamic content webpage bunch, will bunch in each dynamic web page be converted to document object model tree, and compare with this webpage bunch template, find each DOM Document Object Model subtree that exceeds webpage bunch template contours;

2) for each the DOM Document Object Model subtree that exceeds webpage bunch template contours, attempt therefrom to extract injection script;

3) injection script that extracts is carried out grammaticality and detect,, confirm that then its host's dynamic content webpage is the webpage that comprises malevolence injection script if grammer is correct.

5. malevolence injection script web page detection method as claimed in claim 4 is characterized in that, the described step of extracting injection script from the DOM Document Object Model subtree that exceeds webpage bunch template contours is the combination in any of following 5 kinds of script extracting methods:

1) from document object model tree each＜script extract the Javascript/VBScript script the label;

2) from the event-driven function of each html tag of document object model tree, extract the Javascript/VBScript script;

3) from the particular attribute-value of each html tag of document object model tree, extract the Javascript/VBScript script;

4) from document object model tree each＜STYLE extract Javascript/VB Script script the CSS of label definition;

5) from the CSS that the Style attribute of each html tag of document object model tree is introduced, extract the JavaScript/VBScript script.

6. a kind of malevolence injection script web page detection method as claimed in claim 4, it is characterized in that, described injection script to extraction carries out grammer and detects the detection of supporting two kinds of injection script language: if the script that extracts is the Javascript script, then adopt standard Javascript syntax gauge that the JavaScript script that extracts is carried out the grammer detection; If the script type of extracting is VBScript, then adopts standard VBScript syntax gauge that the VBScript script that extracts is carried out grammer and detect.

7. malevolence injection script web page detection system, it is characterized in that comprising spiders module, dynamic content home page filter module, dynamic content webpage cluster module, webpage bunch template extraction module and malevolence injection script web page detection module, wherein, described spiders module traversal and download are scanned all webpages of website; Described dynamic content home page filter module is connected with the spiders module, reception is by the set of spiders module web pages downloaded and carry out preliminary treatment, filter out those and the relevant webpage of static Web object requests, only keep those dynamic content webpages relevant with the Dynamic Web object requests; Described dynamic content webpage cluster module is connected with dynamic content home page filter module, and the dynamic content collections of web pages behind the receiving filtration is carried out cluster according to the webpage URL(uniform resource locator) to the dynamic content webpage, obtains the dynamic content webpage bunch after the cluster; Described webpage bunch template extraction module is connected with dynamic content webpage cluster module, receive the webpage that obtains after each cluster bunch, all dynamic content webpages in each bunch are converted to corresponding document object model tree, and the maximum shared trunk tree of extracting these document object model tree is as this webpage bunch template; Described malevolence injection script web page detection module is connected with dynamic content webpage cluster module with webpage bunch template extraction module, utilizes whether each webpage comprises malevolence injection script in the webpage bunch template detection dynamic content webpage bunch.