CN101692267B - Method and system for detecting large-scale malicious web pages - Google Patents

Method and system for detecting large-scale malicious web pages Download PDF

Info

Publication number
CN101692267B
CN101692267B CN2009100928870A CN200910092887A CN101692267B CN 101692267 B CN101692267 B CN 101692267B CN 2009100928870 A CN2009100928870 A CN 2009100928870A CN 200910092887 A CN200910092887 A CN 200910092887A CN 101692267 B CN101692267 B CN 101692267B
Authority
CN
China
Prior art keywords
node
analysis
task
webpage
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100928870A
Other languages
Chinese (zh)
Other versions
CN101692267A (en
Inventor
梁知音
龚晓锐
韦韬
宋程昱
武新逢
韩心慧
诸葛建伟
邹维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2009100928870A priority Critical patent/CN101692267B/en
Publication of CN101692267A publication Critical patent/CN101692267A/en
Application granted granted Critical
Publication of CN101692267B publication Critical patent/CN101692267B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a system for detecting large-scale malicious web pages by using a three-layer parallel architecture and a layered control guarantee method: in the first layer, a plurality of detection servers interconnected via networks are arranged in parallel, so as to construct a detection server cluster, and a task set to be analyzed is set up on one detection server; in the second layer, a plurality of analysis nodes are arranged in parallel in each detection server, and a node cluster monitoring module is arranged so as to monitor and analyze the operating condition of the nodes; and in the third layer, a sandbox environment is constructed in parallel in each analysis node, so as to achieve the paralleled detection of the task to be analyzed. The architecture of the invention ensures the mutual independence and self-maintenance among the detection servers and among the analysis nodes, the overall operation of the system is free from the dynamic expansion of physical hosts and node numbers, and the overall function thereof is not affected by the invalidity of the single analysis node; and various tasks can be detected in a multi-channel and paralleled manner at the same time in the same node, thereby improving the analysis efficiency of the system.

Description

A kind of large-scale malicious web pages detection method and system
Technical field
The invention belongs to computer safety field, a kind of large-scale malicious web pages detection method and system are proposed, the strategy that adopts three layers of parallel architecture and layering monitoring to ensure, unite fictitious host computer technology and the parallel sandbox technology used, make up the automated analysis environment of extensive continuous service, the network wooden horse that comprises in the detection of dynamic webpage threatens.
Background technology
The internet has become to propagate the important channel of rogue program at present.According to " the China Internet network state of development statistical report " issued CNNIC (CNNIC) in January, 2008, China netizen number surpasses 200,000,000, and the netizen will be more and more stronger to the dependence of WWW in daily life.Simultaneously, China's Internet resources are also at rapid growth, and the annual growth of domain name, website, webpage all surpasses 60%.Yet the safety problem based on the internet but emerged in an endless stream in nearly 2 years, and the gesture of aggravation is gradually arranged.Netizen's computer in 2007 infects the ratio of malicious code up to 90.8%, near half netizen a guy information, number of the account is stolen or the experience that is modified.
According to statistics, in January, 2008, there were more than 8,100 ten thousand computers (comprising the enterprise customer) in the whole nation approximately once by virus infections to October, wherein surpassed 90% by webpage extension Malaysia side formula is infected.Over the past two years, trojan horse had become the main direction of malicious code development, and from the newly-increased Virus Sample number that anti-virus manufacturer over 2007,2,008 two intercepts and captures, the ratio of trojan horse all accounts for more than 60%.By on webpage, hanging horse, utilize the leak of browser and plug-in unit thereof, obtain to carry out authority, and then kidnap browser, implant trojan horse, be the main path that current trojan horse is propagated.
Traditional webpage Trojan horse detection technique usually relies on based on the method for static nature the malicious code that comprises in the webpage is scanned and discerns [referring to patent: based on the webpage malicious script detection method of statistical nature, patent No. ZL200610152531.8, Beijing Institute of Technology].But the webpage Trojan horse author has often used " free to kill " technology, utilizes the detection that the page is obscured and the whole bag of tricks such as encryption is hidden anti-viral software.Therefore, it is inadequate only providing the warning information that comprises webpage Trojan horse in certain web document.
Hang the behavioral characteristic that the horse webpage can cause the malice infringement to system at browsing, research field has proposed the dynamic behaviour detection method based on client honey jar technical thought, this method is judged " harmful consequences " of honey jar environment according to webpage Trojan horse, to the obfuscation mechanisms immunity, can detect and webpage Trojan horse newly occurs, be used as the new technique of monitoring webpage Trojan horse institute widespread usage.On disposing, traditional high mutual honey jar often relies on fictitious host computer program (as VMWare, QEMU etc.) and makes up sandbox, and sandbox is sandbox again, be used to some sources insincere, possess destructive power or the program that can't decision procedure be intended to provides experimental enviroment.Usually, this technology is extensive use of by computer technician, computer anti-virus industry especially, and sandbox is an important environment of observing computer virus.All changes in the sandbox can not cause damage to operating system.Dispose by traditional approach, a fictitious host computer intranodal is realized a sandbox environment, and contemporaneity only can move individual task, and analysis efficiency is lower, and does not provide clear and definite safeguard to guarantee the continuous service of system.[referring to: based on the design and the realization of the malicious web pages detection system of client honey jar, Sun Xiaoyan, Wang Yang, Zhu Yuefei, Wu Dongying, computer utility, 2007 the 27th (7) volumes; All Your iFRAMEs Point to Us, Niels Provos Panayiotis MavrommatisMoheeb Abu Rajab Fabian Monrose, 17th USENIX Security Symposium]
Summary of the invention
The present invention proposes a kind of malicious web pages detection method and system of three layers of parallel architecture for improving analysis efficiency, comes the continuous service of safeguards system by hierarchical control.Described system provides the high-efficient automatic analysis environments, and the network wooden horse in sustainable operation, the detection of dynamic webpage threatens.
Technical scheme of the present invention is summarized as follows:
A kind of large-scale malicious web pages detection method, its step comprises:
1) the plurality of detection server is passed through the network mutual connection, on a certain detection server, task-set to be analyzed is set;
2) in detecting server, each disposes the automatic distribution module of node, utilize fictitious host computer to realize the analysis node distribution, the a plurality of analysis nodes of parallel deployment, all nodes in the same server constitute a node cluster, dispose the node cluster monitoring module, the ruuning situation of the mode monitoring analysis node by access analysis intra-node file is monitored all node states in the node cluster, if the node operation is undesired, then invoke script program replacement analysis node;
3) analysis node connects task-set to be analyzed by internal network and obtains task to be analyzed, realizes the distributed scheduling of task, and analysis node is inner to be disposed the honey jar environment and hang horse webpage detection module, makes up parallel sandbox environment and realizes tasks in parallel detection to be analyzed; By the operation that horse webpage detection module is hung in detected state monitoring module monitoring, testing result is kept in the Analytical Results Database on the server.
Described task-set to be analyzed is collected by spiders or other modes.
Described node distribution module obtains the initial mirror image of analysis node by network, and the virtual machine that the api interface that utilizes virtual machine program to provide is write duplicates script, automatically at every a plurality of analysis nodes of physical host deploy.
The api interface that described node cluster monitoring module application virtual machine program is provided, read the extension horse webpage detection module running state information of the inner detected state monitoring module record of disposing of each analysis node, realization is to the condition monitoring of analysis node, when obtaining the failure of state journal file, or the extension horse webpage detection module inefficacy of disposing in the state journal file prompting analysis node, then invoke script program replacement analysis node.
Comprise at the inner honey jar environment of disposing of analysis node:
Selection contains easily the operating system of the leak that is utilized by webpage Trojan horse and the mounting disc of browser version comes the installing operating system environment;
Select to install the browser plug-in software that is often utilized of particular version by webpage Trojan horse;
The automatic refresh routine of hanging horse webpage detection module is installed.
Comprise at the inner extension horse webpage detection module of disposing of analysis node:
Task scheduling modules: operation task scheduler program in analysis node, obtain task-set to be analyzed;
Parallel sandbox control module: make up parallel sandbox program, catch the system action information that browser is produced after visit task-set to be analyzed;
Horse webpage judge module is resolved and is hung in daily record: after the browser access task-set to be analyzed, the system action information of noting in the access process is analyzed, taken a decision as to whether the horse webpage behavior of hanging.
The method for scheduling task of described task scheduling modules may further comprise the steps:
1. read configuration information;
2. loading system behavior monitoring driver;
3. obtain task-set to be analyzed success, then obtain a vacant sandbox, create the behavior of system during the new system action journal file record access task-set to be analyzed;
4. start the web page interlinkage in the browser access task-set to be analyzed, the visit information of the system resource in the system monitor record sandbox;
5. call daily record and resolve and hang horse webpage judge module parsing journal file, and judge whether this web page interlinkage hangs the horse webpage.
Described parsing journal file judges whether web page interlinkage is that the method for hanging the horse webpage may further comprise the steps:
1. from configuration file, read registry entry and variable list thereof that suffix name list, system start-up file list and the control system of white list, the executable file of newly-built process start;
2. from the parameter list of API Calls, obtain the establishment process name, if process name not in white list, judges promptly that this webpage is for hanging the horse webpage;
3. the parameter list from API Calls obtains the establishment filename, if the filename suffix is included in the EXENAME suffix list, judges that promptly this webpage is for hanging the horse webpage;
4. the parameter list from API Calls obtains the revised file path, if the system start-up filename is included in the system start-up file path, judges that promptly this webpage is for hanging the horse webpage;
5. the parameter list from API Calls obtains registry entry and the variable name of creating or revising, if registry entry and variable name are included in the registry entry/variable list of control system startup, judges that promptly this webpage is for hanging the horse webpage.
A kind of large-scale malicious web pages detection system is characterized in that, adopts three layers of parallel architecture:
Ground floor, the parallel plurality of detection server of disposing makes up and detects a server group of planes by the network mutual connection, on a certain detection server task-set to be analyzed is set;
The second layer, each detects in the server and is deployed with the automatic distribution module of node, and the automatic distribution module of described node utilizes fictitious host computer to realize the analysis node distribution, a plurality of analysis nodes of parallel deployment; All nodes in the same detection server constitute a node cluster, be deployed with the node cluster monitoring module, described node cluster monitoring module is by the ruuning situation of the mode monitoring analysis node of access analysis intra-node file, all node states in the node cluster are monitored, if the node operation is undesired, then invoke script program replacement analysis node;
The 3rd layer, analysis node connects task-set to be analyzed by internal network and obtains task to be analyzed, the distributed scheduling of realization task has been disposed the honey jar environment and has been hung horse webpage detection module in the analysis node, make up parallel sandbox environment and realize tasks in parallel detection to be analyzed; By the operation that horse webpage detection module is hung in detected state monitoring module monitoring, testing result is kept in the Analytical Results Database on the server.
The scale of the server zone of described parallel deployment is with the dynamic expansion of the demand of analytical scale expansion.
Described task-set to be analyzed is to be made of the web page interlinkage of centralized stores in database or file.
Compared with prior art, the invention has the beneficial effects as follows:
1) ground floor in three layers of parallel architecture provided by the invention, parallel physical host (server) independently carries out self-management, can dynamically increase the quantity of physical host according to system's needs; Task scheduling is arranged in the second layer of parallel architecture, connects database application task by the task scheduling modules that is distributed on the analysis node, has realized distributed task scheduling, can dispose according to the node number that the performance of physical host is revised on every main frame; This framework has guaranteed between each physical host and mutual independence and self between each analysis node, the dynamic expansion of physical host and interstitial content can not influence the overall operation of system, and the inefficacy of single analysis node also can not influence the function of entire system;
2) the layering monitoring of the present invention's proposition ensures strategy, it with the node cluster the effective running status of regulatory analysis node of real-time monitoring that unit carries out the node running status, when analysis node occurs when unusual, watchdog routine is utilized API that the fictitious host computer program the provides state of replacement analysis node easily, ensures the continual and steady operation of whole analysis environments; The automatic distribution policy of node can be realized the efficient renewal of node running environment; Hang the automatic update strategy of horse webpage trace routine, can realize hanging the renewal of horse webpage trace routine easily; In the scheduling controlling of the parallel sandbox of analysis node layer realization, can detect a plurality of URL by the while multidiameter delay at same intranodal, higher than the dynamic net horse detection system analysis efficiency of traditional single task;
3) analysis environments of the present invention is to hang the meticulous high mutual honey jar environment of disposing of leak that horse often uses at webpage, similar with the Web user's that is injured system configuration, the web page interlinkage of dynamic operation browser access appointment in such environment, the malicious act that comprises in can more effective triggering malicious web pages, can effectively detect antivirus software can not detected malicious web pages through encryption; The webpage that the present invention proposes is hung the foundation of the judgement of horse, judge according to process creation process, download executable file system, modification startup and behaviors such as file and registry boot item in the sandbox whether webpage hangs horse, standard is clear and definite, and rate of false alarm is low, can effectively discern malicious web pages.
Description of drawings
Fig. 1 is the structural representation of large scale deployment malicious web pages detection system
Fig. 2 is the configuration diagram of single physical main frame
Fig. 3 is the configuration diagram of hanging horse webpage detection module in the analysis node
Fig. 4 is the task scheduling schematic flow sheet of hanging in the horse webpage trace routine
Fig. 5 is the schematic flow sheet of node cluster watchdog routine
Fig. 6 hangs the schematic flow sheet that horse is judged in the malicious web pages detection system
Embodiment
The present invention comprehensively uses in fictitious host computer technology and the malicious code analysis field lightweight sandbox technology based on Copy on write thought [referring to A Feather-weight Virtual Machine for Windows Applications, YangYu, Fanglu Guo, Susanta Nanda, Lap-chung Lam, Tzi-cker Chiueh, Proceedings of the2nd international conference on Virtual execution environments], below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
" task-set " that defines among the present invention is to be made of the URL to be detected of centralized stores in database;
The present invention definition " analysis node bunch " be the set of some analysis nodes of disposing in the separate unit physical host (detection server), analysis node of all analysis nodes formations of every physical host (detection server) deploy bunch.Each analysis node bunch number of nodes scale that can comprise is relevant with this hardware of server performance, can carry out dynamic-configuration.
" analysis node " of the present invention's definition is meant the analytical engine of having disposed specific honey jar environment is installed, utilize fictitious host computer creation analysis node, can carry out the maintenances such as state replacement of analytical engine easily; The extension horse webpage trace routine of disposing in analysis node realizes the scheduling of analysis task and the judgement that webpage is hung horse.
" the parallel sandbox " of the present invention definition be meant in analysis node inside, by hanging horse webpage trace routine control scheduling, and the controlled execution environment of realizing by kernel level API HOOKING technology.Be redirected by accessed resources in the sandbox being carried out name space, can realize between sandbox and the sandbox, mutual isolation between sandbox and the system ensures that the process of sandbox internal operation can not work the mischief to system, also can not influence the behavior of other parallel sandboxs of internal system;
The present invention's definition " the automatic distribution module of node cluster " is meant to improving node distribution efficient, and be deployed in the automatic allocator of node cluster on the physical host, this program is utilized API that software virtual machine provides and the set information in the configuration file, obtains the system image of analysis node automatically and copies distribution.
The present invention's definition " node cluster monitoring module " is meant in order to ensure that analysis node normally moves, and be deployed in node cluster supervisory process on the physical host, the API that this program utilizes software virtual machine to provide, the state journal file of real-time monitoring analysis intra-node, when obtaining the failure of state journal file, or the extension horse webpage trace routine of disposing in the state journal file prompting analysis node lost efficacy, the tactful invoke script program replacement analysis node by pre-defining then ensures the continuous service of analysis node.
Entire system framework of the present invention can mainly comprise with the lower part referring to accompanying drawing 1:
1) url database to be analyzed (task-set): url database stored to be analyzed is by the spiders or the Web page link address to be detected of collecting by additive method, these URL are the task sources in the native system, will offer to hang the analysis of horse webpage detection server zone;
2) hang the horse webpage and detect server zone: hang horse and detect server zone and be made of one or more physical hosts (analysis node bunch), these physical hosts (analysis node bunch between) are separate, separately the own inner analysis node of management.Each analysis node has promptly been disposed specific honey jar environment, and the Virtual Analysis main frame of hanging horse webpage trace routine has been installed, and links to each other with url database to be analyzed by internal network, obtains the URL task-set respectively and detects;
3) Analytical Results Database: by hanging the extension horse webpage of verifying behind the horse webpage trace routine depth analysis, can note analysis time and analysis result and be kept at and hang in the horse web database reference information that these data will be used to hang the follow-up analysis of horse webpage and the safe web page degree is provided for the Internet user.
The framework of the single detection server among the present invention can be referring to accompanying drawing 2, analysis node bunch independently just, and it mainly comprises with the lower part in disposing:
1) the automatic distribution module of node: owing to may need to dispose a plurality of (individual even thousands of of hundreds of) Analysis server in the system, effectively in server, dispose analysis node, require system that automatical and efficient node distribution policy is provided.Native system is realized this requirement by dispose the analysis node distribution module in physical host, the analysis node distribution module obtains the initial mirror image of analysis node by network, the virtual machine that the api interface that utilizes virtual machine program to provide is again write duplicates script, automatically serve a plurality of analysis nodes of deploy at every, and generate virtual machine rpms restart script RPMS program automatically; Distribute according to the script setting, the information spinner of setting will comprise: mirror image name create-rule, and mirror image quantity, the snapshot name that be provided with etc., for the server of identical configuration, these information are identical.
2) node cluster monitoring module: in the Host of physical host OS, dispose analysis node bunch monitoring module, the api interface that the application virtual machine program is provided, read the trace routine running state information of the inner detected state monitoring module record of disposing of analysis node, realization is to the condition monitoring of analysis node, when node occurs (obtaining the failure of state journal file when unusual, or the extension horse webpage trace routine of disposing in the state journal file prompting analysis node lost efficacy), call the rpms restart script RPMS that generates automatically by the node distribution module and restart analysis node, ensure the continuous service of node.
3) analysis node and replacement script thereof: analysis node is meant the Virtual Analysis machine of having disposed specific honey jar environment of installing, utilize fictitious host computer creation analysis node, can conveniently monitor it, can carry out maintenances such as state replacement in case of necessity to it by the replacement script by API;
Single analysis node configuration among the present invention mainly comprises following content
1) deployment of analysis node honey jar environment:
A) selection of operating system and browser: select present main flow, contain easily the operating system of the leak that is utilized by the net horse and the mounting disc of browser version and come the installing operating system environment, and keep the stability of operating system and browser;
The selection of common leak software: the normal browser plug-in software that is utilized by the net horse of particular version is installed, is closed the new option more automatically of these softwares, keep the stability of system version;
B) the automatic renewal of extension horse webpage trace routine: owing to need carry out large scale deployment, the process of program updates also needs robotization to carry out accordingly.Automatic method for updating provided by the invention is when system restarts, to obtain up-to-date analysis software by automatic refresh routine, the automatic renewal of realization extension horse webpage trace routine;
2) framework of extension horse webpage detection module can mainly comprise following content referring to accompanying drawing 3:
A) task scheduling modules: operation task scheduler program in analysis node connects url database to be analyzed by detection task scheduling interface and obtains the web page interlinkage that will analyze, and deposits analysis result in Analytical Results Database by detecting the task scheduling interface;
B) parallel sandbox control module: in the client honey jar, dispose based on process in the parallel sandbox program Hook sandbox of kernel level API Hooking technique construction the system action that browser is produced is caught in the behavior of internal memory, process, file system, registration table behind visit URL.
C) daily record is resolved and hung the horse judge module: behind browser access URL, the system action information that needs to note in the access process is analyzed, and judges whether these behaviors are to hang the peculiar behavior of horse webpage.As create unusual new process, do not obtain the situations such as startup list item that user permission is promptly downloaded executable file, revised startup file or registration table, can judge that this URL page pointed hangs horse.
D) trace routine monitoring module: the detected state monitoring module is used for the operation of monitoring extension horse webpage trace routine normally, the monitoring file content then is set is " ok ", when extension horse webpage trace routine is found mistake, then puts this document content and is " error ".
According to the technical method in the summary of the invention, we have implemented a large-scale net horse check and analysis environment, and our concrete embodiment of accompanying drawings is as follows:
1) implementation strategy of detection server layer:
A) deployment of analysis node bunch: adopt 64 server creation analysis node clusters at present in our system implementation, dispose 8 fictitious host computer analysis nodes in each server, form an analysis node bunch, each intranodal is disposed 10 parallel sandboxs, and the separate unit server can carry out the parallel parsing of 80 URL simultaneously;
B) distribution of analysis node: when needing the environment of replacement analysis node, can start operating analysis node distribution module by Long-distance Control, obtain the initial mirror image of analysis node by ftp, the virtual machine that the api interface that utilizes virtual machine program to provide is again write duplicates script, automatically a plurality of analysis nodes of copy and generation snapshot in every service generate the virtual machine activation shell script then;
2) the monitoring flow process of analysis node bunch can be referring to Fig. 5, and concrete method for supervising is as follows:
A) start the node cluster monitoring module
B) read the node cluster profile information, the store path, detection time that comprises each node virtual file at interval, the snapshot mirror image name that need read when resetting etc.;
C) each analysis node of cyclic access
C1, if the fictitious host computer off-duty, then can't read the monitoring journal file, can think that this analysis node is unusual, the replacement analysis node this moment;
C2, read webpage and hang horse routine analyzer monitoring journal file,, then can think to hang horse webpage trace routine and the replacement analysis node occur unusually if this detections file does not exist;
C3, read webpage and hang horse routine analyzer monitoring journal file content,, think that then the analytical engine state is normal, if file content is " error ", then think webpage extension horse routine analyzer abnormal state, the replacement analysis node if file content be " ok ";
D) sleep after having detected last analytical engine and wait for a period of time, in our implementation procedure, it is 5 minutes that the stand-by period is set, and repeats the poll check process of c afterwards.
3) implementation strategy of analysis node:
A) operating system and browser select to hang at present the main destination OS Windows XP SP2 version of horse webpage, browser adopts IE 6.0.2900.2180.xpsp_sp2_rtm.040803-2158. version, and such system configuration can comprise the MS06014 leak that is often utilized by the net horse;
B) the normal browser plug-in software that is utilized by the net horse of particular version is installed, is comprised: Adobe Flash Player, RealPlayer etc. close the new option more automatically of these softwares, keep the stability of software version;
C) network configuration that analysis node is set is that DHCP dynamically obtains IP;
D) compiled automatic refresh routine is installed in advance in this system, the shortcut add-on system of this program starts in the hurdle, when system restarts, just moves automatic refresh routine automatically, obtains the latest edition of hanging horse webpage trace routine from server.
4) hanging horse webpage trace routine task scheduling flow process can mainly may further comprise the steps referring to accompanying drawing 4:
A) start extension horse webpage trace routine;
B) read configuration information, comprising: enable the number (setting 10 here) of parallel sandbox, the configuration of task source database server, the time (being set at 5 minutes here) of individual task operation, etc. information;
C) loading system behavior monitoring driver;
D) accessing database obtains URL, as success, then obtains a vacant sandbox, creates during new this URL of system action journal file record access the behavior of system;
E) start browser access link to be analyzed, the system monitor meeting hook API of system, the visit information of system resources such as the process in the record sandbox, file, registration table;
F) when the time of individual task operation reaches the time of configuration settings, stop sandbox, the closing journal file;
G) resolve journal file and judge whether this URL hangs horse;
5) resolve journal file and judge whether this URL hangs the flow process of horse can be referring to accompanying drawing 6, implementation strategy is as follows:
A) from configuration file, read registry entry and the variable list thereof that suffix name list, system start-up file list, the control system of white list, the executable file of newly-built process start;
B) visit and survey journal file and whether exist, as file do not exist or file size unusual, judge then and this time analyze failure that resolving withdraws from; Otherwise continue;
C) read file line, obtain the tabulation of api information and api call parameters, judge the affiliated type of API,, turn to steps d if api function is the process creation function; If api function turns to step e for creating documentation function; If api function is the revised file function, turn to step f; If api function is newly-built/modification registry entry function, turn to step g;
D) from the parameter list of API Calls, obtain the establishment process name, whether judge process name,, promptly judge this webpage extension horse if not in white list at white list;
E) parameter list from API Calls obtains the establishment filename, judges whether this document name suffix is included in the EXENAME suffix that configuration file provides, if be included in this list, judges that promptly this webpage hangs horse;
F) parameter list from API Calls obtains the revised file path, judges whether this document name is included in the system start-up file path that configuration file provides, if be included in this list, judges that promptly this webpage hangs horse;
G) obtain the registry entry and the variable name of establishment/modification from the parameter list of API Calls, judge whether this registry entry and variable name are included in registry entry/variable list that control system that configuration file provides starts, if be included in this list, promptly judge this webpage extension horse.
As mentioned above, the present invention utilizes high mutual honey jar main frame technology and parallel sandbox technology, at the behavioral characteristic of webpage Trojan horse, proposes a kind of method that makes up the large-scale malicious web pages detection system.At present, the method among the present invention has been applied to information security engineering center of Peking University and has hung horse webpage detection platform, the continuous service several months, detects up to ten thousand malicious web pages, has well realized purpose of the present invention.The present invention has good practicability and popularizing application prospect.
Although for the explanation goal of the invention discloses specific embodiments and the drawings, its purpose is to help to understand content of the present invention and implement according to this, but it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.Therefore, the present invention should not be limited to most preferred embodiment and the disclosed content of accompanying drawing, and the scope of protection of present invention is as the criterion with the scope that claims define.

Claims (10)

1. large-scale malicious web pages detection method, its step comprises:
1) the plurality of detection server is passed through the network mutual connection, on a certain detection server, task-set to be analyzed is set;
2) in detecting server, each disposes the automatic distribution module of node, utilize fictitious host computer to realize the analysis node distribution, the a plurality of analysis nodes of parallel deployment, all nodes in the same server constitute a node cluster, dispose the node cluster monitoring module, the ruuning situation of the mode monitoring analysis node by access analysis intra-node file is monitored all node states in the node cluster, if the node operation is undesired, then invoke script program replacement analysis node;
3) analysis node connects task-set to be analyzed by internal network and obtains task to be analyzed, realizes the distributed scheduling of task, and analysis node is inner to be disposed the honey jar environment and hang horse webpage detection module, makes up parallel sandbox environment and realizes tasks in parallel detection to be analyzed; By the operation that horse webpage detection module is hung in detected state monitoring module monitoring, testing result is kept in the Analytical Results Database on the server; Described extension horse webpage detection module comprises:
Task scheduling modules: operation task scheduler program in analysis node, obtain task-set to be analyzed;
Parallel sandbox control module: make up parallel sandbox program, catch the system action information that browser is produced after visit task-set to be analyzed;
Horse webpage judge module is resolved and is hung in daily record: after the browser access task-set to be analyzed, the system action information of noting in the access process is analyzed, taken a decision as to whether the horse webpage behavior of hanging.
2. the method for claim 1 is characterized in that, described task-set to be analyzed is collected by spiders or other modes.
3. the method for claim 1, it is characterized in that, the automatic distribution module of described node obtains the initial mirror image of analysis node by network, and the virtual machine that the api interface that utilizes virtual machine program to provide is write duplicates script, automatically at every a plurality of analysis nodes of physical host deploy.
4. the method for claim 1, it is characterized in that, the api interface that described node cluster monitoring module application virtual machine program is provided, read the extension horse webpage detection module running state information of the inner detected state monitoring module record of disposing of each analysis node, realization is to the condition monitoring of analysis node, fail when obtaining the state journal file, or the extension horse webpage detection module inefficacy of disposing in the state journal file prompting analysis node, then invoke script program replacement analysis node.
5. the method for claim 1 is characterized in that, comprises at the inner honey jar environment of disposing of analysis node:
Selection contains easily the operating system of the leak that is utilized by webpage Trojan horse and the mounting disc of browser version comes the installing operating system environment;
Select the browser plug-in software that is often utilized of particular version to install by webpage Trojan horse;
The automatic refresh routine of hanging horse webpage detection module is installed.
6. the method for claim 1 is characterized in that, the method for scheduling task of described task scheduling modules may further comprise the steps:
6.1. read configuration information;
6.2. loading system behavior monitoring driver;
6.3. obtain task-set to be analyzed success, then obtain a vacant sandbox, create the behavior of system during the new system action journal file record access task-set to be analyzed;
6.4. start the web page interlinkage in the browser access task-set to be analyzed, the visit information of the system resource in the system monitor record sandbox;
Resolve and hang horse webpage judge module 6.5. call daily record, resolve journal file and judge whether this web page interlinkage hangs the horse webpage.
7. the method for claim 1 is characterized in that, daily record is resolved and hung horse webpage judge module and judges whether web page interlinkage is that the method for hanging the horse webpage may further comprise the steps:
7.1. from configuration file, read registry entry and variable list thereof that suffix name list, system start-up file list and the control system of white list, the executable file of newly-built process start;
7.2. from the parameter list of API Calls, obtain the establishment process name, if process name not in white list, judges promptly that this webpage is for hanging the horse webpage;
7.3. obtain the establishment filename from the parameter list of API Calls,, judge that promptly this webpage is for hanging the horse webpage if the filename suffix is included in the EXENAME suffix list;
7.4. obtain the revised file path from the parameter list of API Calls,, judge that promptly this webpage is for hanging the horse webpage if the system start-up filename is included in the system start-up file path;
7.5. obtain registry entry and the variable name of creating or revising from the parameter list of API Calls,, judge that promptly this webpage is for hanging the horse webpage if registry entry and variable name are included in the registry entry/variable list of control system startup.
8. a large-scale malicious web pages detection system is characterized in that, adopts three layers of parallel architecture:
Ground floor comprises the parallel plurality of detection server of disposing, and is arranged on the task-set to be analyzed on a certain detection server, and described detection server forms and detects a server group of planes by the network mutual connection;
The second layer comprises the automatic distribution module of node in each detects server, by the node cluster that node constitutes, node cluster monitoring module, the automatic distribution module of described node utilize fictitious host computer to realize the analysis node distribution, a plurality of analysis nodes of parallel deployment; Described node cluster monitoring module is monitored all node states in the node cluster by the ruuning situation of the mode monitoring analysis node of access analysis intra-node file, and is undesired if node moves, then invoke script program replacement analysis node;
The 3rd layer, comprise the honey jar environment in the analysis node and hang horse webpage detection module, detected state monitoring module, described analysis node connects task-set to be analyzed by internal network, be used to obtain task to be analyzed, the distributed scheduling of realization task is hung the parallel sandbox environment of horse webpage detection module structure and is realized tasks in parallel detection to be analyzed; By the operation that horse webpage detection module is hung in detected state monitoring module monitoring, testing result is kept in the Analytical Results Database on the server.
9. system as claimed in claim 8 is characterized in that, the scale of the detection server group of planes of described parallel deployment is dynamic expansion with the demand of analytical scale expansion.
10. system as claimed in claim 8 is characterized in that, described task-set to be analyzed is to be made of the web page interlinkage of centralized stores in database or file.
CN2009100928870A 2009-09-15 2009-09-15 Method and system for detecting large-scale malicious web pages Expired - Fee Related CN101692267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100928870A CN101692267B (en) 2009-09-15 2009-09-15 Method and system for detecting large-scale malicious web pages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100928870A CN101692267B (en) 2009-09-15 2009-09-15 Method and system for detecting large-scale malicious web pages

Publications (2)

Publication Number Publication Date
CN101692267A CN101692267A (en) 2010-04-07
CN101692267B true CN101692267B (en) 2011-09-07

Family

ID=42080951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100928870A Expired - Fee Related CN101692267B (en) 2009-09-15 2009-09-15 Method and system for detecting large-scale malicious web pages

Country Status (1)

Country Link
CN (1) CN101692267B (en)

Families Citing this family (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254111B (en) * 2010-05-17 2015-09-30 北京知道创宇信息技术有限公司 Malicious site detection method and device
RU2446459C1 (en) * 2010-07-23 2012-03-27 Закрытое акционерное общество "Лаборатория Касперского" System and method for checking web resources for presence of malicious components
CN102469113B (en) * 2010-11-01 2014-08-20 北京启明星辰信息技术股份有限公司 Security gateway and method for forwarding webpage by using security gateway
CN102122331B (en) * 2011-01-24 2014-04-30 中国人民解放军国防科学技术大学 Method for constructing ''In-VM'' malicious code detection framework
CN102088379B (en) * 2011-01-24 2013-03-13 国家计算机网络与信息安全管理中心 Detecting method and device of client honeypot webpage malicious code based on sandboxing technology
CN102385616B (en) * 2011-10-10 2016-07-13 江苏鸿信系统集成有限公司 By the method that WEB distribution, data base generate customizing virtual call center
CN102594897B (en) * 2012-02-27 2018-02-27 中兴通讯股份有限公司 Intelligent analysis system and method
CN102663052B (en) * 2012-03-29 2017-05-24 三六零科技股份有限公司 Method and device for providing search results of search engine
CN102739647A (en) * 2012-05-23 2012-10-17 国家计算机网络与信息安全管理中心 High-interaction honeypot based network security system and implementation method thereof
CN102737188A (en) * 2012-06-27 2012-10-17 北京奇虎科技有限公司 Method and device for detecting malicious webpage
CN102768630B (en) * 2012-06-29 2015-07-15 腾讯科技(深圳)有限公司 Method and device for detecting webpage running environment and storage medium
CN102801740A (en) * 2012-08-30 2012-11-28 苏州山石网络有限公司 Trojan horse virus prevention method and equipment
CN103580948A (en) * 2012-12-27 2014-02-12 哈尔滨安天科技股份有限公司 Method and device for detecting network based on structural-file index information
CN103916365B (en) * 2012-12-31 2018-09-11 西门子公司 The method and apparatus of the network behavior feature of export and verification malicious code
CN103281177B (en) * 2013-04-10 2016-09-14 广东电网公司信息中心 Detection method and system to Internet information system malicious attack
CN103268442B (en) * 2013-05-14 2015-12-23 北京奇虎科技有限公司 A kind of method and apparatus realizing secure access video website
CN104519007A (en) * 2013-09-26 2015-04-15 深圳市腾讯计算机系统有限公司 Loophole detection method and server
CN103595732B (en) * 2013-11-29 2017-09-15 北京奇虎科技有限公司 A kind of method and device of network attack evidence obtaining
CN103905422B (en) * 2013-12-17 2017-04-26 哈尔滨安天科技股份有限公司 Method and system for searching for webshell with assistance of local simulation request
CN105515815B (en) * 2014-10-17 2018-11-06 任子行网络技术股份有限公司 A kind of distributed acquisition method and system based on Heritrix reptiles
CN104331663B (en) * 2014-10-31 2017-09-01 北京奇虎科技有限公司 Web shell detection method and web server
CN104331335B (en) * 2014-11-20 2018-03-23 国家电网公司 The dead chain inspection method and device of portal website
CN104462955B (en) * 2014-12-25 2017-04-05 中国科学院信息工程研究所 It is a kind of to be based on virtualized Host behavior active detecting system and method
US9477837B1 (en) * 2015-03-31 2016-10-25 Juniper Networks, Inc. Configuring a sandbox environment for malware testing
CN104899006B (en) * 2015-05-25 2018-03-30 中孚信息股份有限公司 A kind of multi-process method for parallel processing of multisystem platform
CN105138907B (en) * 2015-07-22 2019-04-23 国家计算机网络与信息安全管理中心 A kind of active probe is attacked the method and system of website
CN106487844A (en) * 2015-08-28 2017-03-08 北京奇虎科技有限公司 The method and system of the effectiveness of URL is promoted in a kind of detection
CN105447088B (en) * 2015-11-06 2019-04-09 杭州掘数科技有限公司 A kind of multi-tenant profession cloud crawler system based on volunteer computing mode
CN105607945B (en) * 2015-12-22 2018-12-28 中国科学院信息工程研究所 Host behavior based on virtualization is asynchronous to listen to interception system and method
CN107358095B (en) 2016-05-10 2019-10-25 华为技术有限公司 A kind of threat detection method, device and network system
CN106055975A (en) * 2016-05-16 2016-10-26 杭州华三通信技术有限公司 Document detection method and sandbox
CN106130960B (en) * 2016-06-12 2019-08-09 微梦创科网络科技(中国)有限公司 Judgement system, load dispatching method and the device of steal-number behavior
CN106202319B (en) * 2016-06-30 2020-03-10 北京奇虎科技有限公司 Abnormal URL (Uniform resource locator) verification method and system
CN106446685A (en) * 2016-09-30 2017-02-22 北京奇虎科技有限公司 Methods and devices for detecting malicious documents
CN106485152A (en) * 2016-09-30 2017-03-08 北京奇虎科技有限公司 Leak detection method and device
CN107103242B (en) * 2017-05-11 2020-07-17 北京安赛创想科技有限公司 Data acquisition method and device
CN107171894A (en) * 2017-06-15 2017-09-15 北京奇虎科技有限公司 The method of terminal device, distributed high in the clouds detecting system and pattern detection
CN108363919B (en) * 2017-10-19 2021-04-20 北京安天网络安全技术有限公司 Method and system for generating virus-killing tool
CN108563946A (en) * 2018-04-17 2018-09-21 广州大学 A kind of browser digs method, browser plug-in and the system of mine behavioral value
CN108551404B (en) * 2018-04-20 2019-10-01 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of client-side information analysis
CN108769026B (en) * 2018-05-31 2022-02-15 康键信息技术(深圳)有限公司 User account detection system and method
CN109067708B (en) * 2018-06-29 2021-07-30 北京奇虎科技有限公司 Method, device, equipment and storage medium for detecting webpage backdoor
CN109145532B (en) * 2018-08-20 2020-08-07 北京广成同泰科技有限公司 Program white list management method and system supporting software online upgrade
CN109327451B (en) * 2018-10-30 2021-07-06 深信服科技股份有限公司 Method, system, device and medium for preventing file uploading verification from bypassing
CN109766691B (en) * 2018-12-20 2023-08-22 广东电网有限责任公司 Lexovirus monitoring method and device
CN109684845B (en) * 2018-12-27 2021-04-06 北京天融信网络安全技术有限公司 Detection method and device
CN109828921A (en) * 2019-01-18 2019-05-31 上海极链网络科技有限公司 HTML5 webpage automated function test method, system and electronic equipment
CN112583778B (en) * 2019-09-30 2023-01-06 奇安信安全技术(珠海)有限公司 Malicious website monitoring method, device and system, storage medium and electronic device
CN110826069B (en) * 2019-11-05 2022-09-30 深信服科技股份有限公司 Virus processing method, device, equipment and storage medium
CN111477048A (en) * 2020-05-16 2020-07-31 安徽商贸职业技术学院 Online experiment teaching platform and teaching method
CN111680294A (en) * 2020-06-15 2020-09-18 杭州安恒信息技术股份有限公司 Database monitoring method, device and equipment based on high-interaction honeypot technology
CN114491509A (en) * 2022-01-28 2022-05-13 济南大学 Sandbox-based malicious program behavior analysis processing method and system
CN114826670B (en) * 2022-03-23 2024-03-29 国家计算机网络与信息安全管理中心 Method for analyzing network traffic and detecting large-scale malicious code propagation
CN116932163A (en) * 2023-07-13 2023-10-24 深圳市世强元件网络有限公司 Task scheduling center control method, storage medium and equipment
CN117370966A (en) * 2023-10-16 2024-01-09 深圳市马博士网络科技有限公司 Malicious file detection method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920832A (en) * 2006-09-28 2007-02-28 北京理工大学 Linkage analysis based web page Trojan track technique
CN101515318A (en) * 2009-04-03 2009-08-26 深圳市腾讯计算机系统有限公司 Method and device for identifying vbs webpage Trojan horse

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920832A (en) * 2006-09-28 2007-02-28 北京理工大学 Linkage analysis based web page Trojan track technique
CN101515318A (en) * 2009-04-03 2009-08-26 深圳市腾讯计算机系统有限公司 Method and device for identifying vbs webpage Trojan horse

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙晓妍等.基于客户端蜜罐的恶意网页检测系统的设计与实现.《计算机应用》.2007,第27卷(第7期),第1613-1615页. *
梁知音等.高交互蜜罐主机的识别技术研究.《全国网络与信息安全技术研讨会"2007》.2007,第197-203页. *

Also Published As

Publication number Publication date
CN101692267A (en) 2010-04-07

Similar Documents

Publication Publication Date Title
CN101692267B (en) Method and system for detecting large-scale malicious web pages
Kim et al. Firmae: Towards large-scale emulation of iot firmware for dynamic analysis
Pohly et al. Hi-fi: collecting high-fidelity whole-system provenance
US7757291B2 (en) Malware containment by application encapsulation
US9721089B2 (en) Methods, systems, and computer readable media for efficient computer forensic analysis and data access control
CN110391937B (en) Internet of things honey net system based on SOAP service simulation
US11943238B1 (en) Process tree and tags
Eder et al. Ananas-a framework for analyzing android applications
Krishnan et al. Trail of bytes: efficient support for forensic analysis
Ho et al. PREC: practical root exploit containment for android devices
CN103078864A (en) Active defense file repairing method based on cloud security
Srivastava et al. Automatic discovery of parasitic malware
Gruhn et al. Security of public continuous integration services
CN106469275A (en) Virtual machine virus method and device
Costa et al. Vigilante: End-to-end containment of internet worm epidemics
Di Pietro et al. CloRExPa: Cloud resilience via execution path analysis
KR101234066B1 (en) Web / email for distributing malicious code through the automatic control system and how to manage them
Reeves Autoscopy Jr.: Intrusion detection for embedded control systems
US20230275916A1 (en) Detecting malicious activity on an endpoint based on real-time system events
Krishnan et al. Trail of bytes: New techniques for supporting data provenance and limiting privacy breaches
Hsiao et al. Virtual machine introspection based malware behavior profiling and family grouping
Salman et al. DAIDS: An architecture for modular mobile IDS
Ham et al. Vulnerability monitoring mechanism in Android based smartphone with correlation analysis on event-driven activities
Wang et al. Notice of Retraction: Research on the anti-virus system of military network based on cloud security
Skrzewski Monitoring malware activity on the lan network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110907

Termination date: 20180915

CF01 Termination of patent right due to non-payment of annual fee