CN103279710B

CN103279710B - Method and system for detecting malicious codes of Internet information system

Info

Publication number: CN103279710B
Application number: CN201310125571.3A
Authority: CN
Inventors: 杨永滨; 陈剑锋
Original assignee: SHENZHEN E-LINK INFORMATION TECHNOLOGY CO LTD
Current assignee: Shenzhen Yilingke Network Security Co ltd
Priority date: 2013-04-12
Filing date: 2013-04-12
Publication date: 2016-04-13
Anticipated expiration: 2033-04-12
Also published as: CN103279710A

Abstract

The invention discloses a method and a system for detecting malicious codes of an Internet information system, wherein the method comprises the steps of detecting a website to be detected at intervals of a preset time, and capturing the content of a first layer and the content of a second layer of the detected website, and the content of the first layer and the content of the second layer of the detected website are linked with the first page; then, comparing the obtained page with the page obtained last time, finding out the difference between the two pages, and determining whether a malicious code exists according to the difference; outputting a page with malicious codes, and labeling the existing malicious code segments; and finally, manually confirming the malicious codes of the detected page for the second time, and recording the confirmed result. Therefore, multiple detection of page content is realized, omnibearing step-by-step multi-level content detection is supported, the completeness of content detection is improved, and the detection accuracy is improved; the performance and efficiency of detection and the requirements on software and hardware environments are improved.

Description

The detection method of Internet information system malicious code and system

Technical field

The present invention relates to Malicious Code Detection technical field, particularly relate to a kind of detection method and system of Internet information system malicious code.

Background technology

Malicious code (UnwantedCode) refers to not have to act on and but can bring dangerous code, and a safest definition is that all unnecessary codes are all regarded as malice.

The analytical approach of malicious code has polytype, and general traditional malicious code analysis method is divided into the analytical approach based on code characteristic, the analytical approach based on semanteme, analytical approach three kinds based on code behavior, and these methods all have certain limitation:

Manual detection: open webpage, click right checks source file, and the kind according to web virus also can check whether comprise malicious code, but this method limitation is very large.

The detection method of feature based code: this uses extensively the most ancient method, their exclusive feature instruction sequence is gathered by the sample analysis of extracting malicious code, when inspection software scanning document, current file and condition code storehouse are contrasted, judge whether whether file fragment mates with known features code, this is detected by script virus process by the script of web page horse hanging, but page script mode of texturing, cipher mode are more various compared with traditional PE form virus, detect also more difficult.

Heuristic detection method: the thought of this method is the feature-set threshold value for malicious code, scanner analysis, when the characteristic length of the similar malicious code of eigenwert of file, is just seen as malicious code.Such as certain malicious code, general all can fix call more specific kernel function (especially those to process list, function that registration table is relevant with system service list), usually the order that these functions occur in code also has certain rule, and title and number of times therefore by calling kernel function to certain malicious code are analyzed.

Behavior-based detection method: the exact matching and the fuzzy matching that comprise Behavior-based control.Exact matching, mainly for some more direct malicious acts, as add items in registry boot item, revises the content etc. under system folder.Fuzzy matching is main method of discrimination, the api function that major part rogue program operationally calls is all used by some ordinary procedures, but contrast and just can find that rogue program can some specially or less at ordinary times seems api function with the frequency coordination of exception, or call related function with certain particular combination, fuzzy matching is exactly judge based on this point, and this method can be combined with heuristic detection method.

For the weak point of above-mentioned technology, the Corpus--based Method that prior art adopts some new and signature analysis also adopt virtual machine technique, and main new technology has:

Client honeypot technology

Web virus is hidden in normal WEB communication, traditional fire wall based on port (Firewall) is difficult to the propagation stoping it, fire wall or the intruding detection system (IDS) of content-based (Payload) can detect known web virus, but web virus upgrades very fast, obscure or encryption technology application general, this just makes traditional safety equipment effectively not detect.In order to collect the information of potential threat, finding new instrument, determining attack signature, and the motivation of research assailant, there is Honeypot Techniques (honeypot), exactly by meticulously arranging that network trap attracts hacker attacks.Traditional honey jar mainly refers to server end honey jar, but web virus runs at client-side, and therefore, LanceSpitzner first proposed client honeypot (client-sidehoneypot or honeyclient) this concept.

Different from traditional honey jar, the security vulnerability that client honeypot may exist for client software, server is visited by opening client software on one's own initiative, monitoring occurs with or without abnormal behaviour, trace analysis is carried out to unknown rogue program, and then reaches research learning and safing object.Client honeypot mainly for be Web browser and E-mail client, therefore it needs data source, is faced with the challenge in the network coverage face how reaching large.In order to solve this point, honey jar and reptile (spider) combine by client honeypot, crawl network url to find the Malware performed by client software that may exist with reptile.Substantially the client of all kinds all includes three continuous print treatment steps: first, all pending objects are put into a queue, then, client carrys out the object in request queue, whether contains malice composition finally by analyzing the object determined in queue.While request with handling object, object queue can be expanded.

Sandbox filtering technique

Gateway-level safety product blocks a malicious web pages subject matter is technically exactly how to judge whether a webpage is malicious web pages.Malicious code JavaScript in present most of malicious web pages writes, these JavaScript trigger the leak of local ActiveX control by HeapSpray technology and carry out wooden horse download and run, and the JavaScript code of these malice has generally all carried out obscuring encryption to hide to detect, be the JavaScript code in one section of real malicious web pages as follows:

Faced by obscure encryption after JavaScript code, by keyword search, simple identifies that the way of malicious web pages will lose efficacy, the most effective way is exactly in a virtual environment, carry out actual parsing by built-in HTML and JavaScript analytics engine to the JavaScript in webpage to perform in this case, and resolving the behavior following the tracks of JavaScript code in implementation, such as create ActiveX control and concentrate a large amount of application internal memories etc., thus accurately identifying malicious web pages.This detection mode is called that sandbox detects (Sandbox), and verification and measurement ratio is very high in theory by this method.

But when reality realizes this detection scheme, built-in HTML and the JavaScript analytics engine of trace routine does not likely functionally realize complete, or some behaviors and real browser have deviation, running environment is also had to be different with real client computer after all, can have more or less such or such different from browser in a word, and these differences can utilize by the author of malicious web pages the follow-up investigations of hiding trace routine, that is first malicious web pages checked to see oneself whether to operate in real browser before operation malicious code, if not, what it understands and does not do, it is a malicious web pages that built-in HTML and the JavaScript analytics engine of such trace routine cannot discover this, because malicious code does not run at all.On the contrary, when malicious web pages inspection find oneself be operate in real browser time, it just can run malicious code.Just specifically introduce several possible mode below:

1., in DOM, some objects have many another names, as:

Document.location, window.location, document.URL are of equal value;

Window, window.window, window.self, window.parent, window.self.self.self.self are of equal value;

Any one global variable becomes the member of window all automatically.

Whether malicious web pages can utilize this point oneself to operate in real browser to detect, if the JavaScript analytics engine that in safety product, oneself realizes realizes incomplete words to the characteristic that DOM calls, just can be found by malicious code, thus allow malicious web pages escape from detection.

2. by using some functions of HTML<META>tag to test, judge that current running environment is Sandbox or browser, as meta employs HttpOnly attribute when the Set-Cookie of setting, after HTML agreement is defined in and employs HttpOnly attribute, the Cookie that this meta sets can not be had access to by the script in the page, if the JavaScript analytics engine of safety product some characteristics to meta realize incomplete words, just may be utilized by malicious web pages and escape from detection.

3.Image is to the built-in object liking JavaScript, object can be created by statement varimg=newImage (), statement img.src=http can be passed through: //www.exist.com/a.jpg obtains picture from network after establishment Image object, when browser runs into the words, http request can be sent to www.exist.com, obtain picture a.jpg, if this picture successfully obtains from www.exist.com, browser can call the onload () method of img, if this picture does not exist on www.exist.com or www.exist.com not exists, browser can call the onerror () method of img, malicious web pages can utilize these characteristics to judge current running environment to be Sandbox or browser.

4. work as the mistake of the infinite recursive call occurring grammar mistake or function in javascript code, browser can call window.onerror (), mistake by deliberately introducing grammar mistake or infinite recursive call in malicious web pages judges that current running environment is Sandbox or browser, if the sandbox of safety product realizes incomplete words to error handle, such as may stop when running into grammar mistake having resolved, and do not have probable browser to remove to call window.onerror like that, so just may be utilized by malicious web pages and escape from detection.

Also have other method that can adopt a lot of as detected the characteristic of Ajax, to the processing sequence of event, to the test of plug-in, can be used for detecting current running environment to the test etc. of same source policy is in browser or in sandbox.

The mode that sandbox will be utilized to detect as seen through the above analysis detects malicious web pages, and the very important point is exactly will simulate as far as possible some key characteristics of browser.

But, no matter be new technology or conventional art, all there is certain weak point:

1, no matter be conventional art or new technology, all there is no the depth analysis doing hierarchical classification from Website page content;

2, no matter be conventional art or new technology, all do not record the malicious code that the concrete page is crossed, and behavioural analysis is done to it;

3, no matter be conventional art or new technology, detection means is comprehensive not, only detects from some aspect, there is technology dead angle;

4, no matter be conventional art or new technology, all can not locate the position that malicious code exists, accurately location malicious code Producing reason, the consequence that may cause.

In view of this, prior art haves much room for improvement and improves.

Summary of the invention

In view of the deficiencies in the prior art, the object of the invention is the detection method and the system that provide a kind of Internet information system malicious code.Be intended to solve the problems such as the detection means existed in the Malicious Code Detection technology of prior art is comprehensive not, malicious code location is inaccurate.

Technical scheme of the present invention is as follows:

A detection method for Internet information system malicious code, wherein, described detection method comprises the following steps:

A, to detect needing the website detected every a predetermined time, capturing the homepage content that detects website and the content of the ground floor that links with homepage and the second layer;

B, the page of the page of acquisition and previous acquisition to be compared, find out difference between the two, determine whether there is malicious code according to described difference; Described comparison comprises successively: page properties comparison, the comparison of page feature code, page elements comparison, page JS code comparison and content of pages comparison;

C, export and there is the page of malicious code, and the malicious snippets of code that mark exists;

D, artificial secondary-confirmation is done to the malicious code of the page detected, and the result after registration confirmed.

The detection method of described Internet information system malicious code, wherein, in above-mentioned steps A, the content of pages grabbed is stored in hard disk in a raw, and the content of the page comprises js script, link, picture and word content information.

The detection method of described Internet information system malicious code, wherein, before described step B, also comprises after described steps A:

B0, the page of acquisition and the page that prestores to be compared, judge whether the described page is new added pages, if be then recorded into new added pages, otherwise carry out step B.

The detection method of described Internet information system malicious code, wherein, also comprises the restorative procedure marking described malicious code in described step C.

A detection system for Internet information system malicious code, wherein, described detection system comprises:

Acquiring unit, for detecting needing the website detected every a predetermined time, captures the homepage content that detects website and the content of the ground floor that links with homepage and the second layer;

Comparing unit, for being compared by the page of the page of acquisition and previous acquisition, finding out difference between the two, determining whether there is malicious code according to described difference; Described comparison comprises successively: page properties comparison, the comparison of page feature code, page elements comparison, page JS code comparison and content of pages comparison;

Output unit, for exporting the page that there is malicious code, and the malicious snippets of code that mark exists;

Manual detection unit, for doing artificial secondary-confirmation to the malicious code of the page detected, and the result after registration confirmed.

Beneficial effect:

The detection method of the Internet information system malicious code of the application and system, have the following advantages:

1, content of pages Multiple detection, contrast page properties, condition code, page elements and content of pages, support omnibearing substep multiple levels of content and detect, perfect content detection integrity degree, improve the accuracy detected;

2, the data volume of website Malicious Code Detection is magnanimity level.Adopt substep multi-stage detection technique, the performance of detection, efficiency and the requirement to hardware environment can be improved;

3, can identify that the page newly increases, or change, and the source of content of pages change can be reviewed;

4, the framework of the content of pages detection of Erecting and improving, compatible existing plug-in polling technique, core embedded technology and event triggering technique, from page feature storehouse code, page elements and the page dynamically/static content becomes more meticulous the concrete reason of positioning webpage content alteration, forms experience storehouse.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the detection method of Internet information system malicious code of the present invention.

Fig. 2 is the structured flowchart of the detection system of Internet information system malicious code of the present invention.

Embodiment

Because website malicious code is generally nested in page elements, page JS code and content of pages, in order to detect whether website exists malicious code, first judge that the page is that new added pages or content of pages there occurs change, and judge whether the page exists malicious code on this basis.Based on above-mentioned theory, the invention provides a kind of detection method for Internet information system malicious code and system, for making object of the present invention, technical scheme and effect clearly, clearly, the present invention is described in more detail below.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

Refer to Fig. 1, it is the process flow diagram of the detection method of Internet information system malicious code of the present invention.As shown in the figure, described detection method comprises the following steps:

S1, to detect needing the website detected every a predetermined time, capturing the homepage content that detects website and the content of the ground floor that links with homepage and the second layer;

S2, the page of the page of acquisition and previous acquisition to be compared, find out difference between the two, determine whether there is malicious code according to described difference; Described comparison comprises successively: page properties comparison, the comparison of page feature code, page elements comparison, page JS code comparison and content of pages comparison;

S3, export and there is the page of malicious code, and the malicious snippets of code that mark exists;

S4, artificial secondary-confirmation is done to the malicious code of the page detected, and the result after registration confirmed.

Be described in detail for above-mentioned steps respectively below:

Described step S1 detected needing the website detected every a predetermined time, captured the homepage content that detects website and the content of the ground floor that links with homepage and the second layer.First be arranged on responsive word website capturing web site contents and obtain rule, web crawlers in the prior art finds webpage by the chained address of webpage, and circulation is always gone down, until webpages all for this website has all been captured.In concrete application implementation process, in order to obtain web site contents faster, some webpages not needing to carry out content obtaining can be dispensed by the acquisition of information rule pre-set, reducing the workload capturing content.The acquisition rule settings used in the method is: within every 30 minutes, obtain once, the network station deeply obtained relates to ground floor and the second layer that the homepage of website to be detected, homepage link, it is conceivable that, cycle can be set to more grow or the shorter time as required, can be only whole webpages of homepage or this website according to the degree of depth of the actual needs website detection detected.Further, the content of pages grabbed is stored in hard disk in a raw, and the content of the page comprises js script, link, picture and word content information.

The page of the page of acquisition and previous acquisition is compared by described step S2, finds out difference between the two, determines whether there is malicious code according to described difference; Described comparison comprises successively: page properties comparison, the comparison of page feature code, page elements comparison, page JS code comparison and content of pages comparison.

Wherein, page properties contrast comprises the following steps:

If current page is different from page properties last time, enters next step and detect; If page properties is identical, then the different old versions in the page properties storehouse in the storehouse of comparison experience again (making referrals to about extended meeting behind experience storehouse).If all old versions are identical, then the page does not change; Otherwise whether find out old versions different recently, read content of pages change and confirm, done and confirmed, then this content of pages does not change, and detects and terminates; Otherwise enter next step to detect.

The comparison of page feature code comprises the following steps:

Page feature code refers to MD5 code; If current page is different from page MD5 code last time, enters next step and detect; If current page is identical with page MD5 code last time, the old version in the page feature code storehouse in comparison experience storehouse, if page feature code storehouse is identical, detects and terminates; Otherwise find out the different characteristic code page up-to-date in experience storehouse, check page change state and whether confirm, done the then detection confirmed and terminated, otherwise enter next step detection.

Page elements comparison comprises the following steps:

Page elements refers to the html element set building page framework, comprises html, body, head, font, table, tr, td etc.If current page is different from page elements last time, enters next step and detect; If current page is identical with page elements last time, the old version of the page elements in comparison experience storehouse, if page elements storehouse is identical, detects and terminates; Otherwise find out the page that different page elements up-to-date in experience storehouse are corresponding, check page change state and whether confirm, done the then detection confirmed and terminated, otherwise enter next step detection.

The comparison of page JS code comprises the following steps:

If current page is different from page JS code last time, thinks that web page contents changes, call content tampering storehouse, gather each link testing process data, export the page and change reason, detect and terminate; If current page and last time content of pages in a disguised form with, think that webpage is without change, detect terminate.

Content of pages comparison comprises the following steps:

If current page is different from content of pages last time, think that web page contents changes, call malicious code storehouse, detect whether this change is malicious code, detect and terminate; If current page is identical with content of pages last time, think that webpage is without change, mean no harm code, detects and terminate.

Described step S3 exports the page that there is malicious code, and the malicious snippets of code that mark exists.Particularly, export the page address (URL) that the page changes place, [click] station address, and the malicious snippets of code that mark exists, further, the method that all right and mark is repaired.

Described step S4 does artificial secondary-confirmation to the malicious code of the page detected, and the result after registration confirmed.In order to ensure the accuracy detected, also needing to do manual confirmation to the malicious code on the page be detected, meanwhile, recording the result after confirmation (follow-up join in experience storehouse).

Further, before described step S2, also comprise after described step S1:

S20, the page of acquisition and the page (can be stored in experience storehouse) that prestores to be compared, judge whether the described page is new added pages, if be then recorded into new added pages, otherwise carry out step S2.In order to realize the method, need to carry out content of pages storage: be namely that unit stores current, the upper related content such as version and history page of this page with the page.The element stored comprises parent page, page properties, page feature code, page elements and content of pages, and records the detected number of times in this website.

Present invention also offers a kind of detection system of Internet information system malicious code, as shown in Figure 2, described detection system comprises:

Acquiring unit 100, for detecting needing the website detected every a predetermined time, captures the homepage content that detects website and the content of the ground floor that links with homepage and the second layer;

Comparing unit 200, for being compared by the page of the page of acquisition and previous acquisition, finding out difference between the two, determining whether there is malicious code according to described difference; Described comparison comprises successively: page properties comparison, the comparison of page feature code, page elements comparison, page JS code comparison and content of pages comparison;

Output unit 300, for exporting the page that there is malicious code, and the malicious snippets of code that mark exists;

Manual detection unit 400, for doing artificial secondary-confirmation to the malicious code of the page detected, and the result after registration confirmed.

Wherein, described manual detection unit is also for putting into experience storehouse by the page after testing and the malicious code that comprises thereof, and described experience storehouse comprises: page properties storehouse, page feature code storehouse, page elements storehouse and malicious code storehouse.

In said system, the function of various piece is all described in detail in the above-mentioned methods, has stated here with regard to no longer superfluous.

In sum, the detection method of the Internet information system malicious code of the application and system, have the following advantages: 1, content of pages Multiple detection, contrast page properties, condition code, page elements and content of pages, support omnibearing substep multiple levels of content to detect, perfect content detection integrity degree, improves the accuracy detected; 2, the data volume of website Malicious Code Detection is magnanimity level.Adopt substep multi-stage detection technique, the performance of detection, efficiency and the requirement to hardware environment can be improved; 3, can identify that the page newly increases, or change, and the source of content of pages change can be reviewed; 4, the framework of the content of pages detection of Erecting and improving, compatible existing plug-in polling technique, core embedded technology and event triggering technique, from page feature storehouse code, page elements and the page dynamically/static content becomes more meticulous the concrete reason of positioning webpage content alteration, forms experience storehouse.

Should be understood that, application of the present invention is not limited to above-mentioned citing, for those of ordinary skills, can be improved according to the above description or convert, and all these improve and convert the protection domain that all should belong to claims of the present invention.

Claims

1. a detection method for Internet information system malicious code, is characterized in that, described detection method comprises the following steps:

D, artificial secondary-confirmation is done to the malicious code of the page detected, and the result after registration confirmed;

The content of pages grabbed in described steps A is stored in hard disk in a raw, and the content of the page comprises js script, link, picture and word content information;

The comparison of described page JS code comprises the following steps: if current page is different from page JS code last time, then thinks that web page contents changes, call content tampering storehouse, gathers each link testing process data, exports the page and changes reason, detect and terminate; If current page and last time content of pages in a disguised form with, then think that webpage is without change, detect terminate;

Described content of pages comparison comprises the following steps: if current page is different from content of pages last time, think that web page contents changes, call malicious code storehouse, detects whether this change is malicious code, detects and terminates; If current page is identical with content of pages last time, think that webpage is without change, mean no harm code, detects and terminate.

2. the detection method of Internet information system malicious code according to claim 1, is characterized in that, before described step B, also comprises after described steps A:

3. the detection method of Internet information system malicious code according to claim 1, is characterized in that, also comprises the restorative procedure marking described malicious code in described step C.

4. a detection system for Internet information system malicious code, is characterized in that, described detection system comprises:

Manual detection unit, for doing artificial secondary-confirmation to the malicious code of the page detected, and the result after registration confirmed;

5. the detection system of Internet information system malicious code according to claim 4, it is characterized in that, described manual detection unit is also for putting into experience storehouse by the page after testing and the malicious code that comprises thereof, and described experience storehouse comprises: page properties storehouse, page feature code storehouse, page elements storehouse and malicious code storehouse.