CN102833240B

CN102833240B - A kind of malicious code catching method and system

Info

Publication number: CN102833240B
Application number: CN201210294945.XA
Authority: CN
Inventors: 云晓春; 李书豪; 张永铮; 臧天宁; 王一鹏
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2012-08-17
Filing date: 2012-08-17
Publication date: 2016-02-03
Anticipated expiration: 2032-08-17
Also published as: CN102833240A

Abstract

The present invention relates to a kind of malicious code catching method and system.Malicious code catching method comprises: from multiple e-mail data source, obtain mail data; Resolve described mail data, be apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in described mail data, and this apocrypha is saved in apocrypha database; Utilizing malicious code property data base and manual detection to detect described apocrypha, is that abnormal apocrypha is saved in described malicious code sample database by testing result.Malicious code catching method of the present invention and system can be applied in relevant honey jar and honeynet system, can increase the coverage of catching object, promote the capture ability of malicious code.

Description

A kind of malicious code catching method and system

Technical field

The present invention relates to technical field of network information safety, particularly relate to a kind of malicious code catching method and system.

Background technology

The malicious codes such as network worm, Trojan Horse, Botnet emerge in an endless stream, and bring significant damage to the network information security.In order to analyze and detection of malicious code better, first defender should study the method obtaining malicious code in a large amount of the Internet, and honey jar and sweet network technology arise at the historic moment, progressively rise.Honeypot Techniques refers to that defender passes through to provide virtual or real main frame, server and other intelligent terminals, or simulation related service, for victim scanning, invasion, and then reaches the object obtaining relevant malicious code.The network with certain topological structure that honey net is made up of several correlative honey jars, it can be counted as the honey pot system of large-scale distributed deployment.In general, honey jar does not use as normal main frame, server and other intelligent terminals, and it is mainly used in attracting assailant's invasion, and the attack information according to catching carrys out analyzing and testing, and then the relevant defence policies of design, and then stop or weaken the harm of assailant.

Traditional honey jar can be divided three classes: Virtual honeypot, virtual machine honey jar and physics honey jar.Virtual honeypot inveigles assailant to invade by analog network topology, operating system and network service etc.Although it is few that this kind of honey jar takies resource, interaction capabilities is low, can only catch low mutual malicious code, as subnetwork worm.Virtual machine honey jar is by some leak of virtual machine design or weakness, inveigles assailant to invade.The advantage of this kind of honey jar is saving resource, and supports certain mutual, and can obtain more complete attack information, but easily victim utilizes virtual machine detection technique to find, loses the effect of catching malicious code.Physics honey jar is by using real equipment, designing some leak or weakness, inveigle assailant to invade, and this kind of honey jar can carry out highly mutual with assailant, not easily discovered, but physics honey jar cost is very high, unsuitable large scale deployment.

In honey jar and sweet network technology, be how one of its key problem at the more malicious codes of unit interval IT, and this problem and malicious code communication means close relation.In general, the communication means of malicious code can be divided into two large classes: a class utilizes leak to propagate, and another kind of is utilize social engineering to propagate.Leak is propagated not to be needed to carry out alternately with victim, and traditional Honeypot Techniques is many to be designed based on this type of communication means.It is by carrying out analysis and utilization to weakness such as victim's natural reaction, curiosity, trust, greediness that social engineering is propagated, and reach the object of deception invasion, its communication process needs the participation of user.Along with the development of network service and application, social engineering is propagated and is presented variation, complicated trend.In recent years, increasing malicious code adopted this type of communication means (namely social engineering is propagated), as Koobface, shake net (Stuxnet), Zues etc.

Existing honey jar can catch the malicious code propagated based on leak well, and for the malicious code that some is propagated based on social engineering, also lacks efficient catching method, especially based on the malicious code utilizing social engineering to propagate of Email network.This type of malicious code is propagated along the train of thought of Email network, by with malicious code or with the trap mail push of its access mode to subscriber mailbox, lure that victim performs the malicious code in trap mail into, or access (as web page interlinkage is downloaded) malicious code according to the mode that it provides and perform, and then reaching the object of invasion victim computer.This type of malicious code often utilizes victim's mailbox to send trap mail to the Email good friend of victim, and then infects more Email user.

Can find out, above-mentioned malicious code may utilize the trusting relationship in Email network between user to propagate.Email network is a kind of social networks formed by e-mail contact by mailbox user, is also a kind of important application type of complex network.Complex network is abstracted into figure to analyze by researcher usually, for above-mentioned Email network, each subscriber mailbox represents with " point ", mail between user and quantity " limit " and " weights " represent (if do not have mail between certain two user, then corresponding point-to-point transmission is boundless).In social networks, network average distance is less, and convergence factor is comparatively large, and node degree presents exponential distribution.

But existing honey jar and sweet network technology not yet take into full account the malicious code propagated based on Email network utilisation social engineering, and above-mentioned social networks feature is not utilized to design, to catch the malicious code of more Spread type.Visible, existing honey jar and sweet network technology can not carry out the malicious code propagated based on Email network utilisation social engineering effectively catching on a large scale.

Summary of the invention

Technical problem to be solved by this invention is to provide a kind of malicious code catching method and system, improves the capture ability to the malicious code propagated based on Email network utilisation social engineering.

For solving the problems of the technologies described above, the present invention proposes a kind of malicious code catching method, comprising:

Mail data is obtained from multiple e-mail data source;

Resolve described mail data, be apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in described mail data, and this apocrypha is saved in apocrypha database;

Utilizing malicious code property data base and manual detection to detect described apocrypha, is that abnormal apocrypha is saved in described malicious code sample database by testing result.

Further, above-mentioned malicious code catching method also can have following characteristics, also comprises:

From described malicious code sample database, obtain malicious code sample, in sandbox, run this malicious code sample, record the characteristic information of this malicious code sample and be saved in described malicious code property data base.

Further, above-mentioned malicious code catching method also can have following characteristics, described from multiple e-mail data source, obtain mail data before, also comprise:

The selection of e-mail terminal Virtual honeypot and Deployment Algorithm is adopted to distribute and optimize the use of Virtual honeypot resource, selection and the Deployment Algorithm of described e-mail terminal Virtual honeypot are: the electronic mail network propagated by malicious code is abstract is a social networks Weighted Directed Graph model with Small World Model feature formed with point and limit, wherein, point expression email accounts, while represent the communication mail between email accounts, the weights on limit represent the quantity of communication mail in certain hour, the in-degree of point represents sender's quantity of this point in certain hour, out-degree represents addressee's quantity of this point in certain hour.

Further, above-mentioned malicious code catching method also can have following characteristics, describedly from multiple e-mail data source, obtains mail data comprise:

The first configuration information according to presetting obtains e-mail data source information, and extract the kind in e-mail data source, what e-mail terminal, volunteer's email accounts or associated mechanisms that the kind in described e-mail data source is applied for the registration of for automation provided removes privacy information mail data;

The e-mail terminal that if e-mail data source is automation applies for the registration of or volunteer's email accounts, polling cycle then in this e-mail data source then, obtain the pending mail in this e-mail data source, by the mail data of this pending mail write mass mailings original information data storehouse, the header information that described mail data comprises pending mail source code, the summary info, the source code text of pending mail and the accessible file of pending mail that generate according to this header information;

If what e-mail data source provided for associated mechanisms removes privacy information mail data, then standardization is carried out to the data in this e-mail data source, remove the privacy information of pending mail, by the mail data of pending mail write mass mailings original information data storehouse, the header information that described mail data comprises pending mail source code, the summary info, the source code text of pending mail and the accessible file of pending mail that generate according to this header information.

Further, above-mentioned malicious code catching method also can have following characteristics, and the header information of described pending mail source code comprises languages, type of coding, type of attachment, sender IP address's information, addressee IP address information, IP-based mail routing information.

Further, above-mentioned malicious code catching method also can have following characteristics, the content of described summary info comprise the addressee of Email, sender, theme, text size, with or without annex, e-mail data Source Type.

Further, above-mentioned malicious code catching method also can have following characteristics, resolve described mail data, be apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in described mail data, and this apocrypha is saved in apocrypha database; Comprise:

The second configuration information according to presetting carries out initialization;

The header information of described pending mail source code and source code text are resolved, the header information of exception and/or source code text are saved in malicious code sample database;

Rate of failing to report according to described setting filters described pending mail accessible file, and get rid of normal file, the file that cannot get rid of is saved in apocrypha database as apocrypha.

Further, above-mentioned malicious code catching method also can have following characteristics, utilizes malicious code feature database and manual detection to detect described apocrypha, is that abnormal apocrypha is saved in described malicious code sample database and comprises by testing result:

Malicious code characteristic information according to preserving in malicious code property data base detects described apocrypha, and the apocrypha comprising described malicious code characteristic information is saved in malicious code sample database;

For the apocrypha that cannot detect according to described malicious code characteristic information, expert system according to presetting carries out manual detection, will be judged to be that the new feature information that the apocrypha of malicious code produces is saved in malicious code property data base in manual detection process.

For solving the problems of the technologies described above, the invention allows for a kind of malicious code capture systems, comprising:

Acquisition module, for obtaining mail data from multiple e-mail data source;

Parsing module, for resolving the mail data that described acquisition module obtains, being apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in described mail data, and being saved in apocrypha database by this apocrypha;

Testing result, for utilizing malicious code property data base and manual detection to detect the apocrypha that described parsing module exports, is that abnormal apocrypha is saved in described malicious code sample database by detection module.

Further, above-mentioned malicious code capture systems also can have following characteristics, also comprises:

Sandbox module, for obtaining malicious code sample from described malicious code sample database, runs this malicious code sample in sandbox, records the characteristic information of this malicious code sample and is saved in described malicious code property data base.

Further, above-mentioned malicious code capture systems also can have following characteristics, also comprises being connected with described acquisition module:

Algorithms selection module, distribute for adopting the selection of e-mail terminal Virtual honeypot and Deployment Algorithm and optimize the use of Virtual honeypot resource, selection and the Deployment Algorithm of described e-mail terminal Virtual honeypot are: the electronic mail network propagated by malicious code is abstract is a social networks Weighted Directed Graph model with Small World Model feature formed with point and limit, wherein, point expression email accounts, while represent the communication mail between email accounts, the weights on limit represent the quantity of communication mail in certain hour, the in-degree of point represents sender's quantity of this point in certain hour, out-degree represents addressee's quantity of this point in certain hour.

Further, above-mentioned malicious code capture systems also can have following characteristics, and described acquisition module comprises:

Extraction unit, for obtaining e-mail data source information according to the first configuration information preset, and extract the kind in e-mail data source, the e-mail data that e-mail terminal, volunteer's email accounts or associated mechanisms that the kind in described e-mail data source is applied for the registration of for automation provide;

First acquiring unit, for in e-mail data source be automation apply for the registration of e-mail terminal or volunteer's email accounts time, polling cycle in this e-mail data source then, obtain the new mail in this e-mail data source, by the mail data of pending mail write mass mailings original information data storehouse, the header information that described mail data comprises pending mail source code, the accessible file of summary info, pending mail source code text and pending mail generated according to this header information;

Second acquisition unit, for provide for associated mechanisms in e-mail data source e-mail data time, standardization is carried out to the data in this e-mail data source, remove the privacy information of pending mail, by the mail data of pending mail write mass mailings original information data storehouse, the header information that described mail data comprises pending mail source code, the accessible file of summary info, pending mail source code text and pending mail generated according to this header information.

Further, above-mentioned malicious code capture systems also can have following characteristics, and the header information of described pending mail source code comprises languages, type of coding, type of attachment, sender IP address's information, addressee IP address information, IP-based mail routing information.

Further, above-mentioned malicious code capture systems also can have following characteristics, the content of described summary info comprise the addressee of Email, sender, theme, text size, with or without annex, e-mail data Source Type.

Further, above-mentioned malicious code capture systems also can have following characteristics, and described parsing module comprises:

First initialization unit, for carrying out initialization according to the second configuration information preset;

Resolution unit, for resolving the header information of described pending mail source code and source code text, is saved in malicious code sample database by the header information of exception and/or source code text;

Filter element, filters for the accessible file of rate of failing to report to described pending mail according to described setting, and get rid of normal file, the file that cannot get rid of is saved in apocrypha database as apocrypha.

Further, above-mentioned malicious code capture systems also can have following characteristics, and described detection module comprises:

Second initialization unit, for carrying out initialization according to the second configuration information preset;

First detecting unit, for detecting described apocrypha according to the malicious code characteristic information preserved in malicious code property data base, is saved in malicious code sample database by the apocrypha comprising described malicious code characteristic information;

Second detecting unit, for for the apocrypha that cannot detect according to described malicious code characteristic information, expert system according to presetting carries out manual detection, will be judged to be that the new feature information that the apocrypha of malicious code produces is saved in malicious code property data base in manual detection process.

Malicious code catching method of the present invention and system can be applied in relevant honey jar and honeynet system, can increase the coverage of catching object, promote the capture ability of malicious code.

Accompanying drawing explanation

Fig. 1 is the flow chart of the obtaining step of malicious code catching method in the embodiment of the present invention;

Fig. 2 is the flow chart of the analyzing step of malicious code catching method in the embodiment of the present invention;

Fig. 3 is the flow chart of the detecting step of malicious code catching method in the embodiment of the present invention;

Fig. 4 is the structure chart of malicious code capture systems in the embodiment of the present invention.

Embodiment

Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.

The selection and the Deployment Algorithm that present invention employs Email terminal Virtual honeypot distribute and optimize the use of Virtual honeypot resource.This algorithm (the i.e. selection of Email terminal Virtual honeypot and Deployment Algorithm, lower with) be a social networks Weighted Directed Graph model (being designated as G=<V, E>) with Small World Model feature formed with point and limit the Email network abstraction that malicious code is propagated.Wherein, point (is designated as v _i) representing an Email account, limit (is designated as e _k) representing communication mail between Email account, the weights on limit (are designated as w (e _k)) representing the quantity of communication mail in certain hour, the in-degree of point (is designated as id (v _i)) representing sender's quantity of this point in certain hour, out-degree (is designated as od (v _i)) represent addressee's quantity of this point in certain hour.The main thought of this algorithm is: carry out cluster analysis to Email network, find out and on average gather the higher subnet of coefficient in network, obtain available Email account in subnet, according to the in-degree of point, liveness (unit interval in send mail number) and gather coefficient three standard weighted calculation and go out comprehensive evaluation index (weights are by administrator configurations and adjustment), by the above-mentioned Email account of desired value descending, extract the Email account of predetermined quantity (by administrator configurations and adjustment), add Virtual honeypot set.Such as, keeper can according to the attributive character of actual Email network, selects in-degree, liveness and gathers the higher node of coefficient (such as, three indexs are the terminal of front 30%), calculate comprehensive evaluation index.

The present invention proposes a kind of malicious code catching method, the method comprises the steps:

Step one, obtains mail data from multiple e-mail data source;

Step one is called obtaining step.

Wherein, the kind in e-mail data source can comprise three kinds: what Email terminal, volunteer Email account and associated mechanisms (such as mail service business) that automation is applied for the registration of provided removes privacy mail data.Front two classes can be classified as Email account information, and last class is called coordination mail data." remove privacy information " and refer to that, under the prerequisite of basic satisfied acquisition apocrypha, the relevant informations such as the real people that may relate to mail data and thing carry out automation or semi-automatic replacement processes, protection individual privacy and sensitive information.

Adopt Email terminal as Virtual honeypot, and utilize the small-world network feature of Email network to dispose Virtual honeypot, form the Virtual Honeynet that has special topological structure, and then can more effectively catch more malicious code." small-world network " is a kind of graph type in Kinetics Network, can form communication link between the most of node in this type of figure by other nodes of minority.Small-world network feature mainly comprises: gather coefficient, average path length and node degree and distribute.

In the embodiment of the present invention, the header information that mail data can comprise Email source code, the summary info, Email source code text and the Email accessible file that generate according to this header information.Wherein, Email accessible file refers to the annex that can directly extract from Email, such as, hyperlink in the embedded picture of Email Body, Email can download file, Email attachment files.Summary info is the abstract of whole mail, and the content of summary info can comprise mail number of words, Email attachment size, Email attachment memory location etc.

Fig. 1 is the flow chart of the obtaining step of malicious code catching method in the embodiment of the present invention.As shown in Figure 1, in embodiments of the present invention, obtaining step (i.e. step one) can specifically comprise following sub-step:

Step 101, obtains e-mail data source information;

Particularly, e-mail data source information can be obtained according to the first configuration information, and extract the kind (Email account information or coordination mail data) in e-mail data source.。Wherein, the first configuration information is preset by keeper.The content of the first configuration information is suspicious comprises address, mass mailings original information data storehouse, access account password, the list of Email terminal account information etc.

Step 102, determines whether Email account information, if perform step 103, otherwise performs step 111;

Step 103, obtains the target account last visit time;

Herein, target account refers to the Email account in step 102.

Step 104, judges whether the polling cycle reaching target account, if perform step 105, otherwise performs step 109;

Step 105, access destination account;

Step 106, judges whether target account also has new mail, performs step 107 if having, otherwise performs step 109;

The new mail of target account is the targeted mails hereafter mentioned, and is also pending mail.

Step 107, generates the summary info of targeted mails;

Targeted mails is pending mail, lower same.

Step 108, by the source code text of targeted mails and summary info write mass mailings original information data storehouse;

Meanwhile, also can by targeted mails accessible file stored in the relevant file system of database, and by the file path of targeted mails accessible file write mass mailings original information data storehouse.

Step 109, determines whether last target account, if perform step 118, otherwise performs step 110;

Step 110, navigates to next account, performs step 103;

Step 111, determines whether to coordinate mail data, if perform step 112, otherwise performs step 118;

Step 112, carries out standardization to multi-source mail data;

Here, " standardization " refers to and unifies process to the mail data of different mail data source, extracts and generates consolidation form and the discernible mail source code of system (such as eml file format).

Step 113, navigates to targeted mails;

Step 114, removes targeted mails privacy information;

Step 115, generates the summary info of targeted mails;

The content of summary info is suspicious comprise the addressee of Email, sender, theme, text size, with or without annex, e-mail data Source Type etc.

Step 116, by targeted mails and summary info write mass mailings original information data storehouse;

Step 117, has judged whether untreated mail, if perform step 113, otherwise performs step 118;

Step 118, terminates.

Step one utilizes multiple e-mail data source to input as system, can capture the malicious code based on Email Internet communication on a large scale.

Step 2, resolves mail data, is apocrypha, and is saved in apocrypha database by this apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in mail data;

Step 2 is called analyzing step.

Fig. 2 is the flow chart of the analyzing step of malicious code catching method in the embodiment of the present invention.As shown in Figure 2, in embodiments of the present invention, analyzing step (i.e. step 2) can specifically comprise following sub-step:

Step 201, according to the second configuration information initialization;

Initialization in this step refers to the initialization of parameter, resource etc.

Second configuration information is preset by keeper.The content of the second configuration information can comprise parsing number of mail, can arrange number of servers, database address information etc.Also have a content to be " crawlers quantity " in second configuration information, in the present invention, crawlers quantity is greater than 1, and therefore, used in the present invention is parallel crawler technology.

Step 202, resolves mail header information;

Mail header information can comprise the contents such as languages, type of coding, type of attachment, sender IP address's information, addressee IP address information, IP-based mail routing information.

Step 203, judges that whether header information is abnormal, if perform step 204, otherwise performs step 205;

Particularly, according to mail protocol reference format, the mail that all header format are not inconsistent, is all considered as exception.Judge that the situation of head exception has varied, such as, can judge according to following header information abnormal conditions that whether header information is abnormal.The situation of header information exception comprises: there is spurious information in header information, sender IP is tampered, sender address is tampered, sender's name is tampered etc.

Step 204, recording exceptional information write apocrypha database, performs step 205;

Step 205, resolves Mail Contents information;

Mail Contents is also message body.

Step 206, judges whether the link that there is accessible file, if perform step 209, otherwise performs step 207;

Wherein, accessible file comprises the embedded picture of message body, hyperlink can download file, file etc. in annex.

Step 207, judges that whether Mail Contents is abnormal, if perform step 208, otherwise performs step 213;

The situation of Mail Contents exception comprises that text has spurious information, annex has spurious information etc.The situation of Mail Contents exception is mainly divided into text to forge and annex is forged, and text is forged and comprised the forgery of mail header information, the forgery of mail body content information; Annex forges kind a lot, comprises and utilizes attachment files to bundle executable file, steganography invalid data, distort normal file form etc.

Step 208, recording exceptional information write apocrypha database, performs step 213;

Step 209, extracts or crawls accessible file;

From above step 206, step 207 and step 209, the acquisition mode in the present invention supports mutual active acquisition mode.

Step 210, the accessible file of determining step 209 could be judged as normal file, if perform step 212, otherwise performs step 211;

This step is tentatively filtered accessible file, can get rid of normal file with higher rate of failing to report.The file that can not get rid of and apocrypha.These apocryphas are likely malicious codes, need further to detect." rate of failing to report " is arranged in configuration by keeper, and the span of rate of failing to report is (0,1), generally arranges the rate of failing to report of more than 50%, to ensure as far as possible low rate of false alarm.

Step 211, is stored into apocrypha database, performs step 213;

Step 212, deletes file destination;

Step 213, terminates.

Embodiment shown in Fig. 2, based on parallel crawler technology and mass mailings analytic technique, to support that mutual active acquisition mode obtains suspicious malicious code sample from relevant Email terminal, compensate for the deficiency of honey jar and the passive acquisition malicious code of Honeypot Techniques in prior art, enhance the dynamics of catching." parallel crawler technology " refers to and a station server runs several crawlers simultaneously, starts the such server of multiple stage, with enjoying a database simultaneously.And " supporting mutual active acquisition " refers to that the behavior can simulating Email user obtains the malicious code utilizing social engineering to propagate, such as: identify hyperlink in mail and access associated documents, obtain apocrypha etc. alternately with malice transmission source.

Step 3, utilizing malicious code property data base and the apocrypha of manual detection to step 2 gained to detect, is that abnormal apocrypha is saved in malicious code sample database by testing result;

Step 3 is called detecting step.

Fig. 3 is the flow chart of the detecting step of malicious code catching method in the embodiment of the present invention.As shown in Figure 3, in embodiments of the present invention, detecting step (i.e. step 3) can specifically comprise following sub-step:

Step 301, according to the 3rd configuration information initialization;

3rd configuration information is preset by keeper.The content of the 3rd configuration information can comprise the analyzing and testing time etc. of configuration merger treatment progress number, Malicious Code Detection number of servers, expert system.

Step 302, apocrypha merger process;

Here, merger process is the abbreviation that " conclude, merge " processes.Merger process adopts for the merging treatment of similar documents, based on hash(Hash) means such as the Documents Comparison duplicate removal merging treatment of algorithm, to reduce memory space, save subsequent calculations expense.

Step 303, the Malicious Code Detection in feature based storehouse (referring to malicious code property data base, lower same);

The concrete mode that feature based storehouse is detected can be: if containing the condition code in malicious code property data base in apocrypha, then apocrypha is malicious code, if not containing the condition code in malicious code property data base in apocrypha, then apocrypha is not malicious code.

Step 304, judges whether apocrypha can judge, if perform step 305, otherwise performs step 307;

Step 305, judges whether file destination is malicious code, if perform step 306, otherwise performs step 312;

Step 306, storage file is to malicious code sample database;

In this step, the file being stored to malicious code sample database refers to the apocrypha being judged as malicious code based on malicious code property data base.

Step 307, based on the analyzing and testing of expert system;

" expert system " that the present invention mentions is the amplification of traditional sense expert system, it has the security expert of malicious code analysis experience for core with some, using the apocrypha of this method as input, malicious code judgement is carried out by the conversed analysis technology that manually participates in, behavioral analysis technology, and then make up the deficiency of Aulomatizeted Detect means, find the unknown malicious code that Aulomatizeted Detect means cannot detect.

Step 308, judges whether file destination is malicious code, if perform step 309, otherwise performs step 312;

Step 309, storage file is to malicious code sample database;

Step 310, has judged whether malicious code new feature, if perform step 311, otherwise performs step 312;

Step 311, optimizes malicious code property data base, performs step 303;

If based in expert system testing process, target malicious code produces new feature code, then by this new feature code stored in malicious code property data base, to optimize malicious code property data base, improve accuracy of detection and efficiency.

Step 312, terminates.

Step 4, obtains malicious code sample, runs this malicious code sample in sandbox from described malicious code sample database, records the characteristic information of this malicious code sample and is saved in described malicious code property data base.

The sandbox mentioned in step 4 can be any sandbox.In a preferred embodiment of the invention, lightweight sandbox can be adopted.Lightweight sandbox can save computational resource to a certain extent.

Malicious code catching method of the present invention can realize with computer program, these programs can use C/C++, Python to develop, use PHP, JavaScript language exploitation interface, foreground, use Mysql to build Relational database, and use user-defined file storage mode to deposit relevant large data message.

Malicious code catching method of the present invention, has following beneficial effect:

1) choose Email terminal and form distributed virtual honey net as Virtual honeypot, greatly reduce honey net and build and the cost disposed, and more Email network malicious code can be caught quickly and efficiently;

2) adopt and resolve means based on the mutual acquisition mode of the degree of depth of reptile and mass mailings, compensate for the deficiency of honey jar and Honeypot Techniques passive mode, and more complicated Email network malicious code can be captured;

3) adopt and process many mail datas source as input, can greatly increase the scope that captures Email network malicious code and catch comprehensive.

Malicious code catching method of the present invention can be applied in relevant honey jar and honeynet system, can increase the coverage of catching object, promotes the capture ability of malicious code.

The invention allows for a kind of malicious code capture systems, in order to implement above-mentioned malicious code catching method.

Fig. 4 is the structure chart of malicious code capture systems in the embodiment of the present invention.As shown in Figure 4, in the present embodiment, malicious code capture systems comprises the acquisition module 410, parsing module 420, detection module 430 and the sandbox module 440 that are connected in turn.Wherein, acquisition module 410 for obtaining mail data from multiple e-mail data source.The mail data that parsing module 420 obtains for resolving acquisition module 410, is apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in mail data, and is saved in apocrypha database by this apocrypha.Detection module 430 detects the apocrypha that parsing module 420 exports for utilizing malicious code property data base and manual detection, is that abnormal apocrypha is saved in malicious code sample database by testing result.Sandbox module 440, for obtaining malicious code sample from malicious code sample database, runs this malicious code sample in sandbox, records the characteristic information of this malicious code sample and is saved in malicious code property data base.

In other embodiments of the present invention, sandbox module 440 can be there is no in malicious code capture systems yet.

In other embodiments of the present invention, the algorithms selection module that can also be connected with acquisition module 410 in malicious code capture systems, distribute for adopting the selection of e-mail terminal Virtual honeypot and Deployment Algorithm and optimize the use of Virtual honeypot resource, selection and the Deployment Algorithm of described e-mail terminal Virtual honeypot are: the electronic mail network propagated by malicious code is abstract is a social networks Weighted Directed Graph model with Small World Model feature formed with point and limit, wherein, point expression email accounts, while represent the communication mail between email accounts, the weights on limit represent the quantity of communication mail in certain hour, the in-degree of point represents sender's quantity of this point in certain hour, out-degree represents addressee's quantity of this point in certain hour.

Wherein, acquisition module 410 may further include extraction unit, the first acquiring unit and second acquisition unit.The first configuration information that extraction unit is used for according to presetting obtains e-mail data source information, and extract the kind in e-mail data source, the e-mail data that e-mail terminal, volunteer's email accounts or associated mechanisms that the kind in e-mail data source is applied for the registration of for automation provide.First acquiring unit be used in e-mail data source be the automation e-mail terminal of applying for the registration of or volunteer's email accounts time, polling cycle in this e-mail data source then, obtain the new mail in this e-mail data source, by the mail data of pending mail write mass mailings original information data storehouse, the header information that mail data comprises pending mail source code, the summary info, pending mail source code text and the pending mail accessible file that generate according to this header information.Second acquisition unit is used for when the e-mail data that e-mail data source provides for associated mechanisms, standardization is carried out to the data in this e-mail data source, remove the privacy information of pending mail, by the mail data of pending mail write mass mailings original information data storehouse, the header information that mail data comprises pending mail source code, the accessible file of summary info, pending mail source code text and pending mail generated according to this header information.

Wherein, the header information of pending mail source code can comprise languages, type of coding, type of attachment, sender IP address's information, addressee IP address information, IP-based mail routing information etc.

Wherein, the content of summary info can comprise the addressee of Email, sender, theme, text size, with or without annex, e-mail data Source Type etc.

Parsing module 420 may further include the first initialization unit, resolution unit and filter element.The second configuration information that first initialization unit is used for according to presetting carries out initialization.Resolution unit is resolved for the header information and source code text treating the source code that handles postal matter, and the header information of exception and/or source code text are saved in malicious code sample database.Filter element is used for treating according to the rate of failing to report of setting the accessible file handled postal matter and filters, and get rid of normal file, the file that cannot get rid of is saved in apocrypha database as apocrypha.

Detection module 430 may further include the second initialization unit, the first detecting unit and the second detecting unit.The second configuration information that second initialization unit is used for according to presetting carries out initialization.First detecting unit is used for detecting apocrypha according to the malicious code characteristic information preserved in malicious code property data base, and the apocrypha comprising malicious code characteristic information is saved in malicious code sample database.Second detecting unit is used for for the apocrypha that cannot detect according to malicious code characteristic information, expert system according to presetting carries out manual detection, will be judged to be that the new feature information that the apocrypha of malicious code produces is saved in malicious code property data base in manual detection process.

The workflow of malicious code capture systems of the present invention, with aforementioned malicious code catching method of the present invention, repeats no more herein.

The implication of each noun in malicious code capture systems of the present invention is identical with the implication of the identical noun of malicious code catching method declaratives of the present invention, therefore no longer makes repetition of explanation to the noun occurred in malicious code capture systems.

Malicious code capture systems of the present invention, has following beneficial effect:

Malicious code capture systems of the present invention can be applied in relevant honey jar and honeynet system, can increase the coverage of catching object, promotes the capture ability of malicious code.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a malicious code catching method, is characterized in that, comprising:

Mail data is obtained from multiple e-mail data source;

Utilizing malicious code property data base and manual detection to detect described apocrypha, is that abnormal apocrypha is saved in described malicious code sample database by testing result; Specifically comprise:

2. malicious code catching method according to claim 1, is characterized in that, also comprise:

3. malicious code catching method according to claim 1, is characterized in that, described from multiple e-mail data source, obtain mail data before, also comprise:

4. malicious code catching method according to claim 1, is characterized in that, describedly from multiple e-mail data source, obtains mail data comprise:

5. malicious code catching method according to claim 4, it is characterized in that, the header information of described pending mail source code comprises languages, type of coding, type of attachment, sender IP address's information, addressee IP address information, IP-based mail routing information.

6. malicious code catching method according to claim 4, is characterized in that, the content of described summary info comprise the addressee of Email, sender, theme, text size, with or without annex, e-mail data Source Type.

7. malicious code catching method according to claim 4, it is characterized in that, resolve described mail data, be apocrypha by the file record cannot got rid of according to the rate of failing to report of setting in described mail data, and this apocrypha is saved in apocrypha database; Comprise:

The header information of described pending mail source code and source code text are resolved, the header information of exception and/or source code text are saved in malicious code sample database.

8. a malicious code capture systems, is characterized in that, comprising:

Acquisition module, for obtaining mail data from multiple e-mail data source;

Testing result, for utilizing malicious code property data base and manual detection to detect the apocrypha that described parsing module exports, is that abnormal apocrypha is saved in described malicious code sample database by detection module; Described detection module comprises:

9. malicious code capture systems according to claim 8, is characterized in that, also comprise:

10. malicious code capture systems according to claim 8, is characterized in that, also comprises being connected with described acquisition module:

11. malicious code capture systems according to claim 8, it is characterized in that, described acquisition module comprises:

12. malicious code capture systems according to claim 11, it is characterized in that, the header information of described pending mail source code comprises languages, type of coding, type of attachment, sender IP address's information, addressee IP address information, IP-based mail routing information.

13. malicious code capture systems according to claim 11, is characterized in that, the content of described summary info comprise the addressee of Email, sender, theme, text size, with or without annex, e-mail data Source Type.

14. malicious code capture systems according to claim 11, it is characterized in that, described parsing module comprises:

Resolution unit, for resolving the header information of described pending mail source code and source code text, is saved in malicious code sample database by the header information of exception and/or source code text.