CN112688926A

CN112688926A - Method, system and device for detecting spear type phishing mails based on attachments

Info

Publication number: CN112688926A
Application number: CN202011504518.0A
Authority: CN
Inventors: 丁雄; 范渊; 刘博�
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: DBAPPSecurity Co Ltd; Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-20

Abstract

The application discloses a method, a system, a device and a computer readable storage medium for detecting a spear phishing mail based on an attachment, wherein the mail comprising the attachment is screened out initially, the number of mails needing further detection by a detection system is reduced, the detection efficiency is improved, a first characteristic vector in the mail is extracted, an attachment classifier is utilized to carry out primary analysis on the first characteristic vector, if the first characteristic vector is malicious, a dynamic characteristic vector of the attachment in a target mail is further extracted by a high-confrontation sandbox simulating the operation environment of the attachment, the data volume is reduced, meanwhile, the high-confrontation sandbox can extract the dynamic characteristic vector, the characteristic information of the attachment is more comprehensively extracted, finally, the spear classifier carries out secondary analysis on the first characteristic vector and the dynamic characteristic vector, secondary analysis is realized, and the analysis accuracy of the target mail is further enhanced, and the missing detection is avoided, and the detection accuracy is improved.

Description

Method, system and device for detecting spear type phishing mails based on attachments

Technical Field

The invention relates to the field of network security, in particular to a spear phishing mail detection method, a spear phishing mail detection system, a spear phishing mail detection device and a computer readable storage medium based on attachments.

Background

Phishing mail is a way to attack using social engineering, and spear phishing mail is an advanced phishing mail. Compared with the common fishing mails, the spear fishing mails have extremely strong pertinence, so that the spear fishing mails have much higher harm than the common fishing mails. In recent years, the number of spearphishing mails is continuously increased, and great threats are brought to the safety of network space.

The attachment-based spear phishing mail is a mode that an attacker attaches a malicious attachment to the mail, when a receiver opens the attachment, the malicious attachment can be automatically executed, and relevant vulnerabilities are utilized to carry out subsequent malicious operations of host control, information stealing, transverse movement propagation and the like. Unlike the attachments used by phishing, the attachments of spearphishing are 0-day holes that attackers spend much effort digging or money buying, and then elaborate, and for such attachments with high-level threats, the traditional signature detection scheme fails.

Currently, aiming at an attachment-based spear phishing mail detection scheme, a targeted research is lacked, and a common method still continues to be attacked by malicious code detection. The first scheme is static signature matching. The biggest problem of the scheme is that the feature codes need manual analysis and cannot meet the increasing malicious codes and malicious code variation; the second mode is dynamic analysis, namely, the malicious attachments are dynamically operated in a simulation environment such as a sandbox, and then classified and detected in a machine learning mode. The problem with this approach is that for a huge number of emails, on one hand, sandbox analysis consumes a lot of time, and on the other hand, the accuracy and recall rate of current machine learning classification are low, and finally, with malicious attachments with a sandbox detection function, a common sandbox cannot effectively analyze and take dynamic characteristic behaviors.

Therefore, the current accessory-based spear phishing mail detection scheme has the problems of high false alarm rate, low recall rate, low efficiency and the like.

Therefore, a method for detecting the spear type phishing mails based on the attachments, which has high recall rate, low false alarm rate and high efficiency, is needed.

Disclosure of Invention

In view of the above, the present invention provides a method, a system, a device and a computer readable storage medium for detecting a spearphishing mail based on an attachment, which has a high recall rate, a low false alarm rate and a high efficiency. The specific scheme is as follows:

an attachment-based spearphishing mail detection method comprises the following steps:

detecting whether the target mail comprises an attachment or not;

if the target mail comprises the attachment, extracting a first feature vector in the target mail, wherein the first feature vector comprises a reputation feature vector, a habit feature vector and an attachment feature vector of the attachment;

analyzing the first feature vector by using an accessory classifier, and judging whether the first feature vector is malicious or not;

if the first feature vector is malicious, extracting a dynamic feature vector of an attachment in the target mail by using a high-countermeasure sandbox;

analyzing the first characteristic vector and the dynamic characteristic vector by using a spear classifier, and judging whether the target mail is malicious or not;

if the target mail is malicious, alarming;

the attachment classifier is obtained by training historical first feature vectors of historical attachments in historical mails in advance, and the spear classifier is obtained by training historical first feature vectors and historical dynamic feature vectors in historical mails in advance.

Optionally, the process of extracting the first feature vector in the target email includes:

calling a feature extraction model;

and extracting the first feature vector of the attachment in the target mail by using the feature extraction model and the mail metadata.

Optionally, the process of extracting the mail feature vector of the target mail by using the high-confrontation sandbox includes:

and simulating the environmental operation of the attachment in the target mail by using the high-countermeasure sandbox, monitoring the operation information of the attachment, and obtaining the dynamic characteristic vector of the attachment.

Optionally, the analyzing the first feature vector and the dynamic feature vector by using a spear classifier to determine whether the target email has a malicious process includes:

uniformly coding the first feature vector and the dynamic feature vector to obtain a comprehensive feature vector;

and analyzing the comprehensive characteristic vector by using a spear classifier, and judging whether the target mail is malicious or not.

The invention also discloses a spear phishing mail detection system based on the attachment, which comprises:

the attachment detection module is used for detecting whether the target mail comprises an attachment or not;

the first feature extraction module is used for extracting a first feature vector in the target mail if the target mail comprises an attachment, wherein the first feature vector comprises a reputation feature vector, a habit feature vector and an attachment feature vector of the attachment;

the first malicious analysis module is used for analyzing the first feature vector by using an accessory classifier and judging whether the first feature vector has malicious intent or not;

the second feature extraction module is used for extracting the dynamic feature vector of the attachment in the target mail by using a high countermeasure sandbox if the first malicious analysis module judges that the first feature vector is malicious;

the second malicious analysis module is used for analyzing the first feature vector and the dynamic feature vector by using a spear classifier and judging whether the target mail is malicious or not;

the warning module is used for warning if the second malicious analysis module judges that the target mail has the malicious property;

Optionally, the first feature extraction module includes:

the extraction model calling unit is used for calling the feature extraction model;

and the first feature extraction unit is used for extracting the first feature vector of the attachment in the target mail by using the feature extraction model and the mail metadata.

Optionally, the second feature extraction module is specifically configured to simulate environmental operation of the attachment in the target email by using the high-countermeasure sandbox, monitor operation information of the attachment, and obtain the dynamic feature vector of the attachment.

Optionally, the second malicious analysis module includes:

the unified coding unit is used for carrying out unified coding on the first characteristic vector and the dynamic characteristic vector to obtain a comprehensive characteristic vector;

and the second malicious analysis unit is used for analyzing the comprehensive characteristic vector by using a spear classifier and judging whether the target mail has malicious property.

The invention also discloses a spear type fishing mail detection device based on the attachment, which comprises:

a memory for storing a computer program;

a processor for executing the computer program to implement the attachment-based spearphishing mail detection method as described above.

The invention also discloses a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements an attachment-based spearphishing mail detection method as described above.

In the invention, the accessory-based spear phishing mail detection method comprises the following steps: detecting whether the target mail comprises an attachment or not; if the target mail comprises the attachment, extracting a first feature vector in the target mail, wherein the first feature vector comprises a reputation feature vector, a habit feature vector and an attachment feature vector of the attachment; analyzing the first feature vector by using an accessory classifier, and judging whether the first feature vector is malicious or not; if the first feature vector is malicious, extracting the dynamic feature vector of the attachment in the target mail by using a high-countermeasure sandbox; analyzing the first characteristic vector and the dynamic characteristic vector by using a spear type classifier, and judging whether the target mail is malicious or not; if the target mail is malicious, alarming; the attachment classifier is obtained by training by using historical first feature vectors of historical attachments in historical mails in advance, and the spear type classifier is obtained by training by using historical first feature vectors and historical dynamic feature vectors in historical mails in advance.

The mail including the attachment is screened out initially, the number of mails needing further detection by a detection system is reduced, the detection efficiency is improved, the first feature vector in the mail is extracted, the attachment classifier is used for carrying out primary analysis on the first feature vector, if the first feature vector is malicious, the dynamic feature vector of the attachment in the target mail is further extracted by simulating the operating environment of the attachment by the high-countermeasure sandbox, the data volume is reduced, meanwhile, the high-countermeasure sandbox can extract the dynamic feature vector, the feature information of the attachment is more comprehensively extracted, and finally, the first feature vector and the dynamic feature vector are analyzed again by the spear type classifier, so that secondary analysis is realized, the analysis accuracy of the target mail is further enhanced, missing detection is avoided, and the detection accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for detecting a spearphishing mail based on an attachment according to an embodiment of the invention;

FIG. 2 is a flow chart of another method for detecting a spearphishing mail based on an attachment according to an embodiment of the invention;

fig. 3 is a schematic structural diagram of a spear phishing mail detection system based on an attachment, which is disclosed by the embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a spear phishing mail detection method based on an attachment, and as shown in a figure 1, the method comprises the following steps:

s11: whether the attachment is included in the target mail is detected.

Specifically, because a large number of mails are included in the mail system needing security detection, wherein the large number of mails are mails with low risk degree and without attachments, in order to improve detection efficiency and perform efficient detection on high-risk mails with attachments, whether a target mail to be detected comprises an attachment is judged at first, if the target mail does not comprise an attachment, the mail is skipped and is not detected, and the target mail is released, so that a large amount of detection time is saved, the detection is performed only on the target mail comprising the attachment, the overall detection efficiency is improved, and meanwhile, the operation parameters of a subsequent attachment classifier, a spear classifier and a high-confrontation sandbox are facilitated to be optimized, and the pertinence is improved.

S12: and if the target mail comprises the attachment, extracting a first feature vector in the target mail, wherein the first feature vector comprises a reputation feature vector, a habit feature vector and an attachment feature vector of the attachment.

Specifically, if the target mail comprises the attachment, extracting a first feature vector in the target mail, wherein the first feature vector comprises a reputation feature vector, a habit feature vector and an attachment feature vector of the attachment.

Specifically, mail metadata of a target mail is extracted, information such as attachments and the like is obtained from the mail metadata, and delivery habits of the sender and the receiver can be summarized by utilizing the IP address of the sender, the mailbox address of the sender, the name of the receiver and the address of the receiver, for example, information such as whether the mailbox addresses of the sender and the receiver have the same mailbox suffix, communication frequency between the sender and the receiver, and whether the sender and the receiver send malicious mails or not can be summarized, and reputation characteristic vectors and habit characteristic vectors of the mails can be obtained by analyzing the information.

Specifically, all attachments can be extracted through the mail metadata, and the number of the attachments of each attachment is further counted; counting the file name length of each attachment, and calculating the average file name length; counting the size of each accessory, and calculating the average accessory size; and inquiring a pre-established database to analyze each attachment as a malicious score, wherein the score is equal to the ratio of the number of analysis engines for judging the attachment as malicious to the total number of the analysis engines, and the highest score is taken as the attachment score of the mail. The file attribute category of the attachment can be obtained; the embodiment of the invention takes MIME (Multipurpose Internet Mail Extensions) type as an example for explanation, and can also specifically extract whether an attachment type in an attachment is a PE (Portable Executable) file is added with a shell or not, and a plurality of attachments are subject to the attachment with the shell; the attachment type is whether the PE file is matched with a preset feature code library or not, and the plurality of attachments are subject to the existence of the matched attachments; whether the attachment type is an Office file type and contains the macro object or not, and the plurality of attachments are subject to the attachment containing the macro object; whether the real attribute category of the attachment is consistent with the attribute category displayed by the mail or not is judged, and the multiple attachments are subject to attachments with inconsistent types; in summary, various preset judgment models are used to summarize an attachment feature vector of an attachment, and the content of the attachment feature vector may include the number of attachments, the average file name length, the size of the attachment, the malicious score of the attachment, whether the attachment is shelled, whether the attachment matches a preset feature code library, whether a macro object is contained, whether the real attribute category of the attachment is consistent with the attribute category of the mail display, and the like.

S13: and analyzing the first feature vector by using the accessory classifier, and judging whether the first feature vector has maliciousness or not.

Specifically, the attachment classifier is obtained by training in advance by using historical first feature vectors of historical attachments in the historical mails, so that the attachment classifier can analyze the first feature vectors.

Further, the content of the accessory classifier for judging the accessory feature information may include querying, by using a third-party platform, for example, a Virus Total malicious code analysis website, a score that each accessory is analyzed as malicious, and taking the highest malicious score as a reputation score of the accessory, that is, a reputation feature vector; of course, a corresponding scoring model can be established in advance, but in order to reduce workload and improve efficiency, data of a third-party platform can be used directly to obtain the reputation feature vector of the attachment.

Specifically, the attachment classifier judges whether an attachment recorded in the attachment feature vector is shelled, whether the attachment is matched with a preset feature code library, whether the attachment contains a macro object, and whether the real attribute category of the attachment is consistent with the attribute category displayed by the mail, so as to judge whether the target mail is malicious.

Wherein, shell-adding is a common way for malicious codes to escape the antivirus engine; the preset feature code library is a collection of currently known malicious code feature codes and can be updated regularly; writing the malicious code into the macro object of the Office file is a common attack mode for an attacker; hiding a real file extension type is a common attack mode of an attacker, for example, a real PE file with an attachment type of EXE, and a text file which is displayed by a mail and may be TXT, so as to deceive a receiver to open an attachment for execution; thus, once this occurs, the attachment classifier will consider the attachment feature vector to be malicious.

It can be understood that if the attachment classifier identifies that the first feature vector is not malicious, the inspection of the target mail can be finished, the target mail is released, subsequent further complex detection is not needed, the detection time is shortened, and the detection efficiency is improved.

S14: and if the first feature vector is malicious, extracting the dynamic feature vector of the attachment in the target mail by using the high-countermeasure sandbox.

Specifically, by using the high-countermeasure sandbox which can deeply simulate the runtime state of the attachment in the target mail in the environment, various processes, tasks and steps executed by the attachment in the runtime are simulated, for example, behaviors of the attachment on modification, creation and deletion of files are simulated, malicious codes in the attachment can be exposed by simulating the operation of the attachment in the sandbox, various operations executed by the malicious codes are recorded, various operations executed by the attachment, including all the operations executed by the malicious codes, are recorded, and feature information is extracted from the operations to obtain the dynamic feature vector of the attachment.

S15: and analyzing the first characteristic vector and the dynamic characteristic vector by using a spear classifier, and judging whether the target mail has maliciousness.

Specifically, the first feature vector and the dynamic feature vector after the analysis of the attachment classifier and the high-confrontation sandbox are subjected to secondary analysis by using the spear classifier, and finally malicious judgment of the target mail is made.

Specifically, the spear classifier can obtain a corresponding confidence level by analyzing the first feature vector and the dynamic feature vector, and judge whether the target mail is malicious according to the confidence level, for example, the confidence level is a floating point number between 0 and 1, more than 0.5 represents a spear phishing mail based on an attachment, and the closer the confidence level is to 1, the higher the probability is.

It can be understood that if the target mail has no maliciousness, the target mail is skipped, the target mail is considered as a safe mail, and operations such as warning or interception of the target mail are not performed.

S16: and if the target mail is malicious, alarming.

It can be understood that if the target email is malicious email, an alarm is given to remind the user to avoid opening the email and the attachments therein, and certainly, after the email is confirmed to be malicious email, the email can be directly shielded, the email is classified as spam, the user is prevented from directly seeing the email, and the possibility that the user opens the attachments in the email is reduced.

Therefore, the mail including the attachment is screened out initially, the number of mails needing further detection by a detection system is reduced, the detection efficiency is improved, the first feature vector in the mail is extracted, the first feature vector is subjected to preliminary analysis by using the attachment classifier, if the first feature vector is malicious, the dynamic feature vector of the attachment in the target mail is further extracted by using the operation environment of the high-countermeasure sandbox simulated attachment, the data volume is reduced, meanwhile, the high-countermeasure sandbox can extract the dynamic feature vector, the feature information of the attachment is extracted more comprehensively, and finally, the first feature vector and the dynamic feature vector are subjected to secondary analysis by using the spear classifier, so that the secondary analysis is realized, the analysis accuracy of the target mail is further enhanced, the missing detection is avoided, and the detection accuracy is improved.

The embodiment of the invention discloses a specific accessory-based spearphishing mail detection method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Referring to fig. 2, specifically:

s21: detecting whether the target mail comprises an attachment or not;

s22: if the target email includes an attachment, the feature extraction model is invoked.

Specifically, an external feature extraction model, for example, a feature extraction model of a third-party platform, such as a Virus Total, may be called, and by using an existing feature extraction model, the workload of the whole email detection system may be reduced, and meanwhile, an existing already-completed feature extraction model may be used, which is beneficial to the comprehensiveness and accuracy of feature extraction.

S23: extracting a first feature vector of an attachment in a target mail by using a feature extraction model and mail metadata;

s24: analyzing the first feature vector by using an accessory classifier, and judging whether the first feature vector is malicious or not;

s25: if the first characteristic vector is malicious, simulating the environmental operation of the attachment in the target mail by using the high-countermeasure sandbox, monitoring the operation information of the attachment, and obtaining the dynamic characteristic vector of the attachment.

Specifically, the high-countermeasure sandbox is used for simulating environment operation of the accessory and detecting related behaviors, wherein the related behaviors comprise process related behaviors of recording the operation of the accessory, including but not limited to the number of processes, the number of newly created processes and the number of created remote threads; recording file related behaviors when the attachment runs, wherein the behaviors include but are not limited to the number of created files, the number of deleted files, the number of modified files and whether the file attribute is modified or not; recording the registry-related behaviors when the accessory runs, including but not limited to the number of creating registries, the number of deleting registries, the number of modifying registries and whether the key registry value is modified or not; recording network related behaviors of the accessory during operation, including but not limited to whether network connection exists or not, whether file downloading behaviors exist or not and whether file uploading behaviors exist or not; recording related behaviors of the starting item during the operation of the accessory, including but not limited to whether a newly added starting item exists or not; and (3) carrying out the above feature extraction method on each attachment of the target mail by using a high-confrontation sandbox to obtain the dynamic feature vector of the attachment.

S26: and uniformly coding the first characteristic vector and the dynamic characteristic vector to obtain a comprehensive characteristic vector.

Specifically, in order to perform unified analysis and evaluation on various feature vectors, the first feature vector and the dynamic feature vector are uniformly coded, so that the spear classifier can analyze the first feature vector and the dynamic feature vector by using the uniform coding to perform depth analysis.

S27: analyzing the comprehensive characteristic vector by using a spear type classifier, judging whether the target mail is malicious or not, and judging whether the target mail is malicious or not;

s28: and if the target mail is malicious, alarming.

Correspondingly, the embodiment of the invention also discloses an attachment-based spear phishing mail detection system, which is shown in figure 3 and comprises:

an attachment detection module 11, configured to detect whether an attachment is included in the target email;

a first feature extraction module 12, configured to extract a first feature vector in the target email if the target email includes an attachment, where the first feature vector includes a reputation feature vector, a habit feature vector, and an attachment feature vector of the attachment;

a first malicious analysis module 13, configured to analyze the first feature vector by using an accessory classifier, and determine whether the first feature vector is malicious;

a second feature extraction module 14, configured to, if the first malicious analysis module 13 determines that the first feature vector is malicious, extract a dynamic feature vector of an attachment in the target email by using a high-countermeasure sandbox;

the second malicious analysis module 15 is configured to analyze the first feature vector and the dynamic feature vector by using a spear classifier, and determine whether the target email is malicious or not;

an alarm module 16, configured to alarm if the second malicious analysis module 15 determines that the target email has malicious intent;

Specifically, the accessory-based spearphishing mail detection system of the embodiment of the present invention can be regarded as a peripheral detection system of a mail system, and the mail system can call the accessory-based spearphishing mail detection system of the embodiment of the present invention through an external interface to realize detection of mails.

Specifically, the first feature extraction module 12 may specifically include an extraction model calling unit and a first feature extraction unit; wherein the content of the first and second substances,

Specifically, the second feature extraction module 14 may be specifically configured to utilize the high-countermeasure sandbox to simulate environmental operation of the attachment in the target email, and monitor operation information of the attachment to obtain the dynamic feature vector of the attachment.

Specifically, the second malicious analysis module 15 may include a unified coding unit and a second malicious analysis unit; wherein the content of the first and second substances,

In addition, the embodiment of the invention also discloses a spear type fishing mail detection device based on the attachment, which comprises:

a memory for storing a computer program;

a processor for executing a computer program to implement the attachment-based spearphishing mail detection method as described above.

In addition, the embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the attachment-based spearphishing mail detection method.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The technical content provided by the present invention is described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the above description of the examples is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A spear fishing mail detection method based on attachments is characterized by comprising the following steps:

detecting whether the target mail comprises an attachment or not;

if the target mail is malicious, alarming;

2. The method of claim 1, wherein the extracting the first feature vector of the target mail comprises:

calling a feature extraction model;

3. The method of claim 2, wherein the extracting the mail feature vector of the target mail with the high countermeasure sandbox comprises:

4. The method for detecting spearphishing mail according to claim 3, wherein the analyzing the first feature vector and the dynamic feature vector by the spear classifier to determine whether the target mail is malicious comprises:

5. An attachment-based spearphishing mail detection system comprising:

6. An attachment-based spearphishing mail detection system as claimed in claim 5 wherein said first feature extraction module comprises:

7. The system of claim 6, wherein the second feature extraction module is specifically configured to simulate environmental operation of the attachment in the target email by using the high-countermeasure sandbox, monitor operation information of the attachment, and obtain the dynamic feature vector of the attachment.

8. An attachment-based spearphishing mail detection system as claimed in claim 7 wherein said second malicious analysis module comprises:

9. An attachment-based spear fishing mail detection device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the attachment-based spearphishing mail detection method of any of claims 1 to 4.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the attachment-based spearphishing mail detection method according to any of claims 1 to 4.