CN111600788A

CN111600788A - Method and device for detecting harpoon mails, electronic equipment and storage medium

Info

Publication number: CN111600788A
Application number: CN202010364548.XA
Authority: CN
Inventors: 蒲大峰
Original assignee: Sangfor Technologies Co Ltd
Current assignee: Sangfor Technologies Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-28

Abstract

The embodiment of the invention discloses a method and a device for detecting a harpoon mail, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring a mail to be processed, and analyzing document data corresponding to the mail to be processed; extracting specific features in the document data, inputting the specific features into a detection model for detection to obtain a detection result, wherein the specific features comprise document features and/or picture features; determining whether the mail to be processed is a harpoon mail according to the detection result; therefore, the method and the device can accurately identify the harpoon mails with malicious behaviors, and effectively improve the detection rate of the harpoon mails.

Description

Method and device for detecting harpoon mails, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of information security detection, in particular to a method and a device for detecting a harpoon mail, an electronic device and a computer storage medium.

Background

A harpoon fishing attachment is a special variation of a fishing article, which differs from conventional fishing articles in that it uses a malicious attachment to a particular article. All types of phishing mail are social engineering targets for a particular industry or company individual. In this scenario, an attacker would typically attach the document to phishing mail, and this approach would often rely on user clicks to perform.

In the related technology, two modes of dynamic execution and static scanning are mainly adopted to prevent the attack, however, the two modes have low killing capability on the fish-fork mails, the situations of missing report and false report are easy to occur, the fish-fork mails with malicious behaviors cannot be accurately identified, and further, the detection rate of the fish-fork mails is reduced.

Disclosure of Invention

The embodiment of the invention discloses a method, a device, electronic equipment and a computer storage medium for detecting a harpoon mail.

In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for detecting a harpoon mail, where the method includes:

acquiring a mail to be processed, and analyzing document data corresponding to the mail to be processed;

extracting specific features in the document data, inputting the specific features into a detection model for detection to obtain a detection result, wherein the specific features comprise document features and/or picture features;

and determining whether the mail to be processed is a harpoon mail according to the detection result.

Optionally, the analyzing the document data corresponding to the mail to be processed includes:

analyzing the mail to be processed by adopting an electronic mail protocol to obtain an attachment document of the mail to be processed;

and screening the document data corresponding to the mail to be processed from the attached document.

Optionally, the screening of document data corresponding to the mail to be processed from the attached document includes:

when the type of the attachment document is a compressed packet, decompressing the compressed packet;

and screening the document data corresponding to the mail to be processed from the decompressed attachment document.

Optionally, the method further includes:

extracting specific characteristics of normal sample data to obtain first data;

extracting specific characteristics of the malicious sample data to obtain second data;

the normal sample data represents document data in a normal mail, and the malicious sample data represents document data in a fishfork mail;

and establishing the detection model according to the first data and the second data.

Optionally, the document class feature includes at least one of: the method comprises the steps of determining the size of a document, the number of pages of the document, the number of lines or columns of document contents, whether the document contents contain induced click characters, language information of the document contents and program code information in the document contents; the picture class characteristics include at least one of: the number of pictures in the content of the document, whether the pictures contain characters, the number of the characters in the pictures and whether the documents in the pictures contain the lured click characters.

Optionally, the method further includes:

determining whether the mail to be processed is a fishfork mail according to source Internet Protocol (IP) information of the mail to be processed under the condition that the detection result shows that the document data is similar to the malicious sample data;

and under the condition that the detection result shows that the document data is the same as the normal sample data, determining that the mail to be processed is a non-harpoon mail.

Optionally, the determining, according to the source IP information of the to-be-processed email, whether the to-be-processed email is a harpoon email includes:

determining the mail to be processed with the source IP address in the set area as a harpoon mail according to the source IP information;

and determining the mail to be processed with the source IP address in the non-set area as a highly suspicious mail according to the source IP information.

In a second aspect, an embodiment of the present invention provides a harpoon mail detection apparatus, including:

the analysis module is used for acquiring the mail to be processed and analyzing the document data corresponding to the mail to be processed;

the detection module is used for extracting specific features in the document data, inputting the specific features into a detection model for detection, and obtaining a detection result, wherein the specific features comprise document features and/or picture features;

and the determining module is used for determining whether the mail to be processed is a harpoon mail according to the detection result.

Optionally, the parsing module is further configured to:

Optionally, the detection module is further configured to:

extracting specific characteristics of normal sample data to obtain first data;

Optionally, the determining module is further configured to:

under the condition that the detection result shows that the document data is similar to the malicious sample data, determining whether the mail to be processed is a fishfork mail or not according to the source IP information of the mail to be processed;

Optionally, the determining module is further configured to determine whether the to-be-processed email is a harpoon email according to the source IP information of the to-be-processed email, and includes:

In a third aspect, an embodiment of the present invention provides an electronic device, where the device includes a memory, a processor, and a computer program that is stored in the memory and is executable on the processor, and when the processor executes the computer program, the electronic device implements the method for detecting a harpoon mail according to one or more of the foregoing technical solutions.

In a fourth aspect, a computer storage medium is provided that stores a computer program; the computer program can implement the method for detecting harpoon mails provided by one or more of the above technical solutions after being executed.

The embodiment of the invention discloses a method and a device for detecting a harpoon mail, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring a mail to be processed, and analyzing document data corresponding to the mail to be processed; extracting specific features in the document data, inputting the specific features into a detection model for detection to obtain a detection result, wherein the specific features comprise document features and/or picture features; determining whether the mail to be processed is a harpoon mail according to the detection result; therefore, the embodiment of the invention detects the document data analyzed from the mails based on the detection model, can accurately identify the harpoon mails with malicious behaviors, and effectively improves the detection rate of the harpoon mails.

Drawings

FIG. 1 is a flow chart of a harpoon mail detection method according to an embodiment of the invention;

FIG. 2 is a flowchart of extracting document data in a mail according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the detection using SVM algorithm;

FIG. 4 is a schematic diagram of a process for characterizing document data;

FIG. 5 is a diagram illustrating a process of training a detection model according to document data according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating a harpoon mail detection process according to an embodiment of the invention;

FIG. 7 is a flowchart illustrating the method for determining malicious Office attachments using a detection model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of the structure of a FISH mail detection system according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

It should be understood that the examples provided herein are merely illustrative of the present invention and are not intended to limit the present invention. In addition, the following embodiments are provided as partial embodiments for implementing the present invention, not all embodiments for implementing the present invention, and the technical solutions described in the embodiments of the present invention may be implemented in any combination without conflict.

It should be noted that, in the embodiments of the present invention, the terms "comprises", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, the use of the phrase "including a. -. said." does not exclude the presence of other elements (e.g., steps in a method or elements in a device, such as portions of circuitry, processors, programs, software, etc.) in the method or device in which the element is included.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

For example, the data processing method provided by the embodiment of the present invention includes a series of steps, but the data processing method provided by the embodiment of the present invention is not limited to the described steps, and similarly, the data processing apparatus provided by the embodiment of the present invention includes a series of modules, but the data processing apparatus provided by the embodiment of the present invention is not limited to include the explicitly described modules, and may also include modules that are required to be configured to acquire related information or perform processing based on the information.

Embodiments of the present invention may be implemented based on electronic devices, where the electronic devices may be thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network pcs, minicomputers, and the like.

The electronic device may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

The fishfork mail is generally sent to a target computer by using a Trojan program as an attachment of an e-mail, and induces a computer user to open the attachment to infect Trojan viruses.

Currently, social engineering attacks select the harpoon mail attack as the most common attack approach due to the high usage rate of e-mail in people's daily work exchanges. An attacker uses a mode of embezzling a friend mailbox of a target user or imitating a known website to notify the mailbox through collecting a large amount of attack target information, elaborately edits mail contents meeting the interests and hobbies of a receiver, and induces the user to click a phishing link in the mail or download a malicious attachment so as to achieve the purpose of invading a target host. Since the success rate of spear fishing mail attack is high and is not easy to be found by the traditional intrusion detection and defense system, spear fishing mail attack becomes the preferred method of attackers and brings great harm to the daily network life of users.

In the related art, the detection method for the harpoon mail mainly includes two modes, namely a dynamic execution mode and a static scanning mode:

1) performing dynamically; and deploying equipment with a sandbox function, such as a firewall, an intrusion detection device, a mail gateway and the like, virtually executing the Office attachments, and judging whether the virus sample is a malicious sample according to the behavior characteristics of the system call of the malicious sample.

2) Static scanning; using conventional antivirus software scanning, malicious samples are identified by scanning the virus sample for document features that match the malicious feature library.

However, in the related art, both of the above-mentioned two methods have disadvantages that the harpoon mail cannot be recognized completely and effectively:

1) the traditional firewall with sandbox function, intrusion detection, mail gateway and other devices have low efficiency in searching and killing Office documents, and the function of killing soft anti-sandbox in the presence of some high-level threat sample viruses cannot achieve 100% searching and killing.

2) The related technology has weak capability of searching and killing the new popular viruses, has the possibility of bypassing the rule and is easy to cause the false alarm.

3) Part of users can often use the macro code to improve the work efficiency, which easily causes the false alarm after static scanning and dynamic execution detection.

In view of the above technical problems, some embodiments of the present invention provide a method for detecting a harpoon mail, which can be applied to any scene requiring mail detection.

Fig. 1 is a flowchart of a method for detecting a harpoon mail according to an embodiment of the present invention, and as shown in fig. 1, the flowchart may include:

step 100: and acquiring the mail to be processed, and analyzing document data corresponding to the mail to be processed.

In the embodiment of the invention, the mail to be processed can represent the electronic mail in the electronic mailbox, wherein the electronic mailbox is an electronic information space for network communication provided for a network client by a network electronic post office. The electronic mailbox has the functions of storing and transmitting and receiving electronic information and is a tool for information communication in a mail mode on a network.

In the embodiment of the invention, the mail is generally composed of a receiver, a subject, a text and an attachment; the recipient can indicate the object of information transmission or task distribution, the subject can be used for highlighting the subject of the mail, the body can indicate the information content contained in the mail, and the attachment can be the content of information or materials which need to be added.

For the implementation manner of obtaining the to-be-processed mail, the embodiment of the present invention is not limited, and the implementation manner may be, for example, an original mail which is obtained by a user from an electronic mailbox in a targeted manner, or may be a new mail which is obtained by a network device in an operating state.

The embodiment of parsing the document data corresponding to the mail to be processed may include: analyzing the mail to be processed by adopting an electronic mail protocol to obtain an attachment document of the mail to be processed; and screening the document data corresponding to the mail to be processed from the attached document.

Here, the document data may be an Office type document or a WPS type document, and the document format of the document data is not limited in the embodiment of the present invention.

Illustratively, when the mail to be processed is obtained, the mail to be processed may be analyzed by using an electronic mail protocol to obtain corresponding mail data, an attachment document in the mail to be processed may be restored by performing deep analysis on the mail data, and document data corresponding to the mail to be processed may be determined by screening the attachment document.

Here, the following are common email protocols: simple Mail Transfer Protocol (SMTP), Post Office Protocol POP3 (POP 3), Interactive Mail Access Protocol (IMAP); these protocols are defined by the Transmission Control Protocol/Internet Protocol (TCP/IP) Protocol suite.

For example, in the embodiment of the present invention, document data corresponding to a to-be-processed email may be determined by using an email protocol such as POP3, SMTP, or the like, and may also be determined in other manners, which is not limited in the embodiment of the present invention.

In the embodiment of the present invention, the screening of document data corresponding to a mail to be processed from an attached document may include: when the type of the attachment document is a compressed packet, decompressing the compressed packet; and screening the document data corresponding to the mail to be processed from the decompressed attachment document.

It should be noted that if the document data corresponding to the mail to be processed is screened from the attached document and is a compressed package type document, such as an RAR document and a ZIP document in a WinRAR, the document data of the compressed type can be subjected to fishfork mail detection as long as the document data can be converted into an Office type or WPS type document after being decompressed.

Fig. 2 is a flowchart of extracting document data in a mail according to an embodiment of the present invention, and as shown in fig. 2, the flowchart may include:

acquiring network flow; identifying mail data corresponding to the mail to be processed in a POP3 and SMTP filtering mode; restoring an attachment document in the mail to be processed by adopting a document restoring mode; judging the document format of the attachment document; and under the condition that the document format of the attachment document is an Office type document, determining the Office type document as a document to be detected. When the document format of the attachment document is a compressed package document such as an RAR or ZIP, the compressed package document such as the RAR or ZIP is decompressed, and the type of the decompressed document is determined. Determining the document of the Office type as a document to be detected under the condition that the decompressed document type is the document of the Office type; and under the condition that the decompressed document type is other types, discarding the corresponding document and ending the flow.

Step 101: and extracting specific features in the document data, inputting the specific features into a detection model for detection to obtain a detection result, wherein the specific features comprise document class features and/or picture class features.

Here, the detection model may be a machine learning model, such as a support vector machine, a decision tree, a neural network, or may be a judgment model, for example, a specific feature is compared with a preset harpoon mail detection standard, and whether the mail to be processed is a harpoon mail is determined according to a comparison result.

For example, in the case that the detection model is a judgment model, when any one of the extracted specific features meets the detection standard of the harpoon mail in the judgment model, the mail to be processed can be judged to be the harpoon mail; or, when a plurality of extracted specific features simultaneously meet the detection standard of the harpoon mail in the judgment model, the mail to be processed may be judged to be the harpoon mail, and may be specifically set according to an actual application scenario, which is not limited in the embodiment of the present invention.

In one embodiment, the detection model may include an SVM model, wherein the SVM is a two-class classification model, which may be defined as a linear classifier with maximum interval in the feature space, and the learning strategy corresponding to the SVM is interval maximization, and finally, the classification problem may be transformed into a solution of a convex quadratic programming problem.

FIG. 3 is a schematic diagram of the principle of SVM algorithm detection, as shown in FIG. 3, two dotted lines with arrows respectively represent the abscissa and ordinate axes, w^Tx + b ═ 1 and w^TTwo dotted lines of x + b +1 each indicate a gap boundary, w^TThe solid line x + b being 0 represents the decision boundary. If a hyperplane serving as a decision boundary exists in a feature space where input data are located, the learning target is separated according to a positive class and a negative class, and the distance from a point of any sample to the plane is larger than or equal to 1, the classification problem is called to have linear separability.

Here, the decision boundaries satisfying the following condition actually construct 2 parallel hyperplanes as interval boundaries to discriminate the classification of the sample:

where the parameters w and b are the hyperplane normal vector and intercept, respectively, x represents the sample space formed by the features contained in each sample of the input data, y ∈ { -1,1} represents the negative and positive classes, where all samples above the upper interval boundary belong to the positive class, the samples below the lower interval boundary belong to the negative class, and the distance between the two interval boundaries belongs to the negative class

Defined as the margins, and the positive and negative class samples located on the interval boundaries are support vectors (support vectors).

For example, normal mail may represent normal mail that has not been attacked by a virus; malicious mail may represent abnormal mail that is attacked by a high-level threat virus, such as trojan horse.

In one embodiment, specific features of normal sample data are extracted to obtain first data; extracting specific characteristics of the malicious sample data to obtain second data; the normal sample data represents document data in a normal mail, and the malicious sample data represents document data in a fishfork mail; and establishing a detection model according to the first data and the second data.

Illustratively, the document class characteristics include at least one of: the method comprises the steps of determining the size of a document, the number of pages of the document, the number of lines or columns of document contents, whether the document contents contain induced click characters, language information of the document contents and program code information in the document contents; the picture class characteristics include at least one of: the number of pictures in the content of the document, whether the pictures contain characters, the number of the characters in the pictures and whether the documents in the pictures contain the lured click characters.

For example, as for the way of determining whether the document content or the document in the picture contains the click-inducing characters, the following may be used: performing feature detection on characters contained in a document in the document content or the document in the picture, matching a feature detection result with features of preset trapping click characters, and when the matching result reaches a set value, indicating that the document in the document content or the document in the picture contains the trapping click characters; otherwise, when the matching result does not reach the set value, the document content or the document in the picture does not contain the lured click characters.

Here, Optical Character Recognition (OCR) may be adopted to recognize picture-like features, that is, to recognize the number of pictures in the content of the document, whether the pictures contain characters, the number of characters in the pictures, whether the documents in the pictures contain click-inducing characters; by means of the OCR technology, the document content in the mail to be processed can be identified more comprehensively, and the reliability of the detection of the fishfork mail is further improved.

In one embodiment, a multi-dimensional mode can be adopted for the document to be detected, and specific features of the document data to be detected are extracted and subjected to feature description, wherein the specific features comprise document class features and/or picture class features of the document data. Fig. 4 is a schematic diagram of a process of performing feature description on document data, and as shown in fig. 4, after document data of a mail to be processed is analyzed, a document to be detected is determined; constructing a feature project for the document to be detected, wherein the feature project comprises specific features of multiple dimensions for describing document data of the document to be detected; and performing multi-feature description on the document data of the document to be detected according to the specific features included in the feature engineering to obtain a feature description field corresponding to the document data.

Here, specific features for 10 dimensions in the document to be detected and contents of multi-feature description are shown in table 1:

TABLE 1

It should be noted that the embodiment of the present invention is not limited to the specific features of 10 dimensions described in table 1 for the document to be detected, and may also include features of other dimensions.

Illustratively, after the specific features of the multiple dimensions of the document to be detected and the corresponding feature description fields are determined, the specific features of the document data of the document to be detected are input into the detection model for detection, and a detection result is obtained.

Fig. 5 is a schematic diagram of a process of obtaining a detection model according to document data training according to an embodiment of the present invention, as shown in fig. 5, here, a large amount of normal word document data is collected in advance as white samples, and a large amount of malicious word document data is collected as black samples, where the white samples may be used to represent sample data of a normal mail, and the black samples may be used to represent sample data of a malicious mail (a fish fork mail); constructing multi-dimensional feature engineering for the white samples and the black samples respectively according to the process of performing feature description on the document data in fig. 4, performing feature description, and extracting a sample space formed by a plurality of specific features contained in each of the white samples and the black samples through the multi-dimensional feature description; and inputting the feature description fields corresponding to the white sample and the black sample into the SVM model for data training, and obtaining a detection model for the black and white sample according to a data training result.

In one implementation mode, a machine learning SVM algorithm is used for training specific features corresponding to each of a white sample and a black sample to obtain a detection model of the black and white sample, and the detection model is trained through specific features of multiple dimensions, so that mails with malicious documents can be identified more accurately.

Step 102: and determining whether the mail to be processed is a harpoon mail according to the detection result.

In one implementation mode, a large amount of specific features of normal sample data and malicious sample data are extracted, a feature project of the specific features is constructed, and a feature description field corresponding to the specific features is input into a detection model for training to obtain the detection model for the document data; detecting the document data based on the detection model to obtain a detection result; and the detection result comprises normal sample data and malicious sample data.

Exemplarily, under the condition that the detection result shows that the document data is similar to normal sample data, judging that the document data corresponds to a non-fishfork mail, discarding the document data input into the detection model and finishing the detection process; and under the condition that the detection result shows that the document data is similar to the malicious sample data, determining whether the mail to be processed is a fishfork mail or not according to the source IP information of the mail to be processed.

Further, determining whether the mail to be processed is a harpoon mail according to the source IP information of the mail to be processed may include: determining the mail to be processed with the source IP address in the set area as a harpoon mail according to the source IP information; and determining that the mail to be processed with the source IP address in the non-set area is a high suspicious mail according to the source IP information, wherein the high suspicious mail represents that whether the mail to be processed is uncertain and is a fishfork mail, and further judging by combining other parameters subsequently.

Illustratively, according to the IP address of the mail sender, the document data is detected by combining the detection model to obtain a detection result, and whether the mail is a fishfork mail or not can be determined according to the detection result.

Illustratively, in the case that the detection result of the detection model indicates that the document data is similar to malicious sample data, if the IP address of the mail sender is in the set area, it indicates that the mail to be processed is a fishfork mail, and if the IP address of the mail sender is in the non-set area, it indicates that the mail to be processed is a highly suspicious mail.

Here, the highly suspicious mail representation does not determine whether the mail to be processed is a fishfork mail, that is, the highly suspicious mail is highly likely to be a fishfork mail; for example, whether the highly suspicious mail is the torpedo mail can be determined according to the probability value of whether the highly suspicious mail is the torpedo mail, when the probability value of the highly suspicious mail being the torpedo mail reaches a set value, the highly suspicious mail can be determined to be the torpedo mail, otherwise, the highly suspicious mail is not the torpedo mail.

In one embodiment, the setting area may be outside or according to actual conditions, and correspondingly, the non-setting area may be inside or according to actual conditions, and the embodiment of the present invention is not limited.

Illustratively, if the IP address of the mail sender is the foreign IP, the mail to be processed is the fishfork mail, and if the IP address of the mail sender is the foreign IP, the mail to be processed is the highly suspicious mail.

In the embodiment of the invention, the detection result is obtained by matching the features extracted from the document data with the specific features in the detection model, whether the document data of the mail to be processed contains malicious data can be determined, and then whether the source IP address of the mail is in a non-set area is judged, for example, the IP address is an overseas IP, so that the accuracy rate of the mail to be processed being the harpoon mail can be further improved, and further, the detection rate of the harpoon mail is improved.

Fig. 6 is a schematic flow chart of a harpoon mail detection according to an embodiment of the present invention, which mainly includes the following steps:

inputting a sample to be detected into a detection model, detecting the sample to be detected by a characteristic matching method, and under the condition that a detection result shows that the sample to be detected is similar to a white sample, discarding a mail corresponding to the white sample and ending the process; under the condition that the detection result shows that the sample to be detected is similar to the black sample, judging the IP address of the mail sender, and if the IP address is an overseas IP, judging the mail to be a harpoon mail; if it is non-foreign IP, the mail is a highly suspicious mail.

It should be noted that the method for detecting a harpoon mail according to the embodiment of the present invention may be applied to any network device capable of receiving an e-mail, for example, a mobile electronic device with an internet access function, such as a smart phone, a tablet computer, a notebook computer, and the like, and a fixed electronic terminal with an internet access function, such as a digital TV, a desktop computer, and the like, which is not limited in the embodiment of the present invention.

In order to further embody the object of the present invention, the above embodiments of the present invention are further illustrated.

The method for detecting the fishfork mail provided by the embodiment of the invention is mainly realized on network equipment of a network layer through software, and the flow of malicious attack is identified by detecting the whole network flow in the network.

The first step is as follows: analyzing corresponding mail data in a protocol filtering mode, such as common e-mail protocols SMTP, POP3 and the like, restoring attachment documents in the mails by deep analysis of the mail data, and screening the Office documents of a specific type to be detected by filtering the attachment documents.

The second step is that: the method for extracting the characteristics of the Office document to be detected from multiple dimensions comprises the main dimensions of the document size, the document page number, the line number or column number of the document content, whether the document content contains the trapping click characters or not, the language information of the document content, the program code information in the document content, the number of pictures in the document content, whether the pictures contain the characters or not, the number of the characters in the pictures and whether the document in the pictures contains the trapping click characters or not.

The third step: the method comprises the steps of collecting a large number of normal documents (white samples) and a large number of malicious documents (black samples), carrying out feature extraction from multiple dimensions by a second-step method, and establishing a detection model of the black and white samples by machine learning algorithm SVM according to feature extraction data of the white samples and the black samples.

The fourth step: and judging the data of the sample to be detected in the second step after the characteristics are extracted and the trained detection model, and judging whether the sample is a malicious sample according to the returned result.

The fifth step: and according to the information of the source IP of the mail sender, determining whether the mail is a targeted fishfork mail or not by combining the sample judgment result in the fourth step.

Fig. 7 is a flowchart of determining a malicious Office accessory by using a detection model in the embodiment of the present invention, which mainly includes the following steps:

acquiring network flow; identifying mail data corresponding to the mail to be processed in a protocol filtering mode of POP3, SMTP and the like; and restoring the mail data and carrying out format screening to obtain Office attachment documents in the mails to be processed, carrying out multi-dimensional feature extraction on the Office attachment documents and determining feature contents.

Collecting a large amount of normal Office document data (white samples) and malicious word document data (black samples) in advance; and respectively carrying out multi-dimensional feature extraction on the white sample and the black sample, determining the features of the black and white sample, carrying out sample training on the features of the black and white sample by adopting a machine learning SVM algorithm, and obtaining a detection model aiming at the black and white sample according to a training result.

Judging the feature data in the feature content through a detection model, and corresponding to a malicious Office accessory under the condition that the detection result shows that the feature data is similar to the black sample; and under the condition that the detection result shows that the feature data is the same as the white sample, the detection result corresponds to a normal Office accessory.

After determining that the Office attachment is the malicious Office attachment, further determining whether the mail is a fishfork mail according to the information of the source IP of the mail sender.

Fig. 8 is a schematic structural diagram of a fishfork mail detection apparatus according to an embodiment of the present invention, and as shown in fig. 8, the apparatus includes: an analysis module 800, a detection module 801 and a determination module 802, wherein:

the analysis module 800 is configured to acquire a mail to be processed, and analyze document data corresponding to the mail to be processed;

a detection module 801, configured to extract a specific feature in the document data, input the specific feature to a detection model for detection, and obtain a detection result, where the specific feature includes a document class feature and/or a picture class feature;

a determining module 802, configured to determine whether the mail to be processed is a harpoon mail according to the detection result.

In an embodiment, the parsing module 800 is further configured to:

In one embodiment, the detecting module 801 is further configured to:

extracting specific characteristics of normal sample data to obtain first data;

In one embodiment, the document class characteristics include at least one of: the method comprises the steps of determining the size of a document, the number of pages of the document, the number of lines or columns of document contents, whether the document contents contain induced click characters, language information of the document contents and program code information in the document contents; the picture class characteristics include at least one of: the number of pictures in the content of the document, whether the pictures contain characters, the number of the characters in the pictures and whether the documents in the pictures contain the lured click characters.

In one embodiment, the determining module 802 is further configured to:

In an embodiment, the determining module is further configured to determine whether the to-be-processed email is a harpoon email according to source IP information of the to-be-processed email, and includes:

In practical applications, the parsing module 800, the detecting module 801 and the determining module 802 may be implemented by a processor located in an electronic device, and the processor may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller and a microprocessor.

In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the related art, or all or part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Specifically, the computer program instructions corresponding to a method for detecting a harpoon mail in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, or a U-disc, and when the computer program instructions corresponding to a method for detecting a harpoon mail in the storage medium are read or executed by an electronic device, any one of the methods for detecting a harpoon mail in the foregoing embodiments is implemented.

Based on the same technical concept of the foregoing embodiment, referring to fig. 9, it shows an electronic device 900 provided by an embodiment of the present invention, which may include: a memory 901 and a processor 902; wherein the content of the first and second substances,

a memory 901 for storing computer programs and data;

a processor 902 for executing a computer program stored in a memory to implement any of the foregoing embodiments of the harpoon mail detection method.

In practical applications, the memory 901 may be a volatile memory (RAM); or a non-volatile memory (non-volatile memory) such as a ROM, a flash memory (flash memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 902.

The processor 902 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It is to be understood that, for different augmented reality cloud platforms, other electronic devices may be used to implement the above-described processor function, and the embodiment of the present invention is not particularly limited.

In some embodiments, the functions of the apparatus provided in the embodiments of the present invention or the modules included in the apparatus may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, which are not repeated herein for brevity

The methods disclosed in the method embodiments provided by the present invention can be combined arbitrarily without conflict to obtain a new method embodiment.

Features disclosed in each product embodiment provided by the invention can be combined arbitrarily to obtain a new product embodiment without conflict.

The features disclosed in the method or device embodiments of the invention may be combined in any combination to arrive at new method or device embodiments without conflict.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A method for harpoon mail detection, the method comprising:

2. The method according to claim 1, wherein the parsing out the document data corresponding to the mail to be processed comprises:

3. The method according to claim 2, wherein the screening of document data corresponding to the mail to be processed from the attached document comprises:

4. The method of claim 1, further comprising:

extracting specific characteristics of normal sample data to obtain first data;

5. The method of claim 1, wherein the document class characteristics comprise at least one of: the method comprises the steps of determining the size of a document, the number of pages of the document, the number of lines or columns of document contents, whether the document contents contain induced click characters, language information of the document contents and program code information in the document contents; the picture class characteristics include at least one of: the number of pictures in the content of the document, whether the pictures contain characters, the number of the characters in the pictures and whether the documents in the pictures contain the lured click characters.

6. The method of claim 4, further comprising:

under the condition that the detection result shows that the document data is similar to the malicious sample data, determining whether the mail to be processed is a fishfork mail or not according to the IP information of the source internet interconnection protocol of the mail to be processed;

7. The method of claim 6, wherein determining whether the pending mailpiece is a harpoon mailpiece based on source Internet Protocol (IP) information of the pending mailpiece comprises:

8. A harpoon mail detection device, characterized in that the device comprises:

9. An electronic device, characterized in that the device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any of claims 1 to 7 when executing the program.

10. A computer storage medium on which a computer program is stored, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by a processor.