CN114760119A - Phishing mail attack detection method, device and system - Google Patents

Phishing mail attack detection method, device and system Download PDF

Info

Publication number
CN114760119A
CN114760119A CN202210353048.5A CN202210353048A CN114760119A CN 114760119 A CN114760119 A CN 114760119A CN 202210353048 A CN202210353048 A CN 202210353048A CN 114760119 A CN114760119 A CN 114760119A
Authority
CN
China
Prior art keywords
mail
detection
phishing
library
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210353048.5A
Other languages
Chinese (zh)
Other versions
CN114760119B (en
Inventor
柯明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anbotong Jin'an Technology Co ltd
Original Assignee
Beijing Anbotong Jin'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anbotong Jin'an Technology Co ltd filed Critical Beijing Anbotong Jin'an Technology Co ltd
Priority to CN202210353048.5A priority Critical patent/CN114760119B/en
Publication of CN114760119A publication Critical patent/CN114760119A/en
Application granted granted Critical
Publication of CN114760119B publication Critical patent/CN114760119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a phishing mail attack detection method, a device and a system, wherein the method comprises the following steps: acquiring the flow of the e-mail; carrying out vector matching in a feature vector library and/or configuration item matching in a configuration monitoring item according to the metadata information of the email flow to determine a threat weight; and judging whether the E-mail flow is a phishing mail or not according to the threat weight, and updating the feature vector library. The invention fully utilizes the multi-dimensional detection items, can be flexibly configured, and forms a two-dimensional matrix for configuring the detection items and the detection levels in the monitoring items, thereby solving the problem that the phishing mail detection cannot adapt to the detection strategy according to the configuration of the company security administrator, and ensuring that the company security administrator can well balance the detection rate and the processing efficiency.

Description

Phishing mail attack detection method, device and system
Technical Field
The invention belongs to the technical field of computer security, and particularly relates to a phishing mail attack detection method, device and system.
Background
Electronic mail is one of the most common applications of the internet and is an indispensable component in the work and life of people; but at present, the security of the e-mail faces a non-negligible security challenge, and particularly, the phishing attack is the most prominent. Phishing mail means that a disguised electronic mailbox is utilized to deceive a receiver to reply information such as an account number, a password and the like to a specified receiver; or the mail receiver is induced to click the attachment containing the malicious program or point to the malicious link containing the malicious program, so that the target host is controlled to steal sensitive information. In recent years, the number of phishing mails has been on the rise, which brings a serious challenge to network security.
Most of the existing phishing mail detection methods are based on a built-in feature library to detect the content features of the phishing mails, the detection mode mostly depends on the updating of the content feature library, the detection algorithm strategy is fixed, an attacker conducts targeted camouflage on the mail contents, a large amount of false alarms can be generated, manual confirmation is needed, the energy of safety managers of companies is consumed, the processing efficiency is reduced, and the operation cost of the companies is improved.
Therefore, there is a need for an efficient and accurate phishing attack detection method to solve the above problems.
Disclosure of Invention
Accordingly, there is a need for a phishing mail attack detection method, apparatus and system to overcome the problem of the prior art that phishing mails cannot be identified efficiently and accurately.
In order to solve the technical problem, the invention provides a phishing mail attack detection method, which comprises the following steps:
acquiring the flow of the e-mail;
carrying out vector matching in a feature vector library and/or configuration item matching in a configuration monitoring item according to the metadata information of the email flow to determine a threat weight;
and judging whether the E-mail flow is a phishing mail or not according to the threat weight, and updating the feature vector library.
Further, the determining a threat weight by performing vector matching in a feature vector library and/or performing configuration item matching in a configuration monitoring item according to the metadata information of the email traffic includes:
performing vector matching in the feature vector library according to the metadata information of the email flow, wherein the feature vector library comprises a credit feature library, a forwarding relation library and a flow feature library;
if the matching is hit in the feature vector library, determining a matching result;
and if the unmatched result is hit in the feature vector library, matching configuration items in the configuration monitoring items to determine a threat weight.
Further, the configuring monitoring items include mail detecting items, sensitive word detecting items and detecting level detecting items, if the unmatched items hit the feature vector library, the configuring monitoring items are matched to determine the threat weight, and the method includes the following steps:
if the matching is not matched and hit in the feature vector library, carrying out configuration detection on mail detection items, sensitive word detection items and detection levels according to the metadata information, and determining the detection result of the metadata information;
and determining the threat weight according to the detection result of the metadata information based on the mail detection item and the two-dimensional matrix of the detection level.
Further, the metadata information includes mail header information, text information, linkage information, attachment file information, and sensitive word information, and the determining the threat weight value based on the two-dimensional matrix of the mail detection item and the detection level according to the detection result of the metadata information includes:
detecting and hitting according to the mail header information, the text information, the linkage information, the attachment file information and the sensitive word information in the configured sensitive word detection items;
if yes, determining corresponding weight according to the mail detection item and the two-dimensional matrix of the detection level;
and determining the threat weight according to the weight corresponding to the metadata information of each hit.
Further, the determining the threat weight according to the weight corresponding to the metadata information of each hit includes:
determining a corresponding weight score according to the product of the weight corresponding to each hit metadata information and a preset threshold;
determining the threat weight based on a sum of weight scores of the metadata information for each hit.
Further, the determining whether the email traffic is phishing according to the threat weight includes:
Determining a preset threshold value according to the detection level;
if the threat weight is greater than or equal to a preset threshold value corresponding to the detection level, determining that the mail is fished, and triggering alarm reporting;
and if the threat weight is smaller than a preset threshold value corresponding to the detection level, judging that the mail is not phishing.
Further, the feature vector library includes a reputation feature library, a forwarding relation library, and a traffic feature library, and the updating the feature vector library includes:
updating the feature vector library according to the judged vector features of the e-mail flow of the phishing mails;
wherein the vector features comprise reputation features, and the reputation feature library is updated according to the reputation features; the vector characteristics comprise forwarding relation characteristics, and the forwarding relation library is updated according to the forwarding relation characteristics; the vector features comprise flow features, and the flow feature library is updated according to the flow features.
Further, the method further comprises: and judging whether the email flow is phishing emails or not according to the vector matching result.
The invention also provides a phishing mail attack detection device, which comprises:
the acquiring unit is used for acquiring the flow of the e-mail;
The processing unit is used for carrying out vector matching in a feature vector library and/or carrying out configuration item matching in a configuration monitoring item according to the metadata information of the E-mail flow, and determining a threat weight value;
and the judging unit is used for judging whether the E-mail flow is a phishing mail or not according to the threat weight and updating the feature vector library.
The invention also provides a phishing mail attack detection system which comprises a processor, wherein the processor is used for executing the phishing mail attack detection method.
Compared with the prior art, the invention has the beneficial effects that: firstly, effectively acquiring the flow of the E-mail; then, metadata information of the email flow is effectively extracted, a corresponding threat weight is determined through matching of a feature vector library and/or matching of configuration items, and the diversity of matching means is ensured, so that the accuracy of a matching result is ensured; and finally, identifying the phishing mails in the phishing mails based on the threat weight, updating the feature vector library according to the identified phishing mails, realizing closed-loop learning and feedback, and improving the identification accuracy and richness of the feature vector library. In conclusion, the invention fully utilizes the multi-dimensional detection items, can be flexibly configured, and forms a two-dimensional matrix for configuring the detection items and the detection levels in the monitoring items, thereby solving the problem that the phishing mail detection cannot be self-adaptive to the detection strategy according to the configuration of the company security administrator, and ensuring that the company security administrator can well balance the detection rate and the processing efficiency.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a phishing mail attack detection method according to an embodiment of the invention;
FIG. 2 is a schematic flowchart of one embodiment of the step S102 in FIG. 1 according to the present invention;
FIG. 3 is a flowchart illustrating an embodiment of the step S203 in FIG. 2 according to the present invention;
FIG. 4 is a flowchart illustrating an embodiment of step S302 in FIG. 3 according to the present invention;
FIG. 5 is a flowchart illustrating an embodiment of step S403 in FIG. 4 according to the present invention;
FIG. 6 is a flowchart illustrating an embodiment of step S103 shown in FIG. 1 according to the present invention;
FIG. 7 is a schematic structural diagram of an embodiment of a phishing mail attack detection apparatus provided by the present invention;
fig. 8 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. Further, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by a person skilled in the art that the described embodiments can be combined with other embodiments.
The invention provides a phishing mail attack detection method, device and system, which can be used for identifying metadata information of email flow in a multi-dimensional manner through matching of a feature vector library and matching of configuration monitoring items, so that the flexibility of phishing mail identification is ensured, and a new thought is provided for further improving the efficiency of phishing mail identification.
Before the description of the embodiments, the related words involved are paraphrased:
fishing mail: the method comprises the steps that a disguised electronic mailbox is utilized to deceive a receiver to reply information such as an account number, a password and the like to a designated receiver; or the mail receiver is induced to click the attachment containing the malicious program or point to the malicious link containing the malicious program, so that the target host is controlled to steal sensitive information.
Based on the description of the technical nouns, the content characteristics of the phishing mails are often detected based on a built-in characteristic library in the prior art, the detection mode mostly depends on the updating of the content characteristic library, and the detection rate is low; and the phishing mail detection algorithm cannot adapt to the detection strategy according to the configuration of the security administrator of the company, so that the detection rate and the processing efficiency are balanced, the processing efficiency is improved, and the operation cost of the company is reduced. Therefore, the invention aims to provide a method, a device and a system for efficiently and accurately detecting phishing mail attacks.
Specific examples are described in detail below, respectively:
an embodiment of the present invention provides a phishing mail attack detection method, and as seen in fig. 1, fig. 1 is a schematic flow diagram of an embodiment of the phishing mail attack detection method provided by the present invention, where the method includes steps S101 to S103, where:
in step S101, an email traffic is acquired;
in step S102, performing vector matching in a feature vector library and/or performing configuration item matching in a configuration monitoring item according to metadata information of the email traffic, and determining a threat weight;
in step S103, according to the threat weight, it is determined whether the email traffic is phishing emails, and the feature vector library is updated.
In the embodiment of the invention, firstly, the flow of the E-mail is effectively obtained; then, metadata information of the E-mail flow is effectively extracted, corresponding threat weights are determined through matching of a feature vector library and/or matching of configuration items, and the diversity of matching means is ensured, so that the accuracy of matching results is ensured; and finally, identifying the phishing mails in the phishing mails based on the threat weight, updating the feature vector library according to the identified phishing mails, realizing closed-loop learning and feedback, and improving the identification accuracy and richness of the feature vector library.
As a preferred embodiment, referring to fig. 2, fig. 2 is a schematic flow chart of an embodiment of step S102 in fig. 1 provided by the present invention, where the step S102 includes step S201 to step S203, where:
in step S201, performing vector matching in the feature vector library according to the metadata information of the email traffic, where the feature vector library includes a reputation feature library, a forwarding relation library, and a traffic feature library;
in step S202, if the matching hits the feature vector library, a matching result is determined;
in step S203, if the mismatch hits in the feature vector library, performing configuration item matching in the configuration monitoring item, and determining a threat weight.
In the embodiment of the invention, the system performs model fast matching on the mail flow to be detected, if the mail flow is not hit, the system continues to generate the corresponding threat weight in the step S203 according to the configured detection item, and simultaneously automatically calculates a sum of the threat weights; in step S203, if the total sum of the weights is higher than the weight corresponding to the detection level, the system determines the current mail as a phishing mail.
As a preferred embodiment, with reference to fig. 3, fig. 3 is a schematic flowchart of an embodiment of step S203 in fig. 2 provided by the present invention, where step S203 includes step S301 to step S302, where:
in step S301, if the feature vector library is hit by a mismatch, configuration detection of a mail detection item, a sensitive word detection item, and a detection level is performed according to the metadata information, and a detection result of the metadata information is determined;
in step S302, the threat weight is determined according to the detection result of the metadata information based on the two-dimensional matrix of the mail detection item and the detection level.
In the embodiment of the invention, the mail is detected based on the multi-dimensional detection items, the flexible configuration of the detection level is supported, and a two-dimensional matrix of the detection items and the detection level is formed, so that the threat weight value is determined efficiently.
As a preferred embodiment, the metadata information includes mail header information, text information, linkage information, attachment file information, and sensitive word information, and as shown in fig. 4, fig. 4 is a schematic flow diagram of an embodiment of step S302 in fig. 3 provided by the present invention, where the step S302 includes steps S401 to S403, where:
in step S401, detection hit is performed on the configured sensitive word detection items according to the mail header information, the text information, the linkage information, the attachment file information, and the sensitive word information, respectively;
in step S402, if a hit occurs, determining a corresponding weight according to the mail detection item and the two-dimensional matrix of the detection level;
in step S403, the threat weight is determined according to the weight corresponding to the metadata information of each hit.
In the embodiment of the invention, two-dimensional matrix matching query is carried out according to mail header information, text information, linkage information, attachment file information and sensitive word information respectively, and threat weight is determined.
It should be noted that, when processing an email traffic, extracting email metadata specifically includes the following three parts:
1. Mail header information: sender mailbox and IP address (From), indicating the author of one or more mail, is shown at the sender of the body. Edited by the sender, e.g., spam will edit this field to a nonexistent address; fraudulent mailing would edit this field into a spoofed mailing address. The method specifically comprises the following steps: recipient mailbox and IP address (To); a copied mail address (Cc); a mail address (Bcc) for closed delivery; mail Subject (Subject); the actual deliverer (Sender) of the mail can only be one, and is generally added by an addressee, after receiving the mail, a mail service provider compares the actual deliverer in the mail conversation with the Sender identified by a header From field, if the actual deliverer is inconsistent with the Sender identified by the header From field, the Sender field is added below the header to identify the actual deliverer of the mail, but the field can also be determined by the Sender; the Reply address (Reply-to) of the mail is edited by the sender, and the receiver is expected to Reply to the specified address when replying to the mail. Under general conditions, if a Reply-to field is not additionally added, when a receiver replies to the email, the receiver replies to the address identified by the From field of the original email; the mail specifies a Return-path address (Return-path), and normally, the Return-path field is not added, and the Return defaults to the address identified by the Sender. When Sender is consistent with From, the address identified by From is returned by default; a mail transmission path (Received), added by each relay service station, to help track errors occurring in the transmission. The field contents include the host and the time of receipt of the transmission and receipt.
2. Mail text information: the text content of the mail; a URL link in the content; an IP address in the content; a phone number in the content; a bank account number in the content; a Payment account number in the content; a two-dimensional code in the content.
3. Mail attachment information; a mail attachment name; a mail attachment file; further extracting file metadata for the mail attachment file, specifically comprising: file type, file size, file MD 5.
Wherein, the mail feature vector library is established based on the information, including but not limited to: reputation characteristics, forwarding relationships, traffic characteristics. The invention supports the detection of the mails based on the multidimensional detection item; then, the system will automatically calculate a total weight according to the weight corresponding to each detection item. If the total weight is higher than the weight corresponding to the detection level, the system judges the current mail as a phishing mail.
The detection items specifically mentioned include, but are not limited to: mail head, mail text, linkage information, mail attachment file and sensitive words.
As a preferred embodiment, referring to fig. 5, fig. 5 is a schematic flowchart of an embodiment of step S403 in fig. 4 provided by the present invention, where step S403 includes step S501 to step S502, where:
In step S501, a corresponding weight score is determined according to a product of a weight corresponding to each hit of the metadata information and a preset threshold;
in step S502, the threat weight will be determined from the sum of the weight scores of the metadata information for each hit.
In an embodiment of the invention, a final threat weight is determined based on the sum of the weight scores of the metadata information for each hit.
As a preferred embodiment, referring to fig. 6, fig. 6 is a schematic flowchart of an embodiment of step S103 in fig. 1 provided by the present invention, where the step S103 includes steps S601 to S603, where:
in step S601, a preset threshold is determined according to the detection level;
in step S602, if the threat weight is greater than or equal to a preset threshold corresponding to the detection level, determining that the mail is a fish-fishing mail, and triggering an alarm to report;
in step S603, if the threat weight is smaller than the preset threshold corresponding to the detection level, it is determined as a non-phishing mail.
In the embodiment of the invention, different preset thresholds are determined according to the detection level, so that flexible conversion of different application scenes is facilitated.
In a specific embodiment of the present invention, the two-dimensional matrix of the mail inspection items and the inspection levels is represented by the following tables 1 to 2 (it is understood that tables 1 and 2 are two examples thereof, and the arrangement of the specific two-dimensional matrix varies according to the actual application requirements, and is not limited herein):
TABLE 1
Figure BDA0003579336430000091
TABLE 2
Figure BDA0003579336430000101
When a mail is detected, matching the mail head according to the sensitive words, and then scoring 100 x 0.15-15; and continuously matching threat information, and then accumulating the scores as follows: 15+100 × 0.4 ═ 55 points; if the mail attachment text is continuously matched, the cumulative score is as follows: 55+100 × 0.25 ═ 80; if the algorithm is configured to be normal or loose, the phishing mail is judged; if configured strictly, because of a threshold value less than 90, it is determined not to be a phishing mail.
As a preferred embodiment, the feature vector library includes a reputation feature library, a forwarding relation library, and a traffic feature library, and the updating the feature vector library includes:
updating the feature vector library according to the judged vector features of the e-mail flow of the phishing mails;
wherein the vector features comprise reputation features, and the reputation feature library is updated according to the reputation features; the vector characteristics comprise forwarding relation characteristics, and the forwarding relation library is updated according to the forwarding relation characteristics; the vector features comprise flow features, and the flow feature library is updated according to the flow features.
In the embodiment of the invention, a mail feature vector library model is established and perfected based on the metadata information of mails and mail attachments and the judgment result, and the detection capability of the following phishing mails is more intelligent and efficient along with the accumulation of model data. And updating the metadata of the e-mail and the attachment according to the feature vector library model respectively. For example:
For the reputation feature, the evaluated mail conditions (including fishing mail, sending unknown users, whether to black, and the like) are updated into the reputation feature library.
For the forwarding relation, the information of the forwarding relation of the mail is updated, so that other models such as the characteristics of the forwarding relation and the like can be conveniently judged and used;
for the flow characteristics, the flow statistics of the mail is updated, so that the flow characteristics and other models can be conveniently judged and used later.
As a preferred embodiment, the method further comprises: and judging whether the email flow is phishing emails or not according to the vector matching result.
In the embodiment of the invention, according to the vector matching result, if the vector matching result is matched, the identification and judgment are directly carried out according to the vector matching result, so that the flexibility and the rapidity are ensured.
The technical solution of the present invention is more clearly illustrated by a specific example below:
firstly, using an application identification engine to carry out application identification on network flow to screen out email flow;
secondly, extracting the email metadata information of the email flow by a protocol analysis engine and storing the email metadata information in a storage; if the attachment is carried, further extracting metadata information of the mail attachment, and warehousing the mail attachment in the same way;
And thirdly, matching the characteristic vector library samples according to the metadata of the mails and the attachments, including but not limited to the models: credit characteristics, forwarding relations and flow characteristics; if not, entering the fourth step; otherwise, directly outputting the judgment according to the matching result;
fourthly, respectively detecting a mail head, a mail body, linkage information and a mail attachment file according to the configured fishing mail detection item, the sensitive words and the detection level;
fifthly, outputting a threat weight value according to the table 1 or the table 2 aiming at the detection condition;
sixthly, judging an output module to judge whether the sum of the threat weights is greater than the weight corresponding to the detection level; if not, judging that the mail is not a phishing mail; if yes, judging that the fishing mails are fished, and triggering alarm reporting;
and seventhly, the judgment output module performs log output according to the judgment result and updates the mail information into the feature vector library model so as to accelerate the detection speed subsequently and form a feedback loop.
Wherein, it needs to be further explained that the model for the feature vector library is explained as follows:
credit characteristic, IP address of each sent e-mail is a part of the head of the e-mail and can not be seen by the receiver; but the invention can collect the sending habit data of the sender and establish the credit associated with the IP address; wherein the habit data includes, but is not limited to: spam complaints (from phishing mail determination results), volume of delivery (obtained from traffic characteristics), delivery to unknown users (obtained from forwarding relationships), industry blacklists (obtained from threat intelligence), and the like; and updating the credit characteristic score corresponding to the IP according to the habit data.
The forwarding relation is that a sender is used as a visual angle, and a multi-dimensional forwarding relation table of an e-mail sender, a receiver, a transcriber and a blind sender is established; taking a one-bit sender as an example, the forwarding relationship includes, but is not limited to: a recipient information list, a carbon copy information list, a secret carbon copy information list and the like; and recipient, carbon copy and crypto information includes but is not limited to: mailbox address, IP address, receiving number (mail number, attachment number, specifically including total number, last 1 day, last 1 month, last 3 months, last 6 months, last 1 year, last 2 years and last 3 years), last active time, etc.
The traffic characteristics, which are from the perspective of the sender, establish a traffic characteristics database of the e-mail sender, specifically including but not limited to: flow size, flow rate, flow size segmentation statistics (including less than 64 bytes, 65-128 bytes, 129 + 256 bytes, 257 + 512 bytes, 513 + 1024 bytes, 1025 + 1514 bytes, greater than 1514 bytes, etc.), flow size peak, flow rate peak, etc.
Aiming at matching the mail header, the mail body and the mail attachment file according to the sensitive words, the following can be provided: the matching success is judged if the number of the matched sensitive words/the total number of the sensitive words is more than 30 percent (the model can be configured and recommended to be adopted)
An embodiment of the present invention further provides a phishing mail attack detection apparatus, and as shown in fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the phishing mail attack detection apparatus provided by the present invention, where the phishing mail attack detection apparatus 700 includes:
an obtaining unit 701, configured to obtain an email flow;
a processing unit 702, configured to perform vector matching in a feature vector library and/or configuration item matching in a configuration monitoring item according to metadata information of the email traffic, and determine a threat weight;
a determining unit 703, configured to determine whether the email traffic is phishing emails according to the threat weight, and update the feature vector library.
The more specific implementation of each unit of the phishing mail attack detection device can be referred to the description of the phishing mail attack detection method, and has similar beneficial effects, and the detailed description is omitted here.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the phishing mail attack detection method as described above is implemented.
Generally, computer instructions for carrying out the methods of the present invention may be carried using any combination of one or more computer-readable storage media. Non-transitory computer readable storage media may include any computer readable medium except for the signal itself, which is temporarily propagating.
A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages, and in particular may employ Python languages suitable for neural network computing and TensorFlow, PyTorch-based platform frameworks. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
Fig. 8 is a schematic structural diagram of an embodiment of the electronic device provided by the present invention, and when the electronic device 800 includes a processor 801, a memory 802, and a computer program stored in the memory 802 and operable on the processor 801, the processor 801 executes the computer program to implement the phishing mail attack detection method described above.
As a preferred embodiment, the electronic device 800 further comprises a display 803 for displaying the data processing result after the processor 801 executes the phishing mail attack detection method.
Illustratively, the computer program may be partitioned into one or more modules/units, which are stored in the memory 802 and executed by the processor 801 to implement the present invention. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of a computer program in the electronic device 800. For example, the computer program may be divided into a plurality of units, and the specific functions of each unit are described in the foregoing sub-steps, which are not described herein in detail.
The electronic device 800 may be a desktop computer, a notebook, a palm top computer, or a smart phone with an adjustable camera module.
The processor 801 may be an integrated circuit chip having signal processing capability. The Processor 801 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The Memory 802 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The memory 802 is used for storing a program, and the processor 801 executes the program after receiving an execution instruction, where the method defined by the flow disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 801, or implemented by the processor 801.
The display 803 may be an LCD display or an LED display. Such as a display screen on a cell phone.
It is understood that the configuration shown in fig. 8 is only a schematic configuration of the electronic device 800, and that the electronic device 800 may include more or less components than those shown in fig. 8. The components shown in fig. 8 may be implemented in hardware, software, or a combination thereof.
According to the computer-readable storage medium and the electronic device provided by the above embodiments of the present invention, the content specifically described for implementing the phishing mail attack detection method according to the present invention can be referred to, and the beneficial effects similar to those of the phishing mail attack detection method described above are achieved, and are not described again here.
The invention discloses a phishing mail attack detection method, a device and a system, firstly, the flow of an electronic mail is effectively obtained; then, metadata information of the email flow is effectively extracted, a corresponding threat weight is determined through matching of a feature vector library and/or matching of configuration items, and the diversity of matching means is ensured, so that the accuracy of a matching result is ensured; and finally, identifying the phishing mails in the phishing mails based on the threat weight, updating the feature vector library according to the identified phishing mails, realizing closed-loop learning and feedback, and improving the identification accuracy and richness of the feature vector library.
According to the technical scheme, the multi-dimensional detection items are fully utilized, flexible configuration can be achieved, and a two-dimensional matrix for configuring the detection items and the detection levels in the monitoring items is formed, so that the problem that the phishing mail detection cannot be self-adaptive to the detection strategy according to the configuration of a company safety manager is solved, and the detection rate and the processing efficiency can be well balanced by the company safety manager; the invention can upgrade the existing phishing mail detection engine on the premise of not changing the existing network architecture of a company, can solve the problems of low phishing mail detection rate and insufficient algorithm configuration flexibility of the company, can improve the processing efficiency of a company safety manager to a great extent, reduces the operation cost of the company, establishes a mail characteristic vector library based on metadata information of mails and mail attachments, performs deep detection on the mails based on multi-dimensional detection items, and improves the phishing mail detection rate and processing performance to the greatest extent by combining flexible configuration of detection levels.
While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims (10)

1. A phishing attack detection method, comprising:
acquiring the flow of the e-mail;
according to the metadata information of the E-mail flow, carrying out vector matching in a feature vector library and/or carrying out configuration item matching in a configuration monitoring item, and determining a threat weight;
and judging whether the E-mail flow is a phishing mail or not according to the threat weight, and updating the feature vector library.
2. A phishing mail attack detection method according to claim 1, wherein the determining of the threat weight by performing vector matching in a feature vector library and/or configuration item matching in a configuration monitoring item according to metadata information of the email traffic comprises:
performing vector matching in the feature vector library according to the metadata information of the email flow, wherein the feature vector library comprises a credit feature library, a forwarding relation library and a flow feature library;
if the matching is hit in the feature vector library, determining a matching result;
and if the unmatched result is hit in the feature vector library, matching configuration items in the configuration monitoring items, and determining a threat weight value.
3. A phishing mail attack detection method according to claim 2, wherein the configuration monitoring items include mail detection items, sensitive word detection items and detection level detection items, and if the mismatch hits the feature vector library, the configuration monitoring items are matched to determine a threat weight, including:
If the matching is not matched and the feature vector library is hit, performing configuration detection on mail detection items, sensitive word detection items and detection levels according to the metadata information, and determining a detection result of the metadata information;
and determining the threat weight value according to the detection result of the metadata information based on the mail detection item and the two-dimensional matrix of the detection level.
4. A phishing mail attack detection method according to claim 3, wherein the metadata information includes mail header information, body information, linkage information, attachment file information, and sensitive word information, and the determining the threat weight value based on the mail detection item and the two-dimensional matrix of the detection level according to the detection result of the metadata information includes:
detecting and hitting according to the mail header information, the text information, the linkage information, the attachment file information and the sensitive word information in the configured sensitive word detection items;
if yes, determining corresponding weight according to the mail detection item and the two-dimensional matrix of the detection level;
and determining the threat weight according to the weight corresponding to the metadata information of each hit.
5. A phishing mail attack detection method as claimed in claim 4 wherein said determining said threat weight based on a weight corresponding to said metadata information for each hit comprises:
determining a corresponding weight score according to the product of the weight corresponding to each hit metadata information and a preset threshold;
determining the threat weight based on a sum of weight scores of the metadata information for each hit.
6. A phishing mail attack detection method as claimed in claim 5 wherein said determining whether said e-mail traffic is phishing according to said threat weights comprises:
determining a preset threshold value according to the detection level;
if the threat weight is greater than or equal to a preset threshold value corresponding to the detection level, determining that the mail is fished, and triggering alarm reporting;
and if the threat weight is smaller than a preset threshold value corresponding to the detection level, judging that the mail is not phishing.
7. A phishing mail attack detection method as claimed in claim 1 wherein said feature vector library comprises a reputation feature library, a forwarding relation library, a traffic feature library, said updating said feature vector library comprising:
Updating the feature vector library according to the judged vector features of the e-mail flow of the phishing mails;
wherein the vector features comprise reputation features, and the reputation feature library is updated according to the reputation features; the vector characteristics comprise forwarding relation characteristics, and the forwarding relation library is updated according to the forwarding relation characteristics; the vector features comprise flow features, and the flow feature library is updated according to the flow features.
8. A phishing mail attack detection method according to any one of claims 1 to 7, characterized in that the method further comprises: and judging whether the email flow is phishing emails or not according to the vector matching result.
9. A phishing mail attack detection device, comprising:
the acquiring unit is used for acquiring the flow of the e-mail;
the processing unit is used for carrying out vector matching in a feature vector library and/or carrying out configuration item matching in a configuration monitoring item according to the metadata information of the E-mail flow, and determining a threat weight value;
and the judging unit is used for judging whether the E-mail flow is a phishing mail or not according to the threat weight and updating the feature vector library.
10. A phishing mail attack detection system comprising a processor for executing the phishing mail attack detection method according to any one of claims 1 to 8.
CN202210353048.5A 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system Active CN114760119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210353048.5A CN114760119B (en) 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210353048.5A CN114760119B (en) 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system

Publications (2)

Publication Number Publication Date
CN114760119A true CN114760119A (en) 2022-07-15
CN114760119B CN114760119B (en) 2023-12-12

Family

ID=82329841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210353048.5A Active CN114760119B (en) 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system

Country Status (1)

Country Link
CN (1) CN114760119B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201190A (en) * 2023-11-03 2023-12-08 北京微步在线科技有限公司 Mail attack detection method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150067833A1 (en) * 2013-08-30 2015-03-05 Narasimha Shashidhar Automatic phishing email detection based on natural language processing techniques
CN105072137A (en) * 2015-09-15 2015-11-18 蔡丝英 Spear phishing mail detection method and device
US20160344770A1 (en) * 2013-08-30 2016-11-24 Rakesh Verma Automatic Phishing Email Detection Based on Natural Language Processing Techniques
CN108200105A (en) * 2018-03-30 2018-06-22 杭州迪普科技股份有限公司 A kind of method and device for detecting fishing mail
CN108418777A (en) * 2017-02-09 2018-08-17 中国移动通信有限公司研究院 A kind of fishing mail detection method, apparatus and system
US10158677B1 (en) * 2017-10-02 2018-12-18 Servicenow, Inc. Automated mitigation of electronic message based security threats
CN113489734A (en) * 2021-07-13 2021-10-08 杭州安恒信息技术股份有限公司 Phishing mail detection method and device and electronic device
WO2021242687A1 (en) * 2020-05-28 2021-12-02 GreatHorn, Inc. Computer-implemented methods and systems for pre-analysis of emails for threat detection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150067833A1 (en) * 2013-08-30 2015-03-05 Narasimha Shashidhar Automatic phishing email detection based on natural language processing techniques
US20160344770A1 (en) * 2013-08-30 2016-11-24 Rakesh Verma Automatic Phishing Email Detection Based on Natural Language Processing Techniques
CN105072137A (en) * 2015-09-15 2015-11-18 蔡丝英 Spear phishing mail detection method and device
CN108418777A (en) * 2017-02-09 2018-08-17 中国移动通信有限公司研究院 A kind of fishing mail detection method, apparatus and system
US10158677B1 (en) * 2017-10-02 2018-12-18 Servicenow, Inc. Automated mitigation of electronic message based security threats
CN108200105A (en) * 2018-03-30 2018-06-22 杭州迪普科技股份有限公司 A kind of method and device for detecting fishing mail
WO2021242687A1 (en) * 2020-05-28 2021-12-02 GreatHorn, Inc. Computer-implemented methods and systems for pre-analysis of emails for threat detection
CN113489734A (en) * 2021-07-13 2021-10-08 杭州安恒信息技术股份有限公司 Phishing mail detection method and device and electronic device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201190A (en) * 2023-11-03 2023-12-08 北京微步在线科技有限公司 Mail attack detection method and device, electronic equipment and storage medium
CN117201190B (en) * 2023-11-03 2024-02-02 北京微步在线科技有限公司 Mail attack detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114760119B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
US10819744B1 (en) Collaborative phishing attack detection
US10609073B2 (en) Detecting phishing attempts
US9906554B2 (en) Suspicious message processing and incident response
US8301719B2 (en) Employing pixel density to detect a spam image
US9253207B2 (en) Collaborative phishing attack detection
US9398038B2 (en) Collaborative phishing attack detection
EP2805286B1 (en) Online fraud detection dynamic scoring aggregation systems and methods
US8205255B2 (en) Anti-content spoofing (ACS)
US10904175B1 (en) Verifying users of an electronic messaging system
US8381292B1 (en) System and method for branding a phishing website using advanced pattern matching
US20100281536A1 (en) Phish probability scoring model
US20080046970A1 (en) Determining an invalid request
JP2023515910A (en) System and method for using relationship structure for email classification
US10158639B1 (en) Data scrubbing via template generation and matching
Abraham et al. Approximate string matching algorithm for phishing detection
CN114760119B (en) Phishing mail attack detection method, device and system
US20160132799A1 (en) List hygiene tool
Atimorathanna et al. NoFish; total anti-phishing protection system
Haque Machine Learning Based Prediction versus Human-as-a-Security-Sensor
Chodisetti et al. Synthesis rule-based classification approach for malicious websites identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant