CN114760119B - Phishing mail attack detection method, device and system - Google Patents

Phishing mail attack detection method, device and system Download PDF

Info

Publication number
CN114760119B
CN114760119B CN202210353048.5A CN202210353048A CN114760119B CN 114760119 B CN114760119 B CN 114760119B CN 202210353048 A CN202210353048 A CN 202210353048A CN 114760119 B CN114760119 B CN 114760119B
Authority
CN
China
Prior art keywords
mail
detection
library
matching
phishing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210353048.5A
Other languages
Chinese (zh)
Other versions
CN114760119A (en
Inventor
柯明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anbotong Jin'an Technology Co ltd
Original Assignee
Beijing Anbotong Jin'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anbotong Jin'an Technology Co ltd filed Critical Beijing Anbotong Jin'an Technology Co ltd
Priority to CN202210353048.5A priority Critical patent/CN114760119B/en
Publication of CN114760119A publication Critical patent/CN114760119A/en
Application granted granted Critical
Publication of CN114760119B publication Critical patent/CN114760119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to a method, a device and a system for detecting phishing mail attack, wherein the method comprises the following steps: acquiring the email traffic; vector matching is carried out in a characteristic vector library and/or configuration item matching is carried out in a configuration monitoring item according to metadata information of the E-mail flow, and threat weight is determined; and judging whether the E-mail flow is phishing mail or not according to the threat weight, and updating the feature vector library. The application fully utilizes the multi-dimensional detection items, can be flexibly configured to form a two-dimensional matrix for configuring the detection items and detection levels in the detection items, thereby solving the problem that phishing mail detection cannot be self-adaptively detected according to the configuration of the security manager of the company, and ensuring that the security manager of the company can well balance the detection rate and the processing efficiency.

Description

Phishing mail attack detection method, device and system
Technical Field
The application belongs to the technical field of computer security, and particularly relates to a method, a device and a system for detecting phishing mail attack.
Background
Email, one of the most common applications of the internet, is an integral part of people's work and life; but the security of e-mail is facing a non-negligible security challenge, especially the most prominent is the phishing mail attack. The phishing mail is to reply the information such as account numbers, passwords and the like to the appointed receiver by using a disguised electronic mailbox; or induce the mail receiver to click the attachment containing the malicious program or point to the malicious link containing the malicious program, thereby controlling the target host to steal the sensitive information. In recent years, the number of phishing mails has been on the rise, and has presented a serious challenge to network security.
Most of the existing phishing mail detection methods are based on a built-in feature library to detect the content features of the phishing mails, the detection mode is mostly dependent on the update of the content feature library, the detection algorithm is fixed in strategy, an attacker can pertinently disguise the content of the mails, a large number of false positives are generated, the false positives need to be confirmed one by one manually, the effort of a security manager of a company is consumed, the processing efficiency is reduced, and the operation cost of the company is increased.
Therefore, there is a need for an efficient and accurate phishing mail attack detection method to solve the above problems.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, apparatus and system for detecting phishing mail attack, which are used for overcoming the problem that the phishing mail cannot be identified efficiently and accurately in the prior art.
In order to solve the technical problems, the application provides a phishing mail attack detection method, which comprises the following steps:
acquiring the email traffic;
vector matching is carried out in a characteristic vector library and/or configuration item matching is carried out in a configuration monitoring item according to metadata information of the E-mail flow, and threat weight is determined;
and judging whether the E-mail flow is phishing mail or not according to the threat weight, and updating the feature vector library.
Further, the determining threat weight according to vector matching in a feature vector library and/or configuration item matching in a configuration monitoring item according to metadata information of the email traffic includes:
vector matching is carried out in the characteristic vector library according to the metadata information of the E-mail flow, wherein the characteristic vector library comprises a credit characteristic library, a forwarding relation library and a flow characteristic library;
if the matching hits the feature vector library, determining a matching result;
if the feature vector library is not matched, matching configuration items in the configuration monitoring items, and determining threat weights.
Further, the configuration monitoring item includes a mail detection item, a sensitive word detection item and a detection level detection item, and if the feature vector library is hit by the mismatching, the configuration monitoring item is matched with the configuration monitoring item to determine a threat weight, including:
if the feature vector library is not matched and hit, carrying out configuration detection on mail detection items, sensitive word detection items and detection levels according to the metadata information, and determining a detection result of the metadata information;
and determining the threat weight according to the detection result of the metadata information based on the mail detection item and the two-dimensional matrix of the detection level.
Further, the metadata information includes mail header information, text information, linkage information, attachment file information, and sensitive word information, and the determining the threat weight based on the mail detection item and the two-dimensional matrix of the detection level according to the detection result of the metadata information includes:
detecting hits in the configured sensitive word detection items according to the mail header information, the text information, the linkage information, the attachment file information and the sensitive word information;
if hit, determining corresponding weight according to the mail detection item and the two-dimensional matrix of the detection level;
and determining the threat weight according to the weight corresponding to the metadata information of each hit.
Further, the determining the threat weight according to the weight corresponding to the metadata information of each hit includes:
determining a corresponding weight score according to the product of the weight corresponding to the metadata information of each hit and a preset threshold value;
the threat weight is determined based on a sum of the weight scores of the metadata information for each hit.
Further, the determining whether the email traffic is phishing according to the threat weight includes:
determining a preset threshold according to the detection level;
if the threat weight is greater than or equal to a preset threshold corresponding to the detection level, judging that the threat weight is a fishing mail, and triggering an alarm to report;
and if the threat weight is smaller than a preset threshold corresponding to the detection level, judging that the mail is not phishing.
Further, the feature vector library comprises a reputation feature library, a forwarding relation library and a flow feature library, and the feature vector library is updated and comprises the following steps:
updating the characteristic vector library according to the vector characteristics of the E-mail flow of the phishing mail;
wherein the vector features include reputation features, and updating the reputation feature library according to the reputation features; the vector features comprise forwarding relation features, and the forwarding relation library is updated according to the forwarding relation features; the vector features comprise flow features, and the flow feature library is updated according to the flow features.
Further, the method further comprises: and judging whether the E-mail flow is phishing mail or not according to the vector matching result.
The application also provides a device for detecting the phishing mail attack, which comprises:
an acquiring unit for acquiring the email traffic;
the processing unit is used for carrying out vector matching in a feature vector library and/or carrying out configuration item matching in a configuration monitoring item according to the metadata information of the E-mail flow, and determining threat weight;
and the judging unit is used for judging whether the E-mail flow is phishing mail or not according to the threat weight value, and updating the feature vector library.
The application also provides a phishing mail attack detection system, which comprises a processor, wherein the processor is used for executing the phishing mail attack detection method.
Compared with the prior art, the application has the beneficial effects that: firstly, effectively acquiring the email traffic; then, metadata information of the email traffic is effectively extracted, corresponding threat weights are determined through matching of feature vector libraries and/or matching of configuration items, diversity of matching means is guaranteed, and therefore accuracy of matching results is guaranteed; and finally, based on the threat weight, identifying the phishing mails, updating the feature vector library according to the identified phishing mails, realizing closed-loop learning and feedback, and improving the identification accuracy and richness of the feature vector library. In summary, the application fully utilizes the multi-dimensional detection items, can be flexibly configured to form a two-dimensional matrix for configuring the detection items and detection levels in the detection items, thereby solving the problem that phishing mail detection cannot be self-adaptively detected according to the configuration of the security manager of the company, and ensuring that the security manager of the company can well balance the detection rate and the processing efficiency.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of a method for detecting phishing mail attack according to the present application;
FIG. 2 is a flowchart illustrating an embodiment of step S102 in FIG. 1 according to the present application;
fig. 3 is a flowchart of an embodiment of step S203 in fig. 2 according to the present application;
FIG. 4 is a flowchart illustrating an embodiment of step S302 in FIG. 3 according to the present application;
fig. 5 is a flowchart of an embodiment of step S403 in fig. 4 according to the present application;
FIG. 6 is a flowchart illustrating an embodiment of step S103 in FIG. 1 according to the present application;
FIG. 7 is a schematic diagram of an embodiment of a phishing mail attack detection device according to the present application;
fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
Detailed Description
The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.
In the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. Furthermore, the meaning of "a plurality of" means at least two, such as two, three, etc., unless specifically defined otherwise.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly understand that the described embodiments may be combined with other embodiments.
The application provides a phishing mail attack detection method, device and system, which are used for identifying metadata information of the flow of an electronic mail in a multi-dimensional manner through matching of feature vector libraries and matching of configuration monitoring items, so that the flexibility of identifying the phishing mail is ensured, and a new idea is provided for further improving the high efficiency of identifying the phishing mail.
Before the description of the embodiments, the related words are defined:
fishing mail: the method is characterized in that a disguised electronic mailbox is utilized, and a deception addressee replies information such as an account number, a password and the like to an appointed receiver; or induce the mail receiver to click the attachment containing the malicious program or point to the malicious link containing the malicious program, thereby controlling the target host to steal the sensitive information.
Based on the description of the technical nouns, the prior art often detects the content characteristics of the phishing mails based on a built-in characteristic library, and the detection mode is mostly dependent on the update of the content characteristic library, so that the problem of low detection rate is solved; and the phishing mail detection algorithm cannot adapt to a detection strategy according to the configuration of a company security manager, so that the detection rate and the processing efficiency are balanced, the processing efficiency is improved, and the operation cost of a company is reduced. Therefore, the application aims to provide an efficient and accurate phishing mail attack detection method, device and system.
Specific embodiments are described in detail below:
the embodiment of the application provides a phishing mail attack detection method, and referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of the phishing mail attack detection method provided by the application, where the method includes steps S101 to S103, and the method includes:
in step S101, an email traffic is acquired;
in step S102, vector matching is performed in a feature vector library and/or configuration item matching is performed in a configuration monitoring item according to metadata information of the email traffic, so as to determine threat weight;
in step S103, it is determined whether the email traffic is phishing, according to the threat weight, and the feature vector library is updated.
In the embodiment of the application, firstly, the email flow is effectively acquired; then, metadata information of the email traffic is effectively extracted, corresponding threat weights are determined through matching of feature vector libraries and/or matching of configuration items, diversity of matching means is guaranteed, and therefore accuracy of matching results is guaranteed; and finally, based on the threat weight, identifying the phishing mails, updating the feature vector library according to the identified phishing mails, realizing closed-loop learning and feedback, and improving the identification accuracy and richness of the feature vector library.
As a preferred embodiment, as seen in fig. 2, fig. 2 is a flow chart of an embodiment of step S102 in fig. 1 provided by the present application, where step S102 includes steps S201 to S203, and the steps include:
in step S201, vector matching is performed in the feature vector library according to metadata information of the email traffic, where the feature vector library includes a reputation feature library, a forwarding relation library, and a traffic feature library;
in step S202, if the matching hits the feature vector library, determining a matching result;
in step S203, if the feature vector library is not matched, matching configuration items in the configuration monitoring items, and determining threat weights.
In the embodiment of the application, the system can perform model fast matching on the mail flow to be detected, and the system continues to generate the corresponding threat weight in step S203 according to the configured detection item, and simultaneously automatically calculates a threat weight sum; in step S203, if the sum of the weights is higher than the weight corresponding to the detection level, the system determines the current mail as a phishing mail.
As a preferred embodiment, as seen in fig. 3, fig. 3 is a flow chart of an embodiment of step S203 in fig. 2 provided by the present application, where step S203 includes steps S301 to S302, and the steps include:
in step S301, if the feature vector library is not matched and hit, performing configuration detection on mail detection items, sensitive word detection items and detection levels according to the metadata information, and determining a detection result of the metadata information;
in step S302, the threat weight is determined according to the detection result of the metadata information based on the mail detection item and the two-dimensional matrix of the detection level.
In the embodiment of the application, the mail is detected based on the multi-dimensional detection items, the flexible configuration of the detection levels is supported, and the two-dimensional matrix of the detection items and the detection levels is formed, so that the threat weight is determined efficiently.
As a preferred embodiment, the metadata information includes mail header information, text information, linkage information, attachment file information, and sensitive word information, and as seen in fig. 4, fig. 4 is a schematic flow chart of an embodiment of step S302 in fig. 3 provided in the present application, where step S302 includes steps S401 to S403, and the step S401 includes:
in step S401, detecting hits in the configured sensitive word detection items according to the mail header information, the text information, the linkage information, the attachment file information, and the sensitive word information, respectively;
in step S402, if hit, determining a corresponding weight according to the mail detection item and the two-dimensional matrix of the detection level;
in step S403, the threat weight is determined according to the weight corresponding to the metadata information for each hit.
In the embodiment of the application, matching inquiry of the two-dimensional matrix and determined threat weight are respectively carried out according to mail header information, text information, linkage information, attachment file information and sensitive word information.
When an email flow is processed, the email metadata is extracted, which specifically includes the following three parts:
1. mail header information: sender mailbox and IP address (From), representing the author of one or more mails, are displayed at the sender of the text. Editing by the sender, for example, garbage sending would edit this field to an address that does not exist; the fraudulent mail will edit this field into the spoofed mail address. The method specifically comprises the following steps: recipient mailboxes and IP addresses (To); a copy mail address (Cc); a sent mail address (Bcc); mail Subject (Subject); the actual delivery person (Sender) of the mail is only one, and is generally added by a receiver, after receiving the mail, a mail service provider compares the actual delivery person in the mail session with the Sender identified by the From field of the header, if the actual delivery person is inconsistent with the Sender identified by the From field of the header, the Sender is added under the header to identify the actual delivery person of the mail, but the field can also be determined by the Sender; the Reply address (Reply-to) of the mail is edited by the sender, and the receiver replies to the designated address when hoped to Reply to the mail. Normally, if the Reply-to field is not added additionally, the addressee replies to the address identified by the From field of the original mail; mail specifies a Return-path address (Return-path), and typically, the Return-path field is not added, and the Return default returns to the address identified by the Sender. When the Sender and the From are consistent, the message is returned to the address of the From mark by default; mail transmission paths (Received), added by each relay service station, are used to help track errors that occur in the transmission. The field content includes a host of transmission, reception and reception time.
2. Mail body information: mail text content; URL links in content; IP address in the content; telephone numbers in the content; a bank account number in the content; a payment treasured account number in the content; two-dimensional codes in the content.
3. Mail attachment information; mail attachment name; mail attachment files; the method further extracts file metadata aiming at the mail attachment file, and specifically comprises the following steps: file type, file size, file MD5.
Wherein, based on the information, a mail characteristic vector library is established, including but not limited to: reputation characteristics, forwarding relationships, traffic characteristics. The application supports the detection of mails based on multi-dimensional detection items; and then, the system automatically calculates a weight sum according to the weight corresponding to each detection item. If the sum of the weights is higher than the weight corresponding to the detection level, the system determines the current mail as the phishing mail.
Among the detection items specifically contemplated include, but are not limited to: mail header, mail body, linkage information, mail attachment file and sensitive word.
As a preferred embodiment, as seen in fig. 5, fig. 5 is a flowchart of an embodiment of step S403 in fig. 4 provided by the present application, where step S403 includes steps S501 to S502, and the steps include:
in step S501, a corresponding weight score is determined according to the product of the weight corresponding to the metadata information of each hit and a preset threshold;
in step S502, the threat weight will be determined from the sum of the weight scores of the metadata information for each hit.
In the embodiment of the application, the final threat weight is determined according to the sum of the weight scores of the metadata information of each hit.
As a preferred embodiment, as seen in fig. 6, fig. 6 is a flow chart of an embodiment of step S103 in fig. 1 provided by the present application, where step S103 includes steps S601 to S603, and the steps include:
in step S601, a preset threshold is determined according to the detection level;
in step S602, if the threat weight is greater than or equal to the preset threshold corresponding to the detection level, determining that the threat weight is a fishing mail, and triggering an alarm to report;
in step S603, if the threat weight is smaller than a preset threshold corresponding to the detection level, it is determined that the threat weight is not phishing mail.
In the embodiment of the application, different preset thresholds are determined according to the detection level, so that flexible conversion of different application scenes is facilitated.
In a specific embodiment of the present application, the two-dimensional matrix of mail detection items and detection levels described above is represented by the following tables 1 to 2 (it will be understood that tables 1 and 2 are two examples thereof, and the setting of the specific two-dimensional matrix is changed according to the actual application requirements, and is not limited herein):
TABLE 1
TABLE 2
When a mail piece is detected, the mail head is matched according to the sensitive word, and the score is 100 x 0.15=15; if threat information is continuously matched, the accumulated score is as follows: 15+100×0.4=55 minutes; continuing to match the mail attachment text, the cumulative score is: 55+100×0.25=80; if the algorithm is configured to be regular or loose, the algorithm is judged to be phishing mail; if the configuration is strict, it is determined that the mail is not phishing because the threshold is smaller than 90.
As a preferred embodiment, the feature vector library includes a reputation feature library, a forwarding relation library, and a flow feature library, and the updating the feature vector library includes:
updating the characteristic vector library according to the vector characteristics of the E-mail flow of the phishing mail;
wherein the vector features include reputation features, and updating the reputation feature library according to the reputation features; the vector features comprise forwarding relation features, and the forwarding relation library is updated according to the forwarding relation features; the vector features comprise flow features, and the flow feature library is updated according to the flow features.
In the embodiment of the application, the mail feature vector library model is built and perfected based on the metadata information of the mails and mail attachments and the judgment result, and the follow-up phishing mail detection capability is more intelligent and efficient along with the accumulation of model data. The metadata of the E-mail and the attachment are updated according to the feature vector library model respectively. For example:
for reputation features, the evaluated mail status (including fishing mail, sending unknown users, whether to pull black, etc.) is updated into the reputation feature library.
For the forwarding relation, the information of the forwarding relation of the mail is updated, so that other models such as the characteristic of the forwarding relation and the like are convenient to judge and use;
the flow characteristics are updated by updating the flow statistics of the mail, so that other models such as the flow characteristics and the like are convenient to judge and use.
As a preferred embodiment, the above method further comprises: and judging whether the E-mail flow is phishing mail or not according to the vector matching result.
In the embodiment of the application, according to the vector matching result, if matching is performed, the identification judgment is directly performed according to the vector matching result, so that the flexibility and the rapidity are ensured.
The technical solution of the present application will be more clearly illustrated by the following specific examples:
the method comprises the steps that firstly, an application recognition engine is used for carrying out application recognition on network traffic to screen out email traffic;
step two, the protocol analysis engine extracts mail metadata information of the email flow and stores the mail metadata information in a warehouse; if the attachment is carried, metadata information of the mail attachment is further extracted, and the mail attachment is put in storage;
third, matching feature vector library samples according to metadata of mail and attachment, including but not limited to models: reputation characteristics, forwarding relationships, flow characteristics; if not, entering a fourth step; otherwise, directly judging and outputting according to the matching result;
fourthly, detecting mail heads, mail texts, linkage information and mail attachment files according to the configured phishing mail detection items, the configured sensitive words and the configured detection levels;
fifthly, outputting a threat weight for the detection situation according to the table 1 or the table 2;
step six, judging whether the threat weight sum is larger than the weight corresponding to the detection level or not by the judging output module; if not, judging that the mail is not phishing mail; if yes, the fishing mail is judged, and an alarm is triggered to report;
and seventhly, the judgment output module outputs a log aiming at the judgment result and updates the mail information into the characteristic vector library model so as to subsequently accelerate the detection speed and form a feedback loop.
It should be further described that, for the feature vector library model, the following description is given:
the credit feature, every E-mail sent, IP address is a part of E-mail head, the addressee cannot see; but the application can collect the sending habit data of the sender and establish the credit associated with the IP address; wherein habit data includes, but is not limited to: spam complaints (derived from phishing mail decision results), transmission volumes (obtained from traffic characteristics), transmission to unknown users (obtained from forwarding relationships), industry blacklists (obtained from threat intelligence), etc.; and updating the credit feature scores corresponding to the IP according to the habit data.
The forwarding relation is a multidimensional forwarding relation table which is established by taking a sender as a visual angle and comprises an email sender, a receiver, a transcriber and a secret sender; taking a sender as an example, forwarding relationships include, but are not limited to: a receiver information list, a transcriber information list, a secret transmitter information list, etc.; and recipient, transcriber, and blind sender information includes, but is not limited to: mailbox addresses, IP addresses, number of received (number of mail, number of attachments, specifically including: total number, last 1 day, last 1 month, last 3 months, last 6 months, last 1 year, last 2 years, and last 3 years), last active time, and the like.
The flow characteristics are that, with the sender as the view angle, a flow characteristic database of the email sender is established, and specifically include, but are not limited to: traffic size, traffic rate, traffic size segment statistics (including: less than 64 bytes, 65-128 bytes, 129-256 bytes, 257-512 bytes, 513-1024 bytes, 1025-1514 bytes, greater than 1514 bytes, etc.), traffic size peak, traffic rate peak, etc.
For matching mail header, mail body and mail attachment file according to sensitive words, the method is based on: the number of the matched sensitive words/the total number of the sensitive words is more than 30 percent (the model can be configured and recommended), and the matching is judged to be successful
The embodiment of the application also provides a phishing mail attack detection device, and referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the phishing mail attack detection device provided by the application, where the phishing mail attack detection device 700 includes:
an acquiring unit 701, configured to acquire an email traffic;
the processing unit 702 is configured to perform vector matching in a feature vector library and/or perform configuration item matching in a configuration monitoring item according to metadata information of the email traffic, and determine a threat weight;
a judging unit 703, configured to judge whether the email traffic is phishing, and update the feature vector library according to the threat weight.
For a more specific implementation of each unit of the phishing mail attack detection apparatus, reference may be made to the description of the above method for detecting phishing mail attack, and similar advantageous effects will be obtained, and will not be described herein.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the phishing mail attack detection method as described above.
In general, the computer instructions for carrying out the methods of the present application may be carried in any combination of one or more computer-readable storage media. The non-transitory computer-readable storage medium may include any computer-readable medium, except the signal itself in temporary propagation.
The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, or combinations thereof, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" language or similar programming languages, and in particular, the Python language suitable for neural network computing and TensorFlow, pyTorch-based platform frameworks may be used. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The embodiment of the present application further provides an electronic device, and as shown in fig. 8, fig. 8 is a schematic structural diagram of an embodiment of the electronic device provided by the present application, where the electronic device 800 includes a processor 801, a memory 802, and a computer program stored in the memory 802 and capable of running on the processor 801, and when the processor 801 executes the program, the method for detecting a phishing mail attack as described above is implemented.
As a preferred embodiment, the electronic device 800 further includes a display 803 for displaying the data processing result after the processor 801 performs the phishing mail attack detection method.
By way of example, a computer program may be partitioned into one or more modules/units that are stored in memory 802 and executed by processor 801 to perform the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing particular functions to describe the execution of the computer program in the electronic device 800. For example, the computer program may be divided into a plurality of units, and the specific functions of each unit are described in the above-mentioned respective sub-steps, which are not described in detail herein.
The electronic device 800 may be a desktop computer, notebook, palm top computer, or smart phone device with an adjustable camera module.
The processor 801 may be an integrated circuit chip with signal processing capabilities. The processor 801 may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The Memory 802 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 802 is configured to store a program, and the processor 801 executes the program after receiving an execution instruction, where the method for defining a flow disclosed in any of the foregoing embodiments of the present application may be applied to the processor 801 or implemented by the processor 801.
The display 803 may be an LCD display screen or an LED display screen. Such as a display screen on a cell phone.
It is to be appreciated that the configuration shown in fig. 8 is merely a schematic diagram of one configuration of the electronic device 800, and that the electronic device 800 may include more or fewer components than shown in fig. 8. The components shown in fig. 8 may be implemented in hardware, software, or a combination thereof.
The computer readable storage medium and the electronic device according to the above embodiments of the present application may be implemented with reference to the content specifically described for implementing the phishing mail attack detection method according to the present application, and have similar advantageous effects as the phishing mail attack detection method according to the present application, and will not be described herein.
The application discloses a method, a device and a system for detecting phishing mail attack, which are characterized in that firstly, the flow of an E-mail is effectively acquired; then, metadata information of the email traffic is effectively extracted, corresponding threat weights are determined through matching of feature vector libraries and/or matching of configuration items, diversity of matching means is guaranteed, and therefore accuracy of matching results is guaranteed; and finally, based on the threat weight, identifying the phishing mails, updating the feature vector library according to the identified phishing mails, realizing closed-loop learning and feedback, and improving the identification accuracy and richness of the feature vector library.
According to the technical scheme, the multi-dimensional detection items are fully utilized, the detection items can be flexibly configured, and a two-dimensional matrix for configuring the detection items and detection levels in the detection items is formed, so that the problem that the detection of the phishing mails cannot be self-adaptively detected according to the configuration of the security manager of the company is solved, and the security manager of the company can well balance the detection rate and the processing efficiency; the application can upgrade the existing phishing mail detection engine on the premise of not changing the existing network architecture of the company, can solve the problems of low detection rate and insufficient algorithm configuration flexibility of the phishing mail of the company, can greatly improve the processing efficiency of a security manager of the company, reduces the operation cost of the company, and can carry out deep detection on the mail based on metadata information of the mail and mail attachments by establishing a mail feature vector library and carrying out flexible configuration of the detection level on the mail based on multidimensional detection items, thereby improving the detection rate and the processing performance of the phishing mail to the greatest extent.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.

Claims (8)

1. A phishing mail attack detection method, comprising:
acquiring the email traffic;
vector matching is carried out in a characteristic vector library and/or configuration item matching is carried out in a configuration monitoring item according to metadata information of the E-mail flow, and threat weight is determined;
judging whether the E-mail flow is phishing mail or not according to the threat weight, and updating the feature vector library;
vector matching is carried out in a feature vector library and/or configuration item matching is carried out in a configuration monitoring item according to the metadata information of the email flow, and threat weight is determined, which comprises the following steps:
vector matching is carried out in the characteristic vector library according to the metadata information of the E-mail flow, wherein the characteristic vector library comprises a credit characteristic library, a forwarding relation library and a flow characteristic library;
if the matching hits the feature vector library, determining a matching result;
if the feature vector library is not matched, matching configuration items in the configuration monitoring items, and determining threat weights;
the configuration monitoring items comprise mail detection items, sensitive word detection items and detection level detection items, if the feature vector library is hit by mismatching, configuration item matching is carried out in the configuration monitoring items, and threat weight is determined, wherein the method comprises the following steps:
if the feature vector library is not matched and hit, carrying out configuration detection on mail detection items, sensitive word detection items and detection levels according to the metadata information, and determining a detection result of the metadata information;
and determining the threat weight according to the detection result of the metadata information based on the mail detection item and the two-dimensional matrix of the detection level.
2. The phishing mail attack detection method of claim 1, wherein the metadata information includes mail header information, body information, linkage information, attachment file information, and sensitive word information, the determining the threat weight based on the mail detection item and the two-dimensional matrix of the detection level according to the detection result of the metadata information includes:
detecting hits in the configured sensitive word detection items according to the mail header information, the text information, the linkage information, the attachment file information and the sensitive word information;
if hit, determining corresponding weight according to the mail detection item and the two-dimensional matrix of the detection level;
and determining the threat weight according to the weight corresponding to the metadata information of each hit.
3. The phishing mail attack detection method of claim 2, wherein the determining the threat weight according to the weight corresponding to the metadata information for each hit includes:
determining a corresponding weight score according to the product of the weight corresponding to the metadata information of each hit and a preset threshold value;
and determining the threat weight according to the sum of the weight scores of the metadata information of each hit.
4. The phishing mail attack detection method of claim 3, wherein the determining whether the email traffic is phishing based on the threat weight comprises:
determining a preset threshold according to the detection level;
if the threat weight is greater than or equal to a preset threshold corresponding to the detection level, determining that the threat weight is a fishing mail, and triggering an alarm to report;
and if the threat weight is smaller than a preset threshold corresponding to the detection level, judging that the mail is not phishing.
5. The phishing mail attack detection method of claim 1, wherein the feature vector library comprises a reputation feature library, a forwarding relation library, a traffic feature library, and the updating the feature vector library comprises:
updating the characteristic vector library according to the vector characteristics of the E-mail flow of the phishing mail;
wherein the vector features include reputation features, and updating the reputation feature library according to the reputation features; the vector features comprise forwarding relation features, and the forwarding relation library is updated according to the forwarding relation features; the vector features comprise flow features, and the flow feature library is updated according to the flow features.
6. A phishing mail attack detection method according to any of claims 1 to 5, wherein the method further comprises: and judging whether the E-mail flow is phishing mail or not according to the vector matching result.
7. A phishing mail attack detection apparatus, comprising:
an acquiring unit for acquiring the email traffic;
the processing unit is used for carrying out vector matching in a feature vector library and/or carrying out configuration item matching in a configuration monitoring item according to the metadata information of the E-mail flow, and determining threat weight;
the judging unit is used for judging whether the E-mail flow is phishing mail or not according to the threat weight value, and updating the feature vector library;
vector matching is carried out in a feature vector library and/or configuration item matching is carried out in a configuration monitoring item according to the metadata information of the email flow, and threat weight is determined, which comprises the following steps:
vector matching is carried out in the characteristic vector library according to the metadata information of the E-mail flow, wherein the characteristic vector library comprises a credit characteristic library, a forwarding relation library and a flow characteristic library;
if the matching hits the feature vector library, determining a matching result;
if the feature vector library is not matched, matching configuration items in the configuration monitoring items, and determining threat weights;
the configuration monitoring items comprise mail detection items, sensitive word detection items and detection level detection items, if the feature vector library is hit by mismatching, configuration item matching is carried out in the configuration monitoring items, and threat weight is determined, wherein the method comprises the following steps:
if the feature vector library is not matched and hit, carrying out configuration detection on mail detection items, sensitive word detection items and detection levels according to the metadata information, and determining a detection result of the metadata information;
and determining the threat weight according to the detection result of the metadata information based on the mail detection item and the two-dimensional matrix of the detection level.
8. A phishing mail attack detection system, characterized by comprising a processor for executing the phishing mail attack detection method according to any of claims 1 to 6.
CN202210353048.5A 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system Active CN114760119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210353048.5A CN114760119B (en) 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210353048.5A CN114760119B (en) 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system

Publications (2)

Publication Number Publication Date
CN114760119A CN114760119A (en) 2022-07-15
CN114760119B true CN114760119B (en) 2023-12-12

Family

ID=82329841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210353048.5A Active CN114760119B (en) 2022-04-02 2022-04-02 Phishing mail attack detection method, device and system

Country Status (1)

Country Link
CN (1) CN114760119B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201190B (en) * 2023-11-03 2024-02-02 北京微步在线科技有限公司 Mail attack detection method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072137A (en) * 2015-09-15 2015-11-18 蔡丝英 Spear phishing mail detection method and device
CN108200105A (en) * 2018-03-30 2018-06-22 杭州迪普科技股份有限公司 A kind of method and device for detecting fishing mail
CN108418777A (en) * 2017-02-09 2018-08-17 中国移动通信有限公司研究院 A kind of fishing mail detection method, apparatus and system
US10158677B1 (en) * 2017-10-02 2018-12-18 Servicenow, Inc. Automated mitigation of electronic message based security threats
CN113489734A (en) * 2021-07-13 2021-10-08 杭州安恒信息技术股份有限公司 Phishing mail detection method and device and electronic device
WO2021242687A1 (en) * 2020-05-28 2021-12-02 GreatHorn, Inc. Computer-implemented methods and systems for pre-analysis of emails for threat detection

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10404745B2 (en) * 2013-08-30 2019-09-03 Rakesh Verma Automatic phishing email detection based on natural language processing techniques
US20150067833A1 (en) * 2013-08-30 2015-03-05 Narasimha Shashidhar Automatic phishing email detection based on natural language processing techniques

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072137A (en) * 2015-09-15 2015-11-18 蔡丝英 Spear phishing mail detection method and device
CN108418777A (en) * 2017-02-09 2018-08-17 中国移动通信有限公司研究院 A kind of fishing mail detection method, apparatus and system
US10158677B1 (en) * 2017-10-02 2018-12-18 Servicenow, Inc. Automated mitigation of electronic message based security threats
CN108200105A (en) * 2018-03-30 2018-06-22 杭州迪普科技股份有限公司 A kind of method and device for detecting fishing mail
WO2021242687A1 (en) * 2020-05-28 2021-12-02 GreatHorn, Inc. Computer-implemented methods and systems for pre-analysis of emails for threat detection
CN113489734A (en) * 2021-07-13 2021-10-08 杭州安恒信息技术股份有限公司 Phishing mail detection method and device and electronic device

Also Published As

Publication number Publication date
CN114760119A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
US10819744B1 (en) Collaborative phishing attack detection
US9906554B2 (en) Suspicious message processing and incident response
US9253207B2 (en) Collaborative phishing attack detection
US10904175B1 (en) Verifying users of an electronic messaging system
US8769695B2 (en) Phish probability scoring model
CN104982011B (en) Use the document classification of multiple dimensioned text fingerprints
US8301719B2 (en) Employing pixel density to detect a spam image
US20140230060A1 (en) Collaborative phishing attack detection
WO2020180538A1 (en) Method and system for analyzing electronic communications and customer information to recognize and mitigate message-based attacks
US10158639B1 (en) Data scrubbing via template generation and matching
US20060149820A1 (en) Detecting spam e-mail using similarity calculations
JP7466711B2 (en) System and method for using relationship structures for email classification - Patents.com
CN114760119B (en) Phishing mail attack detection method, device and system
US20230104884A1 (en) Method for detecting webpage spoofing attacks
US10721242B1 (en) Verifying a correlation between a name and a contact point in a messaging system
CN112039874B (en) Malicious mail identification method and device
CN111083705A (en) Group-sending fraud short message detection method, device, server and storage medium
CN113746814B (en) Mail processing method, mail processing device, electronic equipment and storage medium
US11257090B2 (en) Message processing platform for automated phish detection
US10826923B2 (en) Network security tool
CN110288272B (en) Data processing method, device, electronic equipment and storage medium
CN114268480B (en) Picture transmission monitoring method and device, storage medium and terminal
Haque Machine Learning Based Prediction versus Human-as-a-Security-Sensor
Chodisetti et al. Synthesis rule-based classification approach for malicious websites identification
Xiaopeng et al. A multi-dimensional spam filtering framework based on threat intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant