CN112822168A - Abnormal mail detection method and device - Google Patents

Abnormal mail detection method and device Download PDF

Info

Publication number
CN112822168A
CN112822168A CN202011614644.1A CN202011614644A CN112822168A CN 112822168 A CN112822168 A CN 112822168A CN 202011614644 A CN202011614644 A CN 202011614644A CN 112822168 A CN112822168 A CN 112822168A
Authority
CN
China
Prior art keywords
mail
mailbox
sending
common
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011614644.1A
Other languages
Chinese (zh)
Other versions
CN112822168B (en
Inventor
郝传洲
黄�俊
潘登
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Original Assignee
Nsfocus Technologies Inc
Nsfocus Technologies Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nsfocus Technologies Inc, Nsfocus Technologies Group Co Ltd filed Critical Nsfocus Technologies Inc
Priority to CN202011614644.1A priority Critical patent/CN112822168B/en
Publication of CN112822168A publication Critical patent/CN112822168A/en
Application granted granted Critical
Publication of CN112822168B publication Critical patent/CN112822168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The application relates to the technical field of network security, in particular to an abnormal mail detection method and device, which comprises the steps of obtaining behavior information of a mail to be detected, sending a sending mailbox for sending the mail to be detected and receiving an addressee mailbox for receiving the mail to be detected, determining a common contact person set of the addressee mailbox, judging whether the sending mailbox is contained in the common contact person set, if the sending mailbox is determined to be contained in the common contact person set, determining the mail type of the mail to be detected by matching the behavior information with pre-stored standard behavior information, otherwise, determining whether the mail to be detected is an abnormal mail by determining mailbox similarity between the sending mailbox and each common mailbox contained in the common contact person set, and determining whether the mail to be detected is the abnormal mail according to the mail type, and realizing the detection of the abnormal mails.

Description

Abnormal mail detection method and device
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting an abnormal email.
Background
With the development of network technology, more and more network attackers appear, and the network attackers may send abnormal mails to users, carry malicious links in the abnormal mails, and induce the users to click or log in account passwords and the like. Once the user clicks the link or inputs the account password, the related information can be stolen, and even a hacker can borrow the malicious program such as trojan horse and the like to install the malicious program, so that the target computer is continuously damaged. Therefore, how to detect the abnormal mails becomes a problem to be solved urgently.
In the prior art, when an abnormal email is detected, a classifier model may be constructed based on a machine learning algorithm, and a text content of the email to be detected or a carried Uniform Resource Locator (URL) link is identified, so as to determine whether the email to be detected is the abnormal email. However, the method in the prior art needs to detect the content of the mail to be detected, and has the problem of revealing the privacy of the user.
Disclosure of Invention
The embodiment of the application provides an abnormal mail detection method and device, which can realize detection of abnormal mails on the premise of ensuring that privacy of users is not revealed.
The embodiment of the application provides the following specific technical scheme:
the method comprises the steps of obtaining behavior information of a mail to be detected, a sending mailbox for sending the mail to be detected and an receiving mailbox for receiving the mail to be detected, wherein the behavior information represents information of sending behavior of the sending mailbox when the mail to be detected is sent;
determining a common contact person set of the receiving mailbox, and judging whether the sending mailbox is contained in the common contact person set, wherein the common contact person set represents a set of common mailboxes of which the quantity of the incoming and outgoing mails between the sending mailbox and the receiving mailbox exceeds a preset quantity threshold value within a preset time range;
if the mail sending mailbox is determined to be contained in the common contact person set, the mail type of the mail to be detected is determined by matching the behavior information with pre-stored standard behavior information, otherwise, the mail type of the mail to be detected is determined by determining mailbox similarity between the mail sending mailbox and each common mailbox contained in the common contact person set, wherein the standard behavior information represents information of sending behavior of the mail sending mailbox when a non-abnormal mail is sent;
and determining whether the mail to be detected is an abnormal mail or not according to the mail type.
Optionally, the obtaining manner of the common contact set is as follows:
acquiring all received mails received by the receiving mailbox in a preset sampling range, a source sending mailbox for sending all received mails, and all sent mails sent to the receiving mailbox by all the source sending mailboxes;
respectively aiming at each source sending mailbox, determining the average quantity of the forward and backward mails between any source sending mailbox and the receiving mailbox according to each received mail and each sent mail;
respectively aiming at each source sending part mailbox, if the average quantity of the forward and backward mails of any source sending part mailbox is larger than a preset average quantity threshold value, determining that the source sending part mailbox is a common mailbox;
and generating a common contact person set containing all common mailboxes.
Optionally, determining an average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the received mails and the sent mails, specifically including:
counting the daily receiving number of the mails sent by the receiving mailbox and the daily sending number of the mails sent to the receiving mailbox every day by the source sending mailbox;
determining the average receiving number of each day according to the ratio of the sum of the receiving numbers of each day to the preset number of days, and determining the average sending number of each day according to the ratio of the sum of the sending numbers of each day to the preset number of days;
and determining the average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the average daily receiving quantity, the average daily sending quantity and the standard deviation of the average daily receiving quantity.
Optionally, determining the mail type of the mail to be detected by determining mailbox similarity between the sender mailbox and each common mailbox included in the common contact set specifically includes:
respectively aiming at each common mailbox contained in the common contact person set, aligning user name information of any common mailbox with user name information of the sending mailbox, aligning domain name information of the common mailbox with domain name information of the sending mailbox information, converting each character of the common mailbox into a characteristic value, and determining a characteristic vector value of the common mailbox according to each characteristic value;
respectively calculating cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of any one common mailbox aiming at each common mailbox, and if the cosine similarity is larger than a preset similarity threshold, determining that the mail type of the mail to be detected is a similar mail.
Optionally, if the behavior information is sending time and an IP address, determining the mail type of the mail to be detected by matching the behavior information with pre-stored standard behavior information, specifically including:
and if the mail sending time is determined not to be within the preset standard mail sending time and/or the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is a behavior abnormal mail, wherein the standard mail sending time represents the mail sending time when the mail sending mailbox sends a non-abnormal mail, and the standard IP address represents the IP address when the mail sending mailbox sends the non-abnormal mail.
Optionally, the obtaining manner of the standard sending time is as follows:
respectively counting the number of the sent mails of any one common mailbox in each preset time period for each common mailbox in the common contact person set;
and respectively determining the standard score of each time period according to the number of the mails of any common mailbox in each time period every day and the corresponding standard deviation for each common mailbox, and counting the time period exceeding a preset standard score threshold as the standard mail sending time, wherein the standard score represents whether the standard score is the score corresponding to the standard mail sending time.
Optionally, determining whether the mail to be detected is an abnormal mail according to the mail type specifically includes:
and if the mail type is similar mail or abnormal behavior mail, determining that the mail to be detected is abnormal mail.
An abnormal mail detecting apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a sending module, wherein the first acquisition module is used for acquiring behavior information of a mail to be detected, a sending mailbox for sending the mail to be detected and a receiving mailbox for receiving the mail to be detected, and the behavior information represents information of sending behavior of the sending mailbox when the mail to be detected is sent;
the judging module is used for determining a common contact person set of the receiving mailbox and judging whether the sending mailbox is contained in the common contact person set, wherein the common contact person set represents a set of common mailboxes of which the quantity of the incoming and outgoing mails between the sending mailbox and the receiving mailbox exceeds a preset quantity threshold value within a preset time range;
the detection module is used for determining the mail type of the mail to be detected by matching the behavior information with pre-stored standard behavior information if the mail to be sent is determined to be contained in the common contact person set, otherwise determining the mail type of the mail to be detected by determining the mail box similarity between the mail to be sent and each common mail box contained in the common contact person set, wherein the standard behavior information represents the information of the sending behavior of the mail to be sent when the non-abnormal mail is sent;
and the first determining module is used for determining whether the mail to be detected is an abnormal mail or not according to the mail type.
Optionally, when obtaining the set of common contacts, the method further includes:
the second acquisition module is used for acquiring all received mails received by the receiving mailbox in a preset sampling range, a source sending mailbox for sending all received mails and all sent mails sent by all source sending mailboxes to the receiving mailbox;
a second determining module, configured to determine, for each source sending mailbox, an average quantity of forward and backward mails between any source sending mailbox and the receiving mailbox according to each received mail and each sent mail;
a third determining module, configured to determine, for each source sender mailbox, that a source sender mailbox is a common mailbox if it is determined that the average number of forward and backward mails in any source sender mailbox is greater than a preset average number threshold;
and the generating module is used for generating a common contact person set containing all common mailboxes.
Optionally, when determining the average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the received mails and the sent mails, the second determining module is specifically configured to:
counting the daily receiving number of the mails sent by the receiving mailbox and the daily sending number of the mails sent to the receiving mailbox every day by the source sending mailbox;
determining the average receiving number of each day according to the ratio of the sum of the receiving numbers of each day to the preset number of days, and determining the average sending number of each day according to the ratio of the sum of the sending numbers of each day to the preset number of days;
and determining the average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the average daily receiving quantity, the average daily sending quantity and the standard deviation of the average daily receiving quantity.
Optionally, when determining the mail type of the mail to be detected by determining mailbox similarity between the sender mailbox and each common mailbox included in the common contact set, the detection module is specifically configured to:
respectively aiming at each common mailbox contained in the common contact person set, aligning user name information of any common mailbox with user name information of the sending mailbox, aligning domain name information of the common mailbox with domain name information of the sending mailbox information, converting each character of the common mailbox into a characteristic value, and determining a characteristic vector value of the common mailbox according to each characteristic value;
respectively calculating cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of any one common mailbox aiming at each common mailbox, and if the cosine similarity is larger than a preset similarity threshold, determining that the mail type of the mail to be detected is a similar mail.
Optionally, if the behavior information is a sending time and an IP address, when the behavior information is matched with pre-stored standard behavior information, the detection module is specifically configured to:
and if the mail sending time is determined not to be within the preset standard mail sending time and/or the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is a behavior abnormal mail, wherein the standard mail sending time represents the mail sending time when the mail sending mailbox sends a non-abnormal mail, and the standard IP address represents the IP address when the mail sending mailbox sends the non-abnormal mail.
Optionally, when obtaining the standard sending time, the method further includes:
the counting module is used for counting the number of the sent mails of any one common mailbox in each preset time period for each common mailbox in the common contact person set;
and the processing module is used for determining the standard score of each time period according to the number of the sent pieces of any one common mailbox in each time period every day and the corresponding standard deviation, and counting the time period exceeding a preset standard score threshold as the standard sending time, wherein the standard score represents whether the time period is the score corresponding to the standard sending time.
Optionally, the third determining module is specifically configured to:
and if the mail type is similar mail or abnormal behavior mail, determining that the mail to be detected is abnormal mail.
An electronic device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the abnormal mail detection method.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned abnormal mail detecting method.
In the embodiment of the application, behavior information of a mail to be detected is acquired, a sending mailbox for sending the mail to be detected and an receiving mailbox for receiving the mail to be detected are sent, a common contact set of the receiving mailbox is determined, whether the sending mailbox is contained in the common contact set is judged, if the sending mailbox is determined to be contained in the common contact set, the mail type of the mail to be detected is determined by matching the behavior information with pre-stored standard behavior information, otherwise, the mail type of the mail to be detected is determined by determining mailbox similarity between the sending mailbox and each common mailbox contained in the common contact set, and whether the mail to be detected is an abnormal mail is determined according to the mail type. Therefore, the mail type of the mail to be detected can be determined only based on the behavior information of the mail to be detected with a small characteristic dimension number or the sending mailbox of the mail to be detected, so that the detection of the abnormal mail is realized, and the detection of the abnormal mail can be realized on the premise of avoiding the privacy disclosure of a user. And when the mail sending mailbox is not contained in the common contact person set, the mail to be detected is subjected to abnormal detection through mailbox similarity, the mail which is disguised as a common contact person can be identified, and more unknown abnormal mail behaviors can be found.
Drawings
FIG. 1 is a flowchart of an abnormal mail detection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of mailbox alignment in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an abnormal mail detection system according to an embodiment of the present application;
FIG. 4 is another flowchart of a method for detecting an abnormal mail in an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an abnormal mail detection apparatus in the embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
With the development of network technology, more and more network attackers appear, and the network attackers may send abnormal mails to users, carry malicious links in the abnormal mails, and induce the users to click or log in account passwords and the like. Once the user clicks the link or inputs the account password, the related information can be stolen, and even a hacker can borrow the malicious program such as trojan horse and the like to install the malicious program, so that the target computer is continuously damaged. Therefore, how to detect the abnormal mails becomes a problem to be solved urgently.
In the prior art, when detecting an abnormal email, the detection can be realized by the following two ways:
the first mode is as follows: a classifier model can be established based on a machine learning algorithm, and the text content of the mail to be detected or the carried Uniform Resource Locator (URL) link is identified, so that whether the mail to be detected is an abnormal mail or not is judged.
The second mode is as follows: whether the attachment carried by the mail to be detected has malicious behavior can be judged based on the sandbox technology, so that whether the mail to be detected is an abnormal mail is determined.
When the abnormal mails are detected in the first mode, some novel fishing means, such as 'code scanning attention', 'group learning' and the like, cannot be detected. When the abnormal mails are detected in the second mode, only known malicious behaviors can be detected, and some brand-new malicious codes cannot be detected. In addition, the two methods both need to detect the content of the mail to be detected, and the problem of revealing the privacy of the user exists.
In the embodiment of the application, behavior information of a mail to be detected is acquired, a sending mailbox for sending the mail to be detected and an receiving mailbox for receiving the mail to be detected are sent, a common contact set of the receiving mailbox is determined, whether the sending mailbox is contained in the common contact set is judged, if the sending mailbox is determined to be contained in the common contact set, the mail type of the mail to be detected is determined by matching the behavior information with pre-stored standard behavior information, otherwise, the mail type of the mail to be detected is determined by determining mailbox similarity between the sending mailbox and each common mailbox contained in the common contact set, and whether the mail to be detected is an abnormal mail is determined according to the mail type. Therefore, from the aspect of behavior analysis, the detection of the abnormal mails can be realized only by using the behavior information and the sending mailbox of the mail to be detected with smaller characteristic dimensionality, the detection of the attachment or the text content carried by the mail to be detected is not needed, and the detection of the mail in any form can be realized while the privacy of a user is prevented from being leaked.
Based on the foregoing embodiment, referring to fig. 1, a flowchart of an abnormal mail detection method in the embodiment of the present application is specifically included:
step 100: the method comprises the steps of obtaining behavior information of the mail to be detected, sending a mail sending mailbox for sending the mail to be detected and receiving an email receiving mailbox for receiving the mail to be detected.
The behavior information represents the information of the sending behavior of the sending mailbox when the mail to be detected is sent.
In the embodiment of the application, after receiving the mail to be detected, a user acquires behavior information of the mail to be detected, sends a sending mailbox of the mail to be detected and receives a receiving mailbox of the mail to be detected.
The behavior information represents information of a sending behavior of the sending mailbox when the to-be-detected mail is sent, for example, sending time, an IP address, and the like of sending the to-be-detected mail, which is not limited in the embodiment of the present application.
It should be noted that the sending mailbox represents a mailbox for sending a to-be-detected mail, and the receiving mailbox represents a mailbox for receiving the to-be-detected mail.
Step 110: and determining a common contact person set of the receiving mailbox, and judging whether the sending mailbox is contained in the common contact person set.
The common contact person set represents a set of common mailboxes, wherein the number of the incoming and outgoing mails between the common contact person set and the receiving mailbox exceeds a preset number threshold value within a preset time range.
In the embodiment of the application, the common contact person set associated with the inbox is determined according to the association relationship between the inbox and the common contact person set, and then whether the sending mailbox of the mail to be detected is contained in the determined common contact person set is judged.
The common contact person set comprises common mailboxes corresponding to the common contact persons of the user.
It should be noted that the common contact set represents a set of mailboxes containing various common contacts of the user, and the common mailboxes represent mailboxes of the common contacts of the user.
In the embodiment of the present application, common mailboxes are measured from three dimensions, i.e., the number of received and sent mails and the stability of the number of received mails, and the following details are set forth for the obtaining manner of the common contact set in the embodiment of the present application, which specifically include:
s1: and acquiring each received mail received by the receiving mailbox in a preset sampling range, a source sending mailbox for sending each received mail and each sent mail sent by each source sending mailbox to the receiving mailbox.
In the embodiment of the application, each received mail and each source sending mailbox for sending each received mail received by a user receiving mailbox within a preset sampling range are obtained, and each sent mail sent by each source sending mailbox to the receiving mailbox is obtained.
For example, assume that the user's inbox is MrThen M is acquired within 10 days before the mail to be detected is receivedrAll received mails and source sending mailboxes for sending all received mails, and acquiring all sent mails sent to receiving mailboxes by all source sending mailboxes.
It should be noted that the preset sampling range may be preset by an operator, for example, may be 10 days, and this is not limited in the embodiment of the present application.
The preset sampling range represents a time range for acquiring the historical data, the received mails of the inbox of the mail representation user are received, the source sending mailbox represents a mailbox for sending the mails to the inbox of the user, and the sent mails represent the mails sent by the source sending mailbox to the inbox.
S2: and respectively determining the average quantity of the forward and backward mails between any source sending mailbox and any receiving mailbox according to each received mail and each sent mail.
In this embodiment of the application, when the step S2 is executed, the method specifically includes:
a1: and counting the daily receiving number of the mails sent by the receiving mailbox and the daily sending number of the mails sent to the receiving mailbox every day by the source sending mailbox.
In the embodiment of the application, any source sending mailbox is counted, the number of the sent mails of the source sending mailbox received by the receiving mailbox in each day in a preset time range is counted and analyzed, the number of the sent mails of the source sending mailbox received by the receiving mailbox in each day is obtained, the number of the sent mails of the source sending mailbox sent to the receiving mailbox in each day in the same time range is counted and analyzed, and the number of the sent mails of the source sending mailbox sent to the receiving mailbox in each day is obtained in a counting mode.
For example, assume the origin mailbox is MSiThe mail receiving box is MrThen, count M each daySiIs sent to MrThe number of mails sent per day, and count M per daySiReceiving MrThe daily number of addressees of the sent mail.
It should be noted that the daily receiving number represents the number of mails sent by the receiving mailbox received by the source sending mailbox on the same day, and the daily sending number represents the number of mails sent to the receiving mailbox by the source sending mailbox on the same day.
A2: and determining the average receiving number of the receivers per day according to the ratio of the sum of the receiving numbers per day to the preset number of days, and determining the average sending number of the receivers per day according to the ratio of the sum of the sending numbers per day to the preset number of days.
In the embodiment of the application, the daily receiving number of each day in a preset time range is added to obtain the sum of the daily receiving number, the ratio of the sum of the daily receiving number to the preset number of days is calculated, the determined ratio is used as the daily average receiving number, the daily sending number of each day in the same time range is added to obtain the sum of the daily sending number, the ratio of the sum of the daily sending number to the preset number of days is calculated, and the determined ratio is used as the daily average sending number.
It should be noted that the average number of received mails per day represents the average number of mails sent by the source sending mailbox in the preset time range, and the average number of sent mails per day represents the average number of mails sent by the source sending mailbox to the receiving mailbox in the preset time range.
A3: and determining the average number of the mails to and fro between any source sending mailbox and any receiving mailbox according to the average receiving number per day, the average sending number per day and the standard deviation of the average receiving number per day.
In the embodiment of the application, the average receiving number per day and the average sending number per day are multiplied, and the average number of the incoming and outgoing mails between any source sending mailbox and any receiving mailbox is determined according to the ratio of the determined product to the standard deviation of the average receiving number per day.
For example, assume the origin mailbox is
Figure BDA0002876170280000111
The mail receiving box is MrThe preset number of days is 10 days, and then the preset number of days is 10 days
Figure BDA0002876170280000112
Is sent to MrThe average daily number of received mails is
Figure BDA00028761702800001113
Figure BDA0002876170280000113
The standard deviation of the corresponding quantity is
Figure BDA0002876170280000114
In 10 days
Figure BDA0002876170280000115
Receiving MrThe average daily delivery number of the sent mails is
Figure BDA00028761702800001114
Figure BDA0002876170280000116
Thus, the average number of incoming and outgoing mail may be expressed as:
Figure BDA0002876170280000117
s3: and respectively aiming at each source sending part mailbox, if the average quantity of the forward and backward mails of any source sending part mailbox is larger than a preset average quantity threshold value, determining that the source sending part mailbox is a common mailbox.
In the embodiment of the application, whether the average number of the incoming mails and the outgoing mails of any source sending mailbox is greater than a preset average number threshold is judged for each source sending mailbox, so that whether the source sending mailbox is a common mailbox can be determined, and the method specifically comprises the following two conditions:
in the first case: the average quantity of the incoming and outgoing mails of the source mail mailbox is larger than a preset average quantity threshold value.
The method specifically comprises the following steps: and if the average quantity of the incoming and outgoing mails of any source mail box is larger than the preset average quantity threshold value, determining that the source mail box is a common mail box.
For example, assume the origin mailbox is
Figure BDA0002876170280000118
Corresponding to an average number of incoming and outgoing mail of
Figure BDA0002876170280000119
If it is determined
Figure BDA00028761702800001110
Greater than a preset average number threshold, then
Figure BDA00028761702800001111
The contact persons are recorded as the common contact persons,
Figure BDA00028761702800001112
is a common mailbox.
Wherein, the average quantity threshold value can be set by self according to the prior knowledge or the related method.
In the second case: the average quantity of the incoming and outgoing mails of the source sender mailbox is less than or equal to a preset average quantity threshold value.
The method specifically comprises the following steps: and if the average quantity of the incoming and outgoing mails of the source sending mail box is smaller than or equal to the preset average quantity threshold value, determining that the source sending mail box is an abnormal mail box.
S4: and generating a common contact person set containing all common mailboxes.
In the embodiment of the application, a common contact person set containing each common mailbox is generated according to each common mailbox.
Furthermore, for system-level source mail boxes, such as a check mail box, a human-designated mail box, a payroll mail box, and the like, no matter whether the average quantity of the incoming and outgoing mails exceeds an average quantity threshold value, the system-level source mail boxes are directly marked as common mail boxes, and the type of source mail boxes can be set by self.
Step 120: and if the mail sending mailbox is determined to be contained in the common contact person set, determining the mail type of the mail to be detected by matching the behavior information with the pre-stored standard behavior information, otherwise, determining the mail type of the mail to be detected by determining the mailbox similarity between the mail sending mailbox and each common mailbox contained in the common contact person set.
The standard behavior information represents the information of the sending behavior of the sending mailbox when the non-abnormal mail is sent.
In the embodiment of the application, when determining the mail type of the mail to be detected, it may be determined whether the mail sending mailbox is included in the common contact set, so that the mail type of the mail to be detected is determined in two different ways, which may be specifically divided into the following two ways:
the first mode is as follows: the sender mailbox is contained in a common mailbox set.
The method specifically comprises the following steps: and determining the mail type of the mail to be detected by matching the behavior information with the pre-stored standard behavior information.
In the embodiment of the application, the key point is that whether the mail sending behavior of the common mailbox is normal or not is judged, the behavior information of the mail to be detected is matched with the pre-stored standard behavior information, if the matching of the standard behavior information is successful, the mail type of the mail to be detected is determined to be a normal mail, if the matching of the standard behavior information is failed, the mail type of the mail to be detected is determined to be an abnormal behavior mail, and the mail sending mailbox has risks.
In the embodiment of the present application, the behavior information may be, for example, delivery time and an IP address, and the determining is performed from two dimensions, that is, delivery time of the mail and the IP address of the delivery mailbox, specifically includes:
and if the sending time is determined not to be within the preset standard sending time and/or the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is the behavior abnormal mail.
The standard sending time represents the sending time when the sending mailbox sends the non-abnormal mails, and the standard IP address represents the IP address when the sending mailbox sends the non-abnormal mails.
In the embodiment of the application, the key point is to judge whether the mail sending behavior of the sending mailbox is normal or not according to the condition that the sending mailbox is a common mailbox, and the judgment can be carried out according to the sending time of the mail and two dimensions of the IP address of a sender, and the method can be specifically divided into the following four different conditions:
in the first case: and if the sending time is within the preset standard sending time and the IP address is the preset standard IP address, determining that the mail type of the mail to be detected is a normal mail.
For example, when the user a receives the mail of the common contact B, the mailbox a of the user a is an addressee mailbox, the mailbox B of the user B is an outgoing mailbox, and the standard behavior baseline of the outgoing mailbox B is as follows: the method comprises the steps that the common sending time is from 8 am to 5 pm, the commonly registered IP address is 1.1.1.1, and if the mail behavior of a sender mailbox B is that the mail is sent by logging in the mailbox at 9 am by 1.1.1.1, the mail type of the mail to be detected is determined to be a normal mail.
In the second case: and if the sending time is within the preset standard sending time and the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is the mail with the abnormal address.
For example, when the user a receives the mail of the common contact B, the mailbox a of the user a is an addressee mailbox, the mailbox B of the user B is an outgoing mailbox, and the standard behavior baseline of the outgoing mailbox B is as follows: the method comprises the steps that the common sending time is from 8 am to 5 pm, the commonly registered IP address is 1.1.1.1, and if the mail behavior of a sender mailbox B is that the mail is logged in 1.1.1.2 and sent at 9 am, the mail type of the mail to be detected is determined to be an address abnormal mail.
Further, in this case, the abnormal points of the mail to be detected are: only the IP address of the frequently-used contact is abnormal, the system can prompt the user: the system judges that the mail IP is not consistent with the historical login IP address, the possibility of account theft exists, the mail content is not required to be lightened, and the contained link or the attachment is carefully clicked.
In the third case: and if the sending time is not within the preset standard sending time, but the IP address is the preset standard IP address, determining that the mail type of the mail to be detected is the time abnormal mail.
For example, when the user a receives the mail of the common contact B, the mailbox a of the user a is an addressee mailbox, the mailbox B of the user B is an outgoing mailbox, and the standard behavior baseline of the outgoing mailbox B is as follows: the method comprises the steps that the common sending time is from 8 am to 5 pm, the commonly registered IP address is 1.1.1.1, and if the mail behavior of a sender mailbox B is that the mail is logged in 1.1.1.1 to send at 7 am, the mail type of the mail to be detected is determined to be the time abnormal mail.
Further, in this case, the abnormal points of the mail to be detected are: only when the sending time of the mail of the frequently-used contact person is abnormal, the system prompts a user: the system judges that the mail sending time is not consistent with the historical sending time, the possibility of account theft exists, the mail content is not required to be lightened, and the contained link or the attachment is carefully clicked.
In a fourth case: and if the sending time is not within the preset standard sending time and the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is an abnormal mail.
For example, when the user a receives the mail of the common contact B, the mailbox a of the user a is an addressee mailbox, the mailbox B of the user B is an outgoing mailbox, and the standard behavior baseline of the outgoing mailbox B is as follows: the method comprises the steps that the common sending time is from 8 am to 5 pm, the commonly registered IP address is 1.1.1.1, and if the mail behavior of a sender mailbox B is that the mail is registered to 6 am of the mailbox to send by 1.1.1.2, the mail type of the mail to be detected is determined to be an abnormal mail.
Further, in this case, the abnormal points of the mail to be detected are: the time and the IP address of the common contact are abnormal, and the system can prompt the user: according to the judgment of the system, the sending time and the IP of the mail are not consistent with the historical behaviors, the possibility of account theft exists, the mail content is not required to be lightened, and the contained link or the attachment is carefully clicked.
The following describes in detail a manner of obtaining the standard sending time preset in the embodiment of the present application, and specifically includes:
s1: and respectively counting the number of the sent mails of any one common mailbox in each preset time period every day aiming at each common mailbox in the common contact person set.
In the embodiment of the application, the sending quantity of the mails sent to the receiving mailbox by any common mailbox in each preset time period is counted respectively for each common mailbox in the common contact set.
S2: and respectively determining the standard score of each time period according to the number of the dispatches of any one common mailbox in each time period every day and the corresponding standard deviation for each common mailbox, and counting the time period exceeding a preset standard score threshold value as the standard dispatch time.
And the standard score represents whether the score corresponds to the standard sending time.
In the embodiment of the application, firstly, a day is divided into a plurality of preset time periods, the number of sent mails sent to an addressee mailbox by any one common mailbox in each time period of each day is counted respectively for each common mailbox, a group of data for each time period of each day is obtained, a corresponding standard deviation is determined according to the number of sent mails in each time period of each day, then, the sum of the number of sent mails in each time period of each day is calculated respectively for each time period, the calculated sum of the number of sent mails in each day in the time period is divided by the preset number of days to obtain the mean value of the number of sent mails in the time period, and the standard score of the time period is determined according to the ratio of the mean value to the corresponding standard deviation.
The time period is preset, and the time of day can be divided into 24 time periods.
Step S2 is described in detail below using a specific example.
First, the time of day is divided into 24 time periods, denoted as "0-1", "0-2". cndot. "23-24".
Then, after the time periods are divided, historical mail data are obtained, the sending quantity of the mails sent to the receiving mailbox by each common mailbox in each time period in each day is counted respectively, and a group of data aiming at each time period in each day can be obtained.
Then, respectively aiming at the common mailboxes, calculating the standard score of each time period
Figure BDA0002876170280000151
This can be expressed, for example, as:
Figure BDA0002876170280000152
wherein in the formula
Figure BDA0002876170280000153
Which represents the average of the number of times the mail transmission time falls within the time period t _ j in the past preset time,
Figure BDA0002876170280000154
the corresponding standard deviation is indicated.
It should be noted that the larger the standard score is, the more likely the time period is to become the standard delivery time.
And finally, recording the time period when the standard score exceeds the standard score threshold as the standard sending time.
Wherein, the standard score threshold value can be set according to prior knowledge or related methods.
The second mode is as follows: the sender mailbox is not included in the set of common contacts.
The method specifically comprises the following steps: determining the mail type of the mail to be detected by determining the mailbox similarity between the sender mailbox and each common mailbox contained in the common contact set.
In the embodiment of the application, the mail box may be disguised as a common mail box, which causes an illusion that the mail box is a common mail box to the user, and the user may be a mail sent by a common contact person when seeing the mail, so that the mailbox disguised as a common contact person can be further distinguished, and when calculating the mailbox similarity between the common mail box and the mail box, the method specifically includes:
s1: respectively aiming at each common mailbox contained in the common contact person set, aligning user name information of any common mailbox with user name information of a sending mailbox, aligning domain name information of the common mailbox with domain name information of the sending mailbox information, converting each character of the common mailbox into a characteristic value, and determining a characteristic vector value of the common mailbox according to each characteristic value.
In the embodiment of the application, the following operation steps are executed respectively for each common mailbox information:
firstly, the characters which are easy to be confused and contained in any one common mailbox are converted into standard characters, and the characters which are easy to be confused and contained in the sending mailbox are converted into the standard characters.
The characters which are easy to be confused are preset, such as letters o and 0, letters i, l and numbers 1, which are respectively converted into English letters o and l, namely, the numbers 0 are converted into the letters o, and the numbers 1 and i are converted into the letters l.
And then, aligning the user name information of the common mailbox with the user name information of the delivery mailbox based on the longest common character string, and aligning the domain name information of the common mailbox with the domain name information of the delivery mailbox based on the longest common character string.
It should be noted that if the length of the longest common character string is lower than the respective set threshold, the left alignment is defaulted, the position is set to be 0, and the alignment of the "@" symbol is always maintained.
Moreover, each character of the common mailbox needs to be converted into a characteristic value at the same time.
For example, each character is converted into a corresponding ASCII code value, and a numerical value vector is formed to be used for mailbox similarity calculation.
And finally, determining the characteristic vector value of the common mailbox according to each characteristic value.
S2: respectively aiming at each common mailbox, calculating cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of any common mailbox, and if the cosine similarity is determined to be greater than a preset similarity threshold, determining that the mail type of the mail to be detected is a similar mail.
In the embodiment of the application, the following operation steps are executed respectively for each common mailbox information:
firstly, calculating cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of any one common mailbox.
And then, judging whether the cosine similarity between the common mailbox and the sender mailbox is greater than a preset similarity threshold value.
And finally, if the cosine similarity is determined to be larger than a preset similarity threshold, determining that the mail type of the mail to be detected is a similar mail, and if the cosine similarity is determined to be smaller than or equal to the preset similarity threshold, determining that the mail type of the mail to be detected is an emergency contact mail.
A specific example is used below to describe the second detection manner in this embodiment in detail, assuming that a certain common mailbox is "zhangsan @ a.com", and an issue mailbox is "zhangsan @ a.com", specifically including:
firstly, traversing each character contained in the common mailbox and the mail sending mailbox, judging whether the common mailbox and the mail sending mailbox contain characters which are easy to be confused, and if the common mailbox and the mail sending mailbox contain the characters which are easy to be confused, converting the contained characters which are easy to be confused into standard characters.
According to the common mailbox, the zhangsan @ a.com does not contain characters which are easy to be confused, and the numbers 0 and 1 and the letters i and l are not found, so that the common mailbox and the mail mailbox are kept as original.
And then respectively aligning the user name information and the domain name information of the common mailbox and the sending mailbox based on the respective longest common character strings. Referring to fig. 2, which is a schematic diagram of mailbox alignment in the embodiment of the present application, after 0 is added in the alignment, a dashed line frame is a longest common character string in a user name part, a thin solid line frame is a longest common character string in a domain name part, and a thick solid line frame is a 0-added part.
Then, after the common mailbox and the sender mailbox are converted into ASCII codes, the corresponding vectors are respectively [122,104,97,110,103,115,97,110,48,64,97,46,99,111,109], [122,104,97,110,103,115,97,110, 64,97,46,99,111,109 ].
And finally, calculating the cosine similarity, wherein the cosine similarity value is about 0.988, which can be considered as extremely similar, and the result accords with the visual expectation.
It should be noted that, if the cosine similarity is less than or equal to the preset similarity threshold, the abnormal points of the mail to be detected are: only very useful contacts, the system prompts the user: the mail sender is not a common contact person, does not lighten the mail content and carefully clicks an included link or an attachment according to the judgment of the system.
If the cosine similarity is larger than a preset similarity threshold, the abnormal point of the mail to be detected is a frequently-used contact, and the system prompts a user: the mail is disguised as your regular contact XXX, does not request to lighten the mail content, and carefully clicks on the contained links or attachments, as judged by the system.
Step 130: and determining whether the mail to be detected is an abnormal mail or not according to the mail type.
In this embodiment, when step 130 is executed, the method specifically includes:
and if the mail type is similar mail or abnormal behavior mail, determining that the mail to be detected is abnormal mail.
For example, (1) anomaly: only very useful contacts, the system prompts: the mail sender is not a common contact person, does not lighten the mail content and carefully clicks an included link or an attachment according to the judgment of the system.
(2) Abnormal points are as follows: disguised as a common contact, the system prompts: the mail is disguised as your regular contact XXX, does not request to lighten the mail content, and carefully clicks on the contained links or attachments, as judged by the system.
(3) Abnormal points are as follows: only the abnormal prompt of the sending time of the mail of the common contact is as follows: the system judges that the mail sending time is not consistent with the historical sending time, the possibility of account theft exists, the mail content is not required to be lightened, and the contained link or the attachment is carefully clicked.
(4) Abnormal points are as follows: and only the IP address of the frequently-used contact is abnormal, and the system prompts that: the system judges that the mail IP is not consistent with the historical login IP address, the possibility of account theft exists, the mail content is not required to be lightened, and the contained link or the attachment is carefully clicked.
(5) Abnormal points are as follows: the time and the IP address of the common contact are abnormal, and the system prompts: according to the judgment of the system, the sending time and the IP of the mail are not consistent with the historical behaviors, the possibility of account theft exists, the mail content is not required to be lightened, and the contained link or the attachment is carefully clicked.
In the embodiment of the application, behavior information of a mail to be detected is acquired, a sending mailbox for sending the mail to be detected and an receiving mailbox for receiving the mail to be detected are sent, a common contact set of the receiving mailbox is determined, whether the sending mailbox is contained in the common contact set is judged, if the sending mailbox is determined to be contained in the common contact set, the mail type of the mail to be detected is determined by matching the behavior information with pre-stored standard behavior information, otherwise, the mail type of the mail to be detected is determined by determining mailbox similarity between the sending mailbox and each common mailbox contained in the common contact set, and whether the mail to be detected is an abnormal mail is determined according to the mail type. Therefore, the method and the device are based on the mail behavior, do not depend on the text content, carried attachments and the like of the mail, can detect the abnormity of the mail to be detected only based on a small amount of common characteristics, avoid the privacy disclosure of users, are suitable for the mails in any form, and can find some unknown malicious behaviors of the mail.
Based on the above embodiment, referring to fig. 3, a schematic structural diagram of an abnormal mail detection system in the embodiment of the present application is shown, which specifically includes:
1. and a common mailbox judgment module.
And based on the historical mail data, a common contact person set is established by adopting a statistical learning method, and whether the sending mail box is contained in the common contact person set is further judged.
2. A risk detection module.
When the sender mailbox is not contained in the common contact person set, mailbox disguising judgment is carried out on the sender mailbox, and if the sender mailbox is judged to be disguised as a common mailbox in the common contact person set, the risk of incoming mails of the sender mailbox is further determined; and when the mail sending mailbox is contained in the common contact person set, matching the mail behaviors, and if the characteristic dimension matching fails, presuming that the mail sending mailbox has risks.
The risk detection module also comprises two risk point detection methods of mailbox camouflage judgment and mail behavior characteristic matching.
It should be noted that, based on the behavior feature analysis, the present invention redefines the "bad" of the malicious mail, and the specific content is as follows: (a) the sender A and the receiver B have a fresh exchange, and if the mail of the sender A is received by the receiver B on a certain day, the mail has certain risk; (b) the sender A and the receiver B often send and receive mails, and if the incoming mail of the sender A is different from the usual mail at a certain day, for example, the mail sent from the non-working time of the sender A is received during the working time of the outgoing mail, the mail has certain risk.
Therefore, from the aspect of behavior analysis, malicious mail detection can be performed by only utilizing the shared characteristics of a small amount of mail data, a mail sending mailbox, mail sending time, an IP address of a sender and the like, and any form of mail can be detected while avoiding the privacy disclosure of a user.
3. And a risk prompt module.
And combining judgment results based on the two modules into a risk prompt, wherein the risk prompt is used for cautiously treating related information in the incoming piece, including accessories, links and the like.
In the embodiment of the application, the problem that privacy of a user is revealed is avoided by starting from a mail behavior alone without depending on specific contents of the mail, and the detected abnormal mail can be clearly explained to indicate abnormal points of the mail and enable the user to trust a detection result to a certain extent, so that attention is paid to and risks are reduced.
Based on the foregoing embodiment, referring to fig. 4, another flowchart of an abnormal mail detection method in the embodiment of the present application is shown, which specifically includes:
step 400: the method comprises the steps of obtaining behavior information of the mail to be detected, sending a mail sending mailbox for sending the mail to be detected and receiving an email receiving mailbox for receiving the mail to be detected.
Step 401: and determining a common contact person set of the receiving mailbox.
Step 402: and judging whether the sending mailbox is contained in the common contact person set, if so, executing the step 403, and if not, executing the step.
Step 403: and judging whether the sending time is within the preset standard sending time or not, and judging whether the IP address is the preset standard IP address or not.
In this embodiment, if the sending time is within the preset standard sending time and the IP address is the preset standard IP address, step 404 is executed, if the sending time is not within the preset standard sending time and the IP address is the preset standard IP address, step 405 is executed, if the sending time is within the preset standard sending time and the IP address is not the preset standard IP address, step 406 is executed, and if the sending time is not within the preset standard sending time and the IP address is not the preset standard IP address, step 407 is executed.
Step 404: and judging the mail to be detected as a normal mail.
Step 405: and judging that the sending time of the mail to be detected is abnormal.
Step 406: and judging that the IP address of the mail to be detected is abnormal.
Step 407: and judging that the sending time and the IP address of the mail to be detected are abnormal.
Step 408: and converting the extremely confusable characters in the sending mailbox and each common mailbox contained in the common contact collection into standard characters.
Step 409: and aligning the user name information of the common mailbox with the user name information of the delivery mailbox, and aligning the domain name information of the common mailbox with the domain name information of the delivery mailbox.
Step 410: and converting each character of the common mailbox into a characteristic value, and determining a characteristic vector value of the common mailbox according to each characteristic value.
Step 411: and calculating the cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of the common mailbox.
Step 412: and judging whether the cosine similarity is greater than a preset similarity threshold, if so, executing a step 413, and if not, executing a step 414.
Step 413: and judging the mails to be detected to be similar mails.
Step 414: and judging the mail to be detected as the mail of the abnormal contact person.
It should be noted that, the steps 408 and 414 are executed in a loop for each common mailbox included in the common contact set.
In the embodiment of the application, behavior information of a mail to be detected is acquired, a sending mailbox for sending the mail to be detected and an receiving mailbox for receiving the mail to be detected are sent, a common contact set of the receiving mailbox is determined, whether the sending mailbox is contained in the common contact set is judged, if the sending mailbox is determined to be contained in the common contact set, the mail type of the mail to be detected is determined by matching the behavior information with pre-stored standard behavior information, otherwise, the mail type of the mail to be detected is determined by determining mailbox similarity between the sending mailbox and each common mailbox contained in the common contact set, whether the mail to be detected is abnormal is determined according to the mail type, the mail can be detected based on a small number of common characteristics without depending on mail content, the mail can be applicable to mails in any form while avoiding leakage of user privacy, and the prior art scheme is supplemented to a certain extent, and some unknown mail malicious behaviors can be found.
Based on the same inventive concept, the embodiment of the application also provides an abnormal mail detection device, and the abnormal mail detection device can be a hardware structure, a software module or a hardware structure and a software module. Based on the above embodiments, referring to fig. 5, a schematic structural diagram of an abnormal mail detection apparatus in the embodiment of the present application specifically includes:
the first obtaining module 500 is configured to obtain behavior information of a to-be-detected mail, a sending mailbox for sending the to-be-detected mail, and a receiving mailbox for receiving the to-be-detected mail, where the behavior information represents information of a sending behavior of the sending mailbox when the to-be-detected mail is sent;
a determining module 510, configured to determine a common contact set of the inbox, and determine whether the sending mailbox is included in the common contact set, where the common contact set represents a set of common mailboxes in which the number of incoming and outgoing mails between the sending mailbox and the inbox within a preset time range exceeds a preset number threshold;
the detection module 520 is configured to determine the mail type of the to-be-detected mail by matching the behavior information with pre-stored standard behavior information if it is determined that the sending mailbox is included in the common contact set, and otherwise, determine the mail type of the to-be-detected mail by determining mailbox similarity between the sending mailbox and each common mailbox included in the common contact set, where the standard behavior information represents information of a sending behavior of the sending mailbox when a non-abnormal mail is sent;
the first determining module 530 is configured to determine whether the mail to be detected is an abnormal mail according to the mail type.
Optionally, when obtaining the set of common contacts, the method further includes:
a second obtaining module 540, configured to obtain each received mail received by the receiving mailbox within a preset sampling range, a source sending mailbox for sending each received mail, and each sent mail sent by each source sending mailbox to the receiving mailbox;
a second determining module 550, configured to determine, for each source sending mailbox, an average quantity of forward and backward mails between any source sending mailbox and the receiving mailbox according to each received mail and each sent mail;
a third determining module 560, configured to determine, for each source mailbox, that a source mailbox is a common mailbox if it is determined that the average number of the forward and backward mails in any source mailbox is greater than the preset average number threshold;
the generating module 570 is configured to generate a set of common contacts including the common mailboxes.
Optionally, when determining the average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the received mails and the sent mails, the second determining module 550 is specifically configured to:
counting the daily receiving number of the mails sent by the receiving mailbox and the daily sending number of the mails sent to the receiving mailbox every day by the source sending mailbox;
determining the average receiving number of each day according to the ratio of the sum of the receiving numbers of each day to the preset number of days, and determining the average sending number of each day according to the ratio of the sum of the sending numbers of each day to the preset number of days;
and determining the average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the average daily receiving quantity, the average daily sending quantity and the standard deviation of the average daily receiving quantity.
Optionally, when determining the mail type of the mail to be detected by determining mailbox similarity between the sender mailbox and each common mailbox included in the common contact set, the detecting module 520 is specifically configured to:
respectively aiming at each common mailbox contained in the common contact person set, aligning user name information of any common mailbox with user name information of the sending mailbox, aligning domain name information of the common mailbox with domain name information of the sending mailbox information, converting each character of the common mailbox into a characteristic value, and determining a characteristic vector value of the common mailbox according to each characteristic value;
respectively calculating cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of any one common mailbox aiming at each common mailbox, and if the cosine similarity is larger than a preset similarity threshold, determining that the mail type of the mail to be detected is a similar mail.
Optionally, if the behavior information is a sending time and an IP address, when the behavior information is matched with pre-stored standard behavior information, the detecting module 520 is specifically configured to:
and if the mail sending time is determined not to be within the preset standard mail sending time and/or the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is a behavior abnormal mail, wherein the standard mail sending time represents the mail sending time when the mail sending mailbox sends a non-abnormal mail, and the standard IP address represents the IP address when the mail sending mailbox sends the non-abnormal mail.
Optionally, when obtaining the standard sending time, the method further includes:
the counting module 580 is configured to count, for each common mailbox in the common contact set, the number of senders of any common mailbox in each preset time period every day;
the processing module 590 is configured to determine, for each common mailbox, a standard score of each time period according to the number of dispatches of any common mailbox in each time period every day and a corresponding standard deviation, and count a time period exceeding a preset standard score threshold as standard dispatch time, where the standard score represents whether the time period is a score corresponding to the standard dispatch time.
Optionally, the third determining module 560 is specifically configured to:
and if the mail type is similar mail or abnormal behavior mail, determining that the mail to be detected is abnormal mail.
Based on the above embodiments, referring to fig. 6, a schematic structural diagram of an electronic device in an embodiment of the present application is shown.
An embodiment of the present application provides an electronic device, which may include a processor 610 (CPU), a memory 620, an input device 630, an output device 640, and the like, wherein the input device 630 may include a keyboard, a mouse, a touch screen, and the like, and the output device 640 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.
Memory 620 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 610 with program instructions and data stored in memory 620. In the embodiment of the present application, the memory 620 may be used to store a program of any one of the abnormal mail detection methods in the embodiment of the present application.
The processor 610 is configured to execute any one of the abnormal mail detecting methods according to the embodiments of the present application by calling the program instructions stored in the memory 620, and the processor 610 is configured to execute the abnormal mail detecting method according to the obtained program instructions.
Based on the above embodiments, in the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the abnormal mail detection method in any of the above method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An abnormal mail detection method, comprising:
the method comprises the steps of obtaining behavior information of a mail to be detected, a sending mailbox for sending the mail to be detected and an receiving mailbox for receiving the mail to be detected, wherein the behavior information represents information of sending behavior of the sending mailbox when the mail to be detected is sent;
determining a common contact person set of the receiving mailbox, and judging whether the sending mailbox is contained in the common contact person set, wherein the common contact person set represents a set of common mailboxes of which the quantity of the incoming and outgoing mails between the sending mailbox and the receiving mailbox exceeds a preset quantity threshold value within a preset time range;
if the mail sending mailbox is determined to be contained in the common contact person set, the mail type of the mail to be detected is determined by matching the behavior information with pre-stored standard behavior information, otherwise, the mail type of the mail to be detected is determined by determining mailbox similarity between the mail sending mailbox and each common mailbox contained in the common contact person set, wherein the standard behavior information represents information of sending behavior of the mail sending mailbox when a non-abnormal mail is sent;
and determining whether the mail to be detected is an abnormal mail or not according to the mail type.
2. The method of claim 1, wherein the set of common contacts is obtained by:
acquiring all received mails received by the receiving mailbox in a preset sampling range, a source sending mailbox for sending all received mails, and all sent mails sent to the receiving mailbox by all the source sending mailboxes;
respectively aiming at each source sending mailbox, determining the average quantity of the forward and backward mails between any source sending mailbox and the receiving mailbox according to each received mail and each sent mail;
respectively aiming at each source sending part mailbox, if the average quantity of the forward and backward mails of any source sending part mailbox is larger than a preset average quantity threshold value, determining that the source sending part mailbox is a common mailbox;
and generating a common contact person set containing all common mailboxes.
3. The method of claim 2, wherein determining an average quantity of incoming and outgoing mails between any one source sender mailbox and the recipient mailbox according to the received mails and the sent mails, specifically comprises:
counting the daily receiving number of the mails sent by the receiving mailbox and the daily sending number of the mails sent to the receiving mailbox every day by the source sending mailbox;
determining the average receiving number of each day according to the ratio of the sum of the receiving numbers of each day to the preset number of days, and determining the average sending number of each day according to the ratio of the sum of the sending numbers of each day to the preset number of days;
and determining the average quantity of the incoming and outgoing mails between any source sending mailbox and the receiving mailbox according to the average daily receiving quantity, the average daily sending quantity and the standard deviation of the average daily receiving quantity.
4. The method according to claim 1, wherein determining the mail type of the mail to be detected by determining mailbox similarity between the sender mailbox and each common mailbox included in the common contact set specifically includes:
respectively aiming at each common mailbox contained in the common contact person set, aligning user name information of any common mailbox with user name information of the sending mailbox, aligning domain name information of the common mailbox with domain name information of the sending mailbox information, converting each character of the common mailbox into a characteristic value, and determining a characteristic vector value of the common mailbox according to each characteristic value;
respectively calculating cosine similarity between the characteristic vector value of the sender mailbox and the characteristic vector value of any one common mailbox aiming at each common mailbox, and if the cosine similarity is larger than a preset similarity threshold, determining that the mail type of the mail to be detected is a similar mail.
5. The method according to claim 1, wherein if the behavior information is a sending time and an IP address, determining the mail type of the mail to be detected by matching the behavior information with standard behavior information stored in advance, specifically comprising:
and if the mail sending time is determined not to be within the preset standard mail sending time and/or the IP address is not the preset standard IP address, determining that the mail type of the mail to be detected is a behavior abnormal mail, wherein the standard mail sending time represents the mail sending time when the mail sending mailbox sends a non-abnormal mail, and the standard IP address represents the IP address when the mail sending mailbox sends the non-abnormal mail.
6. The method of claim 5, wherein the standard issue time is obtained by:
respectively counting the number of the sent mails of any one common mailbox in each preset time period for each common mailbox in the common contact person set;
and respectively determining the standard score of each time period according to the number of the mails of any common mailbox in each time period every day and the corresponding standard deviation for each common mailbox, and counting the time period exceeding a preset standard score threshold as the standard mail sending time, wherein the standard score represents whether the standard score is the score corresponding to the standard mail sending time.
7. The method according to claim 4 or 5, wherein determining whether the mail to be detected is an abnormal mail according to the mail type specifically comprises:
and if the mail type is similar mail or abnormal behavior mail, determining that the mail to be detected is abnormal mail.
8. An abnormal mail detecting apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a sending module, wherein the first acquisition module is used for acquiring behavior information of a mail to be detected, a sending mailbox for sending the mail to be detected and a receiving mailbox for receiving the mail to be detected, and the behavior information represents information of sending behavior of the sending mailbox when the mail to be detected is sent;
the judging module is used for determining a common contact person set of the receiving mailbox and judging whether the sending mailbox is contained in the common contact person set, wherein the common contact person set represents a set of common mailboxes of which the quantity of the incoming and outgoing mails between the sending mailbox and the receiving mailbox exceeds a preset quantity threshold value within a preset time range;
the detection module is used for determining the mail type of the mail to be detected by matching the behavior information with pre-stored standard behavior information if the mail to be sent is determined to be contained in the common contact person set, otherwise determining the mail type of the mail to be detected by determining the mail box similarity between the mail to be sent and each common mail box contained in the common contact person set, wherein the standard behavior information represents the information of the sending behavior of the mail to be sent when the non-abnormal mail is sent;
and the first determining module is used for determining whether the mail to be detected is an abnormal mail or not according to the mail type.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-7 are implemented when the program is executed by the processor.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN202011614644.1A 2020-12-30 2020-12-30 Abnormal mail detection method and device Active CN112822168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011614644.1A CN112822168B (en) 2020-12-30 2020-12-30 Abnormal mail detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011614644.1A CN112822168B (en) 2020-12-30 2020-12-30 Abnormal mail detection method and device

Publications (2)

Publication Number Publication Date
CN112822168A true CN112822168A (en) 2021-05-18
CN112822168B CN112822168B (en) 2022-09-23

Family

ID=75855455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011614644.1A Active CN112822168B (en) 2020-12-30 2020-12-30 Abnormal mail detection method and device

Country Status (1)

Country Link
CN (1) CN112822168B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408281A (en) * 2021-07-14 2021-09-17 北京天融信网络安全技术有限公司 Mailbox account abnormity detection method and device, electronic equipment and storage medium
CN114520797A (en) * 2022-02-14 2022-05-20 广州拓波软件科技有限公司 Intelligent control method and device for mails

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072137A (en) * 2015-09-15 2015-11-18 蔡丝英 Spear phishing mail detection method and device
CN107196844A (en) * 2016-11-28 2017-09-22 北京神州泰岳信息安全技术有限公司 Exception mail recognition methods and device
CN108347370A (en) * 2017-10-19 2018-07-31 北京安天网络安全技术有限公司 A kind of detection method and system of targeted attacks mail
WO2019141091A1 (en) * 2018-01-19 2019-07-25 论客科技(广州)有限公司 Method, system, and device for mail monitoring
US20190306102A1 (en) * 2018-03-29 2019-10-03 Cellopoint International Corporation Reminding method of unfamiliar emails
CN111147489A (en) * 2019-12-26 2020-05-12 中国科学院信息工程研究所 Link camouflage-oriented fishfork attack mail discovery method and device
CN111404805A (en) * 2020-03-12 2020-07-10 深信服科技股份有限公司 Junk mail detection method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072137A (en) * 2015-09-15 2015-11-18 蔡丝英 Spear phishing mail detection method and device
CN107196844A (en) * 2016-11-28 2017-09-22 北京神州泰岳信息安全技术有限公司 Exception mail recognition methods and device
CN108347370A (en) * 2017-10-19 2018-07-31 北京安天网络安全技术有限公司 A kind of detection method and system of targeted attacks mail
WO2019141091A1 (en) * 2018-01-19 2019-07-25 论客科技(广州)有限公司 Method, system, and device for mail monitoring
US20190306102A1 (en) * 2018-03-29 2019-10-03 Cellopoint International Corporation Reminding method of unfamiliar emails
CN111147489A (en) * 2019-12-26 2020-05-12 中国科学院信息工程研究所 Link camouflage-oriented fishfork attack mail discovery method and device
CN111404805A (en) * 2020-03-12 2020-07-10 深信服科技股份有限公司 Junk mail detection method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408281A (en) * 2021-07-14 2021-09-17 北京天融信网络安全技术有限公司 Mailbox account abnormity detection method and device, electronic equipment and storage medium
CN113408281B (en) * 2021-07-14 2024-02-09 北京天融信网络安全技术有限公司 Mailbox account anomaly detection method and device, electronic equipment and storage medium
CN114520797A (en) * 2022-02-14 2022-05-20 广州拓波软件科技有限公司 Intelligent control method and device for mails
CN114520797B (en) * 2022-02-14 2024-02-09 广州拓波软件科技有限公司 Intelligent mail management and control method and device

Also Published As

Publication number Publication date
CN112822168B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
US10819744B1 (en) Collaborative phishing attack detection
US9906554B2 (en) Suspicious message processing and incident response
US10609073B2 (en) Detecting phishing attempts
US10243989B1 (en) Systems and methods for inspecting emails for malicious content
US8719940B1 (en) Collaborative phishing attack detection
US9398038B2 (en) Collaborative phishing attack detection
US8752172B1 (en) Processing email messages based on authenticity analysis
US10204157B2 (en) Image based spam blocking
US8713677B2 (en) Anti-phishing system and method
US8769695B2 (en) Phish probability scoring model
CA2654796C (en) Systems and methods for identifying potentially malicious messages
US8782402B2 (en) System and method for secure communications
WO2020230053A1 (en) Detection of phishing campaigns
CN112822168B (en) Abnormal mail detection method and device
EP2365461A2 (en) Reputation management for network content classification
CN109039874B (en) Mail auditing method and device based on behavior analysis
US10044735B2 (en) System and method for authentication of electronic communications
CN111404939B (en) Mail threat detection method, device, equipment and storage medium
CN107038540B (en) Method and device for object type distribution based on intelligent bar code
US9740858B1 (en) System and method for identifying forged emails
US20220400134A1 (en) Defense against emoji domain web addresses
US10652276B1 (en) System and method for distinguishing authentic and malicious electronic messages
US20230091440A1 (en) A method and a system for identifying a security breach or a data theft
CN116186685A (en) System and method for identifying phishing emails
CN113726806A (en) BEC mail detection method, device and system and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant