CN113595994A

CN113595994A - Abnormal mail detection method and device, electronic equipment and storage medium

Info

Publication number: CN113595994A
Application number: CN202110785507.2A
Authority: CN
Inventors: 宁阳; 闫凡; 郜振锋; 郑景中; 王雄; 许云中
Original assignee: Sangfor Technologies Co Ltd
Current assignee: Sangfor Technologies Co Ltd
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2021-11-02
Anticipated expiration: 2041-07-12
Also published as: CN113595994B

Abstract

The embodiment of the invention is suitable for the technical field of computers and provides an abnormal mail detection method, an abnormal mail detection device, electronic equipment and a storage medium, wherein the abnormal mail detection method comprises the following steps: acquiring at least two pieces of first data of each electronic mail in at least two electronic mails; the first data represents mail information of a corresponding electronic mail; constructing a knowledge graph about at least two emails based on the acquired first data; the knowledge graph represents the incidence relation between at least two emails; identifying an abnormal subgraph of at least two subgraphs of the knowledge-graph; and determining abnormal mails in the at least two emails based on the identified abnormal subgraph.

Description

Abnormal mail detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting an abnormal email, an electronic device, and a storage medium.

Background

Phishing mail is one of abnormal mails, and is a fake e-mail which induces a receiver to reply information such as an account number, a password and the like to a specified receiver, so that property loss of the receiver is caused. According to the related technology, the phishing mails are detected based on methods such as rules, machine learning and deep learning, when the related technology detects the phishing mails, a plurality of phishing mails are missed to be detected, and the phishing mail detection accuracy rate is low.

Disclosure of Invention

In order to solve the above problem, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for detecting an abnormal email, so as to at least solve the problem of low accuracy in detecting an abnormal email in a related technology.

The technical scheme of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides an abnormal mail detection method, where the method includes:

acquiring at least two pieces of first data of each electronic mail in at least two electronic mails; the first data represent mail information of a corresponding electronic mail;

constructing a knowledge graph about the at least two emails based on the acquired first data; the knowledge graph represents the incidence relation between the at least two emails;

identifying an abnormal subgraph of the at least two subgraphs of the knowledge-graph;

and determining abnormal mails in the at least two emails based on the identified abnormal subgraph.

In the above scheme, the identifying an abnormal subgraph in at least two subgraphs of the knowledge graph includes:

determining a vector of a corresponding sub-graph based on the feature parameters of each of the at least two sub-graphs;

inputting the vector into a set classification model to obtain a predicted value output by the set classification model; the set classification model is used for outputting the predicted value based on the corresponding hypersphere; and the predicted value represents whether the sub-image corresponding to the vector is an abnormal sub-image.

In the above aspect, the set classification model determines whether the vector is within the hypersphere,

in the case that the vector is within the hypersphere, the set classification model outputs a first predicted value; the first predicted value represents that the sub-image corresponding to the vector is a normal sub-image;

in the case that the vector is not within the hypersphere, the set classification model outputs a second predicted value; and the second predicted value represents that the sub-image corresponding to the vector is an abnormal sub-image.

In the above solution, the determining the abnormal email in the at least two emails based on the identified abnormal subgraph includes:

and determining all the e-mails corresponding to the abnormal subgraph as abnormal mails.

determining a number of first nodes of each of at least two subgraphs of the knowledge-graph; the first node represents a sender identity field of the email;

and under the condition that the number of the first nodes is more than 1, identifying the corresponding subgraph as an abnormal subgraph.

determining a third node from all second nodes of the identified abnormal subgraph; the second node represents the ID of the E-mail; the third node represents a second node which is not in a set white list and is connected with the first node;

and determining the E-mail corresponding to the determined third node as an abnormal E-mail.

determining whether the number of fifth nodes connected with a fourth node is greater than 1 in the case that the number of second nodes connected with the fourth node in the knowledge-graph is greater than 1; the second node represents the ID of the E-mail; the fourth node represents a first-level domain name in the URL of the email; the fifth node represents a source IP address of the email or a first-level domain name in a sender address of the email;

and under the condition that the number of the fifth nodes is more than 1, identifying the corresponding subgraph as an abnormal subgraph.

In the above scheme, the determining the abnormal email in the at least two emails based on the abnormal subgraph includes:

and determining the E-mail corresponding to the second node connected with the fourth node in the abnormal subgraph as an abnormal E-mail.

In the foregoing solution, when the knowledge graph about the at least two emails is constructed based on the acquired first data, the method includes:

generating nodes of the knowledge-graph based on at least two first data of each of the at least two emails; wherein the same first data corresponds to the same node in the knowledge-graph;

and connecting nodes corresponding to the first data belonging to the same e-mail to form the knowledge graph.

In the foregoing solution, the at least two first data at least include any two of the following:

a first-level domain name in a sender address of the email;

an identification ID of the e-mail;

a source internet protocol, IP, address of the email;

a first-level domain name in a uniform resource location system URL of an email;

the sender identity field of the email.

In a second aspect, an embodiment of the present invention provides an abnormal mail detection apparatus, including:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring at least two first data of each e-mail in at least two e-mails; the first data represent mail information of a corresponding electronic mail;

the construction module is used for constructing a knowledge graph about the at least two emails based on the acquired first data; the knowledge graph represents the incidence relation between the at least two emails;

an identification module to identify an abnormal subgraph of the at least two subgraphs of the knowledge-graph;

and the determining module is used for determining abnormal mails in the at least two emails based on the identified abnormal subgraph.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the steps of the method for detecting an abnormal mail provided in the first aspect of the embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program. The computer program, when executed by a processor, implements the steps of the method for detecting an abnormal mail as provided by the first aspect of the embodiment of the present invention.

The embodiment of the invention obtains at least two pieces of first data of each E-mail in at least two E-mails, and the first data represents the mail information of the corresponding E-mail. Constructing a knowledge graph about the at least two emails based on the acquired first data, wherein the knowledge graph represents the incidence relation between the at least two emails, identifying abnormal subgraphs in at least two subgraphs of the knowledge graph, and determining the abnormal emails in the at least two emails based on the identified abnormal subgraphs. According to the embodiment of the invention, the knowledge graph of the e-mail is constructed, the knowledge graph can reflect the association relation between the e-mails, and the abnormal e-mail in the e-mail is determined through the abnormal subgraph in the knowledge graph, so that the accuracy of detecting the abnormal e-mail can be improved.

Drawings

Fig. 1 is a schematic flow chart illustrating an implementation of an abnormal mail detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a knowledge-graph provided by an embodiment of the present invention;

fig. 3 is a schematic flow chart illustrating an implementation of another abnormal mail detection method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a knowledge-graph provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a sub-graph in a knowledge-graph, provided by an embodiment of the invention;

fig. 6 is a schematic flow chart illustrating an implementation of another abnormal mail detection method according to an embodiment of the present invention;

fig. 7 is a schematic flow chart illustrating an implementation of another abnormal mail detection method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another knowledge-graph provided by an embodiment of the present invention;

fig. 9 is a schematic flow chart illustrating an implementation of another abnormal mail detection method according to an embodiment of the present invention;

fig. 10 is a schematic flow chart illustrating an implementation of another abnormal mail detection method according to an embodiment of the present invention;

fig. 11 is a schematic flow chart illustrating an implementation of another abnormal mail detection method according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of a hypersphere classification according to an embodiment of the present invention;

FIG. 13 is a diagram illustrating an abnormal mail detection process according to an embodiment of the present invention;

fig. 14 is a schematic diagram of an abnormal mail detection apparatus according to an embodiment of the present invention;

fig. 15 is a schematic diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The phishing mail is one of abnormal mails, and induces a receiver to reply information such as an account number, a password and the like to a specified receiver by using a disguised electronic mail; or to direct the addressee to connect to a specific web page, which is usually disguised as a real web site, such as a bank or financial web page, to make the addressee believe true, thereby inputting the number and password of a credit card or bank card, resulting in property loss of the addressee.

In the related art, phishing mail detection technologies are all based on methods such as rules, machine learning and deep learning, and the methods cannot utilize the correlation information among mail data, so that many phishing mails are missed for detection.

The phishing mails are associated, the normal mails and the phishing mails are also associated, and according to the association relations, the detection accuracy rate of the phishing mails can be improved.

In view of the above disadvantages of the related art, embodiments of the present invention provide an abnormal email detection method, which can at least improve the detection accuracy of obtaining abnormal emails. In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 is a schematic flow chart illustrating an implementation process of an abnormal mail detection method according to an embodiment of the present invention, where an execution main body of the abnormal mail detection method is an electronic device, and the electronic device includes a desktop computer, a notebook computer, a server, and the like. Referring to fig. 1, the abnormal mail detecting method includes:

s101, acquiring at least two first data of each e-mail in at least two e-mails; the first data represents mail information of the corresponding electronic mail.

In an embodiment, the at least two first data comprise at least any two of:

a first-level domain name in a sender address of the email;

an Identification (ID) of the email;

a source Internet Protocol (IP) address of the email;

a primary domain name in a Uniform Resource Locator (URL) of an email;

the sender identity field of the email.

The first-level domain name (fld) in the sender address (Mailfrom) of the email can be a personal mailbox address such as 783939xx @ qq.com and 2016xxx @126.com, and can also be an enterprise mailbox address.

Each e-mail has a unique ID for distinguishing it from other e-mails. For example, the ID of the email may be a string of numbers or codes.

The URL in the email body, such as http:// example. com/xxxx. html, gets the first-level domain name example.

The sender identity field of the e-mail may be a hello field, which is used to identify itself. For example, HELO mail.alpha.com.cn may be interpreted as "hi, i is mail.alpha.com.cn". Of course here the sender may lie, but there is no mechanism to prevent the sender mail.alpha.com.cn from saying "hi, i.e. mail.xxx.com" or "hi, i.e. mail.yyy.com". In most cases, the recipient has some way to confirm the true identity of the sender.

Here, the more the at least two first data include the above items, the more the formed knowledge graph can reflect the association relationship of the mail, and the higher the detection accuracy when detecting the abnormal mail.

S102, constructing a knowledge graph about the at least two emails based on the acquired first data; the knowledge graph represents the association relationship between the at least two emails.

The knowledge map is a knowledge base which represents entities in an objective world and relations between the entities in the form of a graph, wherein the entities can be objects in the real world or abstract concepts, and the relations represent relations between the entities.

The knowledge-graph is composed of nodes and edges, for example, referring to fig. 2, fig. 2 is a schematic diagram of a knowledge-graph provided by an embodiment of the present invention, the left side of fig. 2 contains only one type of nodes and edges, and the right side of fig. 2 contains multiple types of nodes and edges.

In the embodiment of the invention, the first data are used as nodes, and the association relation between the first data is an edge, so that the knowledge graph is constructed. The relationship here refers to whether the two first data belong to the same mail.

Referring to fig. 3, in an embodiment, when the building a knowledge-graph about the at least two emails based on the obtained first data, the method includes:

s301, generating nodes of the knowledge graph based on at least two pieces of first data of each email in the at least two emails; wherein the same first data corresponds to the same node in the knowledge-graph.

The same first data only generates one node, for example, assuming that the source IP addresses of email 1 and email 2 are the same, the source IP addresses of email 1 and email 2 both correspond to the same node in the knowledge graph.

Here, since the IDs of each email are different, the number of nodes corresponding to the IDs in the knowledge-graph is the same as the number of IDs.

In practical applications, different colors and/or sizes of nodes may be used in the knowledge-graph to represent different types of first data, such as the source IP address represented by a green node; email IDs are represented by gray nodes; red nodes represent the primary domain name in the URL; the blue node represents Helo; the yellow node represents the primary domain name in the address of the sender of the email.

S302, connecting nodes corresponding to the first data belonging to the same e-mail to form the knowledge graph.

The first data belonging to the same mail have an association relationship, and nodes corresponding to the first data of the same mail are connected by edges. In the knowledge-graph, each two connected nodes may indicate that the corresponding first data is in the same email.

Since there is no relationship between some nodes in the knowledge-graph, this may cause multiple independent sub-graphs to appear in the knowledge-graph. Referring to fig. 4, fig. 4 is a schematic diagram of a knowledge graph according to an embodiment of the present invention, in fig. 4, a total of 5 sub-graphs appear, and each sub-graph represents that a corresponding email has an association relationship.

Fig. 5 is a sub-graph of the knowledge graph shown in fig. 4, and a total of 5 types of nodes in the sub-graph shown in fig. 5 correspond to the source IP, the mail ID, the first-level domain name in the URL, and the first-level domain name in the help and Mailfrom, respectively. The subgraph shown in fig. 5 is a normal subgraph, and the mails corresponding to the subgraph are all normal mails. The nodes corresponding to the first-level domain names in the source IP, the URL, the Helo domain name and the Mailfrom are all connected with the nodes corresponding to the mail IDs, the node corresponding to each mail ID is connected with 4 other nodes, and the other nodes refer to the nodes corresponding to the first-level domain names in the source IP, the URL, the Helo domain name and the Mailfrom. Since the subgraph shown in fig. 5 is a normal subgraph, the nodes corresponding to 5 mail IDs are all connected to the same hello node, the first-level domain name node in the same URL, and the first-level domain name node in the same Mailfrom. In the abnormal subgraph, a first-level domain name node or a plurality of Helo nodes in a plurality of URLs may appear.

S103, determining abnormal mails in the at least two emails based on the knowledge graph.

And carrying out graph analysis on the constructed knowledge graph, and detecting abnormal mails in at least two emails by using the association relation between the emails.

Referring to FIG. 6, in one embodiment, the determining the abnormal mail in the at least two emails based on the knowledge-graph includes:

s601, identifying abnormal subgraphs of at least two subgraphs of the knowledge graph.

The knowledge graph comprises at least two sub-graphs, and all the e-mails corresponding to each sub-graph have an association relationship, such as e-mails sent by the same sender or e-mails sent at the same source IP address.

Here, the abnormal subgraph refers to a subgraph including abnormal mails, and since normal mails may have an association relationship with abnormal mails, not all abnormal mails in the abnormal subgraph may be abnormal mails, and a part of normal mails may be included.

Referring to fig. 7, in an embodiment, the identifying an abnormal sub-graph of the at least two sub-graphs of the knowledge-graph includes:

s701, determining the number of first nodes of each sub-graph in at least two sub-graphs of the knowledge graph; the first node characterizes a sender identity field of the email.

Here, the sender identity field of the email is illustrated as Helo, which determines the number of first nodes in each sub-graph of the knowledge-graph.

S702, under the condition that the number of the first nodes is larger than 1, identifying the corresponding subgraph as an abnormal subgraph.

In a normal subgraph, only one node corresponding to Helo appears in one subgraph, when more than 1 node corresponding to Helo appears, the subgraph is explained to contain abnormal mails, the subgraph is determined to be an abnormal subgraph,

for example, referring to fig. 8, fig. 8 is a schematic diagram of another knowledge graph provided in the embodiment of the present invention, where the number of nodes corresponding to Helo in the sub-graph on the right side of fig. 8 is already greater than 1, which indicates that this sub-graph is an abnormal sub-graph and contains abnormal mails.

S602, determining abnormal mails in the at least two emails based on the identified abnormal subgraph.

Detecting abnormal mails on the basis of the identified abnormal subgraph, referring to fig. 9, in an embodiment, the determining abnormal mails in the at least two emails on the basis of the identified abnormal subgraph includes:

s901, determining a third node from all second nodes of the identified abnormal subgraph; the second node represents the ID of the E-mail; and the third node represents the second node which is not in the set white list of the connected first node.

The embodiment of the invention detects the abnormal mail based on the abnormal subgraph detected by the embodiment shown in fig. 7, wherein if the abnormal subgraph comprises a plurality of nodes corresponding to helos, one of the nodes corresponding to the helos is the node corresponding to the normal Helo, and the rest are the nodes corresponding to the abnormal helos. Therefore, the node corresponding to the normal hello needs to be excluded from the node, so as to avoid false detection.

In the embodiment of the invention, a set white list is created, a normal Helo field is stored in the set white list, if the Helo field corresponding to the first node of the abnormal subgraph is not in the set white list, the corresponding first node is an abnormal node, and the second node connected with the abnormal node is determined as a third node, namely the third node is the second node connected with the node corresponding to the Helo which is not in the set white list. Here, the second node refers to the ID of the email.

And S902, determining the E-mail corresponding to the determined third node as an abnormal E-mail.

And finding the corresponding e-mail according to the mail ID, wherein the mail IDs corresponding to the third nodes are the IDs of the abnormal mails.

Referring to FIG. 10, in another embodiment, the identifying an anomalous subgraph in the at least two subgraphs of the knowledge-graph comprises:

s1001, determining whether the number of fifth nodes connected with a fourth node is more than 1 or not under the condition that the number of second nodes connected with the fourth node in the knowledge graph is more than 1; the second node represents the ID of the E-mail; the fourth node represents a first-level domain name in the URL of the email; and the fifth node represents a source IP address of the E-mail or a first-level domain name in a sender address of the E-mail.

If the number of the second nodes connected with a fourth node in a subgraph is larger than 1, the electronic mails corresponding to the second nodes are all sent by using the same URL. On the basis, the number of fifth nodes connected with the second nodes is determined, wherein the second nodes refer to all the second nodes connected with the fourth nodes, and the number of the fifth nodes connected with the second nodes is determined, and the fifth nodes can correspond to the source IP address of the email or the first-level domain name in the sender address of the email.

S1002, under the condition that the number of the fifth nodes is larger than 1, identifying the corresponding subgraph as an abnormal subgraph.

In a subgraph, under the condition that the number of second nodes connected with fourth nodes is more than 1 and the number of fifth nodes connected with the second nodes is more than 1, the subgraph is determined to be an abnormal subgraph.

In practical application, if a plurality of emails share the same URL and the source IP addresses of the emails or the first-level domain names in the sender addresses of the emails are different, the corresponding subgraph is an abnormal subgraph, and the emails are all abnormal emails.

On the basis of the embodiment shown in fig. 10, the determining the abnormal email in the at least two emails based on the abnormal subgraph includes:

determining an email corresponding to a second node connected with the fourth node in the abnormal subgraph as an abnormal email; the second node represents the ID of the E-mail; the fourth node characterizes a first level domain name in the URL of the email.

Because a plurality of e-mails all share the same URL and the source IP addresses of the e-mails or the first-level domain names in the sender addresses of the e-mails are different, the e-mails are all abnormal mails.

In practical application, a normal sender does not switch the source IP address, the hello field, and the Mailfrom at will when sending an email, so that in the generated subgraph, if a plurality of nodes corresponding to hellos appear, or a plurality of emails share the same URL and the source IP addresses or Mailfrom of the emails are different, it can be determined that the corresponding subgraph is abnormal.

In addition to the method provided by the above embodiment, an abnormal sub-graph may also be detected by means of a black list, for example, a URL is written in the black list, and as long as the URL appearing in the sub-graph is in the black list, the sub-graph is considered to be an abnormal sub-graph.

Referring to FIG. 11, in another embodiment, the identifying an anomalous subgraph in the at least two subgraphs of the knowledge-graph comprises:

s1101, determining a vector of a corresponding sub-graph based on the characteristic parameters of each sub-graph in the at least two sub-graphs.

Here, the feature parameters of the subgraph may include: the depth of the graph, the number and the ratio of each type of nodes in the graph, the number of the extranet IPs in the graph, the number of the intranet IPs in the graph, the number and the ratio of domain names in the white list and other characteristic parameters.

And generating a vector based on the extracted sub-graph feature parameters, wherein the generated vector can be multidimensional.

S1102, inputting the vector into a set classification model to obtain a predicted value output by the set classification model; the set classification model is used for outputting the predicted value based on the corresponding hypersphere; and the predicted value represents whether the sub-image corresponding to the vector is an abnormal sub-image.

Here, the classification model is a Class of Support Vector Machine (One-Class SVM, One-Class Support Vector Machine). In the embodiment of the invention, a hypersphere corresponding to the set classification model is trained by using a large amount of training data in advance, the hypersphere is described by a plurality of support vectors, and the hypersphere can be separated into two types of samples. The training data should use mostly normal sub-picture vectors and a small portion of abnormal sub-picture vectors. The training aims to obtain the radius and the sphere center position of the hypersphere, and judge whether the sub-graph vector is in the hypersphere or not through the distance from the sub-graph vector to the sphere center of the hypersphere and the radius of the hypersphere, so as to judge whether the sub-graph is an abnormal sub-graph or not.

In one embodiment, the set classification model determines whether the vector is within the hypersphere,

For example, if the vector is within the hypersphere, the set classification model outputs 1; if the vector is not within the hypersphere, the set classification model outputs 0. Therefore, whether the subgraph is an abnormal subgraph or not can be simply judged according to the value output by the set classification model.

Referring to fig. 12, fig. 12 is a schematic diagram of classification of a hypersphere provided by an embodiment of the present invention, in which circles represent hyperspaces, triangles inside the hypersphere represent normal sub-image vectors, and circles outside the hypersphere represent abnormal sub-image vectors. A represents the spherical center a of the hypersphere, R represents the radius of the hypersphere, and whether the subgraph is abnormal can be judged according to the distance from the subgraph vector to the spherical center a of the hypersphere and the size of the radius of the hypersphere R. For example, the distance from the sub-graph vector to the hypersphere center a is greater than the hypersphere radius R, which indicates that the sub-graph vector is outside the hypersphere, i.e., the corresponding sub-graph is considered to be abnormal.

In one embodiment, the determining the abnormal email of the at least two emails based on the identified abnormal subgraph comprises:

In the embodiment of the invention, all the e-mails corresponding to the abnormal subgraph are determined as the abnormal e-mails through the abnormal subgraph detected by the set classification model.

Because the hypersphere support vector machine has unique advantages in the aspect of solving the classification problem of non-uniformly distributed samples, most of e-mails of users are normal mails, few of e-mails are abnormal mails, most of the corresponding generated knowledge graphs are normal subgraphs, few of the corresponding generated knowledge graphs are abnormal subgraphs, and visible subgraphs are in accordance with the non-uniform distribution, the accuracy of detecting the abnormal mails can be improved by using the hypersphere to detect the abnormal subgraphs.

In practical application, in addition to using the One-Class SVM algorithm to detect the abnormal subgraph, the IsolationsForest algorithm, the Elliptics Envelope algorithm and the robust covariance algorithm can be used to realize the abnormal subgraph detection. When the sub-graph features are extracted, the distribution of the sub-graph features can be calculated, the z-score of the features is calculated, the features are sorted according to importance, and visual display is carried out.

Referring to fig. 13, fig. 13 is a schematic diagram of an abnormal mail detection flow provided by an application embodiment of the present invention, where the abnormal mail detection flow includes:

first, mail history data, such as e-mails that may be received by a user within 1 month, is loaded.

Key fields are extracted for these mail data, here 5 types of fields are extracted for each email: helo, Mailfrom, Source IP, mail ID, and URL. And the URL is acquired from the text through the regular expression.

And constructing the knowledge graph based on the flid, Helo and the fld of the URL of the Mailfrom, the source IP and the mail ID.

And carrying out abnormal subgraph detection on the knowledge graph, and if the subgraph does not detect the abnormality, indicating that the mail corresponding to the subgraph is a normal mail.

And if the abnormal subgraph is detected, outputting the abnormal subgraph, determining abnormal mails in the abnormal subgraph, and outputting an alarm to the user. Here, the warning information may include the specific contents of the abnormal mail.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The technical means described in the embodiments of the present invention may be arbitrarily combined without conflict.

In addition, in the embodiments of the present invention, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.

Referring to fig. 14, fig. 14 is a schematic diagram of an abnormal mail detection apparatus according to an embodiment of the present invention, as shown in fig. 14, the apparatus includes: the device comprises an acquisition module, a construction module, an identification module and a determination module.

In an embodiment, the identification module, when identifying an abnormal subgraph of the at least two subgraphs of the knowledge-graph, is configured to:

In one embodiment, the determining module, when determining the abnormal email of the at least two emails based on the identified abnormal subgraph, is configured to:

In one embodiment, the construction module, when constructing the knowledge-graph about the at least two emails based on the acquired first data, is configured to:

In an embodiment, the at least two first data comprise at least any two of:

a first-level domain name in a sender address of the email;

an identification ID of the e-mail;

a source internet protocol, IP, address of the email;

the sender identity field of the email.

In one embodiment, the determination module, when determining the abnormal mail of the at least two emails based on the knowledge-graph, is configured to:

In an embodiment, the determining module, when identifying an abnormal subgraph of the at least two subgraphs of the knowledge-graph, is configured to:

In one embodiment, the determining module is configured to, in identifying an abnormal subgraph of the at least two subgraphs of the knowledge-graph:

In one embodiment, the determining module, when determining the abnormal email of the at least two emails based on the abnormal subgraph, is configured to:

In practical applications, the obtaining module, the constructing module, the identifying module and the determining module may be implemented by a Processor in an electronic device, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA).

It should be noted that: in the foregoing embodiment, when the abnormal mail detection apparatus is deployed in a container group, only the division of the modules is used as an example, and in practical applications, the processing distribution may be completed by different modules according to needs, that is, the internal structure of the apparatus is divided into different modules, so as to complete all or part of the processing described above. In addition, the abnormal mail detection device provided by the above embodiment and the abnormal mail detection method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment and is not described herein again.

Based on the hardware implementation of the program module, in order to implement the method of the embodiment of the present application, an embodiment of the present application further provides an electronic device. Fig. 15 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 15, the electronic device includes:

the communication interface can carry out information interaction with other equipment such as network equipment and the like;

and the processor is connected with the communication interface to realize information interaction with other equipment, and is used for executing the method provided by one or more technical schemes on the electronic equipment side when running a computer program. And the computer program is stored on the memory.

Of course, in practice, the various components in an electronic device are coupled together by a bus system. It will be appreciated that a bus system is used to enable communications among the components. The bus system includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as a bus system in figure 15.

The memory in the embodiments of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.

It will be appreciated that the memory can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the embodiments of the present application may be applied to a processor, or may be implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in a memory where a processor reads the programs in the memory and in combination with its hardware performs the steps of the method as previously described.

Optionally, when the processor executes the program, the corresponding process implemented by the electronic device in each method of the embodiment of the present application is implemented, and for brevity, no further description is given here.

In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, for example, a first memory storing a computer program, where the computer program is executable by a processor of an electronic device to perform the steps of the foregoing method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.

In addition, in the examples of the present application, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An abnormal mail detection method, characterized in that the method comprises:

2. The method of claim 1, wherein the identifying an anomalous subgraph of the at least two subgraphs of the knowledge-graph comprises:

3. The method of claim 2, wherein the set classification model determines whether the vector is within the hypersphere,

4. The method of claim 2, wherein determining the abnormal email of the at least two emails based on the identified abnormal subgraph comprises:

5. The method of claim 1, wherein the identifying an anomalous subgraph of the at least two subgraphs of the knowledge-graph comprises:

6. The method of claim 5, wherein determining the abnormal email of the at least two emails based on the identified abnormal subgraph comprises:

7. The method of claim 1, wherein the identifying an anomalous subgraph of the at least two subgraphs of the knowledge-graph comprises:

8. The method of claim 7, wherein the determining the abnormal email of the at least two emails based on the abnormal subgraph comprises:

9. The method of claim 1, wherein the building of the knowledge-graph of the at least two emails based on the obtained first data comprises:

10. The method of claim 1, wherein the at least two first data comprise at least any two of:

a first-level domain name in a sender address of the email;

an identification ID of the e-mail;

a source internet protocol, IP, address of the email;

the sender identity field of the email.

11. An abnormal mail detecting apparatus, comprising:

12. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of detecting an abnormal mail according to any one of claims 1 to 10 when executing the computer program.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to execute the abnormal mail detecting method according to any one of claims 1 to 10.