CN116566640A

CN116566640A - E-mail Trojan horse identification method and system based on E-mail behavior pattern analysis

Info

Publication number: CN116566640A
Application number: CN202310192868.5A
Authority: CN
Inventors: 金波; 张玲
Original assignee: Beijing Saibo Yian Technology Co ltd
Current assignee: Beijing Saibo Yian Technology Co ltd
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-08-08

Abstract

The invention provides a mail Trojan horse identification method and a system based on mail behavior pattern analysis, which relate to the technical field of intelligent identification, and the method comprises the following steps: the method comprises the steps of respectively collecting mail receiving and sending information according to mail account numbers, clustering the mail receiving and sending information, carrying out mail receiving and sending frequency analysis on various mail sets, determining the receiving and sending frequency, filtering and extracting the mail receiving and sending information by utilizing heartbeat frequency characteristics when the fixed heartbeat frequency exists in the receiving and sending frequency, carrying out similarity analysis on texts and attachment contents after mail data of a specific mail account number are obtained, carrying out behavior analysis on mail bodies when the similarity of the mail data meets a preset similarity threshold, and taking the mail with abnormal behaviors as a highly suspicious mail Trojan.

Description

E-mail Trojan horse identification method and system based on E-mail behavior pattern analysis

Technical Field

The invention relates to the technical field of intelligent recognition, in particular to a mail Trojan horse recognition system based on mail behavior pattern analysis.

Background

With the development of e-mails, especially in the aspect of e-mail security, if a normal mail is safe no matter how the user operates, but the security risk often comes from abnormal mail, at present, the attacks of the Trojan horse on the mail are combined with the social engineering method, the sent mail containing the Trojan horse is not greatly different from the normal mail on the surface, and is not easy to distinguish, and the following modes are often adopted for the mails:

the mail format is a webpage file, the mail can only be opened in html format, the webpages mainly utilize vulnerabilities such as IE, and when the mail is opened, trojan horse programs can be downloaded to formulated addresses and executed in the background.

The application software security vulnerabilities Trojan horse often comprise an attachment, the attachment can be an exe file type or a doc, pdf, xls, ppt file type, an intruder binds Trojan horse software in a file or software by constructing a special format, and when a user opens the file, the Trojan horse program can be directly executed. The method is characterized in that the method is further hidden, the Trojan horse software is replaced by a downloader, the downloader downloads Trojan horse software to a designated site and is not virus software, when a user views a file, the downloader is executed first, then the downloader downloads Trojan horse software and executes the Trojan horse software, and the virus killing software is normal software for viewing the downloader and cannot be checked and killed. The method has good concealment, the Trojan horse survival rate and extremely high safety risk.

The hidden danger of information leakage, users are required to provide relevant information such as Email addresses in the BBS forum, blog and relevant services of some companies, some companies and individuals sell Email addresses and the like for business purposes to make benefits, and the hidden danger of personal information leakage can be brought; in another case, the personal registration information is not shielded on the network, any user can view and search, and detailed information can be obtained through a search engine such as Google, and the like, so that the harm is extremely high.

In the prior art, the problem that the mail is easy to be invaded by the Trojan horse exists due to insufficient recognition of the mail Trojan horse.

Disclosure of Invention

The application provides a mail Trojan identification method based on mail behavior pattern analysis, which is used for solving the technical problem that the mail is easy to be invaded by Trojan due to insufficient identification of the mail Trojan in the prior art.

In view of the above problems, the present application provides a method and a system for identifying a Trojan horse based on analysis of a mail behavior pattern.

In a first aspect, the present application provides a method for identifying a Trojan horse based on analysis of a mail behavior pattern, the method comprising: respectively acquiring mail receiving and sending information according to the mail account numbers, and clustering the mail receiving and sending information to obtain mail clustering results; based on the mail clustering result, carrying out mail receiving and transmitting frequency analysis on various mail sets, and determining receiving and transmitting frequency; judging whether the receiving and transmitting frequency has a fixed heartbeat frequency or not; when the mail information exists, obtaining a heartbeat frequency characteristic according to the fixed heartbeat frequency, and filtering and extracting the mail receiving and sending information by utilizing the heartbeat frequency characteristic to obtain mail data of a specific mail account; carrying out similarity analysis on the text and the attachment content of the mail data of the specific mail account; when the similarity of the mail data meets a preset similarity threshold, performing behavior analysis on the mail body, and taking the mail with abnormal behaviors as a highly suspicious mail Trojan horse.

In a second aspect, the present application provides a mail Trojan recognition system based on mail behavior pattern analysis, the system comprising: the clustering module is used for respectively acquiring the mail receiving and sending information according to the mail account numbers, and clustering the mail receiving and sending information to obtain mail clustering results; the frequency analysis module is used for carrying out mail receiving and transmitting frequency analysis on various mail sets based on the mail clustering result and determining receiving and transmitting frequency; the judging module is used for judging whether the receiving and transmitting frequency has a fixed heartbeat frequency or not; the filtering and extracting module is used for obtaining heartbeat frequency characteristics according to the fixed heartbeat frequency when the mail information exists, and filtering and extracting the mail receiving and sending information by utilizing the heartbeat frequency characteristics to obtain mail data of a specific mail account; the similarity analysis module is used for carrying out similarity analysis on the text and the attachment content on the mail data of the specific mail account; and the behavior analysis module is used for performing behavior analysis on the mail body when the similarity of the mail data meets a preset similarity threshold value, and taking the mail with abnormal behaviors as a highly suspicious mail-shaped Trojan horse.

One or more technical solutions provided in the present application have at least the following technical effects or advantages:

the application provides a mail Trojan identification method based on mail behavior pattern analysis, relates to the technical field of intelligent identification, and solves the technical problem that mails are easy to be invaded by Trojan due to insufficient identification of the mail Trojan in the prior art, so that reasonable and accurate identification of the mail Trojan is realized, and further the invasion rate of the mail Trojan is reduced.

Drawings

Fig. 1 is a schematic flow chart of a method for identifying a Trojan horse based on analysis of a mail behavior pattern;

fig. 2 is a schematic diagram of a mail Trojan recognition system based on mail behavior pattern analysis.

Reference numerals illustrate: the system comprises a clustering module 1, a frequency analysis module 2, a judging module 3, a filtering and extracting module 4, a similarity analysis module 5 and a behavior analysis module 6.

Detailed Description

The method for identifying the E-mail Trojan based on E-mail behavior pattern analysis is used for solving the technical problem that E-mail is easy to be invaded by Trojan due to insufficient identification of the E-mail Trojan in the prior art.

Example 1

As shown in fig. 1, an embodiment of the present application provides a method for identifying a Trojan horse based on analysis of a mail behavior pattern, where the method includes:

step S100: respectively acquiring mail receiving and sending information according to the mail account numbers, and clustering the mail receiving and sending information to obtain mail clustering results;

specifically, the method for identifying the mail Trojan based on the mail behavior pattern analysis is applied to a system for identifying the mail Trojan based on the mail behavior pattern analysis, in the existing electronic mail, the sent mail, the received mail and the deleted mail are counted respectively for the mails under the current target mail account, and the counted mails are classified based on the access type, so that the target electronic mail can be classified into an HTTP access type, an IMAP access type, a POP3 access type and the like, a mail clustering result of the target mail is correspondingly obtained, and the mail Trojan is identified for later realization as an important reference basis.

Step S200: based on the mail clustering result, carrying out mail receiving and transmitting frequency analysis on various mail sets, and determining receiving and transmitting frequency;

specifically, based on the mail clustering result obtained above, the analysis of the corresponding sending and receiving frequencies of the mails of each category in the mail clustering result is performed, the sending and receiving frequencies of the mails are different according to mails with different attributes, and the common sending and receiving frequencies can include three types of sending and receiving frequencies, namely, no second connection frequency, every minute connection frequency, every hour connection frequency and the like, and further according to the determination of the sending and receiving frequencies of different categories, the purpose of identifying mail-type Trojans is ensured.

Step S300: judging whether the receiving and transmitting frequency has a fixed heartbeat frequency or not;

specifically, since the frequency of spontaneous mail sending has uncertainty, the safety hazard of sporadic mail is small, but the mail Trojan horse has definite sending frequency, and meanwhile, some other normal mails have fixed sending and receiving frequencies, so that the judgment needs to be carried out on the mails with the fixed sending and receiving frequencies, whether the counted mails with the fixed heartbeat frequency exist or not is judged, the fixed heartbeat frequency range is about 60 to 100 times per minute, namely, when the frequency of sending and receiving the mails of the current mail account in each minute reaches 60 to 100 times, the current sending and receiving frequency is regarded as the fixed heartbeat frequency, and the tamping basis of the mail type Trojan horse is identified for the subsequent realization.

Step S400: when the mail information exists, obtaining a heartbeat frequency characteristic according to the fixed heartbeat frequency, and filtering and extracting the mail receiving and sending information by utilizing the heartbeat frequency characteristic to obtain mail data of a specific mail account;

specifically, if the fixed heartbeat frequency is determined as the fixed heartbeat frequency, that is, the sending and receiving frequency of the current mail is already in the interval of 60 to 100 times per minute, and the extraction of the heartbeat frequency characteristic is performed on the mail with the fixed heartbeat frequency according to the fixed heartbeat frequency corresponding to the current mail, that is, the heartbeat frequency of the mail with the fixed heartbeat frequency is different from the heartbeat frequency of other mails with the fixed heartbeat frequency, the extraction is performed, the sending and receiving mail information is further filtered by using the heartbeat frequency characteristic, and the sending and receiving mail information conforming to the abnormal heartbeat frequency characteristic is screened and extracted, that is, the mail account corresponding to the screened sending and receiving mail information and the mail data corresponding to the target mail account have a limiting effect on realizing the identification of mail type Trojan.

Step S500: carrying out similarity analysis on the text and the attachment content of the mail data of the specific mail account;

specifically, firstly, the text and the attachment of the screened specific mail account are segmented, text features and attachment features corresponding to the text and the attachment are extracted from the text and the attachment, further, the mail body features are vectorized on the basis of the text features and the attachment features, a feature vector matrix is constructed, the feature vector matrix comprises a text feature vector matrix and an attachment feature vector matrix, and each column of the vector matrix is a vector. Therefore, the matrix can be regarded as a set of vectors, the vectors can be regarded as a column matrix, namely, the motion process of linearly transforming one vector to the other vector is a vector matrix, the text feature vector matrix and the accessory feature vector matrix are used as the basis, euclidean distance calculation is carried out on the text feature vector matrix and the accessory feature vector matrix, text similarity and accessory similarity are respectively obtained correspondingly, and the method has a profound effect on later realization of recognition of mail-type trojans.

Step S600: when the similarity of the mail data meets a preset similarity threshold, performing behavior analysis on the mail body, and taking the mail with abnormal behaviors as a highly suspicious mail Trojan horse.

Specifically, if the obtained similarity of the mail data meets a preset similarity threshold, performing behavior analysis on the current target mail body, wherein the obtained preset similarity threshold is preset by related technicians according to the similar data amount of the mail data, and firstly performing anomaly analysis on the mail sending time, the mail deleting time and the mail receiving and sending person of the mail body respectively, so as to judge whether the mail sending time, the mail deleting time and the mail receiving and sending person of the mail body have anomaly, further determining whether the analysis result has anomaly, if so, recording the current anomaly as a highly-acceptable mail Trojan, further, sending the highly-suspicious mail Trojan to a security analysis channel, performing security assessment on a mail account, correspondingly generating emergency processing information based on the assessment result, and performing network isolation and malicious program removal on the mail sending log of the anomaly mail account, thereby reducing the intrusion rate of the mail Trojan.

Further, step S100 of the present application further includes:

step S110: reading mail information data of a mail server, wherein the mail information data comprises deleted mail information;

step S120: clustering the mail information data according to preset element information, wherein the preset element information comprises a mail subject length, a mail text length, an attachment size and a combination element thereof, and the mail clustering result is obtained and comprises a plurality of mail sets.

Specifically, all mail information data contained in a mail server corresponding to a target mail is read, the mail information data contains received mail information, sent mail information and deleted mail information, the mail information data are clustered according to preset element information, the preset element information is information containing mail subject length, mail body length, attachment size and combination elements, the mail information data are divided into a plurality of classes composed of similar elements according to the mail subject length, the mail body length, the attachment size and the combination elements in the preset element information, and therefore mail clustering results are correspondingly obtained, the mail clustering results correspondingly contain a plurality of mail sets, and therefore the technical effect of providing important basis for later realization of recognition of mail type Trojan is achieved.

Further, step S500 of the present application further includes:

step S510: the text and the attachment of the specific mail account are segmented, and text characteristics and attachment characteristics are extracted;

step S520: carrying out mail body feature vectorization processing by utilizing the text feature and the accessory feature to construct a feature vector matrix, wherein the feature vector matrix comprises a text feature vector matrix and an accessory feature vector matrix;

step S530: and respectively obtaining the text similarity and the attachment similarity through Euclidean distance calculation according to the text feature vector matrix and the attachment feature vector matrix.

Specifically, on the basis of filtering and extracting the mail data of a specific mail account based on the heartbeat frequency feature, segmenting the mail information text and the attachment in the obtained specific mail account, namely, extracting the text content in the mail information and the phrase in the attachment content, wherein the phrases in the normal mail are logically and coherently separated, but the phrases in the malicious mail are not logically and coherently separated, so that after segmenting the mail information, the text feature and the attachment feature in the mail information are extracted, the extracted text feature and attachment feature are utilized to vectorize the target mail body, the target mail body calculation frame is required to be adjusted, so that the feature vector matrix is constructed, the text feature vector matrix and the attachment feature vector matrix are contained in the feature vector matrix, further, based on a text distance calculation formula, the Euclidean distance of the feature vector is calculated on the text distance calculation formula, the Euclidean distance calculation formula of the feature vector is calculated on the attachment feature vector matrix, and the attachment feature vector matrix is as follows

Wherein DisE (z) _i ,z ₁ ) For mail body z _i And mail body z ₁ The Euclidean distance between the two, p is the mail text z ₁ Is used for the number of feature vectors.

So as to calculate the average value according to the Euclidean distance of the calculated feature vector by using the following formula

And acquiring the average value of the Euclidean distance, further taking the average value of the Euclidean distance as a screening threshold value, finally carrying out similarity judgment on the feature vector of the mail body, and respectively screening the text similarity and the attachment similarity through the similarity judgment so as to ensure the high efficiency when the mail type Trojan horse is identified.

Further, step S530 of the present application includes:

step S531: by the formula:calculating Euclidean distance of feature vector, wherein DisE (z _i ,z ₁ ) For mail body z _i And mail body z ₁ The Euclidean distance between the two, p is the mail text z ₁ Is a feature vector number of (a);

step S532: according to the Euclidean distance of the feature vector, the formula is utilized:obtaining an Euclidean distance average value;

step S533: and taking the Euclidean distance average value as a screening threshold value, and judging the similarity of the feature vector of the mail body by using the screening threshold value.

Specifically, the distance between the text feature vector matrix and the accessory feature vector matrix in the feature vector matrix obtained above is calculated by using the Euclidean distance calculation formula

The distance calculation is carried out on the feature vector, namely the Euclidean distance of the feature vector, the measurement is the absolute distance between the text feature vector matrix and the accessory feature vector matrix in the multidimensional space, then the average value of the Euclidean distance obtained at present is calculated, and the formula is used for:and correspondingly obtaining an Euclidean distance average value after calculation, and simultaneously judging the similarity of the feature vectors of the mail body on the basis of a screening threshold value, wherein the screening threshold value is the Euclidean distance average value, and finally, the technical effect of providing reference for identifying mail type Trojan horse is achieved.

According to the method, the judgment of the similarity of the accessories can be performed.

Further, step S600 of the present application further includes:

step S610: analyzing the mail sending time of the mail body, judging whether the mail sending time is abnormal or not, and obtaining a first behavior analysis result;

step S620: analyzing the mail deleting time of the mail body, judging whether the mail deleting time is abnormal or not, and obtaining a second behavior analysis result;

step S630: carrying out mail receiving and sending person analysis on the mail body, judging whether the mail receiving and sending person is abnormal or not, and obtaining a third behavior analysis result;

step S640: and determining whether abnormal behaviors exist according to the analysis results of the first, second and third behaviors.

Specifically, when the similarity of the mail data meets a preset similarity threshold value, performing behavior analysis on the mail body, wherein the obtained preset similarity threshold is preset by related technicians according to the similar data amount of the mail data, the mail sending time of the mail body is firstly analyzed respectively, and meanwhile, whether the mail sending time is abnormal or not is judged, i.e., the reception time of the mail has a limit in units of milliseconds, when the mail transmission time exceeds the limit, if the mail transmission time is within the limit, the mail transmission time is determined to be abnormal, and if the mail transmission time is within the limit, the mail transmission time is normal, so that a first behavior analysis result corresponding to the mail transmission time is obtained, analyzing the mail deleting time of the mail body, judging whether the deleting time of the mail body is abnormal or not, deleting the mail in the mail body by a system instead of being artificial, the deleting time is recorded in the system, and is judged to be abnormal, when the mail deleting operation in the mail body is not the system but is manually and automatically deleted, the deleting time is judged to be normal, so as to obtain a second behavior analysis result corresponding to the deleting time, the method comprises analyzing the mail sender of the mail body, judging whether the mail sender of the mail body is abnormal, when the mail sender is unable to connect to the mail server, it is determined to be abnormal, and if the sender and receiver of the mail can be normally connected to the mail server, it is determined to be normal, thereby obtaining a third behavior analysis result corresponding to the first behavior analysis result, further based on the first behavior analysis result, the second behavior analysis result and the third behavior analysis result obtained by analysis, and determining the abnormal behavior of the mail body, and finally achieving the technical effect of identifying the mail type Trojan horse.

Further, step S600 of the present application further includes:

step S650: sending the highly suspicious mail Trojan to a security analysis channel, and carrying out security evaluation on a mail account to obtain security evaluation information;

step S660: generating emergency processing information according to the security evaluation information, wherein the emergency processing information is used for checking and sealing the mail account with the security evaluation information as abnormal;

step S670: and obtaining a mail sending log of the abnormal mail account, and carrying out network isolation and malicious program removal according to the mail sending log.

Specifically, the method comprises the steps of extracting the highly suspicious Trojan horse which is judged to be highly suspicious, sending the extracted highly suspicious Trojan horse to a security analysis channel, carrying out security assessment on a current mail account through the security analysis channel, namely, whether the mail account of the highly suspicious Trojan horse exists in the security analysis channel or not, if the highly suspicious Trojan horse is determined to be the mail Trojan horse in the security analysis channel, the assessment result of the current mail account is an abnormal account, if the highly suspicious Trojan horse is determined to be a common mail in the security analysis channel, the assessment result of the current mail account is a safe account, correspondingly generating emergency processing information on the mail account which is evaluated to be the abnormal account based on the security evaluation information, carrying out checking processing on the mail account which is evaluated to be the abnormal mail account, extracting a mail transmission log of the currently checked abnormal mail account, carrying out network isolation on the abnormal mail account in the current mail transmission log, and then clearing a malicious program in the current mail account, thereby realizing identification of the mail Trojan horse.

Example two

Based on the same inventive concept as the mail Trojan recognition method based on the mail behavior pattern analysis in the foregoing embodiment, as shown in fig. 2, the present application provides a mail Trojan recognition system based on the mail behavior pattern analysis, the system includes:

the clustering module 1 is used for respectively acquiring the mail receiving and sending information according to the mail account numbers, and clustering the mail receiving and sending information to obtain mail clustering results;

the frequency analysis module 2 is used for carrying out mail receiving and transmitting frequency analysis on various mail sets based on the mail clustering result and determining receiving and transmitting frequency;

the judging module 3 is used for judging whether the receiving and transmitting frequency has a fixed heartbeat frequency or not;

the filtering and extracting module 4 is used for obtaining heartbeat frequency characteristics according to the fixed heartbeat frequency when the mail is present, and filtering and extracting the mail receiving and sending information by utilizing the heartbeat frequency characteristics to obtain mail data of a specific mail account;

the similarity analysis module 5 is used for performing similarity analysis of text and attachment content on the mail data of the specific mail account;

and the behavior analysis module 6 is used for performing behavior analysis on the mail body when the similarity of the mail data meets a preset similarity threshold value, and taking the mail with abnormal behaviors as a highly suspicious mail-type Trojan horse.

Further, the system further comprises:

the reading module is used for reading mail information data of the mail server, wherein the mail information data comprises deleted mail information;

the mail clustering module is used for clustering the mail information data according to preset element information, wherein the preset element information comprises mail subject length, mail text length, attachment size and combination elements thereof, the mail clustering result is obtained, and the mail clustering result comprises a plurality of mail sets.

Further, the system further comprises:

the word segmentation module is used for segmenting the text and the attachment of the specific mail account and extracting text characteristics and attachment characteristics;

the vectorization module is used for carrying out mail body characteristic vectorization processing by utilizing the text characteristic and the accessory characteristic to construct a characteristic vector matrix, wherein the characteristic vector matrix comprises a text characteristic vector matrix and an accessory characteristic vector matrix;

and the calculation module is used for respectively obtaining the text similarity and the accessory similarity through Euclidean distance calculation according to the text feature vector matrix and the accessory feature vector matrix.

Further, the system further comprises:

the formula module is used for passing through the formula:calculating Euclidean distance of feature vector, wherein DisE (z _i ,z ₁ ) For mail body z _i And mail body z ₁ The Euclidean distance between the two, p is the mail text z ₁ Is a feature vector number of (a);

the average module is used for utilizing the formula according to the Euclidean distance of the feature vector:obtaining an Euclidean distance average value;

and the screening module is used for taking the Euclidean distance average value as a screening threshold value and judging the similarity of the feature vector of the mail body by utilizing the screening threshold value.

Further, the system further comprises:

the first abnormality judgment module is used for analyzing the mail sending time of the mail body and judging whether the mail sending time is abnormal or not to obtain a first behavior analysis result;

the second abnormality judgment module is used for analyzing the mail deleting time of the mail body and judging whether the mail deleting time is abnormal or not to obtain a second behavior analysis result;

the third abnormality judgment module is used for carrying out mail transceiver analysis on the mail body and judging whether the mail transceiver is abnormal or not to obtain a third behavior analysis result;

the abnormal behavior determining module is used for determining whether abnormal behaviors exist according to the first, second and third behavior analysis results.

Further, the system further comprises:

the evaluation module is used for sending the highly suspicious mail Trojan to a security analysis channel, and carrying out security evaluation on the mail account to obtain security evaluation information;

the checking and sealing module is used for generating emergency processing information according to the safety evaluation information, and the emergency processing information is used for checking and sealing the mail account with the safety evaluation information being abnormal;

and the clearing module is used for obtaining a mail sending log of the abnormal mail account and carrying out network isolation and malicious program clearing according to the mail sending log.

In the present disclosure, through the foregoing detailed description of a method for identifying a Trojan horse based on a mail behavior pattern analysis, those skilled in the art may clearly know a method and a system for identifying a Trojan horse based on a mail behavior pattern analysis in this embodiment, and for the apparatus disclosed in the embodiment, the description is relatively simple because it corresponds to the method disclosed in the embodiment, and relevant places refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for identifying a Trojan horse based on analysis of a mail behavior pattern, the method comprising:

respectively acquiring mail receiving and sending information according to the mail account numbers, and clustering the mail receiving and sending information to obtain mail clustering results;

based on the mail clustering result, carrying out mail receiving and transmitting frequency analysis on various mail sets, and determining receiving and transmitting frequency;

judging whether the receiving and transmitting frequency has a fixed heartbeat frequency or not;

when the mail information exists, obtaining a heartbeat frequency characteristic according to the fixed heartbeat frequency, and filtering and extracting the mail receiving and sending information by utilizing the heartbeat frequency characteristic to obtain mail data of a specific mail account;

carrying out similarity analysis on the text and the attachment content of the mail data of the specific mail account;

when the similarity of the mail data meets a preset similarity threshold, performing behavior analysis on the mail body, and taking the mail with abnormal behaviors as a highly suspicious mail Trojan horse.

2. The method of claim 1, wherein the collecting the email messages according to the email accounts, clustering the email messages, and obtaining the email clustering result includes:

reading mail information data of a mail server, wherein the mail information data comprises deleted mail information;

clustering the mail information data according to preset element information, wherein the preset element information comprises a mail subject length, a mail text length, an attachment size and a combination element thereof, and the mail clustering result is obtained and comprises a plurality of mail sets.

3. The method of claim 1, wherein performing a similarity analysis of text and attachment content on the mail data of the specific mail account includes:

the text and the attachment of the specific mail account are segmented, and text characteristics and attachment characteristics are extracted;

carrying out mail body feature vectorization processing by utilizing the text feature and the accessory feature to construct a feature vector matrix, wherein the feature vector matrix comprises a text feature vector matrix and an accessory feature vector matrix;

and respectively obtaining the text similarity and the attachment similarity through Euclidean distance calculation according to the text feature vector matrix and the attachment feature vector matrix.

4. The method of claim 1, wherein obtaining text similarity from the text feature vector matrix by euclidean distance calculation comprises:

by the formulaCalculating Euclidean distance of feature vector, wherein +.>For mail body z _i And mail body z ₁ The Euclidean distance between the two, p is the mail text z ₁ Is a feature vector number of (a);

according to the Euclidean distance of the feature vector, the formula is utilized:obtaining an Euclidean distance average value;

and taking the Euclidean distance average value as a screening threshold value, and judging the similarity of the feature vector of the mail body by using the screening threshold value.

5. The method of claim 1, wherein the performing behavior analysis on the mail body, regarding the mail with abnormal behavior as a highly suspicious mail-type Trojan, comprises:

analyzing the mail sending time of the mail body, judging whether the mail sending time is abnormal or not, and obtaining a first behavior analysis result;

analyzing the mail deleting time of the mail body, judging whether the mail deleting time is abnormal or not, and obtaining a second behavior analysis result;

carrying out mail receiving and sending person analysis on the mail body, judging whether the mail receiving and sending person is abnormal or not, and obtaining a third behavior analysis result;

and determining whether abnormal behaviors exist according to the analysis results of the first, second and third behaviors.

6. The method of claim 1, wherein the method further comprises:

sending the highly suspicious mail Trojan to a security analysis channel, and carrying out security evaluation on a mail account to obtain security evaluation information;

generating emergency processing information according to the security evaluation information, wherein the emergency processing information is used for checking and sealing the mail account with the security evaluation information as abnormal;

and obtaining a mail sending log of the abnormal mail account, and carrying out network isolation and malicious program removal according to the mail sending log.

7. A mail Trojan recognition system based on mail behavior pattern analysis, the system comprising:

the clustering module is used for respectively acquiring the mail receiving and sending information according to the mail account numbers, and clustering the mail receiving and sending information to obtain mail clustering results;

the frequency analysis module is used for carrying out mail receiving and transmitting frequency analysis on various mail sets based on the mail clustering result and determining receiving and transmitting frequency;

the judging module is used for judging whether the receiving and transmitting frequency has a fixed heartbeat frequency or not;

the filtering and extracting module is used for obtaining heartbeat frequency characteristics according to the fixed heartbeat frequency when the mail information exists, and filtering and extracting the mail receiving and sending information by utilizing the heartbeat frequency characteristics to obtain mail data of a specific mail account;

the similarity analysis module is used for carrying out similarity analysis on the text and the attachment content on the mail data of the specific mail account;

and the behavior analysis module is used for performing behavior analysis on the mail body when the similarity of the mail data meets a preset similarity threshold value, and taking the mail with abnormal behaviors as a highly suspicious mail-shaped Trojan horse.