CN116319654A

CN116319654A - Intelligent type junk mail scanning method

Info

Publication number: CN116319654A
Application number: CN202310385460.XA
Authority: CN
Inventors: 王宇飞; 戚红建; 韩硕; 张洪卫; 秦绪帅; 邓旭楠; 朱梦迪; 秦子杨; 薛松; 孟庆宇
Original assignee: Beijing Bidding Branch Of China Huaneng Group Co ltd; Huaneng Information Technology Co Ltd
Current assignee: Beijing Bidding Branch Of China Huaneng Group Co ltd; Huaneng Information Technology Co Ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-06-23

Abstract

The invention relates to the technical field of mail transmission and discloses an intelligent type junk mail scanning method, which is used for receiving a mail to be transmitted, obtaining an IP address and mail information of the mail to be transmitted, judging whether the mail to be transmitted is a suspected junk mail according to the IP address, if so, preprocessing the suspected junk mail according to a mail subject and a mail text, carrying out characteristic processing on the preprocessed suspected junk mail to obtain a trusted value of the suspected junk mail, judging whether the suspected junk mail meets a first preset condition according to the relation between the trusted value and a preset trusted value, if so, transmitting the suspected junk mail to a virtual mail receiving server, judging whether the suspected junk mail has malicious attack behaviors in the first preset time, and when the suspected junk mail has the malicious attack behaviors, scanning the suspected junk mail as the junk mail.

Description

Intelligent type junk mail scanning method

Technical Field

The invention relates to the technical field of mail transmission, in particular to an intelligent junk mail scanning method.

Background

Junk e-mail (simply referred to as spam) refers to any e-mail that is forced into a user's mailbox without permission from the user. Email is one of the basic applications of internet users today, while spam is primarily sent through electronic mailboxes. In the current field of e-mails, junk mails are increasingly flooded, so that the processing time of normal mail users is increased, precious resources of a mail system are wasted, and the process of obtaining useful information by users is blocked, so that the junk mails are a problem to be solved urgently in the current field of network communication.

In the prior art, a black-and-white list is set in advance, mails sent by mailbox users with the white list pass preferentially and cannot be rejected as junk mails, and mails sent by mailbox users with the black list take interception operations and cannot pass. However, the junk mail scanning mode can enable a mail sender to intercept the mailboxes of legal users in a Trojan or virus program mode and the like, so that the mailboxes of the legal users slowly send junk mails in a large quantity, so that the interception of a blacklist is bypassed, and further the sending of the junk mails is completed.

Therefore, how to provide a method for accurately scanning the junk mail is a technical problem to be solved at present.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide an intelligent junk mail scanning method, which solves the technical problem that junk mails cannot be accurately scanned, and further avoids the phenomenon that the junk mails influence network transmission and operation speed to cause congestion of mail servers.

In order to achieve the above object, the present invention provides an intelligent spam scanning method, the method comprising:

receiving a mail to be sent, and analyzing the mail to be sent to obtain an IP address and mail information of the mail to be sent, wherein the mail information comprises a mail subject and a mail text;

judging whether the mail to be sent is a suspected junk mail or not according to the IP address, and when the mail to be sent is the suspected junk mail, preprocessing the suspected junk mail according to the mail subject and the mail body;

performing feature processing on the pre-processed suspected junk mail to obtain a trusted value of the suspected junk mail, and judging whether the suspected junk mail meets a first preset condition according to the relation between the trusted value of the suspected junk mail and a preset trusted value;

when the suspected junk mail accords with the first preset condition, the suspected junk mail is sent to a virtual mail receiving server, whether malicious attack acts exist in the suspected junk mail in a first preset time is judged, and when the suspected junk mail has the malicious attack acts, the suspected junk mail is scanned to be junk mail.

In one embodiment, when determining whether the mail to be sent is a suspected spam according to the IP address, the method includes:

determining the mail sending server of the mail to be sent according to the IP address, judging whether the mail sending server is in a blacklist,

if the mail sending server side is in the blacklist, judging that the mail to be sent is junk mail;

and if the mail sending server is not in the blacklist, judging that the mail to be sent is suspected junk mail.

In one embodiment, when the pre-processing is performed on the suspected spam according to the mail subject and the mail body, the pre-processing includes:

performing word segmentation processing on the mail subject and the mail text to obtain a plurality of segmented words, and determining the occurrence times of the segmented words;

identifying and classifying the word segmentation according to the relation between the occurrence times and the preset occurrence times,

when the occurrence frequency is larger than the preset occurrence frequency, recognizing the word segmentation as a high-frequency word;

and when the occurrence frequency is smaller than or equal to the preset occurrence frequency, recognizing the word segmentation as a low-frequency word.

In one embodiment, after identifying and classifying the word segment according to the relationship between the occurrence frequency and the preset occurrence frequency, the method further includes:

the weight of the high frequency word is calculated according to the following formula:

wherein phi is the weight of the high-frequency word, X _a，b Is the number of times that the high frequency word a appears in the suspected spam b.

In one embodiment, when performing feature processing on the pre-processed suspected spam to obtain a trusted value of the suspected spam, the method includes:

judging whether the high-frequency word is in a preset recognition library or not, and if the high-frequency word is in the preset recognition library, marking the high-frequency word as a garbage keyword;

and obtaining the occurrence times of the spam keywords, and calculating the credible value of the suspected spam according to the occurrence times of the spam keywords and the weight of the high-frequency words.

In one embodiment, the trusted value of the suspected spam is calculated according to the following equation:

P=K×φ；

wherein P is the trusted value of the suspected junk mail, K is the occurrence frequency of junk keywords, and phi is the weight of high-frequency words.

In one embodiment, when judging whether the suspected spam accords with a first preset condition according to the relation between the trusted value of the suspected spam and a preset trusted value, the method includes:

if the trusted value of the suspected junk mail is greater than or equal to the preset trusted value, judging that the suspected junk mail accords with the first preset condition;

and if the trusted value of the suspected junk mail is smaller than the preset trusted value, judging that the suspected junk mail does not accord with the first preset condition, scanning the suspected junk mail as a normal mail, and sending the normal mail.

In one embodiment, before the suspected spam is sent to the virtual mail receiving server, the method further includes:

and acquiring the target IP address of the suspected junk mail, and correcting the virtual IP address of the virtual mail receiving server based on the target IP address.

In one embodiment, after scanning the suspected spam to be spam, the method further comprises:

acquiring the quantity A of junk mails sent by the mail sending server in a second preset time;

and setting the network speed of the mail sending server according to the quantity A of the junk mails.

In one embodiment, when setting the network speed of the mail sending server according to the number of the junk mails, the method includes:

presetting a junk mail quantity matrix B sent by a mail sending server, and setting B (B1, B2, B3 and B4), wherein B1 is a first preset junk mail quantity, B2 is a second preset junk mail quantity, B3 is a third preset junk mail quantity, B4 is a fourth preset junk mail quantity, and B1 is more than B2 and less than B3 and less than B4;

presetting a network speed matrix C of a mail sending server, and setting C (C1, C2, C3, C4 and C5), wherein C1 is a first preset network speed, C2 is a second preset network speed, C3 is a third preset network speed, C4 is a fourth preset network speed, C5 is a fifth preset network speed, and C1 is more than C2 and less than C3 and less than C4 and less than C5;

setting the network speed of the mail sending server according to the relation between the number A of the junk mails sent by the mail sending server and the number of the preset junk mails:

when A is smaller than B1, selecting the first preset network speed C1 as the network speed of the mail sending server;

when B1 is less than or equal to A and less than B2, selecting the second preset network speed C2 as the network speed of the mail sending server;

when B2 is less than or equal to A and less than B3, selecting the third preset network speed C3 as the network speed of the mail sending server;

when B3 is less than or equal to A and less than B4, selecting the fourth preset network speed C4 as the network speed of the mail sending server;

and when B4 is less than or equal to A, selecting the fifth preset network speed C5 as the network speed of the mail sending server.

The invention provides an intelligent junk mail scanning method, which has the following beneficial effects compared with the prior art:

the invention discloses an intelligent type junk mail scanning method, which is used for receiving a mail to be sent, analyzing the mail to be sent to obtain an IP address and mail information of the mail to be sent, judging whether the mail to be sent is a suspected junk mail according to the IP address, when the mail to be sent is the suspected junk mail, preprocessing the suspected junk mail according to the mail theme and the mail text, carrying out characteristic processing on the preprocessed suspected junk mail to obtain a trusted value of the suspected junk mail, judging whether the suspected junk mail meets a first preset condition according to the relation between the trusted value of the suspected junk mail and the preset trusted value, sending the suspected junk mail to a virtual mail receiving server when the suspected junk mail meets the first preset condition, judging whether the suspected junk mail has malicious attack behavior in the first preset time, and scanning the suspected junk mail as the junk mail when the suspected junk mail has the malicious attack behavior.

Drawings

FIG. 1 is a flow chart of an intelligent garbage mail scanning method according to an embodiment of the invention;

fig. 2 is a schematic flow chart of preprocessing a suspected spam according to a mail subject and a mail body in an embodiment of the invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

In the description of the present application, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application.

The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.

The following is a description of preferred embodiments of the invention, taken in conjunction with the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention discloses an intelligent spam scanning method, which includes:

s110: and receiving the mail to be sent, and analyzing the mail to be sent to obtain the IP address and the mail information of the mail to be sent, wherein the mail information comprises a mail subject and a mail body.

S120: judging whether the mail to be sent is a suspected junk mail or not according to the IP address, and when the mail to be sent is the suspected junk mail, preprocessing the suspected junk mail according to the mail subject and the mail body.

In order to prevent a phenomenon that illegal personnel slowly send a large amount of junk mails by using a mailbox of a legal user, in some embodiments of the present application, when judging whether the mail to be sent is a suspected junk mail according to the IP address, the method includes:

In this embodiment, a black-and-white list is set in advance, and by acquiring a mail sending server of a mail to be sent, whether the mail sending server is in the black list is determined, when the mail sending server is in the black list, the mail to be sent is directly determined to be a junk mail, and when the mail to be sent is not in the black list, the mail to be sent is determined to be a suspected junk mail, further determination needs to be made, and thus the phenomenon that illegal persons use a mailbox of legal users to slowly send a large amount of junk mails is prevented, and the identification accuracy of the junk mails is improved.

As shown in fig. 2, in order to improve the recognition efficiency of the spam, in some embodiments of the present application, when preprocessing the suspected spam according to the mail subject and the mail body, the method includes:

s121: performing word segmentation processing on the mail subject and the mail text to obtain a plurality of segmented words, and determining the occurrence times of the segmented words;

s122: identifying and classifying the word segmentation according to the relation between the occurrence times and the preset occurrence times,

In this embodiment, the pre-processing is performed on the suspected junk mail according to the parsed mail subject and mail text, where the pre-processing includes word segmentation processing and word segmentation recognition classification, and the word segmentation processing is performed on the mail subject and mail text to obtain a plurality of words, for example, the mail subject is "mail standard format", the mail text is "in order to standardize the mail standard format", and professional enterprise culture is created, so that the enterprise image is improved, and therefore we need to formulate a unified mail standard format. The mail subject and mail text can be divided into mail standard format, purpose, standard, mail standard format, construction, enterprise culture, specialization, promotion, enterprise image, therefore, we, need, formulation, share, unification, standard and mail standard format. The method comprises the steps of determining the occurrence times of each word, if the mail standard format is 3 times, and the mail standard format is 2 times, identifying and classifying the words according to the relation between the occurrence times and the preset occurrence times, identifying the corresponding word as a high-frequency word when the occurrence times are larger than the preset occurrence times, identifying the corresponding word as a low-frequency word when the occurrence times are smaller than or equal to the preset occurrence times, and identifying the word as a high-frequency word when the preset occurrence times are 2 times if the occurrence times are 2 times, wherein the word is the mail standard format, other words are the low-frequency word, specific word segmentation rules and the preset occurrence times can be set according to actual conditions.

In some embodiments of the present application, after identifying and classifying the word segment according to the relationship between the occurrence frequency and the preset occurrence frequency, the method further includes:

In this embodiment, the weight of the high-frequency word is calculated according to the above formula, and by calculating the weight of the high-frequency word, reliable data support can be provided for calculating the trusted value of the suspected spam.

S130: and carrying out feature processing on the pre-processed suspected junk mail to obtain a trusted value of the suspected junk mail, and judging whether the suspected junk mail accords with a first preset condition according to the relation between the trusted value of the suspected junk mail and a preset trusted value.

In order to further improve the recognition accuracy of the spam, in some embodiments of the present application, when performing feature processing on the pre-processed suspected spam to obtain a trusted value of the suspected spam, the method includes:

In some embodiments of the present application, the trusted value of the suspected spam is calculated according to the following formula:

P=K×φ；

In this embodiment, after each word is divided into a high-frequency word and a low-frequency word, whether the high-frequency word appears in a preset recognition library is determined, when the high-frequency word appears in the preset recognition library, the current high-frequency word is a spam keyword, at this time, the number of occurrences of the spam keyword is obtained, and the trusted value of the suspected spam is calculated according to the relationship between the spam keyword and the corresponding high-frequency word weight, and it should be understood that if the spam keyword is greater than 1, the trusted values corresponding to each spam keyword are added, if the spam keyword appears for 30 times, the weight of "cash" is 0.8, and the weight of "cash" appears for 20 times, and the weight of "cash" is 0.75, at this time, the trusted value of the suspected spam is 30×0.8+20×0.75=39. The above examples are not particularly limited, and the method and the device can accurately identify the junk mail by calculating the trusted value of the suspected junk mail, thereby obviously improving the identification rate of the junk mail.

In some embodiments of the present application, when determining whether the suspected spam meets the first preset condition according to the relationship between the trusted value of the suspected spam and the preset trusted value, the method includes:

In this embodiment, when the trusted value of the suspected spam is calculated, whether the suspected spam meets the first preset condition is determined according to the relation between the trusted value of the suspected spam and the preset trusted value, if the trusted value of the suspected spam is greater than or equal to the preset trusted value, the suspected spam is determined to meet the first preset condition, if the trusted value of the suspected spam is less than the preset trusted value, the suspected spam is determined not to meet the first preset condition, the suspected spam is scanned as a normal mail, and the normal mail is sent.

S140: when the suspected junk mail accords with the first preset condition, the suspected junk mail is sent to a virtual mail receiving server, whether malicious attack acts exist in the suspected junk mail in a first preset time is judged, and when the suspected junk mail has the malicious attack acts, the suspected junk mail is scanned to be junk mail.

In order to prevent erroneous judgment, in some embodiments of the present application, before the suspected spam is sent to the virtual mail receiving server, the method further includes:

In this embodiment, when the suspected spam accords with the first preset condition, the destination IP address of the suspected spam is obtained, and the virtual IP address of the virtual mail receiving server is corrected based on the destination IP address, where the IP address of the virtual mail receiving server may be updated according to the actual situation, so as to confuse an illegal person, determine whether the suspected spam has a malicious attack in the virtual mail receiving server, and scan the suspected spam as a spam when the suspected spam has a malicious attack.

In order to prevent the mail sending server from continuously sending the spam, in some embodiments of the present application, after scanning the suspected spam into the spam, the method further includes:

In some embodiments of the present application, when setting the network speed of the mail sending server according to the number of spam, the method includes:

In this embodiment, the number a of junk mails sent by the mail sending server in the second preset time is obtained, and the network speed of the mail sending server is set according to the relationship between the number a of junk mails sent by the mail sending server and the number of each preset junk mail.

In summary, the embodiment of the invention obtains the IP address and the mail information of the mail to be sent by receiving the mail to be sent and analyzing the mail to be sent, the mail information comprises a mail subject and a mail text, whether the mail to be sent is a suspected junk mail or not is judged according to the IP address, when the mail to be sent is the suspected junk mail, the suspected junk mail is preprocessed according to the mail subject and the mail text, the preprocessed suspected junk mail is subjected to characteristic processing to obtain the trusted value of the suspected junk mail, whether the suspected junk mail meets a first preset condition is judged according to the relation between the trusted value of the suspected junk mail and the preset trusted value, when the suspected junk mail meets the first preset condition, the suspected junk mail is sent to a virtual mail receiving server, and whether the suspected junk mail has malicious attack behavior in the first preset time is judged, and when the suspected junk mail has the malicious attack behavior, the suspected junk mail is scanned into the junk mail.

In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

Although the invention has been described hereinabove with reference to embodiments, various modifications thereof may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the features of the disclosed embodiments may be combined with each other in any manner as long as there is no structural conflict, and the entire description of these combinations is not made in the present specification merely for the sake of omitting the descriptions and saving resources. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Those of ordinary skill in the art will appreciate that: the above is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that the present invention is described in detail with reference to the foregoing embodiments, and modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An intelligent spam scanning method, the method comprising:

2. The intelligent spam scanning method according to claim 1, wherein when determining whether the mail to be sent is a suspected spam according to the IP address, comprising:

3. The intelligent spam scanning method of claim 1, wherein when the suspected spam is pre-processed according to the mail subject and the mail body, comprising:

4. The intelligent spam scanning method of claim 3, further comprising, after identifying and classifying the tokens according to a relationship between the frequency of occurrence and a predetermined frequency of occurrence:

；

5. The intelligent spam scanning method according to claim 4, wherein when performing feature processing on the pre-processed suspected spam to obtain a trusted value of the suspected spam, the method comprises:

6. The intelligent spam scanning method of claim 5, wherein the trusted value of the suspected spam is calculated according to the following equation:

P=K×φ；

7. The intelligent spam scanning method according to claim 1, wherein when determining whether the suspected spam meets a first preset condition according to a relationship between the trusted value of the suspected spam and a preset trusted value, comprising:

8. The intelligent spam scanning method of claim 1, further comprising, prior to sending the suspected spam to a virtual mail receiving server:

9. The intelligent spam scanning method of claim 1, further comprising, after scanning the suspected spam as spam:

10. The intelligent spam scanning method of claim 9, wherein when setting the network speed of the mail sending server according to the number of spam, comprising: