CN114629870A

CN114629870A - Junk mail filtering method, device, system and storage medium

Info

Publication number: CN114629870A
Application number: CN202011468519.4A
Authority: CN
Inventors: 李天明
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-06-14

Abstract

The application discloses a junk mail filtering method, which comprises the following steps: reading the title and the content of the mail; performing text classification on the titles to form a title phrase; judging whether the title phrases contain sensitive phrases or not according to a filtering rule; if yes, marking the mail as a junk mail, and interrupting the operation; if not, text classification is carried out on the content to form content word group; judging whether the content phrases contain sensitive phrases or not according to a filtering rule; if yes, marking the mail as a junk mail, and interrupting the operation; if not, the mail is marked as a normal mail. The junk mail filtering method provided by the application can effectively identify the junk mails, and reduces the quantity of the junk mails received by users.

Description

Junk mail filtering method, device, system and storage medium

Technical Field

The present application relates to the field of internet communications technologies, and in particular, to a method, an apparatus, a system, and a storage medium for filtering spam.

Background

Email is a communication method for providing information exchange by electronic means, and is the most widely used service of the internet. Through the e-mail system of the network, the user can contact the network user in any corner of the world in a very quick way (the user can send the information to any specified destination in the world within a few seconds) at a very low price (only the network fee is needed no matter where the user sends the information), and the user can contact the network user in any corner of the world.

The e-mail may be in various forms of text, images, sound, etc. Meanwhile, the user can obtain a large amount of free news and special mails, and easy information search is easily realized. The existence of the e-mail greatly facilitates the communication between people and promotes the development of society.

Spam, such as advertising mails for various commercial promotions or phishing mails for stealing user account information, or reaction mails for promoting reaction information, is often present in emails, and seriously threatens the sharing, interactivity and openness of network resources, and influences the experience of users using emails.

Therefore, designing a spam filtering method which can effectively identify spam and reduce the quantity of spam received by a user is a problem to be solved by technical personnel in the field.

Disclosure of Invention

In order to solve the technical problem, the application provides a spam filtering method which can effectively identify spam and reduce the quantity of spam received by a user.

The technical scheme provided by the application is as follows:

a junk mail filtering method comprises the following steps:

reading the title and the content of the mail;

performing text classification on the titles to form a title phrase;

judging whether the title phrases contain sensitive phrases or not according to filtering rules;

if yes, marking the mail as a junk mail, and interrupting the operation;

if not, text classification is carried out on the content to form content word group;

judging whether the content phrases contain sensitive phrases or not according to a filtering rule;

if yes, marking the mail as a junk mail, and interrupting the operation;

if not, the mail is marked as normal mail.

Preferably, if the spam email is included, the email is marked as spam email, and the interrupting operation specifically comprises:

if yes, adding one to the record value of the sending mailbox;

and marking the mail as junk mail and interrupting the operation.

Further, before the reading of the title and the content of the mail, the method further includes:

reading the mail sending mailbox of the mail;

judging whether the recorded value of the sending mailbox is greater than a frequency threshold value or not;

if the judgment result is yes, the mail is marked as a junk mail, and the operation is interrupted;

if the judgment result is negative, the next step is carried out.

if yes, acquiring the occurrence times of the sensitive phrases;

judging whether the occurrence frequency is greater than a sensitive threshold value;

if the judgment result is negative, the next step is carried out.

Preferably, before reading the title and content of the mail, the method further includes:

judging whether the version of the server filtering rule is higher than that of the local filtering rule or not;

if the judgment result is yes, the server filtering rule is obtained and used as the updated local filtering rule;

if the judgment result is negative, the local filtering rule is obtained;

and reading the local filtering rule.

Further, if the spam email contains the spam email, the email is marked as a spam email, and the interrupting operation specifically comprises:

if yes, marking the mail as a junk mail;

sending the junk mail to a server, and updating the server filtering rule;

the operation is interrupted.

A spam filtering device comprising:

the reading module is used for reading the title and the content of the mail;

the classification module is used for performing text classification on the titles to form a title phrase;

the filtering module is used for judging whether the title phrases contain sensitive phrases or not according to filtering rules;

the classification module is also used for performing text classification on the content to form a content phrase;

the filtering module is further used for judging whether the content-marked phrases contain sensitive phrases or not according to filtering rules;

and the marking module is connected with the filtering module and used for marking the mails as junk mails or normal mails according to the judgment result of the filtering module.

Further, the method also comprises the following steps:

the recording module is used for updating the recorded value of the junk mail;

and the judging module is used for judging whether the record value of the sending mailbox is greater than a time threshold value or not.

A spam filtering system comprising a spam filtering device as claimed in any preceding claim, further comprising a server for updating the filtering rules.

A storage medium storing a computer program, wherein the computer program is executed to implement the spam filtering method as described in any one of the above.

The junk mail filtering method provided by the invention has the advantages that the headers and the contents of the mails are read, the headers and the contents are subjected to text classification in sequence, whether sensitive phrases are contained in the mails or not is judged according to the filtering rule, and the mails are marked as junk mails or normal mails according to the judgment result, so that the junk mails are filtered. The method can effectively identify the junk mails, reduce the number of the junk mails received by the user, and solve the problem that the junk mails seriously threaten the sharing, the interactivity and the openness of network resources and influence the experience of the user in using the emails.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a spam filtering method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a spam filtering apparatus according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that the structures, ratios, sizes, and the like shown in the drawings are only used for matching the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the practical limit conditions of the present application, so that the modifications of the structures, the changes of the ratio relationships, or the adjustment of the sizes, do not have the technical essence, and the modifications, the changes of the ratio relationships, or the adjustment of the sizes, are all within the scope of the technical contents disclosed in the present application without affecting the efficacy and the achievable purpose of the present application.

Embodiments of the present invention are written in a progressive manner.

The embodiment discloses a spam filtering method, as shown in fig. 1, including the following steps:

s1, reading the title and the content of the mail;

s2, carrying out text classification on the title to form a title phrase;

s3, judging whether the title phrases contain sensitive phrases or not according to the filtering rule;

if yes, S4, marking the mail as a junk mail, and interrupting the operation;

if not, S5, performing text classification on the content to form content word groups;

s6, judging whether the content phrases contain sensitive phrases or not according to the filtering rule;

if yes, S7, marking the mail as a junk mail, and interrupting the operation;

if not, S8, marking the mail as a normal mail.

When a new mail is received, step S1 reads the title and content of the mail, step S2 forms a title phrase according to a preset classification rule by classifying the title text, step S3 determines whether there is a sensitive phrase according to a filtering rule, step S4 marks the mail as a spam mail when the title includes the sensitive phrase, step S5 continues to classify the content text to form a content phrase when the title does not include the sensitive phrase, step S6 determines whether there is the sensitive phrase according to the filtering rule, step S7 marks the mail as a spam mail when the content includes the sensitive phrase, and step S8 marks the mail as a normal mail when the content does not include the sensitive phrase.

The title is judged first, and the content is judged again under the condition that the title does not contain sensitive phrases, so that the efficiency can be effectively improved. Compared with the method of directly judging the content, the number of the text words of the title is smaller than that of the content in most cases, the time consumed in classification and judgment is relatively less, and when the title contains sensitive phrases, the operation is interrupted, the content can not be judged any more, so that the consumption of redundant processing time or the occupation of excessive system resources can be avoided.

According to the junk mail filtering method provided by the embodiment of the invention, the headers and the contents of the mails are read, the headers and the contents are subjected to text classification in sequence, whether sensitive phrases are contained in the mails or not is judged according to the filtering rule, and the mails are marked as junk mails or normal mails according to the judgment result, so that the junk mails are filtered. The method can effectively identify the junk mails, reduce the quantity of the junk mails received by users, and solve the problem that the junk mails seriously threaten the shareability, the interactivity and the openness of network resources and influence the experience of the users in using the e-mails.

Preferably, if the spam email contains the spam email, the interrupting operation is specifically as follows:

if yes, adding one to the record value of the sending mailbox;

and marking the mail as junk mail and interrupting the operation.

It should be noted that, here, step S4 and step S7 correspond, that is, whether the mail is a header or the content contains a sensitive word group and is marked as a spam mail, the record value of the sending mailbox corresponding to the mail is incremented by one, that is, the number of spam mails sent by the sending mailbox is increased, which can be understood as placing the sending mailbox in a blacklist.

Further, before reading the title and content of the mail, the method further comprises the following steps:

reading a mail sending mailbox of the mail;

if the judgment result is negative, the next step is carried out.

The junk mails are directly filtered through the judgment of the sending mailbox by judging whether the record value exceeds the preset time threshold value, if the record value is larger than the time threshold value, the mails sent by the sending mailbox are all used as the junk mails, the reading of titles and contents is not needed, the subsequent operations such as text classification and judgment of sensitive phrases are not needed, and the processing efficiency is further improved.

if yes, acquiring the occurrence times of the sensitive phrases;

judging whether the occurrence times are greater than a sensitive threshold value;

if the judgment result is negative, the next step is carried out.

The method comprises the steps of judging whether the occurrence frequency of a sensitive phrase exceeds a preset sensitive threshold value or not, avoiding words of normal phrases, combining the sensitive phrases with each other due to small probability, such as two normal phrases of constitution and carousel, wherein the method and the wheel form the sensitive phrase, but do not belong to the category of junk mails substantially, setting a sensitive threshold value, and only when the sensitive threshold value is larger than the sensitive threshold value, determining the sensitive phrase as the substantial sensitive phrase, and further marking the mail as the junk mail.

Preferably, before reading the title and content of the mail, the method further comprises:

if the judgment result is yes, acquiring a server filtering rule as an updated local filtering rule;

if the judgment result is negative, obtaining a local filtering rule;

the local filtering rules are read.

In this embodiment, the filtering rule is generally a filtering rule stored locally, which can improve processing efficiency and reduce the processing interruption probability caused by network congestion or abnormality, but if the version of the server filtering rule is updated, the server filtering rule needs to be replaced with the local filtering rule, and then the updated local filtering rule is read for subsequent judgment, so as to improve the filtering effect.

Further, if the spam email contains the spam email, the email is marked as spam email, and the interrupting operation specifically comprises the following steps:

if yes, marking the mail as a junk mail;

sending the junk mail to a server, and updating a server filtering rule;

the operation is interrupted.

The junk mails recorded locally are sent to the server for updating the server filtering rules, and the processing effect of the junk mails is improved.

A spam filtering device, as shown in fig. 2, comprising:

a reading module 1, which is used for reading the title and the content of the mail;

the classification module 2 is used for performing text classification on the titles to form a title phrase;

the filtering module 3 is used for judging whether the title phrases contain sensitive phrases or not according to filtering rules;

the classification module is also used for carrying out text classification on the content to form a content phrase;

the filtering module is also used for judging whether the content-marked phrases contain sensitive phrases or not according to the filtering rules;

and the marking module 4 is connected with the filtering module and is used for marking the mails as junk mails or normal mails according to the judgment result of the filtering module.

The operation of each module of the spam filtering device can operate steps S1 to S8 of the filtering method, and the specific data acquisition, processing and output processes are not described herein.

Further, as shown in fig. 2, the method further includes:

the recording module 5 is used for updating the recorded value of the junk mail;

and the judging module 6 is used for judging whether the record value of the sending mailbox is greater than the frequency threshold value.

A junk mail filtering system comprises the junk mail filtering device as described in any one of the above, and is characterized by further comprising a server for updating the filtering rules, so that the same technical effects can be achieved.

A storage medium storing a computer program, wherein the computer program, when executed, implements the spam filtering method as described above, and achieves the same technical effects.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. The above-described device embodiments are merely illustrative, for example, the division of the modules is only one logical functional division, and other division manners may be implemented in practice, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be electrical, mechanical or other.

In addition, all functional modules in the embodiments of the present invention may be integrated into one processor, or each module may be separately used as one device, or two or more modules may be integrated into one device; each functional module in each embodiment of the present invention may be implemented in a form of hardware, or may be implemented in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by program instructions and related hardware, where the program instructions may be stored in a computer-readable storage medium, and when executed, the program instructions perform the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A spam filtering method is characterized by comprising the following steps:

reading the title and the content of the mail;

performing text classification on the titles to form a title phrase;

judging whether the title phrases contain sensitive phrases or not according to a filtering rule;

if yes, marking the mail as a junk mail, and interrupting the operation;

if not, the mail is marked as normal mail.

2. The spam filtering method according to claim 1, wherein if it is included, the email is marked as spam, and the interrupting operation is specifically:

if yes, adding one to the record value of the sending mailbox;

and marking the mail as junk mail and interrupting the operation.

3. The spam filtering method according to claim 2, further comprising, before the reading of the title and content of the mail:

reading the mail sending mailbox of the mail;

if the judgment result is negative, the next step is carried out.

4. The spam filtering method according to claim 1, wherein if it is included, the email is marked as spam, and the interrupting operation is specifically:

if yes, acquiring the occurrence times of the sensitive phrases;

if the judgment result is negative, the next step is carried out.

5. The spam filtering method according to claim 1, further comprising, before the reading of the title and content of the mail:

if the judgment result is negative, the local filtering rule is obtained;

and reading the local filtering rule.

6. The spam filtering method according to claim 5, wherein if it is included, the email is marked as spam, and the interrupting operation is specifically:

if yes, marking the mail as a junk mail;

sending the junk mail to a server, and updating the server filtering rule;

the operation is interrupted.

7. A spam filtering device, comprising:

the reading module is used for reading the title and the content of the mail;

8. The spam filtering device of claim 7, further comprising:

the recording module is used for updating the recorded value of the junk mail;

and the judging module is used for judging whether the record value of the sending mailbox is greater than the frequency threshold value.

9. A spam filtering system comprising a spam filtering device according to any of claims 7 to 8, further comprising a server for updating the filtering rules.

10. A storage medium storing a computer program, wherein the computer program, when executed, implements the spam filtering method of any of claims 1-6.